ECG Rhythm Recognition by Deep Convolutional Neural Network

According to the World Health Organization, cardiovascular disease (CVD) is the leading cause of death worldwide. It’s estimated that 17.9 million people died from CVD in 2016, accounting for 31% of all deaths in the world. Of these, 85% occurred as a result of a heart attack or stroke. The main and most affordable way to diagnose CVD is ECG. The ability to receive, automatically recognize, and make decisions based on remotely obtained ECG data provides doctors and patients with new ways to reduce these unwelcome statistics.

Automatic ECG rhythm recognition is already a classic task. Despite the fact that the first studies in the field of digital processing of ECG recordings appeared back in the 1970s, this area remains relevant for healthcare and continues to develop. Mainly, the changes concern improving the availability of continuous remote cardiac monitoring for ordinary patients within the framework of telemedicine systems.

Over recent years, research on this topic has focused on hunting for algorithms that are more accurate and less demanding of the source data. The methods of automatic recognition with increasing accuracy require an increasing amount of tagged data for training and testing models. The most accessible open data is collected on the PhysioBank project website. In addition, this resource is noteworthy in that it hosts annual competitions to define the properties of physiological data. In the 2017 competition, for example, the task was to isolate atrial fibrillation. Similar recognition quality w by two radically different approaches – feeding a large number of traditional indicators into an automatic algorithm and feeding primary raw data into a neural network.

The classical approach to the training of recognition models involves preliminary filtering of input data from interference from the power supply network and broadband interference caused by the mobility of the electrodes and the natural currents of the body of muscle origin. Often, QRS complexes are detected in the signal, and the data is cut in accordance with their position.

The option of direct data feed to a trained neural network is certainly easier from the point of view of data pre-processing and requires significantly less computing resources. Similar networks can be based on a DCNN structure. According to the atrial fibrillation (AFIB) recognition experience, using 10-second recordings compromise between recognition accuracy and the desire to reduce the amount of simultaneously processed data.

A separate issue that the engineering community is facing is the lack of data for training. When solving recognition problems, first of all, it is necessary to determine the minimum sufficient amount of training sample. This exact problem was investigated by the Auriga team based on data from the publicly available MIT-BIH Arrythmia (mitdb) database and competition materials. We reproduced and evaluated .

First of all, patients 102 and 104 were excluded from ECG recordings of 48 patients because they did not have MLII lead, which was required for our analysis. Fifteen rhythms already present in the markup were used for the study. Due to different numbers of records for different classes, the data of such classes is multiplied in order to equalize their power. Data preprocessing consists only of subtracting the average. The amplitude of the signal is not normalized, since it is known that a drop in amplitude is the most important sign of a critical condition of a patient, such as asystole. There is no asystole in the current data, but it is supposed to continue work with data expansion by records from other databases.

Data multiplication for “poor” classes, training is carried out by sampling from a long implementation of overlapping 10-second windows. When examining the data, one can notice that manual marking of rhythms contains a systematic error in the first segment due to the beginning of the rhythm preferred by the expert with regard to the beat phase, while the 10-second segment in real recognition can start from an arbitrary place. The continuous rhythm intervals are rounded down to the nearest second. This interval is centered on the original, which gives a random start offset from zero to half a second (an average of a quarter second).

To clear data from non-systematic emissions, several types of data are excluded from the sample:

  • recordings marked by experts as noise;
  • areas of normal sinus rhythm where rare episodes of disturbances such as extrasystoles were detected;
  • fragments marked Q (unclassified beat), U (ECG cannot be read), or I (isolated QRS-like artifact).

Within the pacer rhythm, normal beats are also allowed due to registration peculiarities that smooth the leading edge of the beat: tape recording, amplitude-frequency response distortion, and others.

Then, a set of intervals is formed containing a single rhythm, the length of which is a multiple of 1 second and is not less than 10 seconds. Data for the final validation, which should not overlap with the training pattern, is separated from the study sample. The volume of test data is equivalent to 10% of the training data. To generate the required number of samples, the data must be multiplied. Table 1 presents the distribution of prepared data by class.

Rhythm Files Parts Seconds Pieces PieTst Test Shifts Learn
N 33 603 36731 3427 2824 10 0 3417
AFIB 8 77 7392 706 629 10 0 696
P 2 68 2516 227 159 10 0 217
SBR 1 10 1567 152 142 10 0 142
B 6 40 1443 127 87 10 0 117
T 7 36 819 72 36 7 1 164
BII 1 5 698 68 63 7 3 115
AFL 3 17 538 48 31 5 1 101
PREX 1 19 415 35 16 4 2 161
SVTA 3 5 141 12 7 1 5 116
VFL 1 4 132 12 8 1 8 107
IVR 2 2 130 12 10 1 9 101
AB 1 2 80 7 5 1 10 106
VT 1 2 74 6 4 1 7 103
NOD 2 5 73 6 1 1 8 109

 Table 1. Classification of data

Rhythm:  The label of this rhythm in standard annotations.

Files:        Number of files where this rhythm is encountered.

Parts:       Number of source intervals (length of at least 10 seconds, a multiple of a second).

Seconds: The total length of Parts in seconds (in descending order).

Pieces:    The number of non-overlapping 10-second intervals into which Parts can be cut (the sum of the lengths, divided evenly by 10).

PieTst:     Parts lasting 20 seconds or more can give (Len // 10 – 1) Pieces for testing. Upon tha, there will be no lost remainders shorter than 10 seconds.

Test:         The number of intervals for final testing. Minimum of three numbers:

  • 10% Pieces, rounded to the nearest integer;
  • PieTst (we can cut as much as possible without small remainders);
  • 10% of the ordered number of items in the class.

Shifts:      The number of required steps of overlapping windows per second to get windows is slightly larger than the ordered elements of this class. If = 0, then choose from nonoverlapping Pieces.

Learn:      The number of resulting intervals, which is further thinned out to achieve a given number of class elements.

All the work on the preparation of the training and test samples was carried out not with the data itself, but with records containing the sample number of the beginning of the fragment and the duration in seconds. Based on the prepared indices of these fragments, the data is extracted and subjected to the simplest preprocessing: subtraction of the constant component. In addition, each element is present in an inverted form for working with inverse superposition of electrodes (entry 114). Therefore, the real amount of data is doubled.

After training and testing the DCNN network, the following results were obtained:

Classification report   Confusion matrix
pre-cision recall f1-score sup-port rhythm N A
F
I
B
P S
B
R
B T B
I
I
A
F
L
P
R
E
X
S
V
T
A
V
F
L
I
V
R
A
B
V
T
N
O
D
0.91 1.00 0.95 20 N 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.87 1.00 0.93 20 AFIB 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0
1.00 1.00 1.00 20 P 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0
1.00 1.00 1.00 20 SBR 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0
0.95 1.00 0.98 20 B 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0
1.00 0.86 0.92 14 T 0 2 0 0 0 12 0 0 0 0 0 0 0 0 0
1.00 1.00 1.00 14 BII 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0
1.00 0.90 0.95 10 AFL 0 1 0 0 0 0 0 9 0 0 0 0 0 0 0
1.00 1.00 1.00 8 PREX 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0
1.00 1.00 1.00 2 SVTA 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0
1.00 1.00 1.00 2 VFL 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
0.00 0.00 0.00 2 IVR 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
1.00 1.00 1.00 2 AB 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0
1.00 1.00 1.00 2 VT 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0
0.00 0.00 0.00 2 NOD 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.96 158 Accuracy
0.85 0.85 0.85 158 Macro average
0.94 0.96 0.95
0.968 Ranking-based average precision

Table 2. DCNN network training and testing results

Based on the results obtained, we can draw the first conclusions on the results of training. Some classes, with fairly insignificant numbers of records, cannot significantly affect network training – for example, IVR and NOD. For the remaining small classes, the network is most likely retrained. This is easily verified by validating ECG records of those people whose data in the training of the neural network.

It should be noted that if training is carried out based on one group of people and validation is done based on another group, then the rhythms present in only one record, as well as records containing one rhythm (6 rhythms, 4 records), will drop out of the classification.

Out of the remaining 9 classes, validations showed good results only for 4.

Classification report   Confusion matrix
precision Recall f1-score support rhythm N A
F
I
B
B P T A
F
L
S
V
T
A
N
O
D
I
V
R
0.50 1.00 0.66 154 N 154 0 0 0 0 0 0 0 0
0.77 0.76 0.77 126 AFIB 26 96 0 0 0 2 2 0 0
1.00 0.85 0.92 26 B 0 3 22 0 1 0 0 0 0
1.00 0.99 1.00 290 P 0 2 0 288 0 0 0 0 0
0 0 0 70 T 59 11 0 0 0 0 0 0 0
0 0 0 50 AFL 48 1 0 0 0 0 1 0 0
0 0 0 10 SVTA 0 6 0 0 0 4 0 0 0
0 0 0 8 NOD 8 0 0 0 0 0 0 0 0
0 0 0 20 IVR 15 5 0 0 0 0 0 0 0
0.74 754 Accuracy
0.36 0.40 0.37 754 Macro average
0.65 0.74 0.68 754 Weighted average
0.804 Ranking-based average precision

Table 3. Four classes with good results

It is easy to see that the neural network is subject to strongly marked retraining for classes with a small training sample: T, AFL, SVTA. Good accuracy and specificity indicators for cross-validation on the training sample are not confirmed by testing on patients selected for validation. Moreover, in the error matrix, there is a tendency to mix small classes with a normal sinus rhythm ).

For the remaining 4 classes, it makes sense to re-conduct the training and validation process. Validation results are slightly better for 3 classes. Presumably, as a result of filtering out noise of small classes from the training sample we achieve the following results:

Classification report   Confusion matrix
Precision Recall f1-score support rhythm N AFIB B P
0.93 1.00 0.97 154 N 154 0 0 0
0.93 0.92 0.92 126 AFIB 10 116 0 0
1.00 0.62 0.76 26 B 1 9 16 0
1.00 1.00 1.00 290 P 0 0 0 290
0.97 596 Accuracy
0.97 0.88 0.91 596 Macro average
0.97 0.97 0.96 596 Weighted average
0.983 Ranking-based average precision

Table 4. Validation results for the remaining four classes

From the studies conducted, we can conclude that records of even 2patients may be sufficient for reliable recognition of heart rhythm pathologies. The quality of such models should be checked in practice with a mandatory allocation of a group of patients for validation. It appears that the amount of data necessary for each case depends on the rhythm disturbances traits peculiar to the specific form of pathology.

The article was initially published at Medical Product Outsourcing