According to the World Health Organization, cardiovascular disease (CVD) is the leading cause of death worldwide. It’s estimated that 17.9 million people died from CVD in 2016, accounting for 31% of all deaths in the world. Of these, 85% occurred as a result of a heart attack or stroke. The main and most affordable way to diagnose CVD is ECG. The ability to receive, automatically recognize, and make decisions based on remotely obtained ECG data provides doctors and patients with new ways to reduce these unwelcome statistics.
Automatic ECG rhythm recognition is already a classic task. Despite the fact that the first studies in the field of digital processing of ECG recordings appeared back in the 1970s, this area remains relevant for healthcare and continues to develop. Mainly, the changes concern improving the availability of continuous remote cardiac monitoring for ordinary patients within the framework of telemedicine systems.
Over recent years, research on this topic has focused on hunting for algorithms that are more accurate and less demanding of the source data. The methods of automatic recognition with increasing accuracy require an increasing amount of tagged data for training and testing models. The most accessible open data is collected on the PhysioBank project website. In addition, this resource is noteworthy in that it hosts annual competitions to define the properties of physiological data. In the 2017 competition, for example, the task was to isolate atrial fibrillation. Similar recognition quality w by two radically different approaches – feeding a large number of traditional indicators into an automatic algorithm and feeding primary raw data into a neural network.
The classical approach to the training of recognition models involves preliminary filtering of input data from interference from the power supply network and broadband interference caused by the mobility of the electrodes and the natural currents of the body of muscle origin. Often, QRS complexes are detected in the signal, and the data is cut in accordance with their position.
The option of direct data feed to a trained neural network is certainly easier from the point of view of data pre-processing and requires significantly less computing resources. Similar networks can be based on a DCNN structure. According to the atrial fibrillation (AFIB) recognition experience, using 10-second recordings compromise between recognition accuracy and the desire to reduce the amount of simultaneously processed data.
A separate issue that the engineering community is facing is the lack of data for training. When solving recognition problems, first of all, it is necessary to determine the minimum sufficient amount of training sample. This exact problem was investigated by the Auriga team based on data from the publicly available MIT-BIH Arrythmia (mitdb) database and competition materials. We reproduced and evaluated .
First of all, patients 102 and 104 were excluded from ECG recordings of 48 patients because they did not have MLII lead, which was required for our analysis. Fifteen rhythms already present in the markup were used for the study. Due to different numbers of records for different classes, the data of such classes is multiplied in order to equalize their power. Data preprocessing consists only of subtracting the average. The amplitude of the signal is not normalized, since it is known that a drop in amplitude is the most important sign of a critical condition of a patient, such as asystole. There is no asystole in the current data, but it is supposed to continue work with data expansion by records from other databases.
Data multiplication for “poor” classes, training is carried out by sampling from a long implementation of overlapping 10-second windows. When examining the data, one can notice that manual marking of rhythms contains a systematic error in the first segment due to the beginning of the rhythm preferred by the expert with regard to the beat phase, while the 10-second segment in real recognition can start from an arbitrary place. The continuous rhythm intervals are rounded down to the nearest second. This interval is centered on the original, which gives a random start offset from zero to half a second (an average of a quarter second).
To clear data from non-systematic emissions, several types of data are excluded from the sample:
- recordings marked by experts as noise;
- areas of normal sinus rhythm where rare episodes of disturbances such as extrasystoles were detected;
- fragments marked Q (unclassified beat), U (ECG cannot be read), or I (isolated QRS-like artifact).
Within the pacer rhythm, normal beats are also allowed due to registration peculiarities that smooth the leading edge of the beat: tape recording, amplitude-frequency response distortion, and others.
Then, a set of intervals is formed containing a single rhythm, the length of which is a multiple of 1 second and is not less than 10 seconds. Data for the final validation, which should not overlap with the training pattern, is separated from the study sample. The volume of test data is equivalent to 10% of the training data. To generate the required number of samples, the data must be multiplied. Table 1 presents the distribution of prepared data by class.
Rhythm | Files | Parts | Seconds | Pieces | PieTst | Test | Shifts | Learn |
N | 33 | 603 | 36731 | 3427 | 2824 | 10 | 0 | 3417 |
AFIB | 8 | 77 | 7392 | 706 | 629 | 10 | 0 | 696 |
P | 2 | 68 | 2516 | 227 | 159 | 10 | 0 | 217 |
SBR | 1 | 10 | 1567 | 152 | 142 | 10 | 0 | 142 |
B | 6 | 40 | 1443 | 127 | 87 | 10 | 0 | 117 |
T | 7 | 36 | 819 | 72 | 36 | 7 | 1 | 164 |
BII | 1 | 5 | 698 | 68 | 63 | 7 | 3 | 115 |
AFL | 3 | 17 | 538 | 48 | 31 | 5 | 1 | 101 |
PREX | 1 | 19 | 415 | 35 | 16 | 4 | 2 | 161 |
SVTA | 3 | 5 | 141 | 12 | 7 | 1 | 5 | 116 |
VFL | 1 | 4 | 132 | 12 | 8 | 1 | 8 | 107 |
IVR | 2 | 2 | 130 | 12 | 10 | 1 | 9 | 101 |
AB | 1 | 2 | 80 | 7 | 5 | 1 | 10 | 106 |
VT | 1 | 2 | 74 | 6 | 4 | 1 | 7 | 103 |
NOD | 2 | 5 | 73 | 6 | 1 | 1 | 8 | 109 |
Table 1. Classification of data
Rhythm: The label of this rhythm in standard annotations.
Files: Number of files where this rhythm is encountered.
Parts: Number of source intervals (length of at least 10 seconds, a multiple of a second).
Seconds: The total length of Parts in seconds (in descending order).
Pieces: The number of non-overlapping 10-second intervals into which Parts can be cut (the sum of the lengths, divided evenly by 10).
PieTst: Parts lasting 20 seconds or more can give (Len // 10 – 1) Pieces for testing. Upon tha, there will be no lost remainders shorter than 10 seconds.
Test: The number of intervals for final testing. Minimum of three numbers:
- 10% Pieces, rounded to the nearest integer;
- PieTst (we can cut as much as possible without small remainders);
- 10% of the ordered number of items in the class.
Shifts: The number of required steps of overlapping windows per second to get windows is slightly larger than the ordered elements of this class. If = 0, then choose from nonoverlapping Pieces.
Learn: The number of resulting intervals, which is further thinned out to achieve a given number of class elements.
All the work on the preparation of the training and test samples was carried out not with the data itself, but with records containing the sample number of the beginning of the fragment and the duration in seconds. Based on the prepared indices of these fragments, the data is extracted and subjected to the simplest preprocessing: subtraction of the constant component. In addition, each element is present in an inverted form for working with inverse superposition of electrodes (entry 114). Therefore, the real amount of data is doubled.
After training and testing the DCNN network, the following results were obtained:
Classification report | Confusion matrix | ||||||||||||||||||
pre-cision | recall | f1-score | sup-port | rhythm | N | A F I B |
P | S B R |
B | T | B I I |
A F L |
P R E X |
S V T A |
V F L |
I V R |
A B |
V T |
N O D |
0.91 | 1.00 | 0.95 | 20 | N | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0.87 | 1.00 | 0.93 | 20 | AFIB | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1.00 | 1.00 | 1.00 | 20 | P | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1.00 | 1.00 | 1.00 | 20 | SBR | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0.95 | 1.00 | 0.98 | 20 | B | 0 | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1.00 | 0.86 | 0.92 | 14 | T | 0 | 2 | 0 | 0 | 0 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1.00 | 1.00 | 1.00 | 14 | BII | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1.00 | 0.90 | 0.95 | 10 | AFL | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1.00 | 1.00 | 1.00 | 8 | PREX | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 | 0 | 0 |
1.00 | 1.00 | 1.00 | 2 | SVTA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
1.00 | 1.00 | 1.00 | 2 | VFL | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 |
0.00 | 0.00 | 0.00 | 2 | IVR | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
1.00 | 1.00 | 1.00 | 2 | AB | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
1.00 | 1.00 | 1.00 | 2 | VT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 |
0.00 | 0.00 | 0.00 | 2 | NOD | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0.96 | 158 | Accuracy | |||||||||||||||||
0.85 | 0.85 | 0.85 | 158 | Macro average | |||||||||||||||
0.94 | 0.96 | 0.95 | |||||||||||||||||
0.968 | Ranking-based average precision |
Table 2. DCNN network training and testing results
Based on the results obtained, we can draw the first conclusions on the results of training. Some classes, with fairly insignificant numbers of records, cannot significantly affect network training – for example, IVR and NOD. For the remaining small classes, the network is most likely retrained. This is easily verified by validating ECG records of those people whose data in the training of the neural network.
It should be noted that if training is carried out based on one group of people and validation is done based on another group, then the rhythms present in only one record, as well as records containing one rhythm (6 rhythms, 4 records), will drop out of the classification.
Out of the remaining 9 classes, validations showed good results only for 4.
Classification report | Confusion matrix | ||||||||||||
precision | Recall | f1-score | support | rhythm | N | A F I B |
B | P | T | A F L |
S V T A |
N O D |
I V R |
0.50 | 1.00 | 0.66 | 154 | N | 154 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0.77 | 0.76 | 0.77 | 126 | AFIB | 26 | 96 | 0 | 0 | 0 | 2 | 2 | 0 | 0 |
1.00 | 0.85 | 0.92 | 26 | B | 0 | 3 | 22 | 0 | 1 | 0 | 0 | 0 | 0 |
1.00 | 0.99 | 1.00 | 290 | P | 0 | 2 | 0 | 288 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 70 | T | 59 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 50 | AFL | 48 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 10 | SVTA | 0 | 6 | 0 | 0 | 0 | 4 | 0 | 0 | 0 |
0 | 0 | 0 | 8 | NOD | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 20 | IVR | 15 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0.74 | 754 | Accuracy | |||||||||||
0.36 | 0.40 | 0.37 | 754 | Macro average | |||||||||
0.65 | 0.74 | 0.68 | 754 | Weighted average | |||||||||
0.804 | Ranking-based average precision |
Table 3. Four classes with good results
It is easy to see that the neural network is subject to strongly marked retraining for classes with a small training sample: T, AFL, SVTA. Good accuracy and specificity indicators for cross-validation on the training sample are not confirmed by testing on patients selected for validation. Moreover, in the error matrix, there is a tendency to mix small classes with a normal sinus rhythm ).
For the remaining 4 classes, it makes sense to re-conduct the training and validation process. Validation results are slightly better for 3 classes. Presumably, as a result of filtering out noise of small classes from the training sample we achieve the following results:
Classification report | Confusion matrix | |||||||
Precision | Recall | f1-score | support | rhythm | N | AFIB | B | P |
0.93 | 1.00 | 0.97 | 154 | N | 154 | 0 | 0 | 0 |
0.93 | 0.92 | 0.92 | 126 | AFIB | 10 | 116 | 0 | 0 |
1.00 | 0.62 | 0.76 | 26 | B | 1 | 9 | 16 | 0 |
1.00 | 1.00 | 1.00 | 290 | P | 0 | 0 | 0 | 290 |
0.97 | 596 | Accuracy | ||||||
0.97 | 0.88 | 0.91 | 596 | Macro average | ||||
0.97 | 0.97 | 0.96 | 596 | Weighted average | ||||
0.983 | Ranking-based average precision |
Table 4. Validation results for the remaining four classes
From the studies conducted, we can conclude that records of even 2patients may be sufficient for reliable recognition of heart rhythm pathologies. The quality of such models should be checked in practice with a mandatory allocation of a group of patients for validation. It appears that the amount of data necessary for each case depends on the rhythm disturbances traits peculiar to the specific form of pathology.
The article was initially published at Medical Product Outsourcing