https://doi.org/10.3389/frai.2022.1116416
“The identification and characterization of signal regions in Nuclear Magnetic Resonance (NMR) spectra is a challenging but crucial phase in the analysis and determination of complex chemical compounds. Here, we present a novel supervised deep learning approach to perform automatic detection and classification of multiplets in 1H NMR spectra. Our deep neural network was trained on a large number of synthetic spectra, with complete control over the features represented in the samples. We show that our model can detect signal regions effectively and minimize classification errors between different types of resonance patterns. We demonstrate that the network generalizes remarkably well on real experimental 1H NMR spectra.”
”
Since its discovery, Nuclear Magnetic Resonance (NMR) spectroscopy has become an effective and reliable tool for investigating complex molecular compounds, using the interaction of nuclear spins, an intrinsic property of atoms, with the magnetic field. An NMR spectrum contains different kinds of resonances as a function of frequency, including isolated peaks, referred to as singlets, double peaks, referred to as doublets, up to composite sets of multiple peaks, generally referred to as multiplets (Keeler, 2002). The frequency coordinates of the multiplet (chemical shift), together with the integration of the multiplet profile, the resonance pattern and the distance between consecutive peaks within the same multiplet (coupling constant), serve as a molecular fingerprint. These features provide knowledge about the abundances of the atoms, their local chemical environments within the molecule and their connectivity and stereochemistry.
The traditional field in which NMR spectroscopy is extensively employed is organic chemistry, where it is used for the structure elucidation of new natural compounds and reaction products (Jackmann and Sternhell, 1969). Yet, NMR spectroscopy is widespread in numerous scientific fields, combining both qualitative and quantitative approaches. The information extracted from chemical shifts, coupling constants and peak integration is used for the study of the dynamics and compartmentation of metabolic pathways in system biochemistry (Fan and Lane, 2016), for the diagnosis of tumors, hematomas, and other pathologies (e.g., multiple sclerosis) in medicine (Zia et al., 2019), for characterization of humic substances and analysis of contaminants in environmental sciences (Cardoza et al., 2004), and for the evaluation of soil components, plant tissues and complex food compounds in agriculture (Mazzei and Piccolo, 2017) and food chemistry (Cao et al., 2021).
Nevertheless, there is a major drawback. The process of retrieving information from the spectra is often very demanding, time-consuming and susceptible to errors. It requires the involvement of expert spectroscopists to perform manual annotation of the spectra, chemical shift and coupling constants extraction, and structure elucidation. Moreover, the evaluation and interpretation of the NMR spectra are not always straightforward and unambiguous, due to the presence of spectral artifacts and overlapping resonances. Therefore, introducing automation in the NMR analysis could accelerate and facilitate the process while increasing the robustness and reproducibility of the results.
”
We run the classification model over 10,000 segments of synthetic spectra generated independently from the training set. An example of the classification over a synthetically produced segment is displayed in Figure 2. The performance of the prediction for each multiplet class is reported in the confusion matrix in Figure 3. The entries of the matrix were normalized along the rows, that is over the total number of ground truth points in each class. The true positive rates on the diagonal of the matrix are all above 99%. Even if with a very low rate, the most frequent errors involve the baseline class: either baseline points are predicted as signal points or signal points are predicted as baseline points. After a visual inspection, it was apparent that these errors happen at the borders of a signal region and can be explained with a slightly different positioning, of the order of a few points, of the predicted label with respect to the ground truth one. However, these positioning inaccuracies of the predicted labels are negligible compared to a serious misinterpretation of a noise region for signal. Considering the error rates of the multiplet classes, it appears that as the number of peaks in the multiplet class increases the error rate decreases, with the highest error rate, of the order of 1.5%, belonging to the singlets class. This behavior can be interpreted considering the presence of a slight class imbalance. In a point-by-point classification algorithm, this issue exists despite the adoption of a synthetic training set. When generating the training set, the pattern of the resonances is chosen randomly so that there will be on average the same number of resonances for each multiplet class. However, the extension of multiplets varies with the number of peaks so that resonances with more peaks spread over a larger number of spectral points. Therefore, during training, the network is presented with more points belonging to multiplets with a higher number of peaks. This behavior should be considered attentively. However, the excellent results achieved even in the case of singlets assure the proper functioning of the classification algorithm.
Figure 2. Color coded prediction of 1H NMR Spectra: (top) segment of a synthetic spectrum; (bottom) segment of an experimental spectrum from the testing set of 10 small molecules.
Figure 3. Evaluation of performance. (A) Point-wise approach: Confusion matrix of the performance on synthetic spectra. (B) Object-wise approach: Precision-recall curves for each multiplet class (baseline class excluded) at threshold 75%. (C) Point-wise approach: Confusion matrix of the performance on the 10 1H NMR spectra of the testing set.
From the confusion matrix, we measured accuracy, precision, recall and F1 score. The results for each multiplet class together with an average over all the classes are reported in Table 1. Accuracy is always above 99% and precision, recall and F1 score do not fall below 98.5%. Precision and recall values do not diverge significantly across the multiplet classes, with a difference of 0.21% on average. Therefore, the model is able to successfully minimize false positives and false negatives at the same time. This can be confirmed by analyzing the precision-recall curves (see Figure 3). An optimal classification algorithm would yield a precision of 1.0 for all values of recall. Our model is approaching this limit. The choice of the IOU threshold whereby a label prediction was identified as a true positive or a false positive is arbitrary and, indeed, each value of the IOU threshold defines a different AP metric. In the present work, we reported for each class the AP metric and the average over all classes (mAP) for two threshold values, 50% and 75% (see Table 1). Increasing the IOU threshold increases the probability that a label prediction is a false positive and deteriorates the overall performance metrics. On average, passing from a threshold of 50% to a threshold of 75% decreases the AP metrics by only 0.47%. Also in the contest of the object-wise approach, it is shown that the multiplet class for which the classification efficiency declines faster when increasing the IOU threshold is that of singlets.