Paper
19 July 2024 A preprocessing study of serum mass spectrometry data using machine learning as a base
Jiaxin Han, Ruimin Dong, Jinjin Liang
Author Affiliations +
Proceedings Volume 13181, Third International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024); 1318155 (2024) https://doi.org/10.1117/12.3031004
Event: Third International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024), 2024, Beijing, China
Abstract
Mass spectrometry data preprocessing is the most basic and important part of mass spectrometry data application. Aiming at the problems of high dimension of serum mass spectrometry data, an efficient and usable preprocessing method is proposed which deals with the inconsistency of dimensions in each group, non-normalization, and non-unique attribute columns. Firstly, the method of rounding down the Mass value is used to determine the attribute column name; Secondly, four ways are used to determine the unique value for the problem that the existence of the same Mass term after rounding down the Mass value; then four ways of filling the missing values are done, and then the transposition of the overall data is carried out. Since four treatments are done for both the same Mass term and missing values, then 16 permutations are available, with a total of 16 new datasets; Finally, three machine learning methods are used to demonstrate the performances of different data preprocessing approaches on the datasets, which are SVM(linear), Random Forest and Logistic Regression. The numerical results show that the accuracies of classification of the three classifiers respectively reach 0.8296, 0.8906, and 0.8088, which validate that the preprocessing algorithms proposed in this paper can efficiently process the raw mass spectrometry data into valid and normalized data
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Jiaxin Han, Ruimin Dong, and Jinjin Liang "A preprocessing study of serum mass spectrometry data using machine learning as a base", Proc. SPIE 13181, Third International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024), 1318155 (19 July 2024); https://doi.org/10.1117/12.3031004
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mass spectrometry

Machine learning

Biological samples

Analytical research

Ions

Data processing

Diseases and disorders

Back to Top