Research Article

Machine Learning-Based Multimodal Molecular Biomarkers for Predictive Health Analytics

DOI:

10.3791/69241

January 16th, 2026

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Machine learning with multimodal biomarkers enhances disease prediction and monitoring, offering promise for healthcare advancements in various domains, improving healthcare outcomes through accurate disease prediction.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Every year, many people around the world are progressively affected by the devastating conditions of health problems such as heart disease, respiratory infections, neurological dysfunction, cognitive stress, cancer, stroke, diabetes, etc., which lead to severe health complications and associated abnormalities. Thus, early health analytics are crucial, as they enable timely intervention with targeted therapies, potentially providing immediate relief and sustained long-term benefits that may slow disease progression. Due to the complex pathophysiological processes and heterogeneous clinical trials in various health conditions, there is a need for highly sensitive, multimodal biomarkers and effective investigative approaches to accurately detect and monitor patient health outcomes. Therefore, machine learning algorithms with various categories and techniques are considered for predicting outcomes, including prognosis, risk assessment, patient stratification, and disease monitoring. The flow of the proposed work is divided into three stages, as the first stage defines the importance of healthcare with case studies, followed by the traditional Machine Learning (ML) algorithms, traditional Deep Learning (DL) approaches, and modern DL techniques (TabNet and AutoInt) in the second stage. Finally, the experiments are implemented to justify the results. This work highlights the grouping of modalities by integrating molecular protein, chemical, and genetic biomarkers with emerging ML features. The results indicate a significant improvement in predicting the accuracy using the proposed methodology.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The healthcare system is increasingly utilizing predictive analytics to monitor patients and diagnose health outcomes in a timely manner, facilitating proactive care and improving patient outcomes. By integrating predictive analytics with biomarkers, clinicians can enhance diagnostic accuracy, identify effective treatments, monitor therapy progress, and gain valuable insights into disease mechanisms, ultimately leading to more informed and effective treatment decisions. Biomarkers refer to the characteristics of biological molecules or indicators found in samples, such as biofluids, tissues, cells, DNA, RNA, blood, and urine, that measure the presence or progression o....

Access restricted. Please log in or start a trial to view this content.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

1. Experimental workflow

Figure 1 illustrates an organized and modular workflow pipeline designed to categorize molecular biomarkers into Risk or Diagnostic indication categories using machine learning and deep learning techniques. Here is a detailed step-by-step explanation of the process:

  1. Dataset acquisition
    The dataset is collected from MarkerDB 2.0, a curated database of molecular biomarkers.
  2. Data preprocessing
    This ensures data quality and consistency by handling null values and missing entries, encoding categorical features using LabelEncoder, and scali....

Access restricted. Please log in or start a trial to view this content.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The experiments are conducted using ML and DL approaches to compare their performance. At the same time, creating ML models that utilize both supervised and unsupervised learning mechanisms is considered. The supervised techniques used in this methodology are Logistic Regression, Decision Tree, Random Forest, Gradient Boost, Support Vector Machine, and K-Nearest Neighbor algorithms. These algorithms range from simple statistical models to complex tree-based analyses. On the other hand, unsupervised ML algorithms are K-Me.......

Access restricted. Please log in or start a trial to view this content.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The proposed work clearly discusses the importance of biomarkers in the identification and monitoring of a disease and its characteristics. The suggested approach for biomarker methodology is based on four essential steps: careful data preparation (cleaning, encoding, eliminating identifiers); uniform target encoding (Risk=0, Diagnostic=1); normalization of features and high-quality deep embedding for models such as AutoInt37/TabNet; and a solid evaluation process integrity (appropriate train/test.......

Access restricted. Please log in or start a trial to view this content.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors have nothing to disclose.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors received no external funding.

....

Access restricted. Please log in or start a trial to view this content.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
AutoInt ModelPeking Universityhttps://github.com/shichence/AutoIntAutomatic Feature Interaction Learning via Self-Attentive Neural Networks: Efficient algorithm to automatically learn high-order feature interactions for (sparse) categorical and numerical features
deepctr_torch.modelsOpen sourcehttps://github.com/shenweichen/DeepCTR-TorchModular and Extendible package of deep-learning based CTR models to build own custom model easily.
Google ColaboratoryGooglehttps://colab.research.google.com/Cloud-based  environment to write and execute Python code through the browser for machine learning, data analysis
Keras 2.6.0Kerashttps://keras.io/Provides a Python interface for executing neural networks which runs on top of Tensorflow
MarkerDB 2.0MarkerDBhttps://markerdb.ca/downloadsPublicky avilable Database of Molecular Biomarkers
Matplotlib 3.4.3Matplotlib https://matplotlib.org/Visualization library for python
Numpy 1.21.4Numpyhttps://numpy.org/Fundamental package for scientific computing with Python
Pandas 1.3.4Pandashttps://pandas.pydata.org/Powerful, flexible and easy to use open source data analysis and manipulation tool built on top of the Python programming language.
Python 3.8.10Python Software Foundationhttps://www.python.org/Popular programming language that integrate deep learning systems more effectively.
pytorch_tabnet.tab_modelPytorchhttps://pypi.org/project/pytorch-tabnet/For implementing binary/multi-class classification and regression problems.
Scikit-learnScikit-learnhttps://scikit-learn.org/stable/Python module for machine learning algorithms
SeabornSeabornhttps://seaborn.pydata.org/High level data visualization library for drawing attractive and informative statistical graphics
TabNet ModelGooglehttps://github.com/dreamquark-ai/tabnetTabNet is an end-to-end neural network designed to directly handle tabular data
Tensorflow 2.6.0Googlehttps://www.tensorflow.org/An end-to-end platform for machine learning

References

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,
  1. Mollarasouli, F., Bakirhan, N. K., Ozkan, S. A. Introduction to biomarkers. The detection of biomarkers. , Academic Press. 1-22 (2022).
  2. Doroszkiewicz, J., Groblewska, M., Mroczko, B. Molecular biomarkers and their implications for the early diagnosis of selected neurodegenerative diseases. Int J Mo....

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Machine Learning BiomarkersMultimodal BiomarkersPredictive Health AnalyticsMolecular BiomarkersDisease MonitoringRisk AssessmentPatient StratificationDeep Learning TechniquesProtein BiomarkersGenetic Biomarkers

Related Articles