Method Article

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

DOI:

10.3791/57738

October 11th, 2018

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Existing algorithms generate one solution for a biomarker detection dataset. This protocol demonstrates the existence of multiple similarly effective solutions and presents a user-friendly software to help biomedical researchers investigate their datasets for the proposed challenge. Computer scientists may also provide this feature in their biomarker detection algorithms.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Biomarker detection is one of the more important biomedical questions for high-throughput 'omics' researchers, and almost all existing biomarker detection algorithms generate one biomarker subset with the optimized performance measurement for a given dataset. However, a recent study demonstrated the existence of multiple biomarker subsets with similarly effective or even identical classification performances. This protocol presents a simple and straightforward methodology for detecting biomarker subsets with binary classification performances, better than a user-defined cutoff. The protocol consists of data preparation and loading, baseline information summarization, parameter tuning, biomarker screening, result visualization and interpretation, biomarker gene annotations, and result and visualization exportation at publication quality. The proposed biomarker screening strategy is intuitive and demonstrates a general rule for developing biomarker detection algorithms. A user-friendly graphical user interface (GUI) was developed using the programming language Python, allowing biomedical researchers to have direct access to their results. The source code and manual of kSolutionVis can be downloaded from http://www.healthinformaticslab.org/supp/resources.php.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Binary classification, one of the most commonly investigated and challenging data mining problems in the biomedical area, is used to build a classification model trained on two groups of samples with the most accurate discrimination power1,2,3,4,5,6,7. However, the big data generated in the biomedical field has the inherent "large p small n" paradigm, with the number of features usually much larger than the number of samples

Access restricted. Please log in or start a trial to view this content.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

NOTE: The following protocol describes the details of the informatics analytic procedure and pseudo-codes of the major modules. The automatic analysis system was developed using Python version 3.6.0 and the Python modules pandas, abc, numpy, scipy, sklearn, sys, PyQt5, sys, mRMR, math and matplotlib. The materials used in this study are listed in the Table of Materials.

1. Prepare the Data Matrix and Class Labels

  1. Prepare the data matrix file as a TAB- or comma-delimited matrix file, as illustrated in Figure 1A.
    NOTE: Each row has all the values of a feature, and the first item ....

Access restricted. Please log in or start a trial to view this content.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The goal of this workflow (Figure 6) is to detect multiple biomarker subsets with similar efficiencies for a binary classification dataset. The whole process is illustrated by two example datasets ALL1 and ALL2 extracted from a recently-published biomarker detection study12,48. A user may install kSolutionVis by following the instructions in the supplementary materials.

Access restricted. Please log in or start a trial to view this content.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study presents an easy-to-follow multi-solution biomarker detection and characterization protocol for a user-specified binary classification dataset. The software puts an emphasis on user-friendliness and flexible import/export interfaces for various file formats, allowing a biomedical researcher to investigate their dataset easily using the GUI of the software. This study also highlights the necessity of generating more than one solution with similarly effective modeling performances, previously ignored by many exi.......

Access restricted. Please log in or start a trial to view this content.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

We have no conflicts of interest related to this report.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB13040400) and the startup grant from Jilin University. Anonymous reviewers and biomedical testing users were appreciated for their constructive comments on improving the usability and functionality of kSolutionVis.

....

Access restricted. Please log in or start a trial to view this content.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
Hardware
laptopLenovoX1 carbonAny computer works. Recommended minimum configuration: 1GB extra hard disk space, 1 GB memory, 2.0MHz CPU
NameCompanyCatalog NumberComments
Software
Python 3.0WingWareWing PersonalAny python programming and running environments support Python version 3.0 or above

References

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,
  1. Heckerman, D., et al. Genetic variants associated with physical performance and anthropometry in old age: a genome-wide association study in the ilSIRENTE cohort. Scientific Reports. 7, 15879(2017).
  2. Li, Z., et al.

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Biomarker DetectionBinary ClassificationFeature Subset SelectionPerformance MeasurementGraphical User InterfaceData PreparationParameter TuningResult VisualizationGene AnnotationExport Visualization

Related Articles