Method Article

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data

DOI:

10.3791/61715

May 16th, 2022

In This Article

Retraction Notice

The article <em>Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data</em> (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology.

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

LEfSe (LDA Effect Size) is a tool for high-dimensional biomarker mining to identify genomic features (such as genes, pathways, and taxonomies) that significantly characterize two or more groups in microbiome data.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

There is growing attention toward closed biological genomes in the environment and in health. To explore and reveal the intergroup differences among different samples or environments, it is crucial to discover biomarkers with statistical differences among groups. The application of Linear discriminant analysis Effect Size (LEfSe) can help find good biomarkers. Based on the original genome data, quality control, and quantification of different sequences based on taxa or genes are carried out. First, the Kruskal-Wallis rank test was used to distinguish between specific differences among statistical and biological groups. Then, the Wilcoxon rank test was performed between the two groups obtained in the previous step to assess whether the differences were consistent. Finally, a linear discriminant analysis (LDA) was conducted to evaluate the influence of biomarkers on significantly different groups based on LDA scores. To sum up, LEfSe provided the convenience for identifying genomic biomarkers that characterize statistical differences among biological groups.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Biomarkers are biological characteristics that can be measured and can indicate some phenomena such as infection, disease, or environment. Among them, functional biomarkers may be specific biological functions of single species or common to some species, such as gene, protein, metabolite and pathways. Besides, taxonomic biomarkers indicate an unusual species, a group of organisms (kingdom, phylum, class, order, family, genus, species), the Amplicon Sequence Varient (ASV)1, or the Operational Taxonomic Unit (OTU)2. In order to find biomarkers more quickly and accurately, a tool for analyzing the biological data is necessa....

Access restricted. Please log in or start a trial to view this content.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

NOTE: The protocol was sourced and modified from the research of Segata et al.3. The method is provided at https://bitbucket.org/biobakery/biobakery/wiki/lefse.

1. Preparation of input file for analysis

  1. Prepare the input file (Table 1) of LEfSe, which could be easily generated by many workflows8 or previous protocols9 with the original files (sample file and corresponding species annotation file).

2. LEfSe native analysis (limited to the Linux server)

  1. LEfSe Installation
    NOTE: The LEfSe pi....

Access restricted. Please log in or start a trial to view this content.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The LDA scores of microbial communities with significant differences in each group by analyzing the 16S rRNA gene sequences of three samples is shown in Figure 8. The color of the histogram represents different groups, while the length represents the LDA score, which is the influence of the species with significant differences between different groups. The histogram shows the species with significant differences whose LDA score is greater than the preset value. The default preset value is 2........

Access restricted. Please log in or start a trial to view this content.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Here, the protocol for the identification and characterization of biomarkers within different groups is described. This protocol can easily be adapted for other sample types, such as OTUs of microorganisms. The statistical method by LEfSe can find the characteristic microorganisms in each group (default is LDA >2), that is, the microorganisms that are more abundant in this group relative to the others12. LEfSe is available in both native and web Linux versions where users can also perform LEfS.......

Access restricted. Please log in or start a trial to view this content.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors have nothing to disclose.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This work was supported by a grant from Fundamental Research Funds for the Central Public Welfare Research Institutes (TKS170205) and Foundation for Development of Science and Technology, and Tianjin Research Institute for Water Transport Engineering (TIWTE), M.O.T. (KJFZJJ170201).

....

Access restricted. Please log in or start a trial to view this content.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
No materials used

References

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,
  1. Bolyen, E., et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology. 37 (8), 852-857 (2019).
  2. Knight, R., et al. Best practices for analysing microbiomes. Nature Reviews. Mic....

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Linear Discriminant AnalysisBiomarker SelectionMicrobiome DataKruskal Wallis TestWilcoxon Rank TestLDA Effect SizeLEfSe AnalysisGalaxy ServerPrincipal Component AnalysisGenomic Biomarkers