A Bioinformatics Pipeline to Accurately and Efficiently Analyze the MicroRNA Transcriptomes in Plants

* These authors contributed equally
This article has been accepted and is currently in production

Abstract

MicroRNAs (miRNAs) are 20- to 24-nucleotide (nt) endogenous small RNAs (sRNAs) extensively existing in plants and animals that play potent roles in regulating gene expression at the post-transcriptional level. Sequencing sRNA libraries by Next Generation Sequencing (NGS) methods has been widely employed to identify and analyze miRNA transcriptomes in the last decade, resulting in a rapid increase of miRNA discovery. However, two major challenges arise in plant miRNA annotation due to increasing depth of sequenced sRNA libraries as well as the size and complexity of plant genomes. First, many other types of sRNAs, in particular, short interfering RNAs (siRNAs) from sRNA libraries, are erroneously annotated as miRNAs by many computational tools. Second, it becomes an extremely time-consuming process for analyzing miRNA transcriptomes in plant species with large and complex genomes. To overcome these challenges, we recently upgraded miRDeep-P (a popular tool for miRNA transcriptome analyses) to miRDeep-P2 (miRDP2 for short) by employing a new filtering strategy, overhauling the scoring algorithm and incorporating newly updated plant miRNA annotation criteria. We tested miRDP2 against sequenced sRNA populations in five representative plants with increasing genomic complexity, including Arabidopsis, rice, tomato, maize and wheat. The results indicate that miRDP2 processed these tasks with very high efficiency. In addition, miRDP2 outperformed other prediction tools regarding sensitivity and accuracy. Taken together, our results demonstrate miRDP2 as a fast and accurate tool for analyzing plant miRNA transcriptomes, therefore a useful tool in helping the community better annotate miRNAs in plants.