Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.
Failure to account for covariation patterns in helical regions of ribosomal RNA (rRNA) genes has the potential to misdirect the estimation of the phylogenetic signal of the data. Furthermore, the extremes of length variation among taxa, combined with regional substitution rate variation can mislead the alignment of rRNA sequences and thus distort subsequent tree reconstructions. However, recent developments in phylogenetic methodology now allow a comprehensive integration of secondary structures in alignment and tree reconstruction analyses based on rRNA sequences, which has been shown to correct some of these problems. Here, we explore the potentials of RNA substitution models and the interactions of specific model setups with the inherent pattern of covariation in rRNA stems and substitution rate variation among loop regions.
The use of secondary structures has been advocated to improve both the alignment and the tree reconstruction processes of ribosomal RNA (rRNA) data sets. We used simulated and empirical rRNA data to test the impact of secondary structure consideration in both steps of molecular phylogenetic analyses. A simulation approach was used to generate realistic rRNA data sets based on real 16S, 18S, and 28S sequences and structures in combination with different branch length and topologies. Alignment and tree reconstruction performance of four recent structural alignment methods was compared with exclusively sequence-based approaches. As empirical data, we used a hexapod rRNA data set to study the influence of nucleotide interdependencies in sequence alignment and tree reconstruction. Structural alignment methods delivered significantly better sequence alignments compared with pure sequence-based methods. Also, structural alignment methods delivered better trees judged by topological congruence to simulation base trees. However, the advantage of structural alignments was less pronounced and even vanished in several instances. For simulated data, application of mixed RNA/DNA models to stems and loops, respectively, led to significantly shorter branches. The application of mixed RNA/DNA models in the hexapod analyses delivered partly implausible relationships. This can be interpreted as a stronger sensitivity of mixed model setups to nonphylogenetic signal. Secondary structure consideration clearly influenced sequence alignment and tree reconstruction of ribosomal genes. Although sequence alignment quality can considerably be improved by the use of secondary structure information, the application of mixed models in tree reconstructions needs further studies to understand the observed effects.
Ribosomal RNA (rRNA) genes are probably the most frequently used data source in phylogenetic reconstruction. Individual columns of rRNA alignments are not independent as a consequence of their highly conserved secondary structures. Unless explicitly taken into account, these correlation can distort the phylogenetic signal and/or lead to gross overestimates of tree stability. Maximum likelihood and Bayesian approaches are of course amenable to using RNA-specific substitution models that treat conserved base pairs appropriately, but require accurate secondary structure models as input. So far, however, no accurate and easy-to-use tool has been available for computing structure-aware alignments and consensus structures that can deal with the large rRNAs. The RNAsalsa approach is designed to fill this gap. Capitalizing on the improved accuracy of pairwise consensus structures and informed by a priori knowledge of group-specific structural constraints, the tool provides both alignments and consensus structures that are of sufficient accuracy for routine phylogenetic analysis based on RNA-specific substitution models. The power of the approach is demonstrated using two rRNA data sets: a mitochondrial rRNA set of 26 Mammalia, and a collection of 28S nuclear rRNAs representative of the five major echinoderm groups.
Whenever different data sets arrive at conflicting phylogenetic hypotheses, only testable causal explanations of sources of errors in at least one of the data sets allow us to critically choose among the conflicting hypotheses of relationships. The large (28S) and small (18S) subunit rRNAs are among the most popular markers for studies of deep phylogenies. However, some nodes supported by this data are suspected of being artifacts caused by peculiarities of the evolution of these molecules. Arthropod phylogeny is an especially controversial subject dotted with conflicting hypotheses which are dependent on data set and method of reconstruction. We assume that phylogenetic analyses based on these genes can be improved further i) by enlarging the taxon sample and ii) employing more realistic models of sequence evolution incorporating non-stationary substitution processes and iii) considering covariation and pairing of sites in rRNA-genes.
Molecular phylogenies are being published increasingly and many biologists rely on the most recent topologies. However, different phylogenetic trees often contain conflicting results and contradict significant background data. Not knowing how reliable traditional knowledge is, a crucial question concerns the quality of newly produced molecular data. The information content of DNA alignments is rarely discussed, as quality statements are mostly restricted to the statistical support of clades. Here we present a case study of a recently published mollusk phylogeny that contains surprising groupings, based on five genes and 108 species, and we apply new or rarely used tools for the analysis of the information content of alignments and for the filtering of noise (masking of random-like alignment regions, split decomposition, phylogenetic networks, quartet mapping).
Secondary structure models of mitochondrial and nuclear (r)RNA sequences are frequently applied to aid the alignment of these molecules in phylogenetic analyses. Additionally, it is often speculated that structure variation of (r)RNA sequences might profitably be used as phylogenetic markers. The benefit of these approaches depends on the reliability of structure models. We used a recently developed approach to show that reliable inference of large (r)RNA secondary structures as a prerequisite of simultaneous sequence and structure alignment is feasible. The approach iteratively establishes local structure constraints of each sequence and infers fully folded individual structures by constrained MFE optimization. A comparison of structure edit distances of individual constraints and fully folded structures showed pronounced phylogenetic signal in fully folded structures. As model sequences we characterized secondary structures of 28S rRNA sequences of selected insects and examined their phylogenetic signal according to established phylogenetic hypotheses.
In this study, we investigated the relationships among insect orders with a main focus on Polyneoptera (lower Neoptera: roaches, mantids, earwigs, grasshoppers, etc.), and Paraneoptera (thrips, lice, bugs in the wide sense). The relationships between and within these groups of insects are difficult to resolve because only few informative molecular and morphological characters are available. Here, we provide the first phylogenomic expressed sequence tags data (EST: short sub-sequences from a c(opy) DNA sequence encoding for proteins) for stick insects (Phasmatodea) and webspinners (Embioptera) to complete published EST data. As recent EST datasets are characterized by a heterogeneous distribution of available genes across taxa, we use different rationales to optimize the data matrix composition. Our results suggest a monophyletic origin of Polyneoptera and Eumetabola (Paraneoptera + Holometabola). However, we identified artefacts of tree reconstruction (human louse Pediculus humanus assigned to Odonata (damselflies and dragonflies) or Holometabola (insects with a complete metamorphosis); mayfly genus Baetis nested within Neoptera), which were most probably rooted in a data matrix composition bias due to the inclusion of sequence data of entire proteomes. Until entire proteomes are available for each species in phylogenomic analyses, this potential pitfall should be carefully considered.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.