The ITS2 Database

Benjamin Merget; Christian Koetschan; Thomas Hackl; Frank Förster; Thomas Dandekar; Tobias Müller; Jörg Schultz; Matthias Wolf

doi:10.3791/3806

Method Article

The ITS2 Database

DOI:

10.3791/3806

⸱

March 12th, 2012

Benjamin Merget¹^,² , Christian Koetschan¹ , Thomas Hackl¹ , Frank Förster¹ , Thomas Dandekar¹ , Tobias Müller¹ , Jörg Schultz¹ , Matthias Wolf¹

¹Department of Bioinformatics, Biocenter, University of Würzburg, ²Institute of Pharmacy and Food Chemistry, University of Würzburg

Summary

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The ITS2 Database is a workbench for phylogenetic inference simultaneously considering sequence and secondary structure of the internal transcribed spacer 2. This includes data collection with accurate annotation, structure prediction, multiple sequence-structure alignment and fast tree calculation. In a nutshell, this workbench simplifies first phylogenetic analyses to a few clicks.

Abstract

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The internal transcribed spacer 2 (ITS2) has been used as a phylogenetic marker for more than two decades. As ITS2 research mainly focused on the very variable ITS2 sequence, it confined this marker to low-level phylogenetics only. However, the combination of the ITS2 sequence and its highly conserved secondary structure improves the phylogenetic resolution¹ and allows phylogenetic inference at multiple taxonomic ranks, including species delimitation^2-8.

The ITS2 Database⁹ presents an exhaustive dataset of internal transcribed spacer 2 sequences from NCBI GenBank¹¹ accurately reannotated¹⁰. Following an annotation by profile Hidden Markov Models (HMMs), the secondary structure of each sequence is predicted. First, it is tested whether a minimum energy based fold¹² (direct fold) results in a correct, four helix conformation. If this is not the case, the structure is predicted by homology modeling¹³. In homology modeling, an already known secondary structure is transferred to another ITS2 sequence, whose secondary structure was not able to fold correctly in a direct fold.

The ITS2 Database is not only a database for storage and retrieval of ITS2 sequence-structures. It also provides several tools to process your own ITS2 sequences, including annotation, structural prediction, motif detection and BLAST¹⁴ search on the combined sequence-structure information. Moreover, it integrates trimmed versions of 4SALE^15,16 and ProfDistS¹⁷ for multiple sequence-structure alignment calculation and Neighbor Joining¹⁸ tree reconstruction. Together they form a coherent analysis pipeline from an initial set of sequences to a phylogeny based on sequence and secondary structure.

In a nutshell, this workbench simplifies first phylogenetic analyses to only a few mouse-clicks, while additionally providing tools and data for comprehensive large-scale analyses.

Protocol

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

1. Correct Annotation of ITS2 Sequence

Access the ITS2 Database phylogeny workbench here: http://its2.bioapps.biozentrum.uni-wuerzburg.de
Begin your analysis by clicking the "Annotate" icon in the section "Tools." Then, type or paste your sequence into the sequence editor at the top of the website. The sequence editor automatically checks, whether your ITS2 sequences are valid.
Choose an HMM model suitable for your sequences (e.g. Viridiplantae for plants).
Start the process by clicking "Annotate."
By hovering over the "Hybridize" icon you can view an image of the 5.8S and 28S rRNA hybrid as a confirmation of the HMM annotation’s accuracy.
Click on the green plus sign of the resulting ITS2 sequence to select your way of secondary structure prediction: To predict the structure without a known template, click on "Predict structure." If you want to use your own template for the Homology Modeling, click "Model structure."

2. Secondary Structure Prediction

Predict
1. The annotated ITS2 sequence is automatically pasted into the sequence editor.
2. To start the secondary structure prediction with default settings, click the "Predict structures" button.
3. Save the resulting ITS2 sequence including the modeled secondary structure into the data pool by clicking on the green plus sign and then "Add to pool." Alternatively, you can add it to your data pool via drag and drop (Figure 1).
4. If the sequence could not fold directly, the best results of the homology modeling are shown. Save the most suitable sequence-structure via drag and drop to the data pool. Alternatively, save the sequence-structure into the data pool with a right click and then a click on "Add to pool."
Custom Modeling
1. Type or paste one or multiple templates (with known structure) into the upper sequence editor.
2. Type or paste one or multiple target sequences (without structure) into the lower sequence editor.
3. Click on "Predict best template(s)" to start the Homology Modeling with default settings.
4. The best template-target combinations are shown in the resulting list.
5. Save the modeled sequence-structure(s) of your choice either via drag and drop to the data pool or by a right click and a click on "Add to pool."

3. Motif Search

Type or paste your query sequence(s) into the sequence editor at the top of the website.
Choose the correct HMM model (e.g. Viridiplantae for plants). 3.3. Click on "Motif search" to start the process.
ITS2 sequences with highlighted motifs are illustrated at the bottom of the website.
Click on the icon beside the sequence header to display the motifs highlighted in the secondary structure.

4. Search and Browse

Search
1. Type either a taxon name or a GenBank Identifier (GI) into the search field at the top of the website.
2. A search by taxon name is supported by an appearing live-search box.
3. You can perform a multiple search by comma-separating your queries.
4. Click the "Search" button to execute the search.
5. Your results appear listed in a new tab.
6. Click on a column name to sort your results according to the particular column. You can also add or remove columns of your choice with the column menu. The column menu can be entered with a click on the appearing arrow icon within a column name.
7. Click on "Show details" to view the details of a sequence-structure.
8. Save the sequence-structure(s) of your choice either via drag and drop to the data pool or by a right click and a click on "Add to pool."
9. To save your results to an external file, click on "Save selection" or "Save all."
Browse
1. Browse the ITS2 Database by navigating through the tree-like structure at the left of the website.
2. Click on a plus-sign to view the taxa one level lower.
3. Click on a taxon name to open a new tab containing each sequence-structure of the taxon.
4. Click on "Show details" to view the details of a sequence-structure pair.
5. Save the sequence-structure(s) of your choice either via drag and drop to the data pool or by a right click and a click on "Add to pool."
6. To save your results to an external file, click on "Save selection" or "Save all."

5. ITS2 Blast

Type or paste one or multiple query sequences into the sequence editor. Your sequences may either be plain nucleotide sequences or sequence-structure pairs. You can also type several secondary structures below one sequence. By checking the box "Serialize XXFASTA sequences" these structures are used subsequently as individual queries.
To start BLAST with default settings, click on "Blast." Depending on the nature of your query, either a common BLASTN or the ITS2 sequence-structure BLAST is performed.
A sub-tab is opened for each query sequence within the appearing tab "BLAST Results," as well as an overview of the executed searches.
Click on "Show Alignments" to view the calculated BLAST alignments.
Save the BLAST hits of your choice either via drag and drop to the data pool or by a right click and a click on "Add to pool."
To save your results to an external file, click on "Save selection" or "Save all."

6. Multiple Sequence-structure Alignment

Take a look at your data pool by clicking "Manage dataset" and then the magnifying glass symbol right next to the number of sequences in your pool. Alternatively, you can click on the data pool sign at the bottom left of the website.
Click on a sequence-structure pair in your data pool to view its details.
To create a multiple sequence-structure alignment of all sequence-structure pairs in your pool, click on "Analyze dataset" and then "Sequence & Structure."
Now you are asked to select the graphic mode of your alignment. If your alignment contains only a few sequences, decline the slim mode by clicking "No." Otherwise choose the slim graphic mode by clicking "Yes."
In a few moments, your alignment is shown in a new tab (Figure 2). Moreover, it is automatically saved to the data pool.
To save your alignment to an external file, click on "Save alignment."

7. Phylogenetic Tree

To calculate a sequence-structure based Neighbor Joining tree of your multiple alignment, click on "Analyze Dataset" and then "Neighbor Joining."
The resulting tree is illustrated in a new tab (Figure 3).
Scale your tree freely with the scroll bar "Zoom tree."
Reroot your tree by clicking on a node or leaf of the tree and then "Reroot at this node."
If you want to remove a taxon from your data pool, click on the leaf and choose "Remove this node from pool." Now you can recalculate your alignment and tree with the reduced taxon sampling.
Click on "Save tree" to save your phylogenetic tree as a final result of your analysis to an external NEWICK file.

8. Additional Software

Click on "About this website"-"Tools" to find additional information about the stand-alone tools 4SALE and ProfDistS.
Beside the alignment and Neighbor Joining function provided by the ITS2 Database web interface, you can now access several new functions, e.g. species delimitation based on compensatory base changes (CBCs).

9. Representative Results

The workflow as described above has successfully been applied in several open access surveys^3,4. Examples can be viewed through the following links:

In these large scale studies, we were able to resolve the phylogeny of Chlorophyta as well as Hypnales (Bryophyta) with high resolution. In both cases, an exhaustive taxon sampling was gathered from the ITS2 Database⁹, automatically aligned with 4SALE^15,16 and lastly processed by ProfDistS¹⁷ into a phylogenetic tree. In all these steps, sequence and structure information were used simultaneously. Bootstrap support for the phylogenetic backbone was achieved using Profile Neighbor Joining (PNJ)¹⁹, which is available in the stand-alone version of ProfDistS.

For a smaller set of sequence-structure pairs, figures 1 to 3 describe the key steps of this automated workflow⁵ directly on the new ITS2 Database workbench: taxon sampling, the multiple sequence-structure alignment and eventually the phylogenetic tree calculation.

ITS2 secondary structure prediction diagram; RNA folding analysis and sequence alignment interface.
Figure 1. Taxon sampling per drag and drop. At any time sequences or sequence-structure pairs can be added to the data pool, for instance via drag and drop. Here a sequence-structure is added using drag and drop after secondary structure prediction. The blue ellipse marks the area where the sequence-structure is dropped into the data pool. Click here to view the full-sized version of this image.

Multiple sequence alignment diagram; DNA sequence comparison; bioinformatics analysis.
Figure 2. Multiple sequence-structure alignment in full graphic mode. For the few sequences in the data pool, the full graphic mode was chosen. Bases are colored; base pairs can be highlighted with red circles by clicking on one base or bracket of a base pair. Click here to view the full-sized version of this image.

Phylogenetic tree diagram, chlorophyte species, evolutionary relationships.
Figure 3. Sequence-structure Neighbor Joining tree. The freely scalable tree calculated of a seven taxa multiple sequence-structure alignment can be saved in the NEWICK format.

Discussion

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The ITS2 Database is a complete and fully functional workbench for internal transcribed spacer 2 sequence-structure-based phylogenetics. The website can be operated very fast and intuitively. While other web-based phylogeny workbenches like ARB²⁰ or Mobyle²¹ are only able to work on sequence and/or consensus structure information, the ITS2 Database⁹ considers sequences and individual secondary structures for each taxon simultaneously. However, due to limitations in the computational capacity of the web server, it is highly recommended to use the stand-alone tools for multiple alignment and Neighbor Joining¹⁸ calculation, 4SALE^15,16 and ProfDistS¹⁷, respectively, for large datasets. Beside the basic ITS2 sequence-structure phylogeny workflow⁵, these tools feature several additional functions, like calculating bootstrap replicates, Profile Neighbor Joining (PNJ)¹⁹ or species delimitation based on compensatory base changes (CBCs)⁸. They can be accessed through the "About this website"-"Tools" section for download and detailed information. To use 4SALE and ProfDistS, it is necessary to always bring files into the correct format. A taxon sampling to be processed by 4SALE must have the ending .fasta or .txt, whereas the sequence-structure alignment as an input for ProfDistS must end with .xfasta.

We are currently implementing alternative methods for phylogenetic tree reconstruction in the ITS2 database as well as in the related tools. Thus, methods like sequence-structure-based Maximum Parsimony²² and/or Maximum Likelihood²³ will be accessible in the future.

Disclosures

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

No conflicts of interest declared.

Acknowledgements

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

We cordially thank the ITS2 group, Biocenter, University of Würzburg, for rich and valuable feedback. We also thank the Deutsche Forschungsgemeinschaft (DFG; grant Mu-2831/1-1) for funding.

Materials

List of materials used in this article
Name	Company	Comments
Internet access		Preferably high-speed
ITS2 Database⁹	University of Warzburg	Website: http://its2.bioapps.biozentrum.uni-wuerzburg.de
Software: 4SALE^15,16	University of Warzburg	Download: http://4sale.bioapps.biozentrum.uni-wuerzburg.de/
Software: ProfDistS¹⁷	University of Warzburg	Download: http://profdist.bioapps.biozentrum.uni-wuerzburg.de/

References

$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees. Biology Direct. 5, 4-4 (2010).">Keller, A. Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees. Biology Direct. 5, 4-4 (2010).
A common core of secondary structure of the internal transcribed spacer 2 (ITS2) throughout the Eukaryota. RNA. 11, 361-364 (2005).">Schultz, J., Maisel, S., Gerlach, D., Müller, T., Wolf, M. A common core of secondary structure of the internal transcribed spacer 2 (ITS2) throughout the Eukaryota. RNA. 11, 361-364 (2005).
Internal Transcribed Spacer 2 (nu ITS2 rRNA) Sequence-Structure Phylogenetics: Towards an Automated Reconstruction of the Green Algal Tree of Life. PLoS ONE. 6, 16931-16931 (2011).">Buchheim, M. Internal Transcribed Spacer 2 (nu ITS2 rRNA) Sequence-Structure Phylogenetics: Towards an Automated Reconstruction of the Green Algal Tree of Life. PLoS ONE. 6, 16931-16931 (2011).
A molecular phylogeny of Hypnales (Bryophyta) inferred from ITS2 sequence-structure data. BMC Research Notes. 3, (2010).">Merget, B., Wolf, M. A molecular phylogeny of Hypnales (Bryophyta) inferred from ITS2 sequence-structure data. BMC Research Notes. 3, (2010).
ITS2 sequence-structure analysis in phylogenetics: a how-to manual for molecular systematics. Molecular Phylogenetics and Evolution. 52, 520-523 (2009).">Schultz, J., Wolf, M. ITS2 sequence-structure analysis in phylogenetics: a how-to manual for molecular systematics. Molecular Phylogenetics and Evolution. 52, 520-523 (2009).
ITS2 is a double-edged tool for eukaryote evolutionary comparisons. Trends in Genetics. 19, 370-375 (2003).">Coleman, A. ITS2 is a double-edged tool for eukaryote evolutionary comparisons. Trends in Genetics. 19, 370-375 (2003).
The significance of a coincidence between evolutionary landmarks found in mating affinity and a DNA sequence. Protist. 151, 1-9 (2000).">Coleman, A. The significance of a coincidence between evolutionary landmarks found in mating affinity and a DNA sequence. Protist. 151, 1-9 (2000).
Distinguishing species. RNA. 13, 1469-1472 (2007).">Müller, T., Philippi, N., Dandekar, T., Schultz, J., Wolf, M. Distinguishing species. RNA. 13, 1469-1472 (2007).
The ITS2 Database III-sequences and structures for phylogeny. Nucleic Acids Research. 38, 275-279 (2010).">Koetschan, C. The ITS2 Database III-sequences and structures for phylogeny. Nucleic Acids Research. 38, 275-279 (2010).
5.8 S-28S rRNA interaction and HMM-based ITS2 annotation. Gene. 430, 50-57 (2009).">Keller, A. 5.8 S-28S rRNA interaction and HMM-based ITS2 annotation. Gene. 430, 50-57 (2009).
GenBank. Nucleic Acids Research. 39, 32-37 (2011).">Benson, D., Karsch-Mizrachi, I., Lipman, D., Ostell, J., Sayers, E. GenBank. Nucleic Acids Research. 39, 32-37 (2011).
Software for nucleic acid folding and hybridization. Methods in Molecular Biology. , 453-453 (2008).">Markham, N., Zuker, M. Software for nucleic acid folding and hybridization. Methods in Molecular Biology. , 453-453 (2008).
Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures. RNA. 11, 1616-1623 (2005).">Wolf, M., Achtziger, M., Schultz, J., Dandekar, T., Müller, T. Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures. RNA. 11, 1616-1623 (2005).
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 25, 3389-3402 (1997).">Altschul, S. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 25, 3389-3402 (1997).
Synchronous visual analysis and editing of RNA sequence and secondary structure alignments using 4 SALE. BMC Research Notes. 1, (2008).">Seibel, P., Müller, T., Dandekar, T., Wolf, M. Synchronous visual analysis and editing of RNA sequence and secondary structure alignments using 4 SALE. BMC Research Notes. 1, (2008).
4 SALE - A tool for synchronous RNA sequence and secondary structure alignment and editing. BMC Bioinformatics. 7, (2006).">Seibel, P., Müller, T., Dandekar, T., Schultz, J., Wolf, M. 4 SALE - A tool for synchronous RNA sequence and secondary structure alignment and editing. BMC Bioinformatics. 7, (2006).
ProfDistS:(profile-) distance based phylogeny on sequence-structure alignments. Bioinformatics. 24, 2401-2402 (2008).">Wolf, M., Ruderisch, B., Dandekar, T., Schultz, J., Müller, T. ProfDistS:(profile-) distance based phylogeny on sequence-structure alignments. Bioinformatics. 24, 2401-2402 (2008).
The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution. 4, 406-425 (1987).">Saitou, N., Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution. 4, 406-425 (1987).
Accurate and robust phylogeny estimation based on profile distances: a study of the Chlorophyceae (Chlorophyta. BMC Evolutionary Biology. 4, (2004).">Müller, T., Rahmann, S., Dandekar, T., Wolf, M. Accurate and robust phylogeny estimation based on profile distances: a study of the Chlorophyceae (Chlorophyta. BMC Evolutionary Biology. 4, (2004).
ARB: a software environment for sequence data. Nucleic Acids Research. 32, 1363-1371 (2004).">Ludwig, W. olfgang ARB: a software environment for sequence data. Nucleic Acids Research. 32, 1363-1371 (2004).
Mobyle: a new full web bioinformatics framework. Bioinformatics. 25, 3005-3011 (2009).">Néron, B. Mobyle: a new full web bioinformatics framework. Bioinformatics. 25, 3005-3011 (2009).
A method for deducing branching sequences in phylogeny. Evolution. 19, 311-326 (1965).">Camin, J. H., Sokal, R. R. A method for deducing branching sequences in phylogeny. Evolution. 19, 311-326 (1965).
Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution. 17, 368-376 (1981).">Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution. 17, 368-376 (1981).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

The ITS2 Database

In This Article

Summary

Abstract

Protocol

Discussion

Disclosures

Acknowledgements

Materials

References

Reprints and Permissions

Tags

Related Articles