The ITS2 Database is a workbench for phylogenetic inference simultaneously considering sequence and secondary structure of the internal transcribed spacer 2. This includes data collection with accurate annotation, structure prediction, multiple sequence-structure alignment and fast tree calculation. In a nutshell, this workbench simplifies first phylogenetic analyses to a few clicks.
The internal transcribed spacer 2 (ITS2) has been used as a phylogenetic marker for more than two decades. As ITS2 research mainly focused on the very variable ITS2 sequence, it confined this marker to low-level phylogenetics only. However, the combination of the ITS2 sequence and its highly conserved secondary structure improves the phylogenetic resolution1 and allows phylogenetic inference at multiple taxonomic ranks, including species delimitation2-8.
The ITS2 Database9 presents an exhaustive dataset of internal transcribed spacer 2 sequences from NCBI GenBank11 accurately reannotated10. Following an annotation by profile Hidden Markov Models (HMMs), the secondary structure of each sequence is predicted. First, it is tested whether a minimum energy based fold12 (direct fold) results in a correct, four helix conformation. If this is not the case, the structure is predicted by homology modeling13. In homology modeling, an already known secondary structure is transferred to another ITS2 sequence, whose secondary structure was not able to fold correctly in a direct fold.
The ITS2 Database is not only a database for storage and retrieval of ITS2 sequence-structures. It also provides several tools to process your own ITS2 sequences, including annotation, structural prediction, motif detection and BLAST14 search on the combined sequence-structure information. Moreover, it integrates trimmed versions of 4SALE15,16 and ProfDistS17 for multiple sequence-structure alignment calculation and Neighbor Joining18 tree reconstruction. Together they form a coherent analysis pipeline from an initial set of sequences to a phylogeny based on sequence and secondary structure.
In a nutshell, this workbench simplifies first phylogenetic analyses to only a few mouse-clicks, while additionally providing tools and data for comprehensive large-scale analyses.
1. Correct Annotation of ITS2 Sequence
2. Secondary Structure Prediction
3. Motif Search
4. Search and Browse
5. ITS2 Blast
6. Multiple Sequence-structure Alignment
7. Phylogenetic Tree
8. Additional Software
9. Representative Results
The workflow as described above has successfully been applied in several open access surveys3,4. Examples can be viewed through the following links:
In these large scale studies, we were able to resolve the phylogeny of Chlorophyta as well as Hypnales (Bryophyta) with high resolution. In both cases, an exhaustive taxon sampling was gathered from the ITS2 Database9, automatically aligned with 4SALE15,16 and lastly processed by ProfDistS17 into a phylogenetic tree. In all these steps, sequence and structure information were used simultaneously. Bootstrap support for the phylogenetic backbone was achieved using Profile Neighbor Joining (PNJ)19, which is available in the stand-alone version of ProfDistS.
For a smaller set of sequence-structure pairs, figures 1 to 3 describe the key steps of this automated workflow5 directly on the new ITS2 Database workbench: taxon sampling, the multiple sequence-structure alignment and eventually the phylogenetic tree calculation.
Figure 1. Taxon sampling per drag and drop. At any time sequences or sequence-structure pairs can be added to the data pool, for instance via drag and drop. Here a sequence-structure is added using drag and drop after secondary structure prediction. The blue ellipse marks the area where the sequence-structure is dropped into the data pool. Click here to view the full-sized version of this image.
Figure 2. Multiple sequence-structure alignment in full graphic mode. For the few sequences in the data pool, the full graphic mode was chosen. Bases are colored; base pairs can be highlighted with red circles by clicking on one base or bracket of a base pair. Click here to view the full-sized version of this image.
Figure 3. Sequence-structure Neighbor Joining tree. The freely scalable tree calculated of a seven taxa multiple sequence-structure alignment can be saved in the NEWICK format.
The ITS2 Database is a complete and fully functional workbench for internal transcribed spacer 2 sequence-structure-based phylogenetics. The website can be operated very fast and intuitively. While other web-based phylogeny workbenches like ARB20 or Mobyle21 are only able to work on sequence and/or consensus structure information, the ITS2 Database9 considers sequences and individual secondary structures for each taxon simultaneously. However, due to limitations in the computational capacity of the web server, it is highly recommended to use the stand-alone tools for multiple alignment and Neighbor Joining18 calculation, 4SALE15,16 and ProfDistS17, respectively, for large datasets. Beside the basic ITS2 sequence-structure phylogeny workflow5, these tools feature several additional functions, like calculating bootstrap replicates, Profile Neighbor Joining (PNJ)19 or species delimitation based on compensatory base changes (CBCs)8. They can be accessed through the “About this website”-“Tools” section for download and detailed information. To use 4SALE and ProfDistS, it is necessary to always bring files into the correct format. A taxon sampling to be processed by 4SALE must have the ending .fasta or .txt, whereas the sequence-structure alignment as an input for ProfDistS must end with .xfasta.
We are currently implementing alternative methods for phylogenetic tree reconstruction in the ITS2 database as well as in the related tools. Thus, methods like sequence-structure-based Maximum Parsimony22 and/or Maximum Likelihood23 will be accessible in the future.
The authors have nothing to disclose.
We cordially thank the ITS2 group, Biocenter, University of Würzburg, for rich and valuable feedback. We also thank the Deutsche Forschungsgemeinschaft (DFG; grant Mu-2831/1-1) for funding.
Name of the reagent | Company | Comments |
Internet access | Preferably high-speed | |
ITS2 Database9 | Department of Bioinformatics, University of Würzburg | Website: http://its2.bioapps.biozentrum.uni-wuerzburg.de |
Software: 4SALE15,16 | Department of Bioinformatics, University of Würzburg | Download: http://4sale.bioapps.biozentrum.uni-wuerzburg.de/ |
Software: ProfDistS17 | Department of Bioinformatics, University of Würzburg | Download: http://profdist.bioapps.biozentrum.uni-wuerzburg.de/ |