Summary

A Bioinformatics Pipeline to Accurately and Efficiently Analyze the MicroRNA Transcriptomes in Plants

Published: January 21, 2020
doi:

Summary

A bioinformatics pipeline, namely miRDeep-P2 (miRDP2 for short), with updated plant miRNA criteria and an overhauled algorithm, could accurately and efficiently analyze microRNA transcriptomes in plants, especially for species with complex and large genomes.

Abstract

MicroRNAs (miRNAs) are 20- to 24-nucleotide (nt) endogenous small RNAs (sRNAs) extensively existing in plants and animals that play potent roles in regulating gene expression at the post-transcriptional level. Sequencing sRNA libraries by Next Generation Sequencing (NGS) methods has been widely employed to identify and analyze miRNA transcriptomes in the last decade, resulting in a rapid increase of miRNA discovery. However, two major challenges arise in plant miRNA annotation due to increasing depth of sequenced sRNA libraries as well as the size and complexity of plant genomes. First, many other types of sRNAs, in particular, short interfering RNAs (siRNAs) from sRNA libraries, are erroneously annotated as miRNAs by many computational tools. Second, it becomes an extremely time-consuming process for analyzing miRNA transcriptomes in plant species with large and complex genomes. To overcome these challenges, we recently upgraded miRDeep-P (a popular tool for miRNA transcriptome analyses) to miRDeep-P2 (miRDP2 for short) by employing a new filtering strategy, overhauling the scoring algorithm and incorporating newly updated plant miRNA annotation criteria. We tested miRDP2 against sequenced sRNA populations in five representative plants with increasing genomic complexity, including Arabidopsis, rice, tomato, maize and wheat. The results indicate that miRDP2 processed these tasks with very high efficiency. In addition, miRDP2 outperformed other prediction tools regarding sensitivity and accuracy. Taken together, our results demonstrate miRDP2 as a fast and accurate tool for analyzing plant miRNA transcriptomes, therefore a useful tool in helping the community better annotate miRNAs in plants.

Introduction

One of the most exciting discoveries in the last two decades in biology is the proliferating role of sRNA species in regulating diverse functions of the genome1. In particular, miRNAs constitute an important class of 20- to 24-nt sRNAs in eukaryotes, and mainly function at post-transcriptional level as prominent gene regulators throughout life cycle development stages as well as in stimulus and stress responses2,3. In plants, miRNAs arise from primary transcripts called pri-miRNAs, which are generally transcribed by RNA polymerase II as individual transcription units4,5. Processed by evolutionarily conserved cellular machinery (Drosha RNase III in animals, DICER-like in plants), pri-miRNAs are excised into the immediate miRNA precursors, pre-miRNAs, which contain sequences forming intra-molecular stem-loop structures6,7. Pre-miRNAs are then processed into double-stranded intermediates, namely miRNA duplexes, consisting of the functional strand, mature miRNA, and the less frequently functional partner, miRNA*2,8. After loaded into the RNA-induced silencing complex (RISC), the mature miRNAs could recognize their mRNA targets based on sequence complementarity, resulting in a negative regulatory function2,8. miRNAs could either destabilize their target transcripts or prevent target translation but the former manner is dominated in plants8,9.

Since the fortuitous discovery of the first miRNA in the nematode Caenorhabditis elegans10,11, much research has been committed to miRNA identification and its functional analysis, especially after the availability of NGS method. The wide application of the NGS method has greatly promoted the utilization of computational tools that were designed to capture the unique feature of miRNAs, such as the stem-loop structure of precursors and their preferential accumulation of sequence reads on mature miRNA and miRNA*. As a result, researchers have achieved remarkable success in identifying miRNAs in diverse species. Based on a previously described probability model12, we developed miRDeep-P13, which was the first computational tool for discovering plant miRNAs from NGS data. miRDeep-P was specifically aimed at conquering the challenges of decoding plant miRNAs featuring more variable precursor length and large paralogous families13,14,15. After its release, this program has been downloaded thousands of times and used to annotate miRNA transcriptomes in more than 40 plant species16. Propelled by NGS-based tools like miRDeep-P, there has been a dramatic increase in the number of registered miRNAs in the public miRNA repository miRBase17, where over 38,000 miRNA items are currently hosted (release 22.1) in comparison to only ~500 miRNA items (release 2.0) in 200818.

However, two new challenges have arisen from plant miRNA annotation. First, high ratios of false-positives have heavily impacted the quality of plant miRNA annotations16,19 for the following reasons: 1) a deluge of endogenous short interfering RNAs (siRNAs) from NGS sRNA libraries were erroneously annotated as miRNAs due to lacking of a stringent miRNA annotation criteria; 2) for species without a priori miRNA information, false-positives predicted based on NGS data are difficult to eliminate. Using miRBase as an example, Taylor et al.20 found one third of plant miRNA entries in the public repository21 (release 21) lacked convincing supporting evidence and even three-fourths of plant miRNA families were questionable. Second, it becomes an extremely time-consuming process for predicting plant miRNAs with large and complex genomes16. To overcome these challenges, we updated miRDeep-P by adding a new filtering strategy, overhauling the scoring algorithm and integrating new criteria for plant miRNA annotation, and released the new version miRDP2. In addition, we tested miRDP2 using NGS sRNA datasets with gradually increasing genome sizes: Arabidopsis, rice, tomato, maize and wheat. Compared to other five widely used tools and its old version, miRDP2 parsed these sRNA data and analyzed miRNA transcriptomes faster with improved accuracy and sensitivity.

Contents of the miRDP2 package
The miRDP2 package consists of six documented Perl scripts that should be run sequentially by the prepared bash script. Of the six scripts, three (convert_bowtie_to_blast.pl, filter_alignments.pl, and excise_candidate.pl) are inherited from miRDeep-P. The other scripts are modified from the original version. Functions of the six scripts are described in the following:

preprocess_reads.pl filters input reads, including reads that are too long or too short (<19 nt or >25 nt), and reads correlated with Rfam ncRNA sequences, as well as reads with RPM (Reads Per Million) less than 5. The script then retrieves reads correlated to known miRNA mature sequences. The input files are original reads in FASTA/FASTQ format and bowtie2 output of reads mapping to miRNA and ncRNA sequences.

The formula for calculating RPM is as the following:

Equation 1

convert_bowtie_to_blast.pl changes the bowtie format into BLAST-parsed format. BLAST-parsed format is a custom tabular separated format derived from standard NCBI BLASToutput format.

filter_alignments.pl filters the alignments of deep sequencing reads to a genome. It filters partial alignments as well as multi-aligned reads (user-specified frequency cutoff). The basic input is a file in BLAST-parsed format.

excise_candidate.pl cuts out potential precursor sequences from a reference sequence using aligned reads as guidelines. The basic input is a file in BLAST-parsed format and a FASTA file. The output is all potential precursor sequences in FASTA format.

mod-miRDP.pl needs two input files, signature file and structure file, which is modified from the core miRDeep-P algorithm by changing the scoring system with plant specific parameters. The input files are dot-bracket precursor structure file and reads distribution signature file.

mod-rm_redundant_meet_plant.pl needs three input files: chromosome_length, precursors and original_prediction generated by mod-miRDP.pl. It generates two output files, non-redundant predicted file and predicted file filtered by newly updated plant miRNA criteria. Details on the format of output file are described in section 1.4.

Protocol

1. Installation and testing

  1. Download required dependencies: Bowtie222 and RNAfold23. Compiled packages are recommended.
    1. Download Bowtie2, a read mapping tool, from its home site (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml).
    2. Download RNAfold, a tool of the Vienna package used to predict RNA secondary structure, from http://www.tbi.univie.ac.at/~ivo/RNA/.
    3. Before installing miRDP2, ensure that these two dependencies are correctly installed, and customize the bash environment file (e.g., .bashrc) to set a correct PATH for these two dependencies.
      NOTE: Other mapping tools such as Bowtie24 are also suitable to miRDP2; either Bowtie or Bowtie2 can be used after version 1.1.3.
  2. To download the miRDP2 package, go to https://sourceforge.net/projects/mirdp2/files/latest_version/ and fetch the tarball files.
  3. Before installing miRDP2, make sure that Perl is in the PATH. To install miRDP2, extract all contents of the downloaded tarball file into one folder (command lines as in 1.4.2), and then set the folder path into the PATH.
    NOTE: A computer or computing node with at least 8 GB RAM and 100 GB storage are recommended to run miRDP2.
  4. Test the MiRDP2 pipeline.
    1. To test whether miRDP2 has been correctly installed, use the test data and the expected output found in https://sourceforge.net/projects/mirdp2/files/TestData/. Test data contain one formatted GSM sequencing file and one Arabidopsis thaliana genome file.
    2. Move all downloaded files to the current working directory:
      mv miRDP2-v*.tar.gz TestData.tar.gz ncRNA_rfam.tar.gz <user_selected_folder>
      cd <user_selected_folder>
    3. Extract the compressed tarball files:
      tar –xvzf miRDP2-v*.tar.gz
      tar –xvzf TestData.tar.gz
      tar –xvzf ncRNA_rfam.tar.gz
    4. Build the Arabidopsis genome reference index:
      bowtie2-build -f ./TestData/TAIR10_genome.fa ./TestData/TAIR10_genome
    5. Build the ncRNA reference index:
      bowtie2-build -f ./ncRNA_rfam.fa ./1.1.3/script/index/rfam_index
    6. Run the miRDP2 pipeline:
      bash ./1.1.3/miRDP2-v1.1.3_pipeline.bash –g ./TestData/TAIR10_genome.fa -i ./ TestData/TAIR 10_genome –f ./TestData/GSM2094927.fa –o .
      NOTE: Linux commands used are in bold and italic fonts, with command line options in italics. *indicates the version of miRDP2 (the current version is 1.1.3). The bowtie2-build command should take roughly 10 minutes, and the miRDP2 pipeline should finish within several minutes
  5. Check testing outputs.
    1. Note that a folder named 'GSM2094927-15-0-10' is automatically generated in <user_selected_folder>, containing all intermediate files and results.
    2. Check that the tab-delimited output file GSM2094927-15-0-10_filter_P_prediction, the final output of predicted miRNAs, contains columns that indicate chromosome id, strand direction, representative reads id, precursor id, mature miRNA location, precursor location, mature sequence, and precursor sequence. Note the additional bed file derived from this file to facilitate further analysis.
    3. Check the file "progress_log", which provides information about finished steps, and the files "script_log" and "script_err", that contain program output and warnings.
      NOTE: Currently, we have tested miRDP2 on two Linux platforms, including CentOS release 6.5 on a cluster server, and Cygwin 2.6.0 on PC Windows system, and miRDP2 should work on similar systems that support Perl.

2. Identifying novel miRNAs

  1. Before running the pipeline, ensure that the input reads are preprocessed into proper format.
    NOTE: The new version 1.1.3 of miRDP2 can accept original FASTQ format files as inputs, although the process of formatting reads is carried out as in previous versions.
    1. First, remove adapters from the 5' and 3' ends of the deep sequencing reads (if present).
    2. Second, parse the deep sequencing reads into FASTA format.
    3. Third, remove redundancy such that reads with identical sequence are represented with a single and unique FASTA entry.
    4. Finally, ensure that all of the FASTA identifiers are unique. Each sequence identifier must end with a '_x' and an integer, indicating the copy number of the exact sequence that was retrieved in the deep sequencing datasets. One way to ensure unique FASTA identifier is to include a running number in the ID. For reference, see the file GSM2094927.fa in the test data (https://sourceforge.net/projects/mirdp2/files/TestData/).
    5. See the following for examples of correctly formatted reads:

      >read0_x29909
      TTTGGATTGAAGGGAGCTCTA
      >read1_x36974
      TTCCACAGCTTTCTTGAACTG
      >read2_x32635
      TTCCACAGCTTTCTTGAACTT
  2. Build reference indices.
    1. For the genome reference, to save time, download Bowtie2 index files from the iGenomes website (https://support.illumina.com/sequencing/sequencing_software/igenome.html) if the genome sequences of the species of interest have been indexed. Otherwise, users index reference sequences and keep the index file for a while till the project is finished since the genome sequence might need to be re-indexed. Details on how to index a genome reference are included in bowtie2 manual (http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml).
    2. Another non-miRNA ncRNA index is also needed to filter out noisy sequences from other non-coding RNA fragments. The file is a collection of main ncRNA sequences from Rfam, including rRNA, tRNA, snRNA, and snoRNA. To build this index, please refer to part 1.4, as the index should be placed and named correctly, i.e. <miRDP2_version>/script/index/rfam_index.
  3. Run miRDP2.
    1. To use miRDP2 to detect new miRNAs from deep sequencing data, run the bash script in the package to start the analysis pipeline (An example can be found in step 1.4):
      <path_to_miRDP2_folder>/miRDP2-v*.*_pipeline.bash –g <genome_file> -i <path_to_index/index_prefix> -f <seq_file > -o <output_folder>
      where * indicates the version of the pipeline bash script. There are three parameters that can be modified: 1) the number of different locations a read could be mapped to, 2) the mismatch number for running bowtie2, and 3) the threshold of RPM (Reads Per Million). Modify these using the –L, -M, and –R options, respectively. A detailed explanation is in section 3.1.
  4. Check the miRDP2 outputs.
    1. Note that the output folder will be automatically generated under <output_folder>, and named '<seq_file_name>-15-0-10'; the last 3 numbers indicate the values (default in this case) for parameters 1, 2, and 3, respectively. The file <seq_file_name>_filter_P_prediction contains information of the final predicted miRNAs satisfying the newly updated plant miRNA annotation criteria. Details on the format of output file are described in part 1.4.

3. Modifications and caution using miRDP2

  1. Parameters that can be modified
    1. Use the '-L' option to set the limit of how many locations a read could be mapped to (parameter 1). Read mapping to too many sites are possibly associated with repeat sequences, and are not likely to miRNAs. The default setting is 15. For specific species, if there are miRNA families with many members, the first parameter may be increased manually to adapt to the genome landscape.
    2. Use the '-M' option to set the allowed mismatches for bowtie (parameter 2). The default setting is 0.
    3. Use the '-R' option to set the threshold for reads potentially corresponding to mature miRNAs (parameter 3). To reduce time consumption and false-positives, filter reads by RPM. Only reads exceeding a certain RPM threshold may represent mature sequences of miRNAs rather than background noise, and would be kept for further analysis. The default setting is 10 RPM.
    4. Note that changing these parameters can potentially affect performance and time consumption. In general, an increase of parameter 1 and 2 and a decrease of parameter 3 would generate a less stringent result and longer running time and vice versa.
  2. Redundancy and miRNA*
    1. Note that the output miRNAs from miRDP2 may differ from the known miRNAs. We found that this is mainly due to one of two reasons: heterogeneity of the mature miRNAs or the relative abundance of miRNA and miRNA*. We found that this does not impact the optimal length selection of precursors and the profiling of known miRNA genes.

Representative Results

The miRNA annotation pipeline, miRDP2, described herein is applied to 10 public sRNA-seq libraries from 5 plant species with gradually increased genome length, including Arabidopsis thaliana, Oryza sativa (rice), Solanum lycopersicum (tomato), Zea mays (maize) and Triticum aestivum (wheat) (Figure 1A). Overall, for each species, 2 representative sRNA libraries from different tissues (collapsed into unique reads, details in the protocol section) and their indexed genome sequences are processed as two inputs (Table 1). Five miRNA computational prediction tools (miRDeep-P13, miRPlant25, miR-PREFeR26, miRA27, miReNA28) were selected to make the comparison.

Running time test
To compare the runtime and performance of miRDP2 and other five tools, we installed five tools (miRDP2, miRDeep-P, miR-PREFeR, miRA, and miReNA) in a cluster server with Cent OS release 6.5 system. These programs were run with the same input files, hardware and resource (details in Supplementary File 1). Especially, miRPlant is controlled from a GUI written in Java and was not able to run on the server. Instead, we tested miRPlant on a PC with Windows 10 while we have also tested miRDP2 and miRDeep-P on this PC (details in Supplementary File 1).

For small genome species as Arabidopsis thaliana, Oryza sativa, and Solanum lycopersium, all the programs ran properly. However, for large genomes species such as Zea mays and Triticum aestivum (including Solanum lycopersium for miRA), some of the programs depleted all computing resources and broke down halfway. For instance, miReNA, miRA, and miR-PREFeR failed to generate results, probably due to memory deficiency while dealing with large sam files or intermediate files. In particular, miRPlant temporary files consumed too much space, and the result was not able to run on the PC when dealing with large genome species. miRDP2 finished these prediction processes in a very short time, from minutes to hours (Figure 1B). Thus, compared to its old version and other tools, the running time of miRDP2 was markedly shortened.

Sensitivity and accuracy test
Since miRNAs in Arabidopsis are intensively studied, we made use of known miRNAs in Arabidopsis in miRBase21 (release 22.1) to evaluate miRDP2, and made the comparison with other tools. As previously reported19,26, the following formulas are employed to calculate sensitivity and accuracy:

Equation 2

Equation 3

Known miRNAs are those annotated in miRBase. A miRNA is designated as expressed if the mature sequences have more than 5 RPM, and ≥75% reads on the precursor mapped to mature and star miRNA sequences. Two sequenced sRNA libraries from Arabidopsis (Table 1) were used to make the test. miRDP2 (Figure 1C,D) performed better in both sensitivity and accuracy compared to other tools.

Taken together, these results demonstrate that miRDP2 is a fast and accurate tool for analyzing the miRNA transcriptome in plants.

Figure 1
Figure 1: Performance of miRDP2. (A) Genome size (in Gb) of Arabidopsis thaliana (Ath), Oryza sativa (Osa), Solanum lycopersicum (Sly), Zea mays (Zma), Triticum aestivum (Tae). (B-D) Comparison of runtime, sensitivity and accuracy of miRDP2 and other five tools. Two dots corresponding to each tool indicate two tests were made by each tool. This figure has been adapted from Kuang et al.16. Please click here to view a larger version of this figure.

Species (abb.) Genome version sRNA libraries
Library ID File size Total reads Unique reads Tissue
Arabidopsis thaliana (Ath) version 10 GSM2094927 24.9 Mb 40.5M 9.7M Adult leaf
GSM2412287 29.5 Mb 45.1M 11.1M Leaf
Oryza sativa (Osa) version 7 GSM2883136 44.2 Mb 54.9M 16.3M Seedling
GSM3030848 34.7 Mb 49.1M 13.0M Flagleaf
Solanum lycopersicum (Sly) version 3 GSM1213985 205.4 Mb 161.5M 58.0M Leaf
GSM1976413 118.5 Mb 139.3M 46.2M Root
Zea mays (Zma) version 4 GSM1277437 158.4 Mb 266.1M 60.5M Seedling
GSM1428531 144.1 Mb 172.5M 56.3M Seed
Triticum aestivum (Tae) iwgsc 1 GSM1294660 76.1 Mb 59.2M 29.6M Shoot
GSM1294661 113.6 Mb 84.0M 44.0M Leaf

Table 1: Genomes and sRNA libraries used for testing miRDP2 and other tools. This table has been adapted from Kuang et al.16.

Supplementary File 1: Comparison of runtime, sensitivity and accuracy of miRDP2 and other five tools. Please click here to download this file.

Supplementary File 2: Examples of authentic miRNAs with bifurcate structure in loops. Please click here to download this file.

Supplementary File 3: Updated criteria for plant miRNA annotation and criteria for 23-nt and 24-nt miRNAs. Please click here to download this file.

Supplementary File 4: Diagram of the workflow of miRDP2. Please click here to download this file.

Discussion

With the advent of NGS, a large number of miRNA loci have been identified from an ever-increasing amount of sRNA sequencing data in diverse species29,30. In the centralized community database miRBase21, the deposited miRNA items have increased almost 100 times in the last decade. However, in comparison to miRNAs in animals, plant miRNAs have many unique features that make the identification/annotation more complicated13,14.

First, the precursors of plant miRNAs are more variable in length and structure (Supplementary File 2)16. Not like the relatively uniform length of animal miRNA precursors around 70-90 nt, the length of plant precursors vary by several folds and could reach several hundred nts13,31. This difference introduces a lot of uncertainty when predicting the secondary structure of miRNA precursors even though a cutoff of precursor length is usually set arbitrarily such as not exceeding 300 nt19 (this parameter was embedded in miRDP2, and experienced users of miRDP2 could adjust this by themselves). In addition, conserved plant miRNA families tend to have more members, and the length variation of these members is also often significant13. This is the reason why miRDP2 has the parameter –L, which indicates the potential largest miRNA families in member size. Together, the heterogeneity of plant miRNA precursors raises many difficulties for their accurate annotation.

Second, the noise or false-positives introduced by siRNAs is hard to eliminate. Alongside miRNAs, NGS methods also produce a deluge of siRNAs in the sequenced sRNA libraries. Even though siRNAs could be separated from miRNAs by their biogenesis and functions32,33, it is extremely difficult to distinguish them based on sequencing data and mining tools. The public databases such as miRBase, argued by many researchers, have deteriorated sharply by the large number of false-positives siRNAs, which are erroneously annotated as miRNAs20,31. Thus, refined tools with a new and strict set of criteria for plant miRNA annotation like the newly updated criteria25 (Supplementary File 3) are highly desired in the miRNA annotation pipeline/process.

Last but not least, the computational time for parsing sRNA libraries has increased exponentially when the same method is transplanted from a small size genome species to a large size one. The computational tools such as miRDeep-P13 and miR-PREFeR26, by capturing and quantifying the signature distribution of sRNA reads along miRNA precursors, have become two popular methods and are widely used to annotate miRNAs. The mapping strategy, the process of excising precursor candidates and subsequent secondary structure prediction demand considerable computing time16. When these tools are employed to parse the data from small size genomes like Arabidopsis to large ones like maize, the data processing time is increased from hours to days even weeks (Figure 1B), resulting in frequent collapse of the process. An innovation on the foregoing limitations is thus urgently in need.

Our new miRDP216 program, updated from miRDeep-P13, is designed to overcome the challenges mentioned above (Supplementary File 4). In this program, we employed a new filtering strategy, optimized the scoring algorithm, and incorporated newly updated plant miRNA annotation criteria. As a result of these new features, the running time was markedly shortened when tested using ten sRNA libraries from five plant species with increasing genome size. Additionally, compared to other tools, miRDP2 displayed superior performance in both sensitivity and accuracy (Figure 1). Taken together, these results demonstrate that miRDP2 is a fast and accurate tool for analyzing the miRNA transcriptomes in plants.

It should be cautioned that the current understanding on miRNA characteristics might limit the performance of any computational tools. Even the newly updated miRNA annotation criteria are based on a limited set of well-studied examples. The deduced information is thus only empirical. In fact, unique features of miRNAs have been shown to exist in different plant species or lineages3. In addition, characteristics such as the structures of upstream and downstream regions of the miRNA/miRNA* duplex also play critical roles in miRNA biogenesis34,35, which are not taken into account in current annotation tools. With the accumulation of well-studied examples in more plant species, it is likely that even more advanced annotation tools are developed in the future that can capture more subtle distinctions and classify miRNAs with a higher degree of accuracy than current methods. A promising new miRNA annotation direction is to incorporate machine learning approaches36 as the quality of training datasets and annotation criteria continually evolve.

Declarações

The authors have nothing to disclose.

Acknowledgements

This work has been supported by Beijing Academy of Agriculture and Forestry Sciences (KJCX201917, KJCX20180425, and KJCX20180204) to XY and National Natural Science Foundation of China (31621001) to LL.

Materials

Computer/computing node N/A N/A Perl is required; at least 8 GB RAM and 100 GB storage are recommended

Referências

  1. Ghildiyal, M., Zamore, P. D. Small silencing RNAs: an expanding universe. Nature Reviews Genetics. 10 (2), 94-108 (2009).
  2. Bartel, D. P. MicroRNAs: target recognition and regulatory functions. Cell. 136 (2), 215-233 (2009).
  3. Moran, Y., Agron, M., Praher, D., Technau, U. The evolutionary origin of plant and animal microRNAs. Nature Ecology Evolution. 1 (3), 27 (2017).
  4. Xie, Z., et al. Expression of Arabidopsis MIRNA genes. Plant Physiology. 138 (4), 2145-2154 (2005).
  5. Zhao, X., Zhang, H., Li, L. Identification and analysis of the proximal promoters of microRNA genes in Arabidopsis. Genomics. 101 (3), 187-194 (2013).
  6. Bologna, N. G., Mateos, J. L., Bresso, E. G., Palatnik, J. F. A loop-to-base processing mechanism underlies the biogenesis of plant microRNAs miR319 and miR159. EMBO JOURNAL. 28 (23), 3646-3656 (2009).
  7. Rogers, K., Chen, X. Biogenesis, turnover, and mode of action of plant microRNAs. Plant Cell. 25 (7), 2383-2399 (2013).
  8. Voinnet, O. Origin, biogenesis, and activity of plant microRNAs. Cell. 136 (4), 669-687 (2009).
  9. Iwakawa, H. O., Tomari, Y. The Functions of MicroRNAs: mRNA Decay and Translational Repression. Trends in Cell Biology. 25 (11), 651-665 (2015).
  10. Lee, R. C., Feinbaum, R. L., Ambros, V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 75 (5), 843-854 (1993).
  11. Wightman, B., Ha, I., Ruvkun, G. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell. 75 (5), 855-862 (1993).
  12. Friedlander, M. R., et al. Discovering microRNAs from deep sequencing data using miRDeep. Nature Biotechnology. 26 (4), 407-415 (2008).
  13. Yang, X., Li, L. miRDeep-P: a computational tool for analyzing the microRNA transcriptome in plants. Bioinformatics. 27 (18), 2614-2615 (2011).
  14. Meyers, B. C., et al. Criteria for annotation of plant MicroRNAs. Plant Cell. 20 (12), 3186-3190 (2008).
  15. Yang, X., Zhang, H., Li, L. Global analysis of gene-level microRNA expression in Arabidopsis using deep sequencing data. Genomics. 98 (1), 40-46 (2011).
  16. Kuang, Z., Wang, Y., Li, L., Yang, X. miRDeep-P2: accurate and fast analysis of the microRNA transcriptome in plants. Bioinformatics. , (2018).
  17. Kozomara, A., Birgaoanu, M., Griffiths-Jones, S. miRBase: from microRNA sequences to function. Nucleic Acids Research. 47 (1), 155-162 (2019).
  18. Griffiths-Jones, S., Saini, H. K., van Dongen, S., Enright, A. J. miRBase: tools for microRNA genomics. Nucleic Acids Research. 36, 154-158 (2008).
  19. Axtell, M. J., Meyers, B. C. Revisiting Criteria for Plant MicroRNA Annotation in the Era of Big Data. Plant Cell. 30 (2), 272-284 (2018).
  20. Taylor, R. S., Tarver, J. E., Hiscock, S. J., Donoghue, P. C. Evolutionary history of plant microRNAs. Trends in Plant Science. 19 (3), 175-182 (2014).
  21. Kozomara, A., Griffiths-Jones, S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Research. 42, 68-73 (2014).
  22. Langmead, B., Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods. 9 (4), 357-359 (2012).
  23. Lorenz, R., et al. ViennaRNA Package 2.0. Algorithms for Molecular Biology. 6, 26 (2011).
  24. Langmead, B., Trapnell, C., Pop, M., Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 10 (3), 25 (2009).
  25. An, J., Lai, J., Sajjanhar, A., Lehman, M. L., Nelson, C. C. miRPlant: an integrated tool for identification of plant miRNA from RNA sequencing data. BMC Bioinformatics. 15, 275 (2014).
  26. Lei, J., Sun, Y. miR-PREFeR: an accurate, fast and easy-to-use plant miRNA prediction tool using small RNA-Seq data. Bioinformatics. 30 (19), 2837-2839 (2014).
  27. Evers, M., Huttner, M., Dueck, A., Meister, G., Engelmann, J. C. miRA: adaptable novel miRNA identification in plants using small RNA sequencing data. BMC Bioinformatics. 16, 370 (2015).
  28. Mathelier, A., Carbone, A. MIReNA: finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data. Bioinformatics. 26 (18), 2226-2234 (2010).
  29. Zhu, Q. H., et al. A diverse set of microRNAs and microRNA-like small RNAs in developing rice grains. Genome Research. 18 (9), 1456-1465 (2008).
  30. Fahlgren, N., et al. MicroRNA gene evolution in Arabidopsis lyrata and Arabidopsis thaliana. Plant Cell. 22 (4), 1074-1089 (2010).
  31. Fromm, B., et al. A Uniform System for the Annotation of Vertebrate microRNA Genes and the Evolution of the Human microRNAome. Annual Review of Genetics. 49, 213-242 (2015).
  32. Blevins, T., et al. Identification of Pol IV and RDR2-dependent precursors of 24 nt siRNAs guiding de novo DNA methylation in Arabidopsis. Elife. 4, 09591 (2015).
  33. Zhai, J., et al. A One Precursor One siRNA Model for Pol IV-Dependent siRNA Biogenesis. Cell. 163 (2), 445-455 (2015).
  34. Werner, S., Wollmann, H., Schneeberger, K., Weigel, D. Structure determinants for accurate processing of miR172a in Arabidopsis thaliana. Current Biology. 20 (1), 42-48 (2010).
  35. Mateos, J. L., Bologna, N. G., Chorostecki, U., Palatnik, J. F. Identification of microRNA processing determinants by random mutagenesis of Arabidopsis MIR172a precursor. Current Biology. 20 (1), 49-54 (2010).
  36. Vitsios, D. M., et al. Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests. Nucleic Acids Research. 45 (21), 177 (2017).

Play Video

Citar este artigo
Wang, Y., Kuang, Z., Li, L., Yang, X. A Bioinformatics Pipeline to Accurately and Efficiently Analyze the MicroRNA Transcriptomes in Plants. J. Vis. Exp. (155), e59864, doi:10.3791/59864 (2020).

View Video