Combination of genomics, co-expression gene analysis and the identification of target compounds via metabolism give gene functional annotation.
Given the ever expanding number of model plant species for which complete genome sequences are available and the abundance of bio-resources such as knockout mutants, wild accessions and advanced breeding populations, there is a rising burden for gene functional annotation. In this protocol, annotation of plant gene function using combined co-expression gene analysis, metabolomics and informatics is provided (Figure 1). This approach is based on the theory of using target genes of known function to allow the identification of non-annotated genes likely to be involved in a certain metabolic process, with the identification of target compounds via metabolomics. Strategies are put forward for applying this information on populations generated by both forward and reverse genetics approaches in spite of none of these are effortless. By corollary this approach can also be used as an approach to characterise unknown peaks representing new or specific secondary metabolites in the limited tissues, plant species or stress treatment, which is currently the important trial to understanding plant metabolism.
1. Sample Preparation
2. Extraction for Metabolite Profiling
3. Metabolite Profiling by LC-MS
4. Data Analysis
5. Prediction of Metabolic Pathway
6. Preparation of Gene List with Arabidopsis Orthologous Gene ID
7. Co-expressed Gene Analysis
8. Integration of All Information to Predict New Pathways
9. Experiments for Gene Identification Using Bio-resources
10. Representative Results
The procedure of integrated analysis described in this protocol has many possibilities depending on specified experimental purpose and choice of biological and analytical combinations. Choice of procedures and experimental design should be carried out properly on the basis of your target pathway, compounds and plant species. The integration strategy described in this protocol is focused on annotation of plant gene function and the discovery of novel gene functions with an efficient usage of several bio- and data-resource. Expected outcome is promised to provide with the only case of conclusive prediction. This fact indicates that if enough evidences can not be given by combination profiles, experiment should not be started. For this reason, in any cases, additional preliminary experiments such as targeted gene expression profiling by RT-PCR, can support your prediction of gene function. Accuracy and correctness of prediction correlates higher depending on qualitative difference and number of variation of combination. In addition, good candidates and valid outcomes can only come from accurate prediction of pathways. Peak annotation should be conducted by combination of several approaches, for example literature survey, reference plant extract, MSn analysis, organ specificity and mutant analysis 13.
Figure 1. Overview of the experimental flow of gene annotation via combined approach. In some cases, projects start with the discovery of a novel peak which is detected in special conditions or tissues, and the desire to understand its role within its metabolism. In other instances the purpose of the project is gene identification or discovery of key regulatory factors such as transcription factors. Design of experiment should be planed with an data set which shows clear differences of metabolite levels in your target pathway, using a wide range of tissue samples from different organs, and for differentially grown plants or plants exposed to stress conditions, and subjecting the material to metabolite profiling. Mutant and transgenic plants as well as QTL harbouring breeding material also represent suitable genetic material for these studies. Prediction of novel pathway should be performed carefully with accurate peak annotation and combination approach with different type of metabolotype such as organ securities and stress responses according to gene expression data of your pathway-of-interest. In the last step, metabolite and transcript profiling should be performed which will eventually, when combined with in silico analysis of web-resources and in vitro characterisation of gene expression via heterologous expression, lead to the confirmation of the gene candidate and elucidation of its function and position within a metabolic pathway. Abbreviations: QTL, Quantitative Trait Loci.
Figure 2. Work flow of combinational approach for peak annotation. An procedure for peak identification and annotation by the standard compound, comparison of wild type and knock out mutants, multi-dimensional mass spectrometry of the target peak referring to mass spectra of pure compounds from the databases 12. Abbreviations: DB, database; KO, knock-out; 1-D, one- dimensional; 2-D, two-dimensional; NMR, nuclear magnetic resonance; IR, infra-red; MSn, mass-mass spectrometries.
Figure 3. Example co-regulation network analysis of the anthocyanin pathway. Coexpression analyses were performed using the PRIMe (http://prime.psc.riken.jp/?action=coexpression_index) based on the data set of ATTEDII version 3 8,2 with the Pajek program (http://vlado.fmf.uni-lj.si/pub/networks/pajek/). Positive correlations (r<0.5) are used to make network connections. Red node: twelve anthocyanin enzymatic genes (At5g13930, CHS, TT4, chalcone synthase; At3g55120, CHI, TT5, chalcone isomerise; At3g51240, F3H, TT6, flavanone 3-hydroxylase; At5g07990, F3’H, TT7, flavonoid 3′-hydroxylase; At5g17050, Fd3GT, UGT78D2, flavonoid 3-O-glucosyltransferase; At5g17220, AtGSTF12, TT19; At5g42800, DFR, TT3, dihydroflavonol reductase; At4g22880, ANS/LDOX, TT18, anthocyanidin synthese; At4g14090, A5GT, anthocyanin 5-O-glucosyltransferase; At5g54060, A3G2″XT, putative anthocyanin 3-O-glucoside 2″-O-xylosyltransferase; At3g29590, A5GMaT, anthocyanin 5-O-glucoside 6”’-O-malonyltransferase; At1g03940, A3GCouT, anthocyanin 3-O-glucoside 6″-O–p-coumaroyltransferase) and two transcription factors for anthocyanin production (At1g56650, PAP1; At1g66390, PAP2) was used for searching candidate genes. Candidate genes were found by an “intersection of sets” search with a threshold value with a coefficient of r>0.50 queried by intersection of sets by all genes queried (Fourteen anthocyanin biosynthetic genes). A co-expression network, including correlated candidate genes (68 genes) and queried genes (14 genes), was re-constructed by an “interconnection of sets” search with r>0.50 using the PRIMe database. The output files that were formatted with a ‘.net’ file from the PRIMe database and networks were drawn using Pajek software. Blue node indicates candidate genes which correlated with anthocyanin genes.
species | Major secondary metabolite |
Arabidopsis thaliana | Glucosinolate, flavonol, anthocyanin, sinapoyl derivative |
Populus trichocarpa | Flavonol, anthocyanin, salicylate derivative |
Vitis vinifera | Flavonol, anthocyanin, tannin, stilbene |
Solanum lycopersicum | Flavonol, anthocyanin, glycoalkaloid, chrologenate related, |
Nicotiana tabacum | Flavonol, anthocyanin, nicotianamide, chrologenate related, acylsugar |
Oryza sativa | Glycoflavone, anthocyanin, sterol derivatives |
Zea may | Glycoflavone, anthocyanin, benzoxazinone, sterol derivatives |
Medicago truncatula | Isoflavone, anthocyanin, saponin, |
Lotus japonica | Isoflavone, flavonol, anthocyanin, saponin, |
Table I. Major secondary metabolites in model plant species.
Co-expression database | Address |
Plant cross species | |
COP | http://webs2.kazusa.or.jp/kagiana/cop0911/ |
PlaNet | http://aranet.mpimp-golm.mpg.de/ |
Plant species | |
ATEED-II | http://atted.jp/ |
BAR | http://142.150.214.117/welcome.htm |
COP | http://webs2.kazusa.or.jp/kagiana/cop |
GeneCAT | http://genecat.mpg.de/ |
Arabidopsis | |
ACT | http://www.arabidopsis.leeds.ac.uk/act/coexpanalyser |
AthCoR@CSB.DB | http://csbdb.mpimp-golm.mpg.de/csbdb/dbcor/ath.html |
CressExpress | http://cressexpress.org/ |
PRIMe | http://prime.psc.riken.jp/?action=coexpression_index |
Oryza sativa | |
RiceArrayNet | http://arraynet.mju.ac.kr/arraynet/ |
Rice Array Database | http://www.ricearray.org/coexpression/coexpression.shtml |
Table II. Available gene expression database for in silico co-expression analysis.
Given that transcriptomics and metabolomics technologies have been used for several years, the process of data integration for metabolomics assisted gene annotation generally begins with the identification of a novel peak representing an unknown metabolite. This fact leads to the next stage which is to evaluate quantitative variance in metabolite peaks or the novel candidate genes thought to be responsible for their biosynthesis. The strategy described in this protocol, however, has three major problems i) difficulty of peak annotation, ii) complexity of pathway prediction, iii) resolution of gene information and quality of gene expression data. To counter the first problem, peak annotation should be carried out with co-elution of standard compounds or combinatorial approach utilizing information from MSn analysis, reference extract, mutant analysis, metabolite database search and literature survey (Figure 2, 12). For the second problem, pathway prediction can only be obtained by correct peak annotation. However, metabolite profiling of tissue specificity also can be support peak annotation, because metabolite accumulation should be correlated with the gene expressions of related genes. Therefore combination profiles of different tissues and growth conditions can be helpful for this second problem. The third problem concerning the resolution of gene information depends on the progress of sequence data. In case of the model plant without completion of genome sequence, co-expression analysis using orthologous genes in other model plants is useful. Detailed alignment comparison and phylogenetic tree analysis of amino acid sequence can support to connect model organisms to other species.
This protocol is suitable for all metabolisms. It is most efficient in the analysis of intermediate and secondary metabolisms which are well characterised to be subject to strong transcriptional control 1,5,11,16. In some examples, co-expression analysis succeeded to be performed in sulfur assimilation, genes for β-oxidation, branched-chain amino acid degradation, chlorophyll breakdown, and the lysine catabolism 3, cell wall metabolism 10,7 and light signalling cascade 14. Annotation of gene function via combined genomics, metabolomics and informatics is not only for biosynthetic gene and direct regulator of transcription factor but also for understanding physiological process and response (see example Figure 3. 14).
To develop this approach from model plants to crop species, metabolic comparison of across plant species is powerful approach in some general metabolisms. For example, if same compound is detected in different plant species, and some orthologous genes are found in these plant species, cross species co-expression analysis using orthologous genes can provide strong support for your prediction. This approach can be performed in Arabidopsis, poplar, medicago, in addition important crops such as barley, rice, wheat and soybean, by co-expression analysis of plant species (6, PlaNet: http://aranet.mpimp-golm.mpg.de/,; 9, COP: http://webs2.kazusa.or.jp/kagiana/cop0911/; see an example, 15).
The authors have nothing to disclose.
We thank Prof. Kazuki Saito in RIKEN PSC and Dr. Bjoern Usadel in MPIMP for useful discussions. TT is supported by a fellowship from the Alexander von Humboldt foundation.
Name of the reagent | Company | Catalogue number |
Distilled water ULC/MS grad | BIOSOLVE | 23214102 |
Acetonitrile (ACN) ULC/MS grade | BIOSOLVE | 01204102 |
Methanol (MeOH) ULC/MS grade | BIOSOLVE | 13684102 |
Formic acid (HCOOH) ULC/MS grade for liquid chromatography | BIOSOLVE | 06914131 |
Standard compounds | EXTRASYNTHESE | |
Linear ion trap (IT) ESI-MS system FINNIGAN-LTQ | Thermo Finnigan | |
HPLC system Surveyor | Thermo Finnigan | |
Analytical column Luna C18(2), 2.0 mm diameter, 150 mm length, 100 Å pore size and spherical particles of 3 mm | Phenomenex | 00F-4251-B0 |
Xcalibur software | Thermo Finnigan |