A straight-forward and robust method to identify potential regulatory motifs in co-regulated genes is presented. SCOPE does not require any user parameters and returns motifs that represent excellent candidates for regulatory signals. The identification of such regulatory signals helps to understand the underlying biology.
SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference1. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data1. In this article, we utilize a web version of SCOPE2 to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs3,4 and has been used in other studies5-8.
The three algorithms that comprise SCOPE are BEAM9, which finds non-degenerate motifs (ACCGGT), PRISM10, which finds degenerate motifs (ASCGWT), and SPACER11, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well.
Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor.
Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a “Sample Search” button that allows the user to perform a trial run.
Scope has a very friendly user interface that enables novice users to access the algorithm’s full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from a file. The output from SCOPE contains a list of all identified motifs with their scores, number of occurrences, fraction of genes containing the motif, and the algorithm used to identify the motif. For each motif, result details include a consensus representation of the motif, a sequence logo, a position weight matrix, and a list of instances for every motif occurrence (with exact positions and “strand” indicated). Results are returned in a browser window and also optionally by email. Previous papers describe the SCOPE algorithms in detail1,2,9-11.
SCOPE provides the researcher with a powerful tool to use for the identification of potential regulatory motifs in sets of coordinately regulated genes. The user is not required to guess at the size of the motif or the number of occurrences of the motif as many other motif finding sites require. These parameters are basically unknowable until the motif is identified. The interface is very simple both for entering sequences or gene names and for viewing the output.
SCOPE output provides detailed information about all of the motifs that are identified, using three different ways of motif representation. Each instance of the motif in all of the genes is listed with position and “strand” information. Graphical results in the form of motif maps provide a visual display that is easy to understand and provides an intuitive way to see patterns in the motifs that are present.
SCOPE is very robust to the presence of noise in the data. Typically, this takes the form of extra genes being present in the starting set that might not actually be co-regulated with the rest of the genes. This often happens when starting with genes that are co-expressed in microarray experiments. Sometimes the experiment is noisy, or there may be several transcription factors activated in the experimental conditions used for the microarray experiment. These different transcription factors will likely have different target sites on the DNA. Even in the presence of 4-fold extraneous genes (noise:signal ratio is 4:1), SCOPE is still maintains 50% of its accuracy in predicting sites1.
Although SCOPE contains over 2 million synonyms for gene names, it sometimes fails to identify some genes names. We are constantly updating our synonym lists, but sometimes find that different synonyms refer to the same gene. In those cases, we do not include the synonyms because of the ambiguity. if you have a gene name that is not found by SCOPE, it is recommended that you refer to the genome specific site to find an alternative gene name to use in SCOPE. Examples of appropriate gene names for each species are provided by SCOPE.
SCOPE currently contains 72 species with new species being added all the time. The web site contains video help as well as FAQs. Source code is freely available to academic users by writing to RHG.
The authors have nothing to disclose.
This research was supported by a grant to RHG from the National Science Foundation, DBI-0445967.