Here, we present a protocol to utilize the latest version of the US Environmental Protection Agency Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool. This protocol demonstrates the application of the online tool to rapidly analyze protein conservation and provide customizable and easily interpretable predictions of chemical susceptibility across species.
The US Environmental Protection Agency Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool is a fast, freely available, online screening application that allows researchers and regulators to extrapolate toxicity information across species. For biological targets in model systems such as human cells, mice, rats, and zebrafish, toxicity data are available for a variety of chemicals. Through the evaluation of protein target conservation, this tool can be used to extrapolate data generated from such model systems to thousands of other species lacking toxicity data, yielding predictions of relative intrinsic chemical susceptibility. The latest releases of the tool (versions 2.0-6.1) have incorporated new features that allow for the rapid synthesis, interpretation, and use of the data for publication plus presentation-quality graphics.
Among these features are customizable data visualizations and a comprehensive summary report designed to summarize SeqAPASS data for ease of interpretation. This paper describes the protocol to guide users through submitting jobs, navigating the various levels of protein sequence comparisons, and interpreting and displaying the resulting data. New features of SeqAPASS v2.0-6.0 are highlighted. Furthermore, two use-cases focused on transthyretin and opioid receptor protein conservation using this tool are described. Finally, SeqAPASS' strengths and limitations are discussed to define the domain of applicability for the tool and highlight different applications for cross-species extrapolation.
Traditionally, the field of toxicology has relied heavily on the use of whole-animal testing to provide the data necessary for chemical safety evaluations. Such methods are typically costly and resource-intensive. However, due to the large number of chemicals currently used and the rapid pace at which new chemicals are being developed, globally there is a recognized need for more efficient methods of chemical screening1,2. This need and the resulting paradigm shift away from animal testing has led to the development of many new approach methods, including high-throughput screening assays, high-throughput transcriptomics, next-generation sequencing, and computational modeling, which are promising alternative testing strategies3,4.
Evaluating chemical safety across the diversity of species potentially impacted by chemical exposures has been an enduring challenge, not only with traditional toxicity testing but also with new approach methods. Advances in comparative and predictive toxicology have provided frameworks for understanding the relative sensitivity of different species, and technological advances in computational methods continue to increase the applicability of these methods. Several strategies have been discussed over the last decade that leverage existing gene and protein sequence databases, along with knowledge of specific chemical molecular targets, to support predictive approaches for cross-species extrapolation and enhance chemical safety evaluations beyond the typical model organisms5,6,7,8.
To advance the science into action, build upon these foundational studies in predictive toxicology, prioritize chemical testing efforts, and support decision-making, the US Environmental Protection Agency Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was created. This tool is a public and freely available web-based application that uses public repositories of constantly expanding protein sequence information to predict chemical susceptibility across the diversity of species9. Based on the principle that a species' relative intrinsic susceptibility to a particular chemical can be determined by evaluating theconservation of the known protein targets of that chemical, this tool rapidly compares protein amino acid sequences from a species with known sensitivity to all species with existing protein sequence data. This evaluation is completed through three levels of analysis, including (1) primary amino acid sequence, (2) functional domain, and (3) critical amino acid residue comparisons, each requiring more in-depth knowledge of the chemical-protein interaction and providing greater taxonomic resolution in the susceptibility prediction. A major strength of SeqAPASS is that users can customize and refine their evaluation by adding additional lines of evidence toward target conservation based on how much information is available regarding the chemical-protein or protein-protein interaction of interest.
The first version was released in 2016, which allowed users to evaluate primary amino acid sequences and functional domains in a streamlined manner to predict chemical susceptibility and contained minimal data visualization capabilities (Table 1). Individual amino acid differences have been shown to be important determinants of cross-species differences in chemical-protein interactions, which can affect species' chemical susceptibility10,11,12. Therefore, subsequent versions were developed to consider the critical amino acids that are important for direct chemical interaction13. Responding to stakeholder and user feedback, this tool has undergone annual version releases with additional new features designed to meet the needs of both researchers and regulatory communities for addressing challenges in cross-species extrapolation (Table 1). The launch of SeqAPASS version 5.0 in 2020 brought forth user-centered features that incorporate data visualization and data synthesis options, external links, summary table and report options, and graphical features. Overall, the new attributes and capabilities of this version improved data synthesis, interoperability amongst external databases, and the ease of data interpretation for predictions of cross-species susceptibility.
1. Getting started
NOTE: The protocol presented here is focused on tool utility and key features. Detailed descriptions of methods, features, and components can be found on the website in a comprehensive User Guide (Table 1).
Table 1: Evolution of the SeqAPASS tool. A list of features and updates added to the SeqAPASS tool from its initial deployment. Abbreviations: SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility; ECOTOX = ECOTOXicology knowledgebase. Please click here to download this Table.
Figure 1: SeqAPASS problem formulation: schematic diagram of the preliminary information necessary for a successful analysis. Abbreviations: SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility; LBD = ligand-binding domain. Please click here to view a larger version of this figure.
Figure 2: SeqAPASS interoperability across databases. Schematic diagram of external tools, databases, and resources integrated into SeqAPASS. Abbreviations: SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility; AOP = adverse outcome pathway; NCBI = National Center for Biotechnology Information; ECOTOX = ECOTOXicology knowledgebase. Please click here to view a larger version of this figure.
Table 2: Links, resources, and tools integrated into the SeqAPASS tool. A list of the various data sources, links, and resources leveraged into the SeqAPASS tool. Abbreviation: SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility. Please click here to download this Table.
2. Developing and running a SeqAPASS query: Level 1
NOTE: In a Level 1 analysis, the entire primary amino acid sequence of a query protein is compared to the primary amino acid sequences of all species with available sequence information. This tool uses algorithms to mine, collect, and compile publicly available data to rapidly align and compare amino acid sequences across species. The backend stores information from National Center for Biotechnology Information (NCBI) databases and strategically makes use of the standalone versions of the Protein Basic Local Alignment Search Tool (BLASTp)54 and the Constraint-based Multiple Alignment Tool (COBALT)55.
3. Developing and running a SeqAPASS query: Level 2
NOTE: As the entire protein sequence is not directly involved in a chemical interaction, a Level 2 analysis compares only the amino acid sequence of the functional domain to make susceptibility predictions at lower taxonomic ranks (e.g., class, order, family).
4. Accessing and understanding the data: SeqAPASS Level 1 and Level 2
5. Manipulating data settings: SeqAPASS Level 1 and Level 2
NOTE: In both Level 1 and Level 2 analyses, it is assumed that the greater the protein similarity, the greater the likelihood that a chemical will interact with the protein in a similar manner to the query species/protein, making them susceptible to potential impacts of chemicals with this molecular target. Due to the similarity of these data, steps for understanding Level 1 and 2 data are outlined together in a single protocol.
6. Visualizing the data: SeqAPASS Level 1 and Level 2
7. Developing and running a SeqAPASS analysis: Level 3
NOTE: A Level 3 analysis assesses user-identified amino acid residues within the query protein and rapidly compares the conservation of these residues across species. Species in which these residues are conserved are assumed to be more likely to interact with a chemical in a similar manner to the template species/protein. As Level 3 focuses on individual amino acids, an analysis can only be performed when detailed knowledge of the amino acid residues critical to the chemical-protein or protein-protein interaction is available.
8. Identify critical amino acid residues using identified literature
9. Visualizing Level 3 SeqAPASS data
NOTE: As in previous Levels, Primary and Full reports are available. In addition to data identical to the data in Level 1 and 2, the Primary Report displays amino acid positions, abbreviations, and a yes/no (Y/N) similar susceptibility as the template prediction. Similarly, the Full Report contains information on amino acid side chain classification and molecular weight.
10. Interpretation of SeqAPASS Results: Lines of evidence for protein conservation
NOTE: For ease of interpretation, this tool includes a Decision Summary Report (DS Report) designed to integrate data across Levels. The DS Report contains the results (i.e., data tables and/or visualizations) that the user has selected and allows for the quick evaluation of susceptibility predictions across multiple Levels for multiple species simultaneously.
To demonstrate the application of the SeqAPASS tool and highlight new features, two case studies are described representing instances in which protein conservation predicts that there are differences in chemical susceptibility across species (human transthyretin) and that there are no differences (µ opioid receptor [MOR]). The first of these examples addresses protein sequence/structural comparisons to predict the domain of applicability for adverse outcome pathways (AOPs, see Table 2 for definition), while the second is focused on developing research hypotheses relevant to cross-species susceptibility to opioids present in wastewater. The basic approaches described in these case studies can be applied to any chemical and demonstrate the broad utility of this tool for decision-making and research.
Thyroid hormones are essential for normal growth and development. They are synthesized in the thyroid gland and secreted into the bloodstream where they bind to distribution proteins and are circulated throughout the body14,15,16,17,18. Recent studies have shown that environmental contaminants, such as polychlorinated biphenyls (PCBs), polybrominated diphenyl ethers (PBDEs), and per- and polyfluoroalkyl substances (PFAS), can competitively bind to the distribution protein transthyretin (TTR) and disrupt normal thyroid processes19,20,21,22,23,24,25. An AOP has been developed describing competitive binding to TTR leading to human neurodevelopmental toxicity (https://aopwiki.org/aops/152). While there is evidence that this AOP is also applicable to rodents, the applicability to other taxonomic groups has yet to be defined. As TTR-binding chemicals are present in the environment, it is important to understand the taxonomic relevance of this AOP, a challenge that can be addressed in part through SeqAPASS analysis. Using the tool’s problem formulation strategy, the objective of the analysis can be stated as follows: With the knowledge that TTR-binding compounds lead to adverse outcomes in humans, what taxonomic groups would be predicted to share similar susceptibility?
The human transthyretin protein is well characterized, and there are several well-studied ligands known to bind at the human TTR (hTTR) binding site, making it an optimal target for SeqAPASS analysis8,9,13. Using the NCBI accession for human transthyretin, P02766.1, a Level 1 analysis was conducted with default settings. The results of the Level 1 analysis set the percent similarity cutoff at 49%, with mammals (Mammalia), birds (Aves), reptiles (Testudines, Lepidosauria, Crocodylia), amphibians (Amphibia), and most fish species (Actinopteri, Coelacanthimorpha, Cladista, Chondrichthyes) falling above this cutoff (Figure 3). Thus, all species from these taxonomic groups resulted in a susceptibility prediction of "Y" (i.e., yes) and are likely susceptible to chemicals known to interact with hTTR (Figure 3 and Supplemental File 1).
For the Level 2 assessment of functional domain(s), the NCBI Conserved Domain Database was used to identify TR_THY (accession smart00095) as a conserved domain comprising the mature chain of the TTR subunit protein from residues 27 to 147. As the protein sequence of TTR reported in NCBI includes a 20 amino acid pre-segment not relevant to the current analysis, focusing the comparison on the mature chain provides an additional, more specific, line of evidence toward the conservation of this protein across species. From the Level 2 evaluation, a percent similarity cutoff of 58% was reported, with mammals, birds, reptiles, amphibians, and most fish species again falling above this cutoff (Figure 4). Consequently, SeqAPASS concluded a susceptibility prediction of "Y" (i.e., yes) for species from these taxonomic groups, indicating they are likely susceptible to chemicals that interact with the hTTR protein (Figure 4 and Supplemental File 1). Overall, results from Level 1 and Level 2 analyses suggest that most vertebrate species share conservation of the hTTR and are likely to be susceptible to chemicals known to interact with this protein.
Figure 3: SeqAPASS Level 1 analysis of transthyretin conservation across taxonomic groups with available sequence information relative to the human protein. Percent similarity of the protein amino acid sequence is displayed on the Y-axis; taxonomic group is displayed on the X-axis. Open circles (○) indicate the query sequence, and closed circles (●) indicate the species within the taxonomic group with the highest percent similarity. Within the plot, the top and bottom of each box represent the 75th and 25th percentiles, the whiskers extend to 1.5 times the interquartile range, and the mean and median values are represented by horizontal black lines on the box. The dashed line indicates the cutoff for susceptibility predictions. Abbreviations: TTR = transthyretin; SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility. Please click here to view a larger version of this figure.
Figure 4: SeqAPASS Level 2 analysis of the transthyretin receptor ligand-binding domain conservation across taxonomic groups with available sequence information relative to the human protein LBD. Percent similarity of the ligand binding domain amino acid sequence is displayed on the Y-axis; taxonomic group is displayed on the X-axis. Open circles (○) indicate the query sequence, and closed circles (●) indicate the species within the taxonomic group with the highest percent similarity. Within the plot, the top and bottom of each box represent the 75th and 25th percentiles, the whiskers extend to 1.5 times the interquartile range, and the mean and median values are represented by horizontal black lines on the box. The dashed line indicates the cutoff for susceptibility predictions. Abbreviations: TTR = transthyretin; SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility. Please click here to view a larger version of this figure.
Through the analysis of molecular modeling and protein crystallography studies, amino acids were identified in the binding region of TTR that are predicted to interact with the endogenous ligand 3,3′,5,5′-tetraiodo-L-thyronine (T4, PDB 2ROX), as well as three environmental chemicals: perfluorooctane sulfonate (PFOS, PDB 5JIM), tetrabromobisphenol A (TBBPA, PDB 5HJG), and diethylstilbestrol (DES, PDB 1TZ8)19,21,22,26. The amino acid residues Lys35, Ser137, Leu130, Ala128, Ala129, and Thr139 were all identified as playing a key role in protein-ligand interactions, either through direct hydrogen bond interactions or van der Waals interactions. These six amino acid residues were evaluated in a Level 3 analysis across species using hTTR as the template sequence and excluding non-homologous, hypothetical, partial, and low-quality sequences (Supplemental File 1). As it was previously determined that TTR is conserved only across vertebrate species, invertebrate species were excluded from this analysis (Figure 3 and Figure 4). Additionally, it is important to note that amino acid positions reported in the literature exclude a 20 amino acid pre-segment that is absent in the mature hTTR protein and, for this reason, positions submitted in Level 3 were adjusted from those reported in the literature to ensure accurate alignment to the selected template protein15 (Supplemental File 1).
In the Level 3 analysis of TTR, 294 vertebrate species were selected for alignment (mammals, birds, amphibians, reptiles, and fish). Of the species evaluated, 18 displayed differences in key amino acids resulting in a susceptibility prediction of "N" (i.e., no). Interestingly, five species of marine mammals presented with an amino acid substitution at position 2 (128A), while four species of fish demonstrated substitutions at either position 2 (128A) or position 6 (139T) (Figure 5). As these amino acids play important roles in protein-ligand interactions in the binding channel of TTR, these data suggest that TTR ligands may interact differently in these species and would result in different chemical susceptibility relative to humans.
Figure 5: SeqAPASS Level 3 analysis of the conservation of amino acid residues important for TTR-chemical binding. (A) Level 3 summary table displaying the number of species with available sequence data across all taxonomic groups, the number of species predicted to be similarly susceptible (Y), and the number of species predicted to not be similarly susceptible (N). (B) Level 3 heatmap displaying select species predicted to not be similarly susceptible relative to the human transthyretin TTR protein, demonstrating full, partial, and non-matching amino acids. Abbreviations: TTR = transthyretin; SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility. Please click here to view a larger version of this figure.
In response to stakeholder and user feedback, new features have been designed and incorporated into the SeqAPASS tool, including the ability to connect to empirical data for different applications. The interoperability of this tool with the ECOTOX knowledgebase (Table 1) was achieved by both embedding external links in the Level 1 and Level 2 data tables for accessions present in ECOTOX and creating an ECOTOX widget within the tool to directly filter to the most relevant toxicity data in ECOTOX. Through the links and the widget, users can rapidly query ECOTOX and gather corresponding toxicity data for species with SeqAPASS susceptibility predictions. Currently, SeqAPASS predictions are connected to ECOTOX through a chemical stressor and species; however, toxicity data are not yet linked to specific genes/proteins, which would allow for direct connections to the specific endpoints/molecular targets of interest in SeqAPASS. While linking predictions to toxicity data based on a chemical stressor is not ideal, as data may not be specific to a given pathway, establishing a connection to bring results together is the first step. As the first iteration of a SeqAPASS-ECOTOX integration, the current approach provides users with all available toxicity data for the chemical stressor(s) and species at a broad level. These data, when combined with SeqAPASS predictions, can provide context at broad levels (vertebrate vs. invertebrate) and can be considered within the context of the AOP framework.
TTR presents a good case example for examining this connection as the existing AOP (AOP 152) provides context for interpreting potentially relevant ECOTOX toxicity data. Starting with the ligands examined in SeqAPASS Level 3, environmental toxicity data were collected across species for four chemicals known to interact with the TTR ligand binding domain (diethylstilbestrol [DES], perfluorohexanoic acid [PFHxA], perfluorooctane sulfonate [PFOS], and tetrabromobisphenol A [TBBPA])19,21,23,24. For each chemical, ECOTOX was queried for Aquatic and Terrestrial data by Chemical Abstracts Service (CAS) number using custom search parameters (Supplemental File 1). Data were filtered to species groups of interest (amphibians, birds, fish, invertebrates, mammals, reptiles). Within the filtered query results, an average of the study's minimum and maximum effect concentrations was calculated and implemented as an approximation for the mean for any hits that did not report a mean effect concentration value (Figure 6A and Supplemental File 1). Within the context of a single chemical, Kruskal-Wallis tests were conducted to compare the mean effect concentrations of different taxonomic groups as the data did not meet ANOVA test assumptions. Post-hoc pairwise comparison tests were then conducted using the Dunn's Test for all chemicals, as the taxonomic groups consisted of unequal sample sizes. Aquatic and Terrestrial results were analyzed separately, as data between the two types of exposures are not directly comparable. Within ECOTOX, aquatic toxicity data for the selected chemicals were available for amphibians, birds, invertebrates, and fish species (Figure 6A). Terrestrial toxicity data for the selected chemicals were only available for mammals and DES (Supplemental File 1).
Figure 6: Linking SeqAPASS results with empirical data. (A) Mean effect concentrations across taxonomic groups with data available in the ECOTOXicology knowledgebase for select chemicals known to bind to the human TTR protein. (B) Overlap in the number of species included in each SeqAPASS analysis with the species for which ECOTOX data was available. In panel A, parentheses along the x-axis indicate the number of query hits for which data was aggregated. Asterisks indicate pairs of significantly different effect concentrations between species groups within the context of a single chemical (Dunn's test, p < 0.05), where higher numbers of asterisks indicate stronger levels of significance (*: p < 0.05; **: p < 0.01; ***: p < 0.001; ****: p < 0.0001). The center lines within each box represent the median, with box edges demonstrating the interquartile range. The whiskers extend up to 1.5 times the interquartile range. Outliers falling outside of that range are shown as individual points. Abbreviations: TTR = transthyretin; SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility; ECOTOX = ECOTOXicology knowledgebase. Please click here to view a larger version of this figure.
Overall, these data suggest that bioactivity for the chemicals assessed occurs in vertebrate species but does not occur in invertebrates. While the lack of biological target and pathway information in ECOTOX makes it impossible to directly link these empirical data to TTR, these results do support the SeqAPASS predictions that invertebrate species do not share susceptibility. All vertebrate species with available data demonstrated susceptibility to both PFOS and TBBPA, although the mean effect concentrations were significantly higher in fish and birds than amphibians. These data suggest potential differences in sensitivity between taxonomic groups that may be attributed to biological pathway differences (including TTR). It is noteworthy that other variables such as metabolism and excretion may also play a role in differences in sensitivity. For PFHxA and PFOS, fish were found to be significantly more sensitive than both invertebrates and birds, and, for DES, amphibians presented with significantly higher mean effect concentrations than invertebrates. Again, these data support our SeqAPASS prediction that invertebrates do not share susceptibility with vertebrate species. Of all species assessed using this tool for which there were available TTR sequence data, only a small number had corresponding ECOTOX data for the four chemicals of interest (Figure 6B, Supplemental Table S1, and Supplemental Table S2). For those species lacking apical data, SeqAPASS predictions of susceptibility add additional lines of evidence that related species may behave similarly to those with apical data. Data for all SeqAPASS and ECOTOX analyses are available in Supplemental File 1.
According to the Centers for Disease Control and Prevention (CDC), in 2017, opioids contributed to about 47,600 deaths by overdose in the United States, a number that continues to rise27. Wastewater plants in the US are regulated nationally by the US Environmental Protection Agency's National Pollutant Discharge Elimination System, which does not require testing for opioids or other pharmaceuticals in their discharge28. In recent years, an effort has been made to use wastewater-based epidemiology as a tool to map community opioid use. Opioid monitoring efforts have detected concentrations as high as 1.27 µg/L in wastewater effluent and 0.7 µg/L in surface waters29,30. Recent toxicity studies assessing the effect of opioid exposure on fish have reported the development of addictive behaviors and adverse immunological effects (e.g., higher infection rates, downregulation of immune genes)31,32,33. Overall, these studies suggest there is potential for adverse environmental opioid exposures and highlight the importance of understanding the risk these chemicals pose to aquatic species. Given the range of species that may encounter these compounds in the environment, identifying potentially susceptible species using SeqAPASS may be important for the prioritization of testing or monitoring efforts.
The MOR constitutes the main opioid target for the management of pain and is responsible for the powerful analgesic and addictive properties of opiate alkaloids in humans34,35. Due to the importance of this receptor to human health, MOR ligands are well known, and high-quality X-ray crystallography studies are available, making this target ideal for SeqAPASS analysis8,9,13. Using the NCBI accession for human µ opioid receptor, ACM90349.1, a Level 1 analysis was conducted using default settings. The susceptibility cutoff was established at 55% for Level 1, with percent similarities for mammals (Mammalia), birds (Aves), reptiles (Testudines, Lepidosauria, Crocodylia), amphibians (Amphibia), and most fish species (Actinopteri, Coelacanthimorpha, Cladista, Chondrichthyes) falling above this cutoff; therefore, species from these taxonomic groups resulted in a susceptibility prediction of "Y" (i.e. yes), indicating they would likely be susceptible to chemicals known to interact with human MOR (Figure 7 and Supplemental File 1). Using the NCBI Conserved Domain Database, 7tmA_Mu_opioid_R was identified (accession cd15090) as a functional domain comprising all seven helices of the MOR protein from 133 to 411, including a putative ligand binding site. Compared to Level 1, the Level 2 results identified a higher susceptibility cutoff of 88% similarity, with mammals, birds, reptiles, amphibians, and most fish species found above this cutoff and resulting in a susceptibility prediction of "Y" for yes (Figure 8). Overall, results from Level 1 and Level 2 analyses suggest that most vertebrate species share conservation of the MOR and are likely to be susceptible to chemicals known to interact with human MOR.
Figure 7: SeqAPASS Level 1 analysis of µ opioid receptor conservation across taxonomic groups with available sequence information relative to the human protein. Percent similarity of the protein amino acid sequence is displayed on the Y-axis; taxonomic group is displayed on the X-axis. Open circles (○) indicate the query sequence, and closed circles (●) indicate the species within the taxonomic group with the highest percent similarity. Within the plot, the top and bottom of each box represent the 75th and 25th percentiles, the whiskers extend to 1.5 times the interquartile range, and the mean and median values are represented by horizontal black lines on the box. The dashed line indicates the cutoff for susceptibility predictions. Abbreviations: MOR = mu-opioid receptor; SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility. Please click here to view a larger version of this figure.
Figure 8: SeqAPASS Level 2 analysis of the µ opioid receptor ligand-binding domain conservation relative to the domain in the human protein. Percent similarity of the ligand-binding domain amino acid sequence is displayed on the Y-axis; taxonomic group is displayed on the X-axis. Open circles (○) indicate the query sequence, and closed circles (●) indicate the species within the taxonomic group with the highest percent similarity. Within the plot, the top and bottom of each box represent the 75th and 25th percentiles, the whiskers extend to 1.5 times the interquartile range, and the mean and median values are represented by horizontal black lines on the box. The dashed line indicates the cutoff for susceptibility predictions. Abbreviations: MOR = mu-opioid receptor; SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility. Please click here to view a larger version of this figure.
Through analysis of molecular modeling and protein crystallography studies, amino acids were identified in the binding region of MOR that are predicted to interact with known ligands. Although the diverse set of ligands that bind well to opioid receptors results in complex pharmacology, some consistent ligand-protein interactions are observed36,37. Based on molecular docking to various MOR crystal structures, both morphine and fentanyl, high-affinity MOR agonists, interact with D147, Y148, M151, W293, I296, H297, V300, I322, and Y32636,38. Residues D147, Y148, M151, and H297 are also implicated in crystal structures of MOR bound to the morphine agonist BU72, while D147, M151, H297, and Y326 are also critical in binding to the irreversible morphine antagonist β-funaltrexamine37. Considering these lines of evidence, nine residues were selected (D147, Y148, M151, W293, I296, H297, V300, I322, Y326) for Level 3 evaluation using human MOR as a template sequence and excluding partial, predicted, hypothetical, and low-quality sequences. Importantly, amino acid positions reported in the literature exclude a 64-amino-acid segment relative to the NCBI protein accession and, for this reason, positions in Level 3 were selected to represent those that aligned with the template sequence for correct alignment sequences.
In the Level 3 analysis of the human MOR, 284 species were assessed across vertebrate species (mammals, birds, amphibians, reptiles, and fish). Across all species evaluated, the nine amino acids were either a total match or a partial match based on side-chain classification and molecular weight; consequently, all species assessed resulted in a susceptibility prediction of "Y" for yes (Table 3 and Supplemental File 1). As these amino acids are important in the binding of both strong MOR agonists and strong antagonists, these data suggest that opioid compounds targeting human µ opioid receptors may interact similarly with receptors across vertebrate species. Although there is little empirical data available to date within the ECOTOX knowledgebase for opioid compounds, several studies suggest that fish are likely susceptible31,32,33. Overall, results from SeqAPASS point to the potential for broader environmental impacts of MOR-modulating chemicals across species, indicating that more research and perhaps monitoring may be valuable. Data for all analyses are available in Supplemental File 1.
Table 3: SeqAPASS Level 3 analysis of the conservation of amino acid residues important for chemical binding to the µ opioid receptor. Summary table displaying the number of species with available sequence data across all taxonomic groups, the number of species predicted to be similarly susceptible (Y), and the number of species predicted to not be similarly susceptible, as well as full, partial, and non-matching amino acids. Abbreviation: SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility. Please click here to download this Table.
Supplemental Table S1: Species with available ECOTOX data for the four chemicals of interest known to bind to the human transthyretin protein. Data available for each chemical is aligned with SeqAPASS predictions of similar Susceptibility across levels 1, 2, and 3. All SeqAPASS predictions relative to the human transthyretin sequence. Abbreviations: SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility; DES = diethylstilbestrol; PFHxA = perfluorohexanoic acid; PFOS = perfluorooctane sulfonic acid; TBBPA = tetrabromobisphenol A. Please click here to download this Table.
Supplemental Table S2: Total number of species with available data across SeqAPASS and ECOTOX Evaluations for select chemicals known to bind to the human transthyretin protein. SeqAPASS provides means for predicting species susceptibility across many species for which empirical toxicity data is unavailable. Please click here to download this Table.
Supplemental File 1: SeqAPASS and ECOTOX data for all representative results. File contains ToC followed by the following data sheets: Tab 1-hTTR SeqAPASS Results Level 1, Tab 2-hTTR SeqAPASS Results Level 2, Tab 3-hTTR SeqAPASS Results Level 3, Tab 4-EcoTox data for DES, Tab 5-EcoTox data for PFOS, Tab 6-EcoTox data for PFHxA, Tab 7-EcoTox data for TBBPA, Tab 8-EcoTox Group Mean Calculations, Tab 9-SeqAPASS EcoTox Data Comparisons, Tab 10-hMOR SeqAPASS Results Level 1, Tab 11-hMOR SeqAPASS Results Level 2, and Tab 12-µ-hMOR SeqAPASS Results Level 3. Abbreviations: SeqAPASS = Sequence Alignment to Predict Across Species Susceptibility; ToC = Table of Contents; hTTR = human transthyretin; ECOTOX = Ecotoxicology knowledgebase; DES = diethylstilbestrol; PFOS = perfluorooctane sulfonic acid; PFHxA = perfluorohexanoic acid; TBBPA = tetrabromobisphenol A; hMOR = human mu-opioid receptor. Please click here to download this File.
There is widespread recognition that it is not feasible to empirically test enough species to capture the genomic, phenotypic, physiological, and behavioral diversity of living organisms that may be exposed to chemicals of toxicological interest. The goal of SeqAPASS is to maximize the use of existing and continuously expanding protein sequence and structural data to aid and inform the extrapolation of chemical toxicity data/knowledge from tested organisms to hundreds or thousands of other species through molecular-level comparisons. The SeqAPASS tool was designed to reduce the complexity of protein sequence comparisons for scientists, risk assessors, and regulators through a streamlined and rapid analysis that includes transparently generated and downloadable summary tables, interactive data visualizations, and the easy identification of threatened and endangered species, as well as common model organisms. Protocols are described here to run SeqAPASS Levels 1, 2, and 3 for the evaluation of primary amino acid sequence similarity, functional domain conservation, and critical amino acids involved in chemical-protein and protein-protein interactions. Lines of evidence gathered from each Level of SeqAPASS analysis predict chemical susceptibility across species, providing consistent and readily interpretable data. To date, this tool has been used across a broad range of applications,including the identification of chemicals that bind to certain receptors and assessing read-across potential for vertebrate ecological receptors with mammalian systems. Additionally, two case studies focused on the thyroid hormone distribution protein, TTR, and the MOR are described here to demonstrate the new features and functionality of SeqAPASS v2 through v6.
As with any computational approach, the ability to generate predictions of species susceptibility within the SeqAPASS tool is highly dependent on the input of appropriate parameters8,9,13. It is, therefore, critical that, prior to conducting an analysis, a problem formulation step is conducted to survey existing data and literature for the intended target. Starting an analysis with knowledge of the protein target allows the user to identify appropriate protein accession numbers and high-quality sequences. Similarly, knowledge of a sensitive or targeted species or of model organisms used in assays or AOP development ensures the selection of an appropriate query species to which all other species are compared. Selecting functional domains for Level 2 and critical amino acid residues for Level 3 are also critical steps that require the user to identify appropriate input parameters to generate predictions. Due to this need for pre-existing knowledge of a chemical-protein interaction, recent version releases of the SeqAPASS tool integrate user-friendly resources designed to help guide users to relevant information for the initiation of a query (e.g., links to other tools) (Table 2 and Figure 2). Additionally, pop-up information messages and alerts have been integrated into the tool to guide the user through the analysis and help inform the user of any errors that need to be resolved.
The complexity of chemical-biological interactions presents a limitation of the SeqAPASS tool. When extrapolating toxicity data across species, the conservation of the molecular target is one of many factors to consider. The adsorption, distribution, metabolism, and excretion (ADME) of a chemical is crucial when considering chemical toxicity, as chemicals can either be activated or detoxified by these processes39,40. Other factors, such as the route of chemical exposure, organism life-stage and life history, and diet, can also play important roles in determining chemical sensitivity across species as well41,42. To address this limitation, it is important to understand the main question SeqAPASS asks when predicting chemical susceptibility: is a chemical's protein target likely to be present in another species for the chemical to act upon? This question is addressed by identifying ortholog candidates and considering the conservation of that target across species relative to a known sensitive or targeted species. This information can be used as a line of evidence for cross-species extrapolation and integrated into other evidence streams (e.g., the potential for exposure) to better understand species susceptibility to chemical stressors. Updates to SeqAPASS have incorporated integrated links to external tools, including the US EPA ECOTOX knowledgebase43 and the US Fish & Wildlife Service Environmental Conservation Online System (ECOS)44. Connection to these databases provides SeqAPASS users easy access to empirical chemical toxicity data for comparison to sequence-based predictions and a means to identify species that may have protected status.
The SeqAPASS tool provides a scientifically based platform for computational predictions of intrinsic susceptibility that are supported by concepts in evolutionary biology and case examples that compare predictions to available empirical results. In addition, SeqAPASS is free and publicly available on a well-supported web-based platform that is widely accessible (https://seqapass.epa.gov/seqapass/). As this tool leverages sequence data and protein information from existing databases, its ability to predict chemical susceptibility in a broader diversity of species is constantly improving as sequencing technology advances and the genomes of new species are sequenced and annotated. Although this offers distinct advantages regarding data availability, it also presents a limitation in that publicly available sequence information can be subject to inconsistent quality, poor annotation, and incompleteness of protein sequences for some species. However, it is promising that omics technologies and methods in bioinformatics are advancing rapidly and, therefore, sequence curation and quality are likely to continue to improve over time.
One major goal of the SeqAPASS tool is transparency, providing access in the form of links to all data sources and tools that are integrated in the backend. Such transparency allows the user rapid access to the original sources of the sequence or taxonomic information from NCBI. The domain of applicability for this tool is defined by the information needed to conduct a meaningful analysis. Since knowledge of a chemical-protein or protein-protein interaction in a known sensitive or targeted species are key elements to begin a query, it must be acknowledged that queries conducted without this information are not meaningful. Additionally, chemicals that have multiple undefined biological targets or interact with different targets with differing degrees of potency also present a challenge and limitation of the tool in its current form. It is anticipated that, with improved bioinformatics, computational modeling, and cell-based, high-throughput screening and transcriptomics, greater knowledge across the diversity of chemical space regarding interactions with specific proteins will continue to be elucidated. It is expected that the ability to apply SeqAPASS to broader challenges of species extrapolation, relative to understanding the potential for adverse chemical effects across the diversity of species, will continue to improve.
In conclusion, the SeqAPASS tool is an accessible platform that readily applies molecular information to address the sizeable challenge of cross-species extrapolation in chemical safety evaluations. Although the examples highlighted here are focused on generating predictions of chemical susceptibility, the results can also aid in understanding the overall conservation of biological pathways. Bringing together different lines of evidence and facilitating access to multiple platforms and databases, this tool helps to build transparent cases for the prioritization of chemical testing and allocation of resources. With the continued development of scientific and bioinformatic capabilities, the power and utility of the tool will continue to grow and improve to meet the needs of research and regulatory communities while reducing the resources required for cross-species assessments.
The authors have nothing to disclose.
The authors thank Dr. Daniel L. Villeneuve (U.S. EPA, Center for Computational Toxicology and Exposure) and Dr. Jon A. Doering (Department of Environmental Sciences, Louisiana State University) for providing comments on an earlier draft of the manuscript. This work was supported by the U.S. Environmental Protection Agency. The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency, nor does the mention of trade names or commercial products indicate endorsement by the federal government.
Spreadsheet program | N/A | N/A | Any program that can be used to view and work with csv files (e.g. Microsoft Excel, OpenOffice Calc, Google Docs) can be used to access data export files. |
Basic computing setup and internet access | N/A | N/A | SeqAPASS is a free, online tool that can be easily used via an internet connection. No software downloads are required. |