Science Education
>

Investigating Protein Sequence-structure-dynamics Relationships with Bio3D-web

PREPARAZIONE ISTRUTTORI
CONCETTI
Student Protocol
JoVE Journal
Biochimica
This content is Free Access.
JoVE Journal Biochimica
Investigating Protein Sequence-structure-dynamics Relationships with Bio3D-web

NOTE: A typical Bio3D-web session proceeds through five consecutive and dependent steps (see Figure 1 for a schematic representation). Each step is implemented as a consecutive navigation tab of the web application namely SEARCH, ALIGN, FIT, PCA, and eNMA.

1. Structure Search and Selection (SEARCH)

  1. Input structure
    1. Obtain the PDB ID of adenylate kinase (Adk), e.g. by searching the PDB [http://www.rcsb.org/pdb]. Alternatively, obtain the protein amino acid sequence of interest, e.g. from UniProt [http://uniprot.org].
    2. Enter the four character long PDB ID for Adk (e.g. 1AKE), or paste a protein sequence, to the text box in the "Input structure or sequence" panel.
  2. Hit selection
    1. Click the blue "Next" (Hit selection) button in the first panel or simply scroll down to panel B) "Hit selection" for further analysis.
    2. Make sure the "Limit total number of included structures" slider is set to its maximum value to include all structures above the cutoff.
    3. Lower the "Adjust the inclusion BitScore cutoff" to include more distantly related hits, or increase it to exclude.
  3. Optional hit filtering
    1. Click the blue "Next" (Hit selection) button in the first panel or simply scroll down to panel C) "Optional filtering of related structures for further analysis".
    2. Make sure the selected hits represent relevant structures by inspecting details of the table, e.g. PDB name, species, and bound ligands.
    3. Manually refine the selected subset of structures if needed by clicking the rows of the table.
      ​NOTE: Rows highlighted with a blue color depict PDB IDs selected for further analysis in subsequent tabs.

2. Multiple Sequence Alignment Analysis (ALIGN)

  1. Click the ALIGN tab to perform sequence alignment of the selected structures from the SEARCH tab.
  2. Alignment summary
    1. Review the alignment summary in panel A) "Alignment summary". Make sure that the regions of interest are aligned and not masked by gaps in one or more structures.
    2. If needed, toggle the "Display alignment editing options" and remove unwanted PDB IDs, e.g. PDBs with missing residues.
  3. Sequence alignment analysis
    1. Click the blue "Next" (Analysis) button to perform sequence-based clustering analysis of the collected structures.
    2. Select the plot option Dendrogram. Adjust the Cluster into K groups slider to partition the structures into k groups.
    3. Optionally change the clustering method if desired by toggling the More clustering and output options checkbox.
  4. Residue conservation analysis
    1. Click the blue "Next" (Conservation) button to calculate the column-wise residue conservation.
    2. Select the Aligned structure sets to generate a plot of the residue conservation at each alignment position.
    3. Select Structures aligned with PFAM seed alignment to show conservation calculated with respect to the associated PFAM seed alignment containing representative members of the family.
  5. Sequence alignment display
    1. Click the blue "Next" (Alignment) button to show the full sequence alignment with the in-browser alignment visualization tool.

3. Structure Fitting and Analysis (FIT)

  1. Perform structure superimposition by entering the FIT tab.
  2. Structure superposition
    1. Toggle the "Show PDBs" checkbox to visualize the aligned protein structures in-browser.
    2. Make sure the protein structures are superimposed to corresponding and relevant regions by visual inspections. Click and drag the mouse over the structures to rotate, and scroll to zoom.
    3. Adjust the coloring of the structures by clicking on the "Color options". Coloring options include alignment position, structural variability per position, RMSD cluster groups, sequence cluster groups, aligned regions and secondary structure.
    4. Download the superposed structures as either conventional PDB files or as a single PyMOL session file for visualization in a specialized molecular viewer program.
  3. Structure analysis
    1. Click the blue "Next" (Analysis) button to perform structure-based clustering of the collected PDB structures.
    2. Toggle the RMSD Heatmap in the Plot options dropdown menu.
    3. Adjust the clustering options, including the clustering method itself, through toggling the "More clustering and output options" checkbox.
      NOTE: Pairwise RMSD data can also be visualized as a dendrogram, a histogram or a heat map.
  4. Residue fluctuations
    1. Click the blue "Next" (RMSF) button to view the structural variability of each residue (shown as an RMSF plot) with major secondary structure elements shown in the marginal regions of the x-axis.
    2. Toggle the Show B-factors checkbox to overlay crystallographic B-factors of the reference structure onto the RMSF plot.

4. Principal Component Analysis (PCA)

  1. Perform principal component analysis by entering the "PCA" tab.
  2. Visualization of the principal components
    1. Toggle the "Show PC Trajectory" checkbox to visualize motions described by the PCs with the in-browser visualization tool.
    2. Make sure "Principal Component 1" is chosen from the first drop down menu.
    3. To visualize the motions described by other PCs, choose the desired PC from the "Choose Principal Component" drop down menu.
    4. Change the coloring of the trajectory from the "Color options" drop down menu.
    5. Choose "Variability per position" from the "Color options" to color by displacement magnitude.
    6. Click the "Download PDB trajectory" button in the "Principal Component Visualization" panel to obtain a trajectory view of the motion described by the PCs.
    7. Click the button "Download PyMOL" session file to generate a PyMOL session file giving the motions as a vector field.
  3. Conformer analysis
    1. Project the individual structures onto two selected PCs by clicking on the blue "Next" (Plot) button.
    2. Make sure "PC on X-axis" is set to 1, and "PC on Y-axis" to 2. To project the structures onto other PCs, adjust the PC numbering accordingly.
    3. Choose "Cluster by PC Subspace" to color the structures in the plot by PC-based clustering; "RMSD" to color by "RMSD-based" clustering; and "Sequence" to color by sequence based clustering.
    4. Click on any individual points in the plot to label the structures. Alternatively, highlight one or more structures in the table "PCA conformer plot annotation" below the plot.
    5. Slide the PCs in subspace slider to including more/less PCs for the clustering algorithm.
  4. Residue contributions
    1. Calculate the residue contributions to the individual PCs by clicking the blue "Next" (Residue contributions) button.
    2. Plot the contributions for additional PCs by including the PC number in the "Choose Principal Component" text box.
    3. Toggle the "Spread lines" checkbox avoid plotting the residue contributions on top of each other.
    4. Toggle off the "Multiline plot" checkbox to plot the residue contributions in separate plots.
    5. Toggle the "Show RMSF" to include the RMSF values (from the FIT tab).

5. Ensemble Normal Mode Analysis (eNMA)

  1. Click the eNMA tab to initiate normal modes (NMs) calculation.
  2. Filter structure
    1. Adjust the number of structures by lowering or increasing the "Cutoff" for structure inclusion/exclusion.
    2. Click the green "Run Ensemble NMA" to start the NMA calculation.
  3. Normal modes visualization
    1. Scroll down to the second panel of the eNMA tab (Normal Modes Visualization) for visualization of the NMs.
      ​NOTE: By default, the NM with the highest overlap (similarity) to PC-1 is displayed in the visualization window.
    2. To visualize the motions described by other NMs or other PDB structures, choose the desired NM and structure from the "Choose Mode" and "Show NMs for structure" drop down menus, respectively.
  4. Residue fluctuations
    1. Click the blue "Next" (Fluctuations) button to calculate the residue-wise fluctuations of structures selected for eNMA.
    2. Toggle the "Cluster by RMSD" to color the fluctuation profiles by RMSD-based clustering.
    3. Toggle the "Cluster by RMSIP" to color the fluctuation profiles by RMSIP-based clustering.
    4. Toggle the "Spread lines" checkbox to plot the grouped fluctuation profiles apart from each other.
  5. Comparing NMA and PCA
    1. Click the blue "Next" (PCA-vs-NMA) button to calculate the similarity between the individual NMs and PCs.
    2. Select a PDB ID from the "Compare NMs of structure" drop down menu to calculate the similarity between the NMs of this structure to the PCs calculated in the PCA tab.
  6. Overlap analysis
    1. Click the blue "Next" (Overlap analysis) button to calculate the overlap between calculated NMs and the structure difference vector between two selected structures.
    2. Select a 'reference' PDB ID from the "Compare NMs of structure" drop down menu and or one or more PDB IDs in the structure table for the pairwise comparison with the reference PDB.
  7. Clustering analysis
    1. Click the blue "Next" (Clustering) button to perform structure clustering based on pair-wise NM similarity (RMSIP).

Investigating Protein Sequence-structure-dynamics Relationships with Bio3D-web

Learning Objectives

Adenylate kinase (Adk) is a ubiquitous enzyme that functions to maintain the equilibrium between cytoplasmic nucleotides essential for many cellular processes. Adk operates by catalyzing the reversible transfer of a phosphoryl group from ATP to AMP. This reaction is accompanied by well-studied rate limiting conformational transitions 3,21. Here we analyze all currently available Adk structures with Bio3D-web to reveal detailed features and mechanistic principles of these essential transitions.

We can begin our Bio3D-web analysis of Adk by entering the RCSB PDB code of any known Adk structure. For example, entering the PDB ID 1AKE in panel A of the SEARCH tab returns 167 sequence similar structures from which the top 26 are automatically selected for further analysis (see panel B). The annotation presented in panel C indicates that these selected structures are all from E. coli, were solved by x-ray diffraction in a range of space groups; have a resolution range of 1.63 to 2.8 Å, and were co-crystalized with a range of different ligands (including no ligands, AMP, ADP, MG and the inhibitor AP5). Note that additional annotation details can be displayed by clicking on "Show/Hide Columns" option in panel C.

Multiple sequence alignment is performed upon entering the ALIGN tab. The first panel of the ALIGN tab displays a summary of the alignment providing details on the number of sequence rows (equivalent to the number of PDB structures), as well as the number of positions (i.e. alignment columns). This includes a specification of the number of gap and non-gap containing columns. The figure on the right hand side of the first row provides a schematic representation of the sequence alignment. Here the grey areas represent non-gap positions, while white areas in the alignment correspond to gaps. A representation of the sequence conservation is shown above the alignment with red areas indicating well-conserved positions, and white indicating less conserved. Note that the sequences in this figure are ordered based on their similarity provided by the clustering dendrogram on the left hand side. The second panel of this tab further facilitates clustering of the selected PDBs based on their pair-wise sequence similarity, which can be visualized either as a dendrogram or a heat map. By default, a dendrogram (or tree diagram) representing the arrangement of clusters is shown. The y- axis of the dendrogram represents the distance (in terms of sequence identity) between the clusters.

Structure superposition is performed automatically upon entering the FIT tab. The superimposed structures, displayed interactively in panel A, indicate the presence of a relatively rigid core region (encompassing residues 1-29, 68-117, and 161-214; see the 'optional core and RMSD details' panel at the bottom of the FIT tab for details). Two more variable nucleotide-binding regions (residues 30-67 and 118-167) are also clearly visible (Figure 2). RMSD-based clustering groups these structures into two distinct conformations.

Clicking on the PCA tab more clearly shows the relationship between the structures in terms of the displacements of these regions that effectively close over the bound nucleotide species in related structures (Figure 2B and 2C). The majority of structures are in the 'closed' form (blue in Figure 2C) and are associated with a bound ligand or inhibitor. In contrast more 'open' conformations are nucleotide and inhibitor free. This is consistent with the extensive body of research on Adk structure and dynamics indicating that an open configuration of these regions is required for nucleotide binding and a closed conformation for efficient phosphoryl transfer and suppression of detrimental hydrolysis events. It is notable that a single PC captures 97% of the total mean square displacement in this Adk structure set and provides a clear and compelling description of the open to closed transition along with the individual residue contributions to this functional displacement (panel C of the app and Figure 2).

Visiting the NMA tab and increasing the number of structures considered for calculation (via decreasing the cutoff for filtering similar structures) indicates that open state structures display enhanced local and global dynamics in comparison to the closed form structures (Figure 2D and panel C of app). Comparing PCA and NMA results for individual structures (panel D) indicates that the first mode of all open form structures displays a relatively high overlap to PC1 (with a mean value of 0.37 ± 0.04). In contrast, closed form structures display lower values (with a mean of 0.30 ± 0.01). RMSIP values for open form structures (0.62 ± 0.003) are also higher than those of closed structures (0.56 ± 0.008). In addition, overlap analysis shows that the first modes of the open state are in good agreement with the conformational change that describes the difference of the open and closed states (panel E). Clustering based on RMSIP values again displays a consistent partitioning of open and closed state structures (panel F).

Collectively these results indicate the existence of two major distinct conformational states for Adk. These differ by a collective low frequency displacement of two nucleotide-binding site regions that display distinct flexibilities upon nucleotide binding.

Figure 1
Figure 1: Bio3D-web overview with screen shots of the PCA and NMA tabs. Bio3D-web takes a user provided protein structure or sequence as input in the SEARCH tab (1). The server provides a list of related structures, which can be selected for further analysis. (2) The ALIGN tab provides sequence alignment and analysis of the structures selected in the SEARCH tab. (3) In the FIT tab all structures are superimposed and visualized in 3D together with the results of conventional pair-wise structure analysis. (4) Principal component analysis of the structure set is performed in the PCA tab to characterize inter-conformer relationships. (5) Normal mode analysis on each structure can be carried out in the eNMA tab to explore dynamic trends for the available structural states. Please click here to view a larger version of this figure.

Figure 2
Figure 2: Results of Bio3D-web analysis of adenylate kinase. (A) Available PDB structures of adenylate kinase superimposed on the identified invariant core. Structures are colored according to RMSD-based clustering provided in the FIT tab. (B) Visualization of the principal components is available from the PCA tab to characterize the major conformational variations in the data set. Here, the trajectory corresponding to the first principal component is shown in tube representation showing the large-scale closing motion of the protein. (C) Structures are projected onto their two first principal components in a conformer plot showing a low-dimensional representation of the conformational variability. Each dot (or structure) is colored according to user specified criteria, in this case PCA-based clustering results. (D) Normal mode analysis in the eNMA tab suggests enhanced local and global dynamics for structures in the open state (red) in comparison to the closed form (blue) structures. Please click here to view a larger version of this figure.

List of Materials

Bio3D-web
Web-site http://thegrantlab.org/bio3d-web/
Requirements Web browser

Lab Prep

We demonstrate the usage of Bio3D-web for the interactive analysis of biomolecular structure data. The Bio3D-web application provides online functionality for: (1) The identification of related protein structure sets to user specified thresholds of similarity; (2) Their multiple alignment and structure superposition; (3) Sequence and structure conservation analysis; (4) Inter-conformer relationship mapping with principal component analysis, and (5) comparison of predicted internal dynamics via ensemble normal mode analysis. This integrated functionality provides a complete online workflow for investigating sequence-structure-dynamic relationships within protein families and superfamilies.

We demonstrate the usage of Bio3D-web for the interactive analysis of biomolecular structure data. The Bio3D-web application provides online functionality for: (1) The identification of related protein structure sets to user specified thresholds of similarity; (2) Their multiple alignment and structure superposition; (3) Sequence and structure conservation analysis; (4) Inter-conformer relationship mapping with principal component analysis, and (5) comparison of predicted internal dynamics via ensemble normal mode analysis. This integrated functionality provides a complete online workflow for investigating sequence-structure-dynamic relationships within protein families and superfamilies.

Procedura

We demonstrate the usage of Bio3D-web for the interactive analysis of biomolecular structure data. The Bio3D-web application provides online functionality for: (1) The identification of related protein structure sets to user specified thresholds of similarity; (2) Their multiple alignment and structure superposition; (3) Sequence and structure conservation analysis; (4) Inter-conformer relationship mapping with principal component analysis, and (5) comparison of predicted internal dynamics via ensemble normal mode analysis. This integrated functionality provides a complete online workflow for investigating sequence-structure-dynamic relationships within protein families and superfamilies.

Tags