A key skill in biomolecular modeling is displaying and annotating active sites in proteins. This technique is demonstrated using four popular free programs for macromolecular visualization: iCn3D, Jmol, PyMOL, and UCSF ChimeraX.
Biomolecular visualization skills are paramount to understanding key concepts in the biological sciences, such as structure-function relationships and molecular interactions. Various programs allow a learner to manipulate 3D structures, and biomolecular modeling promotes active learning, builds computational skills, and bridges the gap between two dimensional textbook images and the three dimensions of life. A critical skill in this area is to model a protein active site, displaying parts of the macromolecule that can interact with a small molecule, or ligand, in a way that shows binding interactions. In this protocol, we describe this process using four freely available macromolecular modeling programs: iCn3D, Jmol/JSmol, PyMOL, and UCSF ChimeraX. This guide is intended for students seeking to learn the basics of a specific program, as well as instructors incorporating biomolecular modeling into their curriculum. The protocol enables the user to model an active site using a specific visualization program, or to sample several of the free programs available. The model chosen for this protocol is human glucokinase, an isoform of the enzyme hexokinase, which catalyzes the first step of glycolysis. The enzyme is bound to one of its substrates, as well as a non-reactive substrate analog, which allows the user to analyze interactions in the catalytic complex.
Understanding representations of the molecular world is critical to becoming an expert in the biomolecular sciences1, because the interpretation of such images is key to understanding biological function2. A learner's introduction to macromolecules usually comes in the form of two-dimensional textbook images of cell membranes, organelles, macromolecules, etc. but the biological reality is that these are three-dimensional structures, and an understanding of their properties requires ways to visualize and extract meaning from 3D models.
Accordingly, the development of biomolecular visual literacy in upper division molecular life science courses has gained attention, with a number of articles reporting on the importance and difficulties of teaching and assessing visualization skills1,3,4,5,6,7,8,9. The response to these articles has been an increase in the number of classroom interventions, typically within a semester in a single institution, wherein molecular visualization programs and models are used to target difficult concepts2,10,11,12,13,14,15. Additionally, researchers have sought to characterize how students use biomolecular visualization programs and/or models to approach a specific topic16,17,18,19. Our own group, BioMolViz, has described a Framework that subdivides overarching themes in visual literacy into learning goals and objectives to guide such interventions20,21, and we lead workshops that train faculty to use the Framework in the backward design of assessments to measure visual literacy skills22.
At the center of all of this work is a critical skill: the ability to manipulate structures of macromolecules using programs for biomolecular visualization. These tools were developed independently using a variety of platforms; therefore, they can be rather unique in their operation and use. This necessitates program-specific instructions, and the identification of a program that a user is comfortable with is important to facilitate continued implementation.
Beyond the very basics of manipulating structures in 3D (rotating, selecting, and altering the model), a major goal is to model the active site of a protein. This process allows a learner to develop their understanding in three overarching themes described by the BioMolViz Framework: molecular interactions, ligands / modifications, and structure-function relationships20,21.
Four popular choices of programs for biomolecular visualization include: Jmol/JSmol23, iCn3D24, PyMOL25, and UCSF Chimera26,27. We encourage those new to Chimera to use UCSF ChimeraX, the next generation of the Chimera molecular visualization program, which is the currently supported version of the program.
In this protocol, we demonstrate how to use each of these four programs to model the active site of human glucokinase with a bound substrate analog complex (PDB ID: 3FGU), and to display measurements to illustrate specific binding interactions28. The model represents a catalytic complex of the enzyme. To capture the active site in the pre-catalysis state, a non-hydrolyzable analog of ATP was bound to the glucokinase active site. This phosphoaminophosphonic acid-adenylate ester (ANP) contains a phosphorous-nitrogen bond instead of the usual phosphorous-oxygen linkage at this position. The active site also contains glucose (denoted BCG in the model) and magnesium (denoted MG). Additionally, there is a potassium ion (K) in the structure, resulting from potassium chloride used in the crystallization solvent. This ion is not critical for biological function and is located outside of the active site.
Figure 1: ATP/ANP structures. Adenosine triphosphate (ATP) structure compared to the phosphoaminophosphonic acid-adenylate ester (ANP). Please click here to view a larger version of this figure.
The protocol demonstrates the selection of the bound ligands of the substrate analog complex and the identification of active-site residues within 5 Å of the bound complex, which captures amino acids and water molecules capable of making relevant molecular interactions, including hydrophobic and van der Waals interactions.
The display is initially manipulated to show the majority of the protein in a cartoon representation, with the active site amino acid residues in stick representation to show the relevant atoms of the protein and highlight the molecular interactions. After step 3 of the protocol for each program, these representations have been applied and the view of the protein is similar across programs (Figure 2). At the end of the protocol, the protein cartoon is hidden to simplify the view, and focus on the active site.
Figure 2: Structure comparison across programs. Comparison of the structure of 3FGU in each program following the Adjust the Representation step (step 2 or 3 of each protocol). Please click here to view a larger version of this figure.
CPK coloring is applied to the active site amino acids and bound ligands29,30. This coloring scheme distinguishes atoms of different chemical elements in molecular models shown in line, stick, ball and stick, and space-filling representations. Hydrogen is white, nitrogen is blue, oxygen is red, sulfur is yellow, and phosphorus is orange in the CPK coloring scheme. Traditionally, black is used for carbon, although in modern use, carbon coloring may vary.
Hydrogen atoms are not visible in crystal structures, although each of these programs is capable of predicting their location. Adding the hydrogen atoms to a large macromolecular structure can obscure the view, thus they are not displayed in this protocol. Accordingly, hydrogen bonds will be shown by measuring from the center of two heteroatoms (e.g., oxygen to oxygen, oxygen to nitrogen) in these structures.
Program Overviews
Downloadable Graphical User Interfaces (GUIs): PyMOL (Version 2.4.1), ChimeraX (Version 1.2.5), and Jmol (Version 1.8.0_301) are GUI-based molecular modeling tools. These three interfaces feature command lines to input typed code; many of the same capabilities are available through menus and buttons in the GUI. A common feature in the command line of these programs is that the user may load and re-execute previous commands using the up and down arrow keys on the keyboard.
Web-based GUIs: iCn3D (I-see-in-3D) is a WebGL-based viewer for interactive viewing of three-dimensional macromolecular structures and chemicals on the Web, without the need to install a separate application. It does not use a command line, though the full web version features an editable command log. JSmol is a JavaScript or HTML5 version of Jmol for use on a website or in a web browser window, and is very similar in operation to Jmol. JSmol can be used to create online tutorials, including animations.
Proteopedia31,32, FirstGlance in Jmol33, and the JSmol web interface (JUDE) at the Milwaukee School of Engineering Center for BioMolecular Modeling are examples of such Jmol-based online design environments34. The Proteopedia wiki is a teaching tool that allows the user to model a macromolecule structure and create pages featuring these models within the website35. The Proteopedia scene authoring tool, built using JSmol, integrates a GUI with additional features not available in the Jmol GUI.
Jmol and iCn3D are based on the Java programming language; JSmol uses either Java or HTML5, and PyMOL and ChimeraX are based on the Python programming language. Each of these programs loads protein data bank files, which can be downloaded from the RCSB Protein Data Bank under a 4-digit alphanumeric PDB ID36,37. The most common file types are Protein Data Bank (PDB) files containing the .pdb extension and Crystallographic Information File (CIF or mmCIF) containing the .cif extension. CIF has superseded PDB as the default file type for the Protein Data Bank, but both file formats function in these programs. There can be slight differences in the way the sequence/structure is displayed when using CIF as opposed to PDB files; however, the files function similarly and the differences will not be addressed in detail here. The Molecular Modeling Database (MMDB), a product of the National Center for Biotechnology Information (NCBI), is a subset of PDB structures to which categorical information has been associated (e.g., biological features, conserved protein domains)38. iCn3D, a product of the NCBI, is capable of loading PDB files containing the MMDB data.
To view a model, the user can download the desired file from the dedicated Protein Data Bank page for the structure (e.g., https://www.rcsb.org/structure/3FGU), and then use the dropdown File menu of the program to open the structure. All of the programs are also capable of loading a structure file directly through the interface, and that method is detailed within the protocols.
The ChimeraX, Jmol, and PyMOL GUIs each contain one or more windows of the console that may be resized by dragging the corner. iCn3D and JSmol are entirely contained in a web browser. When using iCn3D, the user may need to scroll within the pop-up windows to reveal all menu items, depending on screen size and resolution.
The protocols detailed here provide a simple method to display the active site of the enzyme using each program. It should be noted that there are multiple ways to execute the steps in each program. For example, in ChimeraX, the same task may be executed using dropdown menus, the toolbar at the top, or the command line. Users interested in learning a specific program in detail are encouraged to explore the online tutorials, manuals, and Wikis available for these programs39,40,41,42,43,44,45,46.
Existing manuals and tutorials for these programs present the items in this protocol as discrete tasks. To display an active site, the user must synthesize the required operations from the various manuals and tutorials. This manuscript augments existing tutorials available by presenting a linear protocol for modeling a labeled active site with molecular interactions, providing the user with a logic for active site modeling that can be applied to other models and programs.
Figure 3: ChimeraX GUI. ChimeraX GUI interface with the dropdown menus, toolbar, structure viewer, and command line labeled. Please click here to view a larger version of this figure.
Figure 4: iCn3D GUI. iCn3D GUI interface with the dropdown menus, toolbar, structure viewer, command log, select sets pop-up, and sequence and annotations pop-up menus labeled. Please click here to view a larger version of this figure.
Figure 5: Jmol GUI. Jmol GUI interface with the dropdown menus, toolbar, structure viewer, pop-up menu, and console/command line labeled. Please click here to view a larger version of this figure.
Figure 6: PyMOL GUI. PyMOL GUI interface with the dropdown menus, structure viewer, names/object panel, mouse controls menu, and command line labeled. Please click here to view a larger version of this figure.
NOTE: The protocol for each program is outlined in ten overarching steps, (1) Loading the structure into the program, (2) Identifying the ligands in the active site, (3) Adjusting the representation, (4) Selecting residues within 5 Å to define an active site, (5) Showing the interactions of the enzyme with the active site ligands, (6) Displaying the side chains as sticks and showing/adjusting the active site water molecules, (7) Simplifying the structure, (8) Labeling ligands and hydrogen-bonded side chains, (9) Saving the rendering at any point to return to work on it or share with others, (10) Saving an image for embedding or printing. Steps 1, 4, and 7-10 are identical for each protocol; however, due to the unique operation of each program, some protocols are more efficiently executed when steps 2/3 and 5/6 are interchanged.
1. UCSF ChimeraX protocol
NOTE: Trackpad and Mouse Controls. To rotate, click and drag or use two-finger drag (mouse: left click and drag). To zoom, pinch and spread (Mac) or control + two-finger movement (PC) (mouse: scroll wheel). To translate (i.e., move the entire structure) press the option + click and drag (Mac) or shift + click and drag (PC) (mouse: right click and drag). To re-center, use the dropdown menus at the top of the interface to click Actions > View.
2. iCn3D protocol
NOTE: Trackpad and Mouse Controls: To rotate, click and drag (mouse: left click and drag). To zoom, pinch and spread (mouse: rotate the scroll wheel). To translate (i.e., move the entire structure) click and drag with two fingers (mouse: right click and drag). To re-center, hover over View in the top dropdown menus, and then click on Center Selection.
3. Jmol Protocol
NOTE: Trackpad and Mouse Controls: To rotate, click and drag (mouse: left click and drag). To zoom: scroll vertically using two fingers (mouse: shift + left click + drag vertically). To translate (i.e., move the entire structure) control + alt + click and drag (PC), control + option + click and drag (Mac). To re-center: shift + double click in the empty space of the structure viewer window.
4. PyMOL protocol
NOTE: Trackpad and Mouse Controls: To rotate, click and drag (mouse: left click and drag). To zoom, pinch and spread (mouse: right click and drag). To translate (i.e., move the entire structure), control + click and drag (mouse: command + left click and drag). To re-center go to the right-hand object panel and click on A > Orient or Center.
A successfully executed protocol for each of the programs will result in a molecular model zoomed in on the active site, with active site residues and ligands shown as sticks, the protein cartoon hidden, and ligands displayed with a contrasting color scheme. Interacting amino acid residues should be labeled with their identifiers, and hydrogen bonding and ionic interactions shown with lines. The presence of these features can be determined by visual inspection of the model.
To facilitate this inspection and enable the user to determine whether they have performed the steps of the protocol correctly, we've provided animated figures that present an image of the structure following each step. For ChimeraX, iCn3D, Jmol, and PyMOL, this is illustrated in Figures 7-10, respectively.
Figure 7: ChimeraX protocol output. Animated figure illustrating steps 1.1-1.8 of the ChimeraX protocol. Please click here to download this figure.
Figure 8: iCn3D protocol output. Animated figure illustrating steps 2.1-2.8 of the iCn3D protocol. Please click here to download this figure.
Figure 9: Jmol protocol output. Animated figure illustrating steps 3.1-3.8 of the Jmol protocol. Please click here to download this figure.
Figure 10: PyMOL protocol output. Animated figure illustrating steps 4.1-4.8 of the PyMOL protocol. Please click here to download this figure.
The most common error that can influence the outcome of these protocols is erroneous selection, resulting in part of the structure being displayed in an undesired rendering. This is typically a result of mis-clicking, either on the structure itself, or in one of the display menu buttons. An example of a suboptimal result would be a model containing residues outside of the active site displayed as sticks. The user can begin to analyze if this error has occurred by visually inspecting the residues displayed as sticks and ensuring that they are in the proximity of the active site ligands. An advanced method to evaluate whether or not the displayed residues are within 5Å of the active site ligands is to use the measurement tools built into each program to measure the distance between a nearby ligand and the active site residue. The measurement tools are beyond the scope of this manuscript; however, we encourage interested users to explore the many online tutorials detailing this type of analysis.
We present a specific example of a sub-optimal execution of this protocol, resulting from a mis-click on the names/objects panel in the PyMOL. This error displays the entire protein as sticks, rather than showing only the active site using this representation, as illustrated in Figure 11.
Figure 11: Negative result. Example of a negative result. Mis-selecting the full cartoon in PyMOL and displaying sticks. Please click here to view a larger version of this figure.
To troubleshoot, the user will need to hide the sticks for the entire model (labeled 3FGU in the names/object panel), and then show the stick representation for only the selection named "active," using the hide and show buttons/commands in PyMOL. Recovering the model from this type of error is relatively straightforward once the user is able to create appropriate selections for different parts of the model and display and hide them effectively. It is tempting to restart the protocol and work through the steps another time; however, we encourage the user to not be afraid to go "off script" and experiment with the model. In our experience, working through display errors facilitates progress in understanding the modeling program.
A side-by-side display of the final output from a successfully-executed protocol for each program is shown in Figure 12. The views are oriented similarly to allow the user to compare the appearance of the models created in different programs.
Figure 12: Final structure comparison across programs. Comparison of the structure of each active site rendering at the end of the protocol. A: ChimeraX, B: iCn3D, C: Jmol, D: PyMOL. The PyMOL active site label includes all active site residues and the ligands. The other outputs have only hydrogen bonded side chains labeled. Please click here to view a larger version of this figure.
This protocol outlines a ten-step process for the modeling of an enzyme active site, applied to four popular programs for biomolecular modeling. The critical steps of the protocol are: identifying the ligands in the active site, selecting residues within 5 Å to define an active site, and showing the interactions of the enzyme with the active site ligands. Distinguishing the ligands relevant to the biological function is paramount, as this allows the user to define the amino acid residues within 5 Å that can play a role in binding the ligands. Finally, using the program to display molecular interactions allows the user to develop the skills necessary to understand the molecular interactions that promote binding.
A limitation of computer-based molecular modeling protocols is the dependence on specific commands and syntax. While biochemical protocols may be tolerant of small changes in procedure, computer-based investigations may yield wildly different final products if the procedure is not closely adhered to. This is particularly important when using command-line interfaces where program-specific syntax is required to achieve a certain output, and a seemingly insignificant change in punctuation or capitalization can cause a command to fail. There are various Wikis and manuals for each program, where a user can find and troubleshoot command-line inputs; the user should pay careful attention to the details of command syntax. Although most molecular visualization programs include undo commands, due to the complexity of the interfaces, the undo command doesn't always faithfully reverse the last executed step. Therefore, saving the current working state often is encouraged, especially for new users.
Further limitations can arise from the data used to create the model itself. While the standards inherent in the Protein Data Bank ensure a certain level of consistency, users of molecular visualization programs will often encounter unexpected effects in a protein rendering. First, most structures are determined using X-ray crystallography, which provides a single model of the protein; however, NMR structures are often composed of multiple models that can be visualized one at a time. Second, structures determined from crystallography or cryogenic electron microscopy experiments may contain atoms whose position can't be elucidated and appear as gaps in certain representations of the protein. Protein structures may have alternate conformations of side chains, which, when displayed in stick rendering, appear as two groups protruding off of the same amino acid backbone. Even short sections of backbone may have such alternative conformations, and sometimes ligands are superimposed in the active site in more than one binding conformation.
For a crystal structure, the deposited 3D coordinates include all components of the asymmetric unit, which provides enough information to reproduce the repeating unit of a protein crystal. Sometimes, this structure will contain additional protein chains compared to the biologically active form of the protein (e.g., fetal hemoglobin mutant, PDB ID: 4MQK). Conversely, some programs may not automatically load all chains of the biologically active unit. For example, the SARS-CoV2 main protease (PDB ID: 6Y2E) loads half of the biologically active dimer (made up of two protein chains) when fetched using the commands described in this protocol in ChimeraX, PyMOL, and Jmol. Though slight modification of the command will load the biologically active dimer, this consideration may not be straightforward for the novice modeling program user. A different issue that can arise is in the identification of the active site or substrate itself. Crystallographic experiments are carried out using a variety of molecules, which may be modeled into the final structure. For example, sulfate molecules may bind phosphate binding sites in the active site, or they may bind other regions that are not relevant to the mechanism. These molecules may obscure the correct identification of the active site itself and may even suggest to the student that they are part of the mechanism.
Presumably, the user will wish to apply this procedure to other active/binding sites. To apply this protocol in the future work involving the analysis of new protein active sites, the user will need to identify which of the bound ligands are relevant to function. Some ligands are not associated with protein function, and instead are a result of the solvent or crystallization conditions used to conduct the experiment (e.g., the potassium ion present in the 3FGU model). The key ligands should be identified by consulting the original manuscript. With practice and, where applicable, an understanding of the line command syntax, a user will be able to apply the protocol for the desired modeling program to any enzyme active site, and to model other macromolecules of their choice.
Identifying and analyzing bound substrates and ligands is central to the elucidation of molecular mechanisms and structure-based drug design efforts, which have directly led to improvements in treatments for disease, including acquired immunodeficiency syndrome (AIDS) and COVID-1947,48,49,50,51,52. While individual molecular visualization programs offer different interfaces and user experiences, most offer comparable features. It is important for the development of biomolecular visualization literacy that upper level biochemistry students become familiar with structure visualization and the tools to generate such images4,20,53. This allows students to move beyond the interpretation of two-dimensional images in textbooks and journal articles and to more easily develop their own hypotheses from structural data54, which will prepare developing scientists to address future public health issues and improve the understanding of biochemical processes.
In summary, this protocol details active site modeling using four leading free macromolecular modeling programs. Our community, BioMolViz, adopts a non-software-specific approach to biomolecular modeling. We specifically avoided a critique or comparison of program features, although a user sampling each program will likely find that they prefer certain aspects of macromolecular modeling in one program versus another. We invite readers to utilize the BioMolViz Framework, which details the biomolecular visualization-based learning goals and objectives targeted in this protocol, and explore resources for teaching and learning biomolecular visualization through the BioMolViz community website at http://biomolviz.org.
The authors have nothing to disclose.
Funding for this work has been provided by the National Science Foundation:
Improving Undergraduate STEM Education Grant (Award #1712268)
Research Coordination Networks in Undergraduate in Undergraduate Biology Education (Award # 1920270)
We are grateful to Karsten Theis, PhD, Westfield University, for helpful discussions about Jmol.
ChimeraX (Version 1.2.5) https://www.rbvi.ucsf.edu/chimerax/ | |||
Computer | Any | ||
iCn3D (web-based only: https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html) | |||
Java (for Jmol) https://java.com/en/download/ | |||
Jmol (Version 1.8.0_301) http://jmol.sourceforge.net/ | |||
Mouse (optional) | Any | ||
PyMOL (Version 2.4.1 – educational): https://pymol.org/2 educational use only version: https://pymol.org/edu/?q=educational |