$$\rightleftharpoonup{xx}$$
$$\longleftharp{xx}$$,
$$\longrightharp{xx}$$,
StickWRLD has been used previously to detect interpositional dependencies (IPDs) between residues in both DNA3 and protein15-17 alignments. These co-evolving residues, while often distal from one another in the sequence alignment, are often proximal to one another in the folded protein. StickWRLD allows rapid discovery of residue-specific co-occurrence at such sites, e.g., an alanine at position “x” is strongly correlated to a threonine at position “y”. Such correlations can be indicative of provable structural relationships, and typically are sites that, by necessity, co-evolve. StickWRLD is able to detect these relationships even when more “traditional” approaches using HMMs to describe motifs fail. For example, analysis of the PFAM alignment of the ADK lid domain using StickWRLD reveals a strong positive correlation between cysteines (C) at positions 4 and 8 and a coordinated pair of C at positions 35 and 38. At the same time, StickWRLD showed a similar strong positive relationship between histidine (H) and serine (S) at 4 and 8, with a strong negative relationships between these and the C quartet at 4, 8, 35, and 38, and a strong positive relationship with aspartic acid (D) and threonine (T) at positions 35 and 38 respectively. Additional IPDs exist between the H,S,D,T motif and a T and G at position **** 10 and 29 in b subtilis **** highlighting the conditional nature of these IPDs - the tetracysteine motif does not 'care' about the identities at these two positions, while the hydrophilic H,S,D,T triad requires specific residues in these positions almost absolutely. These two completely different position-dependent residue motifs can fulfill the same role the ADK lid. As can be seen in Figure 6, a large cluster of IPDs, including a 3-node association between G (glycine) at position 132, Y (tyrosine) at position 135, and a P (proline) at position 141, is visible in the foreground (Figure 6A). In Figure 6B, the view has been skewed to position the user slightly above the cylinder, revealing an IPD between an H (histidine) at position 136 and an M (methionine) at position 29, 107 residues distant. A PFAM HMM-derived motif of the same domain (Figure 2), meanwhile, not only does not detect these as specifically co-occuring motif variants, but also defines the overall groupings in a biologically unsupported scheme16.

Figure 1. “Subway Map” representation of the B. subtilis Adenosine Kinase (ADK) Lid domain structure. Arrows indicate IPDs identified in the PFAM alignment of ADK Lid domain by StickWRLD. StickWRLD is able to correctly identify IPDs within a cluster of residues which are in close proximity in the folded protein. Of particular interest are the T and G pair at positions 9 and 29, which only form an IPD when the tetrad of residues at 4, 7, 24, and 27 is not C,C,C,C). Residue numbers displayed represents B. subtilis position and not PFAM alignment positions. Please click here to view a larger version of this figure.

Figure 2. Skylign18 Hidden Markov Model (HMM) Sequence Logo for the ADK lid domain. While HMMs are powerful tools for determining probabilities at each position as well as the contribution of each site to the overall model, the positional independence of HMMs makes them unsuitable for detecting IPDs. This model does not suggest any of the dependencies seen in the StickWRLD representations (Figure 6). Please click here to view a larger version of this figure.

Figure 3. The StickWRLD Data Loader. Users can choose from existing demo data or load their own data in the form of DNA or Protein sequence alignments.

Figure 4. The StickWRLD Control window. The Control pane allows the user to change various view properties as well as regulate the thresholds controlling the display of edge lines indicating relationships between residues (IPDs). Circled in red are the defaults that typically need to be adjusted for best viewing of any dataset. The Residual value sets the threshold of (observed-expected) for which connector/association lines are drawn. The controls for Column and Ball labels control whether or not the column position and residue values (e.g., “A” for arginine) are displayed. The Column Edge Line control toggles on and off the display of edge lines connecting columns – for dense data sets this is better turned off. The Column Thickness controls whether or not the column itself is displayed – setting this to a very small value (e.g., 0.1) will draw a line through the spheres in the column, making it easy to distinguish the columns from one another. Please click here to view a larger version of this figure.

Figure 5. Initial view of the StickWRLD OpenGL window with the Adenylate Kinase lid domain protein data set loaded. The initial perspective looks “down” through the cylinder comprised of the sequence alignment positions. The user can rotate the cylinder using left-mouse-click-drag, and zoom in/out using right-mouse-click-drag. The initial view is quite dense because the default display shows even small rates of co-evolution. For many proteins, at this setting, distinct modules can be detected, but even in densely co-evolving proteins the display can be rapidly and interactively simplified to find the most important IPDs using the StickWRLD interface. Please click here to view a larger version of this figure.

Figure 6. Closeup view of a StickWRLD visualization of the Adenylate Kinase lid domain protein. Here we have changed the default Residual to 0.2. This increases the threshold for display of inter-residue edges, showing fewer edges. The edges that remain indicate strongly associated IPDs. In addition the view has been rotated and zoomed to allow for easier viewing of the edges. (A) A large cluster of IPDs is visible in the foreground, including a 3-node association between G (glycine) at position 132, Y (tyrosine) at position 135, and a P (proline) at position 141. (B) The view has been skewed to position the user slightly above the cylinder, revealing an IPD between an H (histidine) at position 136 and an M (methionine) at position 29, 107 residues distant. Please click here to view a larger version of this figure.

Figure 7. StickWRLD Control window lower-right information view. CTRL+Left clicking on an object (e.g., sphere or edge) in the OpenGL window displays the information for the object in the lower right of the StickWLRD Control window. Here we see the information for an IPD edge between a methionine at position 29 and a histidine at position 136.