Methods of protein structure determination based on NMR chemical shifts are becoming increasingly common. The most widely used approaches adopt the molecular fragment replacement strategy, in which structural fragments are repeatedly reassembled into different complete conformations in molecular simulations. Although these approaches are effective in generating individual structures consistent with the chemical shift data, they do not enable the sampling of the conformational space of proteins with correct statistical weights. Here, we present a method of molecular fragment replacement that makes it possible to perform equilibrium simulations of proteins, and hence to determine their free energy landscapes. This strategy is based on the encoding of the chemical shift information in a probabilistic model in Markov chain Monte Carlo simulations. First, we demonstrate that with this approach it is possible to fold proteins to their native states starting from extended structures. Second, we show that the method satisfies the detailed balance condition and hence it can be used to carry out an equilibrium sampling from the Boltzmann distribution corresponding to the force field used in the simulations. Third, by comparing the results of simulations carried out with and without chemical shift restraints we describe quantitatively the effects that these restraints have on the free energy landscapes of proteins. Taken together, these results demonstrate that the molecular fragment replacement strategy can be used in combination with chemical shift information to characterize not only the native structures of proteins but also their conformational fluctuations.
A key component of computational biology is to compare the results of computer modelling with experimental measurements. Despite substantial progress in the models and algorithms used in many areas of computational biology, such comparisons sometimes reveal that the computations are not in quantitative agreement with experimental data. The principle of maximum entropy is a general procedure for constructing probability distributions in the light of new data, making it a natural tool in cases when an initial model provides results that are at odds with experiments. The number of maximum entropy applications in our field has grown steadily in recent years, in areas as diverse as sequence analysis, structural modelling, and neurobiology. In this Perspectives article, we give a broad introduction to the method, in an attempt to encourage its further adoption. The general procedure is explained in the context of a simple example, after which we proceed with a real-world application in the field of molecular simulations, where the maximum entropy procedure has recently provided new insight. Given the limited accuracy of force fields, macromolecular simulations sometimes produce results that are at not in complete and quantitative accordance with experiments. A common solution to this problem is to explicitly ensure agreement between the two by perturbing the potential energy function towards the experimental data. So far, a general consensus for how such perturbations should be implemented has been lacking. Three very recent papers have explored this problem using the maximum entropy approach, providing both new theoretical and practical insights to the problem. We highlight each of these contributions in turn and conclude with a discussion on remaining challenges.
We propose a kinetic model for the activation of the las regulon in the opportunistic pathogen Pseudomonas aeruginosa. The model is based on in vitro data and accounts for the LasR dimerization and consecutive activation by binding of two OdDHL signal molecules. Experimentally, the production of the active LasR quorum-sensing regulator was studied in an Escherichia coli background as a function of signal molecule concentration. The functional activity of the regulator was monitored via a GFP reporter fusion to lasB expressed from the native lasB promoter. The new data shows that the active form of the LasR dimer binds two signal molecules cooperatively and that the timescale for reaching saturation is independent of the signal molecule concentration. This favors a picture where the dimerized regulator is protected against proteases and remains protected as it is activated through binding of two successive signal molecules. In absence of signal molecules, the dimerized regulator can dissociate and degrade through proteolytic turnover of the monomer. This resolves the apparent contradiction between our data and recent reports that the fully protected dimer is able to "degrade" when the induction of LasR ceases.
We present a new software framework for Markov chain Monte Carlo sampling for simulation, prediction, and inference of protein structure. The software package contains implementations of recent advances in Monte Carlo methodology, such as efficient local updates and sampling from probabilistic models of local protein structure. These models form a probabilistic alternative to the widely used fragment and rotamer libraries. Combined with an easily extendible software architecture, this makes PHAISTOS well suited for Bayesian inference of protein structure from sequence and/or experimental data. Currently, two force-fields are available within the framework: PROFASI and OPLS-AA/L, the latter including the generalized Born surface area solvent model. A flexible command-line and configuration-file interface allows users quickly to set up simulations with the desired configuration. PHAISTOS is released under the GNU General Public License v3.0. Source code and documentation are freely available from http://phaistos.sourceforge.net. The software is implemented in C++ and has been tested on Linux and OSX platforms.
Conventional methods for protein structure determination from NMR data rely on the ad hoc combination of physical forcefields and experimental data, along with heuristic determination of free parameters such as weight of experimental data relative to a physical forcefield. Recently, a theoretically rigorous approach was developed which treats structure determination as a problem of Bayesian inference. In this case, the forcefields are brought in as a prior distribution in the form of a Boltzmann factor. Due to high computational cost, the approach has been only sparsely applied in practice. Here, we demonstrate that the use of generative probabilistic models instead of physical forcefields in the Bayesian formalism is not only conceptually attractive, but also improves precision and efficiency. Our results open new vistas for the use of sophisticated probabilistic models of biomolecular structure in structure determination from experimental data.
Motor cortical points are linked by intrinsic horizontal connections having a recurrent network topology. However, it is not known whether neural activity can propagate over the area covered by these intrinsic connections and whether there are spatial anisotropies of synaptic strength, as opposed to synaptic density. Moreover, the mechanisms by which activity spreads have yet to be determined. To address these issues, an 8 × 8 microelectrode array was inserted in the forelimb area of the cat motor cortex (MCx). The centre of the array had a laser etched hole ?500 ?m in diameter. A microiontophoretic pipette, with a tip diameter of 2-3 ?m, containing bicuculline methiodide (BIC) was inserted in the hole and driven to a depth of 1200-1400 ?m from the cortical surface. BIC was ejected for ?2min from the tip of the micropipette with positive direct current ranging between 20 and 40 nA in different experiments. This produced spontaneous nearly periodic bursts (0.2-1.0 Hz) of multi-unit activity in a radius of about 400 ?m from the tip of the micropipette. The bursts of neural activity spread at a velocity of 0.11-0.24 ms?¹ (mean=0.14 mm ms?¹, SD=0.05)with decreasing amplitude.The area activated was on average 7.22 mm² (SD=0.91 mm²), or ?92% of the area covered by the recording array. The mode of propagation was determined to occur by progressive recruitment of cortical territory, driven by a central locus of activity of some 400 ?m in radius. Thus, activity did not propagate as a wave. Transection of the connections between the thalamus and MCx did not significantly alter the propagation velocity or the size of the recruited area, demonstrating that the bursts spread along the routes of intrinsic cortical connectivity. These experiments demonstrate that neural activity initiated within a small motor cortical locus (? 400 ?m in radius) can recruit a relatively large neighbourhood in which a variety of muscles acting at several forelimb joints are represented. These results support the hypothesis that the MCx controls the forelimb musculature in an integrated and anticipatory manner based on a recurrent network topology
Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances--so-called "potentials of mean force" (PMFs)--have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state--a necessary component of these potentials--is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities "reference ratio distributions" deriving from the application of the "reference ratio method." This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.
Many human diseases are associated with protein aggregation and fibrillation. We present experiments on in vitro glucagon fibrillation using total internal reflection fluorescence microscopy, providing real-time measurements of single-fibril growth. We find that amyloid fibrils grow in an intermittent fashion, with periods of growth followed by long pauses. The observed exponential distributions of stop and growth times support a Markovian model, in which fibrils shift between the two states with specific rates. Even if the individual rates vary considerably, we observe that the probability of being in the growing (stopping) state is very close to 1/4 (3/4) in all experiments.
Genome sequencing projects have expanded the gap between the amount of known protein sequences and structures. The limitations of current high resolution structure determination methods make it unlikely that this gap will disappear in the near future. Small angle X-ray scattering (SAXS) is an established low resolution method for routinely determining the structure of proteins in solution. The purpose of this study is to develop a method for the efficient calculation of accurate SAXS curves from coarse-grained protein models. Such a method can for example be used to construct a likelihood function, which is paramount for structure determination based on statistical inference.
We present detailed results on the C4-HSL-mediated quorum sensing (QS) regulatory system of the opportunistic Gram-negative bacterium Aeromonas hydrophila. This bacterium contains a particularly simple QS system that allows for a detailed modeling of kinetics. In a model system (i.e., the Escherichia coli monitor strain MH205), the C4-HSL production of A. hydrophila is interrupted by fusion of gfp(ASV). In the present in vitro study, we measure the response of the QS regulatory ahyRI locus in the monitor strain to predetermined concentrations of C4-HSL signal molecules. A minimal kinetic model describes the data well. It can be solved analytically, providing substantial insight into the QS mechanism: at high concentrations of signal molecules, a slow decay of the activated regulator sets the timescale for the QS regulation loop. Slow saturation ensures that, in an A. hydrophila cell, the QS system is activated only by signal molecules produced by other A. hydrophila cells. Separate information on the ahyR and ahyI loci can be extracted, thus allowing the probe to be used in identifying the target when testing QS inhibitors.
Intra-cellular information exchange, propelled by cascades of interacting signalling proteins, is essential for the proper functioning and survival of cells. Now that the interactome of several organisms is being mapped and several structural mechanisms of cooperativity at the molecular level in proteins have been elucidated, the formalization of this fundamental quantity, i.e. information, in these very diverse biological contexts becomes feasible.
Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations. However, instead of examining individual conformations it is in many cases more relevant to analyse ensembles of conformations that have been obtained either through experiments or from methods such as molecular dynamics simulations. We here present three approaches that can be used to compare conformational ensembles in the same way as the root mean square deviation is used to compare individual pairs of structures. The methods are based on the estimation of the probability distributions underlying the ensembles and subsequent comparison of these distributions. We first validate the methods using a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single-molecule refinement.
Protein dynamics play a crucial role in function, catalytic activity, and pathogenesis. Consequently, there is great interest in computational methods that probe the conformational fluctuations of a protein. However, molecular dynamics simulations are computationally costly and therefore are often limited to comparatively short timescales. TYPHON is a probabilistic method to explore the conformational space of proteins under the guidance of a sophisticated probabilistic model of local structure and a given set of restraints that represent nonlocal interactions, such as hydrogen bonds or disulfide bridges. The choice of the restraints themselves is heuristic, but the resulting probabilistic model is well-defined and rigorous. Conceptually, TYPHON constitutes a null model of conformational fluctuations under a given set of restraints. We demonstrate that TYPHON can provide information on conformational fluctuations that is in correspondence with experimental measurements. TYPHON provides a flexible, yet computationally efficient, method to explore possible conformational fluctuations in proteins.
Ensembles of bacteria are able to coordinate their phenotypic behavior in accordance with the size, density, and growth state of the ensemble. This is achieved through production and exchange of diffusible signal molecules in a cell-cell regulatory system termed quorum sensing. In the generic quorum sensor a positive feedback in the production of signal molecules defines the conditions at which the collective behavior switches on. In spite of its conceptual simplicity, a proper measure of biofilm colony "size" appears to be lacking. We establish that the cell density multiplied by a geometric factor which incorporates the boundary conditions constitutes an appropriate size measure. The geometric factor is the square of the radius for a spherical colony or a hemisphere attached to a reflecting surface. If surrounded by a rapidly exchanged medium, the geometric factor is divided by three. For a disk-shaped biofilm the geometric factor is the horizontal dimension multiplied by the height, and the square of the height of the biofilm if there is significant flow above the biofilm. A remarkably simple factorized expression for the size is obtained, which separates the all-or-none ignition caused by the positive feedback from the smoother activation outside the switching region.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.