Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for ?-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.
Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used.
The translocon recognizes transmembrane helices with sufficient level of hydrophobicity and inserts them into the membrane. However, sometimes less hydrophobic helices are also recognized. Positive inside rule, orientational preferences of and specific interactions with neighboring helices have been shown to aid in the recognition of these helices, at least in artificial systems. To better understand how the translocon inserts marginally hydrophobic helices, we studied three naturally occurring marginally hydrophobic helices, which were previously shown to require the subsequent helix for efficient translocon recognition. We find no evidence for specific interactions when we scan all residues in the subsequent helices. Instead, we identify arginines located at the N-terminal part of the subsequent helices that are crucial for the recognition of the marginally hydrophobic transmembrane helices, indicating that the positive inside rule is important. However, in two of the constructs, these arginines do not aid in the recognition without the rest of the subsequent helix; that is, the positive inside rule alone is not sufficient. Instead, the improved recognition of marginally hydrophobic helices can here be explained as follows: the positive inside rule provides an orientational preference of the subsequent helix, which in turn allows the marginally hydrophobic helix to be inserted; that is, the effect of the positive inside rule is stronger if positively charged residues are followed by a transmembrane helix. Such a mechanism obviously cannot aid C-terminal helices, and consequently, we find that the terminal helices in multi-spanning membrane proteins are more hydrophobic than internal helices.
The recognition of transmembrane helices by the translocon is primarily guided by the average hydrophobicity of the potential transmembrane helix. However, the exact hydrophobicity of each amino acid can be identified in several different ways. The free energy of transfer for amino acid analogues between a hydrophobic media, for example, octanol and water can be measured or obtained from simulations, the hydrophobicity can also be estimated by statistical properties from known transmembrane segments and finally the contribution of each amino acid type for the probability of translocon recognition has recently been measured directly. Although these scales correlate quite well, there are clear differences between them and it is not well understood which scale represents neither the biology best nor what the differences are. Here, we try to provide some answers to this by studying the ability of different scales to recognize transmembrane helices and predict the topology of transmembrane proteins. From this analysis it is clear that the biological hydrophobicity scale as well scales created from statistical analysis of membrane helices perform better than earlier experimental scales that are mainly based on measurements of amino acid analogs and not directly on transmembrane helix recognition. Using these results we identified the properties of the scales that perform better than other scales. We find, for instance, that the better performing scales consider proline more hydrophilic. This shows that transmembrane recognition is not only governed by pure hydrophobicity but also by the helix preferences for amino acids, as proline is a strong helix breaker.
The folding of most integral membrane proteins follows a two-step process: initially, individual transmembrane helices are inserted into the membrane by the Sec translocon. Thereafter, these helices fold to shape the final conformation of the protein. However, for some proteins, including Aquaporin 1 (AQP1), the folding appears to follow a more complicated path. AQP1 has been reported to first insert as a four-helical intermediate, where helix 2 and 4 are not inserted into the membrane. In a second step, this intermediate is folded into a six-helical topology. During this process, the orientation of the third helix is inverted. Here, we propose a mechanism for how this reorientation could be initiated: first, helix 3 slides out from the membrane core resulting in that the preceding loop enters the membrane. The final conformation could then be formed as helix 2, 3, and 4 are inserted into the membrane and the reentrant regions come together. We find support for the first step in this process by showing that the loop preceding helix 3 can insert into the membrane. Further, hydrophobicity curves, experimentally measured insertion efficiencies and MD-simulations suggest that the barrier between these two hydrophobic regions is relatively low, supporting the idea that helix 3 can slide out of the membrane core, initiating the rearrangement process.
While early structural models of helix-bundle integral membrane proteins posited that the transmembrane ?-helices [transmembrane helices (TMHs)] were orientated more or less perpendicular to the membrane plane, there is now ample evidence from high-resolution structures that many TMHs have significant tilt angles relative to the membrane. Here, we address the question whether the tilt is an intrinsic property of the TMH in question or if it is imparted on the TMH during folding of the protein. Using a glycosylation mapping technique, we show that four highly tilted helices found in multi-spanning membrane proteins all have much shorter membrane-embedded segments when inserted by themselves into the membrane than seen in the high-resolution structures. This suggests that tilting can be induced by tertiary packing interactions within the protein, subsequent to the initial membrane-insertion step.
The frequency of de novo creation of proteins has been debated. Early it was assumed that de novo creation should be extremely rare and that the vast majority of all protein coding genes were created in early history of life. However, the early genomics era lead to the insight that protein coding genes do appear to be lineage-specific. Today, with thousands of completely sequenced genomes, this impression remains. It has even been proposed that the creation of novel genes, a continuous process where most de novo genes are short-lived, is as frequent as gene duplications. There exist reports with strongly indicative evidence for de novo gene emergence in many organisms ranging from Bacteria, sometimes generated through bacteriophages, to humans, where orphans appear to be overexpressed in brain and testis. In contrast, research on protein evolution indicates that many very distantly related proteins appear to share partial homology. Here, we discuss recent results on de novo gene emergence, as well as important technical challenges limiting our ability to get a definite answer to the extent of de novo protein creation.
Proteins evolve not only through point mutations but also by insertion and deletion events, which affect the length of the protein. It is well known that such indel events most frequently occur in surface-exposed loops. However, detailed analysis of indel events in distantly related and fast-evolving proteins is hampered by the difficulty involved in correctly aligning such sequences. Here, we circumvent this problem by first only analyzing homologous proteins based on length variation rather than pairwise alignments. Using this approach, we find a surprisingly strong relationship between difference in length and difference in the number of intrinsically disordered residues, where up to three quarters of the length variation can be explained by changes in the number of intrinsically disordered residues. Further, we find that disorder is common in both insertions and deletions. A more detailed analysis reveals that indel events do not induce disorder but rather that already disordered regions accrue indels, suggesting that there is a lowered selective pressure for indels to occur within intrinsically disordered regions.
In this chapter, we first discuss protein localization in bacteria and evaluate some localization prediction tools on an independent dataset. Next, we focus on ?-barrel outer membrane proteins (BOMPs), describing and evaluating new tools for BOMP detection and topology prediction. Finally, we apply general protein structure prediction methods on these proteins to show that the structure of most BOMPs in E. coli can be modeled reliably.
Clustering methods are often needed for accurately assessing the quality of modeled protein structures. Recent blind evaluation of quality assessment methods in CASP10 showed that there is little difference between many different methods as far as ranking models and selecting best model are concerned. When comparing many models, the computational cost of the model comparison can become significant. Here, we present PconsD, a fast, stream-computing method for distance-driven model quality assessment that runs on consumer hardware. PconsD is at least one order of magnitude faster than other methods of comparable accuracy.
Recently, several new contact prediction methods have been published. They use (i) large sets of multiple aligned sequences and (ii) assume that correlations between columns in these alignments can be the results of indirect interaction. These methods are clearly superior to earlier methods when it comes to predicting contacts in proteins. Here, we demonstrate that combining predictions from two prediction methods, PSICOV and plmDCA, and two alignment methods, HHblits and jackhmmer at four different e-value cut-offs, provides a relative improvement of 20% in comparison with the best single method, exceeding 70% correct predictions for one contact prediction per residue.
Topology analysis of membrane proteins can be obtained by enzymatic shaving in combination with MS identification of peptides. Ideally, such analysis could provide quite detailed information about the membrane spanning regions. Here, we examine the ability of some shaving enzymes to provide large-scale analysis of membrane proteome topologies. To compare different shaving enzymes, we first analyzed the detected peptides from two over-expressed proteins. Second, we analyzed the peptides from non-over-expressed Escherichia coli membrane proteins with known structure to evaluate the shaving methods. Finally, the identified peptides were used to test the accuracy of a number of topology predictors. At the end we suggest that the usage of thermolysin, an enzyme working at the natural pH of the cell for membrane shaving, is superior because: (i) we detect a similar number of peptides and proteins using thermolysin and trypsin; (ii) thermolysin shaving can be run at a natural pH and (iii) the incubation time is quite short. (iv) Fewer detected peptides from thermolysin shaving originate from the transmembrane regions. Using thermolysin shaving we can also provide a clear separation between the best and the less accurate topology predictors, indicating that using data from shaving can provide valuable information when developing new topology predictors.
Many proteins are composed of protein domains, functional units of common descent. Multidomain forms are common in all eukaryotes making up more than half of the proteome and the evolution of novel domain architecture has been accelerated in metazoans. It is also becoming increasingly clear that alternative splicing is prevalent among vertebrates. Given that protein domains are defined as structurally, functionally and evolutionarily distinct units, one may speculate that some alternative splicing events may lead to clean excisions of protein domains, thus generating a number of different domain architectures from one gene template. However, recent findings indicate that smaller alternative splicing events, in particular in disordered regions, might be more prominent than domain architectural changes. The problem of identifying protein isoforms is, however, still not resolved. Clearly, many splice forms identified through detection of mRNA sequences appear to produce nonfunctional proteins, such as proteins with missing internal secondary structure elements. Here, we review the state of the art methods for identification of functional isoforms and present a summary of what is known, thus far, about alternative splicing with regard to protein domain architectures.
Proteins evolve through point mutations as well as by insertions and deletions (indels). During the last decade it has become apparent that protein regions that do not fold into three-dimensional structures, i.e. intrinsically disordered regions, are quite common. Here, we have studied the relationship between protein disorder and indels using HMM-HMM pairwise alignments in two sets of orthologous eukaryotic protein pairs. First, we show that disordered residues are much more frequent among indel residues than among aligned residues and, also are more prevalent among indels than in coils. Second, we observed that disordered residues are particularly common in longer indels. Disordered indels of short-to-medium size are prevalent in the non-terminal regions of proteins while the longest indels, ordered and disordered alike, occur toward the termini of the proteins where new structural units are comparatively well tolerated. Finally, while disordered regions often evolve faster than ordered regions and disorder is common in indels, there are some previously recognized protein families where the disordered region is more conserved than the ordered region. We find that these rare proteins are often involved in information processes, such as RNA processing and translation. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.
?-Helical membrane proteins are important for many biological functions. Due to physicochemical constraints, the structures of membrane proteins differ from the structure of soluble proteins. Historically, membrane protein structures were assumed to be more or less two-dimensional, consisting of long, straight, membrane-spanning parallel helices packed against each other. However, during the past decade, a number of the new membrane protein structures cast doubt on this notion. Today, it is evident that the structures of many membrane proteins are equally complex as for many soluble proteins. Here, we review this development and discuss the consequences for our understanding of membrane protein biogenesis, folding, evolution, and bioinformatics.
With synthetic gene services, molecular cloning is as easy as ordering a pizza. However choosing the right RNA code for efficient protein production is less straightforward, more akin to deciding on the pizza toppings. The possibility to choose synonymous codons in the gene sequence has ignited a discussion that dates back 50 years: Does synonymous codon use matter? Recent studies indicate that replacement of particular codons for synonymous codons can improve expression in homologous or heterologous hosts, however it is not always successful. Furthermore it is increasingly apparent that membrane protein biogenesis can be codon-sensitive. Single synonymous codon substitutions can influence mRNA stability, mRNA structure, translational initiation, translational elongation and even protein folding. Synonymous codon substitutions therefore need to be carefully evaluated when membrane proteins are engineered for higher production levels and further studies are needed to fully understand how to select the codons that are optimal for higher production. This article is part of a Special Issue entitled: Protein Folding in Membranes.
Kalign2 is one of the fastest and most accurate methods for multiple alignments. However, in contrast to other methods Kalign2 does not allow externally supplied position specific gap penalties. Here, we present a modification to Kalign2, KalignP, so that it accepts such penalties. Further, we show that KalignP using position specific gap penalties obtained from predicted secondary structures makes steady improvement over Kalign2 when tested on Balibase 3.0 as well as on a dataset derived from Pfam-A seed alignments.
State-of-the-art methods for topology of ?-helical membrane proteins are based on the use of time-consuming multiple sequence alignments obtained from PSI-BLAST or other sources. Here, we examine if it is possible to use the consensus of topology prediction methods that are based on single sequences to obtain a similar accuracy as the more accurate multiple sequence-based methods. Here, we show that TOPCONS-single performs better than any of the other topology prediction methods tested here, but ~6% worse than the best method that is utilizing multiple sequence alignments.
CD46 is a C3b/C4b binding complement regulator and a receptor for several human pathogens. We examined the interaction between CD46 and Helicobacter pylori (a bacterium that colonizes the human gastric mucosa and causes gastritis), peptic ulcers, and cancer.
Multiple templates can often be used to build more accurate homology models than models built from a single template. Here we introduce PconsM, an automated protocol that uses multiple templates to build protein models. PconsM has been among the top-performing methods in the recent CASP experiments and consistently perform better than the single template models used in Pcons.net. In particular for the easier targets with many alternative templates with a high degree of sequence identity, quality is readily improved with a few percentages over the highest ranked model built on a single template. PconsM is available as an additional pipeline within the Pcons.net protein structure prediction server.
Many ?-helical membrane proteins contain internal symmetries, indicating that they might have evolved through a gene duplication and fusion event. Here, we have characterized internal duplications among membrane proteins of known structure and in three complete genomes. We found that the majority of large transmembrane (TM) proteins contain an internal duplication. The duplications found showed a large variability both in the number of TM-segments included and in their orientation. Surprisingly, an approximately equal number of antiparallel duplications and parallel duplications were found. However, of all 11 superfamilies with an internal duplication, only for one, the AcrB Multidrug Efflux Pump, the duplicated unit could be found in its nonduplicated form. An evolutionary analysis of the AcrB homologs indicates that several independent fusions have occurred, including the fusion of the SecD and SecF proteins into the 12-TM-protein SecDF in Brucella and Staphylococcus aureus. In one additional case, the Vitamin B12 transporter-like ABC transporters, the protein had undergone an additional fusion to form protein with 20 TM-helices in several bacterial genomes. Finally, homologs to all human membrane proteins were used to detect the presence of duplicated and nonduplicated proteins. This confirmed that only in rare cases can homologs with different duplication status be found, although internal symmetry is frequent among these proteins. One possible explanation is that it is frequent that duplication and fusion events happen simultaneously and that there is almost always a strong selective advantage for the fused form.
The orderly progression through mitosis is regulated by the Anaphase-Promoting Complex (APC), a large multiprotein E3 ubiquitin ligase that targets key cell-cycle regulators for destruction by the 26 S proteasome. The APC is composed of at least 11 subunits and associates with additional regulatory activators during mitosis and interphase cycles. Despite extensive research on APC and activator functions in the cell cycle, only a few components have been functionally characterized in plants.
In water-soluble proteins it is energetically favorable to bury hydrophobic residues and to expose polar and charged residues. In contrast to water soluble proteins, transmembrane proteins face three distinct environments; a hydrophobic lipid environment inside the membrane, a hydrophilic water environment outside the membrane and an interface region rich in phospholipid head-groups. Therefore, it is energetically favorable for transmembrane proteins to expose different types of residues in the different regions.
Galanin is a neuropeptide found throughout the central and peripheral nervous systems of a wide range of species, ranging from human and mouse to frog and tuna. Galanin mediates its physiological roles through three receptors (GalR1-3), all members of the G-protein coupled receptor family. In mapping these roles, receptor subtype selective ligands are crucial tools. To facilitate the ligand design, data on receptor structure and interaction points are of great importance. The current study investigates the mechanism by which galanin interacts with GalR3. Mutated receptors were tested with competitive binding analysis in vitro. Our studies identify six mutagenic constructs that lost receptor affinity completely, despite being expressed at the cell surface. Mutations of the Tyr103(3.33) in transmembrane helix (TM) III, His251(6.51) in TM VI, Arg273(7.35) or His277(7.39) in TM VII, Phe263(6.63) or Tyr270(7.32) in the extracellular loop III all result in complete reduction of ligand binding. In addition, docking studies of an in silico model of GalR3 propose that four of the identified residues interact with pharmacophores situated within the galanin(2-6) sequence. This study provides novel insights into the interaction between ligands and GalR3 and highlights the requirement for correct design of targeting ligands.
Here, we present a study of polar residues within the membrane core of alpha-helical membrane proteins. As expected, polar residues are less frequent in the membrane than expected. Further, most of these residues are buried within the interior of the protein and are only rarely exposed to lipids. However, the polar groups often border internal water filled cavities, even if the rest of the sidechain is buried. A survey of their functional roles in known structures showed that the polar residues are often directly involved in binding of small compounds, especially in channels and transporters, but other functions including proton transfer, catalysis, and selectivity have also been attributed to these proteins. Among the polar residues histidines often interact with prosthetic groups in photosynthetic- and oxidoreductase-related proteins, whereas prolines often are required for conformational changes of the proteins. Indeed, the polar residues in the membrane core are more conserved than other residues in the core, as well as more conserved than polar residues outside the membrane. The reason is twofold; they are often (i) buried in the interior of the protein and (ii) directly involved in the function of the proteins. Finally, a method to identify which polar residues are present within the membrane core directly from protein sequences was developed. Applying the method to the set of all human membrane proteins the prediction indicates that polar residues were most frequent among active transporter proteins and GPCRs, whereas infrequent in families with few transmembrane regions, such as non-GPCR receptors.
Protein domain repeats are common in proteins that are central to the organization of a cell, in particular in eukaryotes. They are known to evolve through internal tandem duplications. However, the understanding of the underlying mechanisms is incomplete. To shed light on repeat expansion mechanisms, we have studied the evolution of the muscle protein Nebulin, a protein that contains a large number of actin-binding nebulin domains. Nebulin proteins have evolved from an invertebrate precursor containing two nebulin domains. Repeat regions have expanded through duplications of single domains, as well as duplications of a super repeat (SR) consisting of seven nebulins. We show that the SR has evolved independently into large regions in at least three instances: twice in the invertebrate Branchiostoma floridae and once in vertebrates. In-depth analysis reveals several recent tandem duplications in the Nebulin gene. The events involve both single-domain and multidomain SR units or several SR units. There are single events, but frequently the same unit is duplicated multiple times. For instance, an ancestor of human and chimpanzee underwent two tandem duplications. The duplication junction coincides with an Alu transposon, thus suggesting duplication through Alu-mediated homologous recombination. Duplications in the SR region consistently involve multiples of seven domains. However, the exact unit that is duplicated varies both between species and within species. Thus, multiple tandem duplications of the same motif did not create the large Nebulin protein. Finally, analysis of segmental duplications in the human genome reveals that duplications are more common in genes containing domain repeats than in those coding for nonrepeated proteins. In fact, segmental duplications are found three to six times more often in long repeated genes than expected by chance.
We have determined the optimal placement of individual transmembrane helices in the Pyrococcus horikoshii Glt(Ph) glutamate transporter homolog in the membrane. The results are in close agreement with theoretical predictions based on hydrophobicity, but do not, in general, match the known three-dimensional structure, suggesting that transmembrane helices can be repositioned relative to the membrane during folding and oligomerization. Theoretical analysis of a database of membrane protein structures provides additional support for this idea. These observations raise new challenges for the structure prediction of membrane proteins and suggest that the classical two-stage model often used to describe membrane protein folding needs to be modified.
In mammalian cells, most integral membrane proteins are initially inserted into the endoplasmic reticulum membrane by the so-called Sec61 translocon. However, recent predictions suggest that many transmembrane helices (TMHs) in multispanning membrane proteins are not sufficiently hydrophobic to be recognized as such by the translocon. In this study, we have screened 16 marginally hydrophobic TMHs from membrane proteins of known three-dimensional structure. Indeed, most of these TMHs do not insert efficiently into the endoplasmic reticulum membrane by themselves. To test if loops or TMHs immediately upstream or downstream of a marginally hydrophobic helix might influence the insertion efficiency, insertion of marginally hydrophobic helices was also studied in the presence of their neighboring loops and helices. The results show that flanking loops and nearest-neighbor TMHs are sufficient to ensure the insertion of many marginally hydrophobic helices. However, for at least two of the marginally hydrophobic helices, the local interactions are not enough, indicating that post-insertional rearrangements are involved in the folding of these proteins.
Model Quality Assessment Programs (MQAPs) are programs developed to rank protein models. These methods can be trained to predict the overall global quality of a model or what local regions in a model that are likely to be incorrect. In CASP8, we participated with two predictors that predict both global and local quality using either consensus information, Pcons, or purely structural information, ProQ. Consistently with results in previous CASPs, the best performance in CASP8 was obtained using the Pcons method. Furthermore, the results show that the modification introduced into Pcons for CASP8 improved the predictions against GDT_TS and now a correlation coefficient above 0.9 is achieved, whereas the correlation for ProQ is about 0.7. The correlation is better for the easier than for the harder targets, but it is not below 0.5 for a single target and below 0.7 only for three targets. The correlation coefficient for the best local quality MQAP is 0.68 showing that there is still clear room for improvement within this area. We also detect that Pcons still is not always able to identify the best model. However, we show that using a linear combination of Pcons and ProQ it is possible to select models that are better than the models from the best single server. In particular, the average quality over the hard targets increases by about 6% compared with using Pcons alone.
Protein structures change during evolution in response to mutations. Here, we analyze the mapping between sequence and structure in a set of structurally aligned protein domains. To avoid artifacts, we restricted our attention only to the core components of these structures. We found that on average, using different measures of structural change, protein cores evolve linearly with evolutionary distance (amino acid substitutions per site). This is true irrespective of which measure of structural change we used, whether RMSD or discrete structural descriptors for secondary structure, accessibility, or contacts. This linear response allows us to quantify the claim that structure is more conserved than sequence. Using structural alphabets of similar cardinality to the sequence alphabet, structural cores evolve three to ten times slower than sequences. Although we observed an average linear response, we found a wide variance. Different domain families varied fivefold in structural response to evolution. An attempt to categorically analyze this variance among subgroups by structural and functional category revealed only one statistically significant trend. This trend can be explained by the fact that beta-sheets change faster than alpha-helices, most likely due to that they are shorter and that change occurs at the ends of the secondary structure elements.
TOPCONS (http://topcons.net/) is a web server for consensus prediction of membrane protein topology. The underlying algorithm combines an arbitrary number of topology predictions into one consensus prediction and quantifies the reliability of the prediction based on the level of agreement between the underlying methods, both on the protein level and on the level of individual TM regions. Benchmarking the method shows that overall performance levels match the best available topology prediction methods, and for sequences with high reliability scores, performance is increased by approximately 10 percentage points. The web interface allows for constraining parts of the sequence to a known inside/outside location, and detailed results are displayed both graphically and in text format.
For large regions of many proteins, and even entire proteins, no homology to known domains or proteins can be detected. These sequences are often referred to as orphans. Surprisingly, it has been reported that the large number of orphans is sustained in spite of a rapid increase of available genomic sequences. However, it is believed that de novo creation of coding sequences is rare in comparison to mechanisms such as domain shuffling and gene duplication; hence, most sequences should have homologs in other genomes. To investigate this, the sequences of 19 complete fungi genomes were compared. By using the phylogenetic relationship between these genomes, we could identify potentially de novo created orphans in Saccharomyces cerevisiae. We found that only a small fraction, <2%, of the S. cerevisiae proteome is orphan, which confirms that de novo creation of coding sequences is indeed rare. Furthermore, we found it necessary to compare the most closely related species to distinguish between de novo created sequences and rapidly evolving sequences where homologs are present but cannot be detected. Next, the orphan proteins (OPs) and orphan domains (ODs) were characterized. First, it was observed that both OPs and ODs are short. In addition, at least some of the OPs have been shown to be functional in experimental assays, showing that they are not pseudogenes. Furthermore, in contrast to what has been reported before and what is seen for older orphans, S. cerevisiae specific ODs and proteins are not more disordered than other proteins. This might indicate that many of the older, and earlier classified, orphans indeed are fast-evolving sequences. Finally, >90% of the detected ODs are located at the protein termini, which suggests that these orphans could have been created by mutations that have affected the start or stop codons.
The galanin receptor family comprises of three members, GalR1, GalR2 and GalR3, all belonging to the G-protein-couple receptor superfamily. All three receptors bind the peptide hormone galanin, but show distinctly different binding properties to other molecules and effects on intracellular signaling. To gain insight on the molecular basis of receptor subtype specificity, we have generated a three-dimensional model for each of the galanin receptors based on its homologs in the same family. We found significant differences in the organization of the binding pockets among the three types of receptors, which might be the key for specific molecular recognition of ligands. Through docking of fragments of the galanin peptide and a number of ligands, we investigated the involvement of transmembrane and loop residues in ligand interaction.
?-Helical hairpins, consisting of a pair of closely spaced transmembrane (TM) helices that are connected by a short interfacial turn, are the simplest structural motifs found in multi-spanning membrane proteins. In naturally occurring hairpins, the presence of polar residues is common and predicted to complicate membrane insertion. We postulate that the pre-packing process offsets any energetic cost of allocating polar and charged residues within the hydrophobic environment of biological membranes. Consistent with this idea, we provide here experimental evidence demonstrating that helical hairpin insertion into biological membranes can be driven by electrostatic interactions between closely separated, poorly hydrophobic sequences. Additionally, we observe that the integral hairpin can be stabilized by a short loop heavily populated by turn-promoting residues. We conclude that the combined effect of TM-TM electrostatic interactions and tight turns plays an important role in generating the functional architecture of membrane proteins and propose that helical hairpin motifs can be acquired within the context of the Sec61 translocon at the early stages of membrane protein biosynthesis. Taken together, these data further underline the potential complexities involved in accurately predicting TM domains from primary structures.
Transmembrane ?-barrels exist in the outer membrane of gram-negative bacteria as well as in chloroplast and mitochondria. They are often involved in transport processes and are promising antimicrobial drug targets. Structures of only a few ?-barrel protein families are known. Therefore, a method that could automatically generate such models would be valuable. The symmetrical arrangement of the barrels suggests that an approach based on idealized geometries may be successful.
For current state-of-the-art methods, the prediction of correct topology of membrane proteins has been reported to be above 80%. However, this performance has only been observed in small and possibly biased data sets obtained from protein structures or biochemical assays. Here, we test a number of topology predictors on an "unseen" set of proteins of known structure and also on four "genome-scale" data sets, including one recent large set of experimentally validated human membrane proteins with glycosylated sites. The set of glycosylated proteins is also used to examine the ability of prediction methods to separate membrane from nonmembrane proteins. The results show that methods utilizing multiple sequence alignments are overall superior to methods that do not. The best performance is obtained by TOPCONS, a consensus method that combines several of the other prediction methods. The best methods to distinguish membrane from nonmembrane proteins belong to the "Phobius" group of predictors. We further observe that the reported high accuracies in the smaller benchmark sets are not quite maintained in larger scale benchmarks. Instead, we estimate the performance of the best prediction methods for eukaryotic membrane proteins to be between 60% and 70%. The low agreement between predictions from different methods questions earlier estimates about the global properties of the membrane proteome. Finally, we suggest a pipeline to estimate these properties using a combination of the best predictors that could be applied in large-scale proteomics studies of membrane proteins.
Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.
Particularly in higher eukaryotes, some protein domains are found in tandem repeats, performing broad functions often related to cellular organization. For instance, the eukaryotic protein filamin interacts with many proteins and is crucial for the cytoskeleton. The functional properties of long repeat domains are governed by the specific properties of each individual domain as well as by the repeat copy number. To provide better understanding of the evolutionary and functional history of repeating domains, we investigated the mode of evolution of the filamin domain in some detail. Among the domains that are common in long repeat proteins, sushi and spectrin domains evolve primarily through cassette tandem duplications while scavenger and immunoglobulin repeats appear to evolve through clustered tandem duplications. Additionally, immunoglobulin and filamin repeats exhibit a unique pattern where every other domain shows high sequence similarity. This pattern may be the result of tandem duplications, serve to avert aggregation between adjacent domains or it is the result of functional constraints. In filamin, our studies confirm the presence of interspersed integrin binding domains in vertebrates, while invertebrates exhibit more varied patterns, including more clustered integrin binding domains. The most notable case is leech filamin, which contains a 20 repeat expansion and exhibits unique dimerization topology. Clearly, invertebrate filamins are varied and contain examples of similar adjacent integrin-binding domains. Given that invertebrate integrin shows more similarity to the weaker filamin binder, integrin ?3, it is possible that the distance between integrin-binding domains is not as crucial for invertebrate filamins as for vertebrates.
Transmembrane ? barrel proteins (TMBs) are found in the outer membrane of Gram-negative bacteria, chloroplast and mitochondria. They play a major role in the translocation machinery, pore formation, membrane anchoring and ion exchange. TMBs are also promising targets for antimicrobial drugs and vaccines. Given the difficulty in membrane protein structure determination, computational methods to identify TMBs and predict the topology of TMBs are important.
Functioning and processing of membrane proteins critically depend on the way their transmembrane segments are embedded in the membrane. Sphingolipids are structural components of membranes and can also act as intracellular second messengers. Not much is known of sphingolipids binding to transmembrane domains (TMDs) of proteins within the hydrophobic bilayer, and how this could affect protein function. Here we show a direct and highly specific interaction of exclusively one sphingomyelin species, SM 18, with the TMD of the COPI machinery protein p24 (ref. 2). Strikingly, the interaction depends on both the headgroup and the backbone of the sphingolipid, and on a signature sequence (VXXTLXXIY) within the TMD. Molecular dynamics simulations show a close interaction of SM 18 with the TMD. We suggest a role of SM 18 in regulating the equilibrium between an inactive monomeric and an active oligomeric state of the p24 protein, which in turn regulates COPI-dependent transport. Bioinformatic analyses predict that the signature sequence represents a conserved sphingolipid-binding cavity in a variety of mammalian membrane proteins. Thus, in addition to a function as second messengers, sphingolipids can act as cofactors to regulate the function of transmembrane proteins. Our discovery of an unprecedented specificity of interaction of a TMD with an individual sphingolipid species adds to our understanding of why biological membranes are assembled from such a large variety of different lipids.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.