Neurodegenerative diseases and other amyloidoses are linked to the formation of amyloid fibrils. It has been shown that the ability to form these fibrils is coded by the amino acid sequence. Existing methods for the prediction of amyloidogenicity generate an unsatisfactory high number of false positives when tested against sequences of the disease-related proteins.
The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request.
The established correlation between neurodegenerative disorders and intracerebral deposition of polyglutamine aggregates motivates attempts to better understand their fibrillar structure. We designed polyglutamines with a few lysines inserted to overcome the hindrance of extreme insolubility and two D-lysines to limit the lengths of ?-strands. One is 33 amino acids long (PolyQKd-33) and the other has one fewer glutamine (PolyQKd-32). Both form well-dispersed fibrils suitable for analysis by electron microscopy. Electron diffraction confirmed cross-? structures in both fibrils. Remarkably, the deletion of just one glutamine residue from the middle of the peptide leads to substantially different amyloid structures. PolyQKd-32 fibrils are consistently 10-20% wider than PolyQKd-33, as measured by negative staining, cryo-electron microscopy, and scanning transmission electron microscopy. Scanning transmission electron microscopy analysis revealed that the PolyQKd-32 fibrils have 50% higher mass-per-length than PolyQKd-33. This distinction can be explained by a superpleated ?-structure model for PolyQKd-33 and a model with two ?-solenoid protofibrils for PolyQKd-32. These data provide evidence for ?-arch-containing structures in polyglutamine fibrils and open future possibilities for structure-based drug design.
The R2TP is a recently identified Hsp90 co-chaperone, composed of four proteins as follows: Pih1D1, RPAP3, and the AAA(+)-ATPases RUVBL1 and RUVBL2. In mammals, the R2TP is involved in the biogenesis of cellular machineries such as RNA polymerases, small nucleolar ribonucleoparticles and phosphatidylinositol 3-kinase-related kinases. Here, we characterize the spaghetti (spag) gene of Drosophila, the homolog of human RPAP3. This gene plays an essential function during Drosophila development. We show that Spag protein binds Drosophila orthologs of R2TP components and Hsp90, like its yeast counterpart. Unexpectedly, Spag also interacts and stimulates the chaperone activity of Hsp70. Using null mutants and flies with inducible RNAi, we show that spaghetti is necessary for the stabilization of snoRNP core proteins and target of rapamycin activity and likely the assembly of RNA polymerase II. This work highlights the strong conservation of both the HSP90/R2TP system and its clients and further shows that Spag, unlike Saccharomyces cerevisiae Tah1, performs essential functions in metazoans. Interaction of Spag with both Hsp70 and Hsp90 suggests a model whereby R2TP would accompany clients from Hsp70 to Hsp90 to facilitate their assembly into macromolecular complexes.
Protein ?-helical coiled coil structures that elicit antibody responses, which block critical functions of medically important microorganisms, represent a means for vaccine development. By using bioinformatics algorithms, a total of 50 antigens with ?-helical coiled coil motifs orthologous to Plasmodium falciparum were identified in the P. vivax genome. The peptides identified in silico were chemically synthesized; circular dichroism studies indicated partial or high ?-helical content. Antigenicity was evaluated using human sera samples from malaria-endemic areas of Colombia and Papua New Guinea. Eight of these fragments were selected and used to assess immunogenicity in BALB/c mice. ELISA assays indicated strong reactivity of serum samples from individuals residing in malaria-endemic regions and sera of immunized mice, with the ?-helical coiled coil structures. In addition, ex vivo production of IFN-? by murine mononuclear cells confirmed the immunogenicity of these structures and the presence of T-cell epitopes in the peptide sequences. Moreover, sera of mice immunized with four of the eight antigens recognized native proteins on blood-stage P. vivax parasites, and antigenic cross-reactivity with three of the peptides was observed when reacted with both the P. falciparum orthologous fragments and whole parasites. Results here point to the ?-helical coiled coil peptides as possible P. vivax malaria vaccine candidates as were observed for P. falciparum. Fragments selected here warrant further study in humans and non-human primate models to assess their protective efficacy as single components or assembled as hybrid linear epitopes.
RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.
Recent studies have shown that Sup35p prion fibrils probably have a parallel in-register ?-structure. However, the part(s) of the N-domain critical for fibril formation and maintenance of the [PSI(+)] phenotype remains unclear. Here we designed a set of five SUP35 mutant alleles (sup35(KK)) with lysine substitutions in each of five N-domain repeats, and investigated their effect on infectivity and ability of corresponding proteins to aggregate and coaggregate with wild type Sup35p in the [PSI(+)] strain. Alleles sup35-M1 (Y46K/Q47K) and sup35-M2 (Q61K/Q62K) led to prion loss, whereas sup35-M3 (Q70K/Q71K), sup35-M4 (Q80K/Q81K), and sup35-M5 (Q89K/Q90K) were able to maintain the [PSI(+)] prion. This suggests that the critical part of the parallel in-register ?-structure for the studied [PSI(+)] prion variant lies in the first 63-69 residues. Our study also reveals an unexpected interplay between the wild type Sup35p and proteins expressed from the sup35(KK) alleles during prionization. Both Sup35-M1p and Sup35-M2p coaggregated with Sup35p, but only sup35-M2 led to prion loss in a dominant manner. We suggest that in the fibrils, Sup35p can bind to Sup35-M1p in the same conformation, whereas Sup35-M2p only allowed the Sup35p conformation that leads to the non-heritable fold. Mutations sup35-M4 and sup35-M5 influence the structure of the prion forming region to a lesser extent, and can lead to the formation of new prion variants.
Epithelial cell adhesion molecule (EpCAM) is a cell-surface protein highly expressed in embryonic tissues and in malignant carcinomas. We report that EpCAM acts as a potent inhibitor of novel protein kinase C (nPKC) in both embryos and cancer cells. We observed dramatic effects of loss of EpCAM on amphibian embryonic tissues, which include sequentially strong overstimulation of PKC activity and of the Erk pathway, leading to exacerbated myosin contractility, loss of cadherin-mediated adhesion, tissue dissociation, and, ultimately, cell death. We show that PKC inhibition is caused by a short segment of the EpCAM cytoplasmic tail. This motif resembles the pseudosubstrate inhibitory domains of PKCs and binds nPKCs with high affinity. A bioinformatics search reveals the existence of similar motifs in other plasma membrane proteins, most of which are cell-cell adhesion molecules. Thus, direct inhibition of PKC by EpCAM represents a general mode of regulation of signal transduction by cell-surface proteins.
Plasmodium vivax circumsporozoite (PvCS) protein is a major sporozoite surface antigen involved in parasite invasion of hepatocytes and is currently being considered as vaccine candidate. PvCS contains a dimorphic central repetitive fragment flanked by conserved regions that contain functional domains.
The bioinformatics analysis of proteins containing tandem repeats requires special computer programs and databases, since the conventional approaches predominantly developed for globular domains have limited success. Here, I survey bioinformatics tools which have been developed recently for identification and proteome-wide analysis of protein repeats. The last few years have also been marked by an emergence of new 3D structures of these proteins. Appraisal of the known structures and their classification uncovers a straightforward relationship between their architecture and the length of the repetitive units. This relationship and the repetitive character of structural folds suggest rules for better prediction of the 3D structures of such proteins. Furthermore, bioinformatics approaches combined with low resolution structural data, from biophysical techniques, especially, the recently emerged cryo-electron microscopy, lead to reliable prediction of the protein repeat structures and their mode of binding with partners within molecular complexes. This hybrid approach can actively be used for structural and functional annotations of proteomes.
A new strategy for the rapid identification of new malaria antigens based on protein structural motifs was previously described. We identified and evaluated the malaria vaccine potential of fragments of several malaria antigens containing ?-helical coiled coil protein motifs. By taking advantage of the relatively short size of these structural fragments, we constructed different poly-epitopes in which 3 or 4 of these segments were joined together via a non-immunogenic linker. Only peptides that are targets of human antibodies with anti-parasite in vitro biological activities were incorporated. One of the constructs, P181, was well recognized by sera and peripheral blood mononuclear cells (PBMC) of adults living in malaria-endemic areas. Affinity purified antigen-specific human antibodies and sera from P181-immunized mice recognised native proteins on malaria-infected erythrocytes in both immunofluorescence and western blot assays. In addition, specific antibodies inhibited parasite development in an antibody dependent cellular inhibition (ADCI) assay. Naturally induced antigen-specific human antibodies were at high titers and associated with clinical protection from malaria in longitudinal follow-up studies in Senegal.
Tropomodulin is a tropomyosin-dependent actin filament capping protein involved in the structural formation of thin filaments and in the regulation of their lengths through its localization at the pointed ends of actin filaments. The disordered N-terminal domain of tropomodulin contains three functional sites: two tropomyosin-binding and one tropomyosin-dependent actin-capping sites. The C-terminal half of tropomodulin consists of one compact domain containing a tropomyosin-independent actin-capping site. Here we determined the structural properties of tropomodulin-1 that affect its roles in cardiomyocytes. To explore the significance of individual tropomyosin-binding sites, GFP-tropomodulin-1 with single mutations that destroy each tropomyosin-binding site was expressed in cardiomyocytes. We demonstrated that both sites are necessary for the optimal localization of tropomodulin-1 at thin filament pointed ends, with site 2 acting as the major determinant. To investigate the functional properties of the tropomodulin C-terminal domain, truncated versions of GFP-tropomodulin-1 were expressed in cardiomyocytes. We discovered that the leucine-rich repeat (LRR) fold and the C-terminal helix are required for its proper targeting to the pointed ends. To investigate the structural significance of the LRR fold, we generated three mutations within the C-terminal domain (V232D, F263D, and L313D). Our results show that these mutations affect both tropomyosin-independent actin-capping activity and pointed end localization, most likely by changing local conformations of either loops or side chains of the surfaces involved in the interactions of the LRR domain. Studying the influence of these mutations individually, we concluded that, in addition to the tropomyosin-independent actin-capping site, there appears to be another regulatory site within the tropomodulin C-terminal domain.
Long synthetic peptides (LSPs) have a variety of important clinical uses as synthetic vaccines and drugs. Techniques for peptide synthesis were revolutionized in the 1960s and 1980s, after which efficient techniques for purification and characterization of the product were developed. These improved techniques allowed the stepwise synthesis of increasingly longer products at a faster rate, greater purity, and lower cost for clinical use. A synthetic peptide approach, coupled with bioinformatics analysis of genomes, can tremendously expand the search for clinically relevant products. In this Review, we discuss efforts to develop a malaria vaccine from LSPs, among other clinically directed work.
The vast majority of protein sequences are aperiodic; they do not have any strong bias in the amino acid composition, and they use a subtle mixture of all or most of the 20 amino acid residues to code a great number of various structures and functions. In this context, homorepeats, runs of a single amino acid residue, represent unusual, eye-catching motifs in proteins. Despite the sequence simplicity and relatively small size, the homorepeat runs have a strong potential for molecular interactions due to the excessively high local concentration of a certain physico-chemical property. Appearance of such runs within proteins may give them new structural and functional features. An increasing number of studies demonstrate the abundance of these motifs in proteins, their important roles in biological processes, and their link to a number of hereditary and age-related diseases. In this chapter, we summarize data on the distribution of homorepeats in proteomes and on their structural properties, evolution, and functions.
The multisubunit Golgi-associated retrograde protein (GARP) complex is required for tethering and fusion of endosome-derived transport vesicles to the trans-Golgi network. Mutation of leucine-967 to glutamine in the Vps54 subunit of GARP is responsible for spinal muscular atrophy in the wobbler mouse, an animal model of amyotrophic lateral sclerosis. The crystal structure at 1.7 A resolution of the mouse Vps54 C-terminal fragment harboring leucine-967, in conjunction with comparative sequence analysis, reveals that Vps54 has a continuous alpha-helical bundle organization similar to that of other multisubunit tethering complexes. The structure shows that leucine-967 is buried within the alpha-helical bundle through predominantly hydrophobic interactions that are critical for domain stability and folding in vitro. Mutation of this residue to glutamine does not prevent integration of Vps54 into the GARP complex but greatly reduces the half-life and levels of the protein in vivo. Severely reduced levels of mutant Vps54 and, consequently, of the whole GARP complex underlie the phenotype of the wobbler mouse.
We analysed the structural properties of protein regions containing arrays of perfect and nearly perfect tandem repeats. Naturally occurring proteins with perfect repeats are practically absent among the proteins with known 3D structures. The great majority of such regions in the Protein Data Bank are found in the proteins designed de novo. The abundance of natural structured proteins with tandem repeats is inversely correlated with the repeat perfection: the chance of finding natural structured proteins in the Protein Data Bank increases with a decrease in the level of repeat perfection. Prediction of intrinsic disorder within the tandem repeats in the SwissProt proteins supports the conclusion that the level of repeat perfection correlates with their tendency to be unstructured. This correlation is valid across the various species and subcellular localizations, although the level of disordered tandem repeats varies significantly between these datasets. On average, in prokaryotes, tandem repeats of cytoplasmic proteins were predicted to be the most structured, whereas in eukaryotes, the most structured portion of the repeats was found in the membrane proteins. Our study supports the hypothesis that, in general, the repeat perfection is a sign of recent evolutionary events rather than of exceptional structural and (or) functional importance of the repeat residues.
Amyloid fibrils are filamentous protein aggregates that accumulate in diseases such as Alzheimers or type II diabetes. The amyloid-forming protein is disease specific. Amyloids may also be formed in vitro from many other proteins, after first denaturing them. Unlike the diverse native folds of these proteins, their amyloids are fundamentally similar in being rigid, smooth-sided, and cross-beta-structured, that is, with beta strands running perpendicular to the fibril axis. In the absence of high-resolution fibril structures, increasingly credible models are being derived by integrating data from a crossfire of experimental techniques. Most current models of disease-related amyloids invoke "beta arcades," columnar structures produced by in-register stacking of "beta arches." A beta arch is a strand-turn-strand motif in which the two beta strands interact via their side chains, not via the polypeptide backbone as in a conventional beta hairpin. Crystal structures of beta-solenoids, a class of proteins with amyloid-like properties, offer insight into the beta-arc turns found in beta arches. General conformational and thermodynamic considerations suggest that complexes of 2 or more beta arches may nucleate amyloid fibrillogenesis in vivo. The apparent prevalence of beta arches and their components have implications for identifying amyloidogenic sequences, elucidating fibril polymorphisms, predicting the locations and conformations of beta arcs within amyloid fibrils, and refining existing fibril models.
TLR2 is a pattern recognition receptor that functions in association with TLR1 or TLR6 to mediate innate immune responses to a variety of conserved microbial products. In the present study, the ectodomain of TLR2 was extensively mutated, and the mutants were assessed for their ability to bind and to mediate cellular responses to triacylated lipopeptide Pam(3)CSK(4). This analysis provides evidence that the recently published crystal structure of the TLR2-TLR1-Pam(3)CSK(4) complex represents a functional signal-inducing complex. Furthermore, we report that extended H-bond networks on the surface of TLR2 are critical for signaling in response to Pam(3)CSK(4) and to other di- and tri-acylated TLR2-TLR6 and TLR2-TLR1 ligands. Based on this finding, we suggest a dynamic model for TLR2-mediated recognition of these ligands in which TLR2 fluctuates between a conformation that is more suitable for binding of the fatty acyl moieties of the ligands and a conformation that favors, via a specific orientation of the ligand head group, formation of a signal-inducing ternary complex.
Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins.
Attachment to host tissues is a critical step in the pathogenesis of most bacterial infections. Enterotoxigenic Escherichia coli (ETEC) remains one of the principal causes of infectious diarrhea in humans. The recent identification of additional ETEC surface molecules suggests that new targets may be exploited in vaccine development. The EtpA protein identified in ETEC H10407 is a large glycosylated adhesin secreted via the two-partner secretion system. EtpA requires its putative partner EtpB for translocation across the outer membrane (OM). We investigated the biochemical and electrophysiological properties of purified EtpB. We showed that EtpB is 65-kDa heat-modifiable protein localized to the OM. Electrophysiological experiments indicated that EtpB is able to form pores in planar lipid bilayer membranes with an asymmetric current, suggesting its functional asymmetry. The pore of EtpB frequently assumes an opened conformation and fluctuates between three well-defined conductance states. In silico analysis of the EtpB amino acid sequence and molecular modeling suggest that EtpB is similar to the well-known TpsB protein FhaC from Bordetella pertussis and has a C-terminal transmembrane beta-barrel domain that is occluded by an N-terminal alpha-helix, an extracellular loop, and two periplasmic polypeptide-transport-associated (POTRA) domains. Together, these data confirm that EtpB is a pore-forming protein mainly folded into a beta-barrel conformation and indicate that EtpB presents typical features of the OM TpsB proteins.
The availability of the P. falciparum genome has led to novel ways to identify potential vaccine candidates. A new approach for antigen discovery based on the bioinformatic selection of heptad repeat motifs corresponding to alpha-helical coiled coil structures yielded promising results. To elucidate the question about the relationship between the coiled coil motifs and their sequence conservation, we have assessed the extent of polymorphism in putative alpha-helical coiled coil domains in culture strains, in natural populations and in the single nucleotide polymorphism data available at PlasmoDB.
Proteins located on the surface of the pathogenic malaria parasite Plasmodium falciparum are objects of intensive studies due to their important role in the invasion of human cells and the accessibility to host antibodies thus making these proteins attractive vaccine candidates. One of these proteins, merozoite surface protein 3 (MSP3) represents a leading component among vaccine candidates; however, little is known about its structure and function. Our biophysical studies suggest that the 40 residue C-terminal domain of MSP3 protein self-assembles into a four-stranded alpha-helical coiled coil structure where alpha-helices are packed "side-by-side". A bioinformatics analysis provides an extended list of known and putative proteins from different species of Plasmodium which have such MSP3-like C-terminal domains. This finding allowed us to extend some conclusions of our studies to a larger group of the malaria surface proteins. Possible structural and functional roles of these highly conserved oligomerization domains in the intact merozoite surface proteins are discussed.
Numerous studies have shown that the ability to form amyloid fibrils is an inherent property of the polypeptide chain. This has lead to the development of several computational approaches to predict amyloidogenicity by amino acid sequences. Here, we discuss the principles governing these methods, and evaluate them using several datasets. They deliver excellent performance in the tests made using short peptides (~6 residues). However, there is a general tendency towards a high number of false positives when tested against longer sequences. This shortcoming needs to be addressed as these longer sequences are linked to diseases. Recent structural studies have shown that the core element of the majority of disease-related amyloid fibrils is a ?-strand-loop-?-strand motif called ?-arch. This insight provides an opportunity to substantially improve the prediction of amyloids produced by natural proteins, ushering in an era of personalized medicine based on genome analysis.
In a genome-wide screen for alpha-helical coiled coil motifs aiming at structurally defined vaccine candidates we identified PFF0165c. This protein is exported in the trophozoite stage and was named accordingly Trophozoite exported protein 1 (Tex1). In an extensive preclinical evaluation of its coiled coil peptides Tex1 was identified as promising novel malaria vaccine candidate providing the rational for a comprehensive cell biological characterization of Tex1. Antibodies generated against an intrinsically unstructured N-terminal region of Tex1 and against a coiled coil domain were used to investigate cytological localization, solubility and expression profile. Co-localization experiments revealed that Tex1 is exported across the parasitophorous vacuole membrane and located to Maurers clefts. Change in location is accompanied by a change in solubility: from a soluble state within the parasite to a membrane-associated state after export to Maurers clefts. No classical export motifs such as PEXEL, signal sequence/anchor or transmembrane domain was identified for Tex1.
Tandem repeats (TRs) represent one of the most prevalent features of genomic sequences. Due to their abundance and functional significance, a plethora of detection tools has been devised over the last two decades. Despite the longstanding interest, TR detection is still not resolved. Our large-scale tests reveal that current detectors produce different, often nonoverlapping inferences, reflecting characteristics of the underlying algorithms rather than the true distribution of TRs in genomic data. Our simulations show that the power of detecting TRs depends on the degree of their divergence, and repeat characteristics such as the length of the minimal repeat unit and their number in tandem. To reconcile the diverse predictions of current algorithms, we propose and evaluate several statistical criteria for measuring the quality of predicted repeat units. In particular, we propose a model-based phylogenetic classifier, entailing a maximum-likelihood estimation of the repeat divergence. Applied in conjunction with the state of the art detectors, our statistical classification scheme for inferred repeats allows to filter out false-positive predictions. Since different algorithms appear to specialize at predicting TRs with certain properties, we advise applying multiple detectors with subsequent filtering to obtain the most complete set of genuine repeats.
Rapidly increasing genomic data present new challenges for scientists: making sense of millions of amino acid sequences requires a systematic approach and information about their 3D structure, function, and evolution. Over the last decade, numerous studies demonstrated the fundamental importance of protein tandem repeats and their involvement in human diseases. Bioinformatics analysis of these regions requires special computer programs and databases, since the conventional approaches predominantly developed for globular domains have limited success. To perform a global comparative analysis of protein tandem repeats, we developed the Protein Tandem Repeat DataBase (PRDB). PRDB is a curated database that includes the protein tandem repeats found in sequence databanks by the T-REKS program. The database is available at http://bioinfo.montp.cnrs.fr/?r=repeatDB.
The review covers the development of synthetic peptides as vaccine candidates for Plasmodium falciparum- and Plasmodium vivax-induced malaria from its beginning up to date and the concomitant progress of solid phase peptide synthesis (SPPS) that enables the production of long peptides in a routine fashion. The review also stresses the development of other complementary tools and actions in order to achieve the long sought goal of an efficacious malaria vaccine.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.