JoVE Visualize What is visualize?
Stop Reading. Start Watching.
Advanced Search
Stop Reading. Start Watching.
Regular Search
Find video protocols related to scientific articles indexed in Pubmed.
Structure and selectivity in bestrophin ion channels.
Science
PUBLISHED: 09-25-2014
Show Abstract
Hide Abstract
Human bestrophin-1 (hBest1) is a calcium-activated chloride channel from the retinal pigment epithelium, where mutations are associated with vitelliform macular degeneration, or Best disease. We describe the structure of a bacterial homolog (KpBest) of hBest1 and functional characterizations of both channels. KpBest is a pentamer that forms a five-helix transmembrane pore, closed by three rings of conserved hydrophobic residues, and has a cytoplasmic cavern with a restricted exit. From electrophysiological analysis of structure-inspired mutations in KpBest and hBest1, we find a sensitive control of ion selectivity in the bestrophins, including reversal of anion/cation selectivity, and dramatic activation by mutations at the cytoplasmic exit. A homology model of hBest1 shows the locations of disease-causing mutations and suggests possible roles in regulation.
Related JoVE Video
Taking structure searches to the next dimension.
Structure
PUBLISHED: 07-10-2014
Show Abstract
Hide Abstract
Structure comparisons are now the first step when a new experimental high-resolution protein structure has been determined. In this issue of Structure, Wiederstein and colleagues describe their latest tool for comparing structures, which gives us the unprecedented power to discover crucial structural connections between whole complexes of proteins in the full structural database in real time.
Related JoVE Video
Structural basis for a pH-sensitive calcium leak across membranes.
Science
PUBLISHED: 06-07-2014
Show Abstract
Hide Abstract
Calcium homeostasis balances passive calcium leak and active calcium uptake. Human Bax inhibitor-1 (hBI-1) is an antiapoptotic protein that mediates a calcium leak and is representative of a highly conserved and widely distributed family, the transmembrane Bax inhibitor motif (TMBIM) proteins. Here, we present crystal structures of a bacterial homolog and characterize its calcium leak activity. The structure has a seven-transmembrane-helix fold that features two triple-helix sandwiches wrapped around a central C-terminal helix. Structures obtained in closed and open conformations are reversibly interconvertible by change of pH. A hydrogen-bonded, pKa (where Ka is the acid dissociation constant)-perturbed pair of conserved aspartate residues explains the pH dependence of this transition, and biochemical studies show that pH regulates calcium influx in proteoliposomes. Homology models for hBI-1 provide insights into TMBIM-mediated calcium leak and cytoprotective activity.
Related JoVE Video
LocTree3 prediction of localization.
Nucleic Acids Res.
PUBLISHED: 05-21-2014
Show Abstract
Hide Abstract
The prediction of protein sub-cellular localization is an important step toward elucidating protein function. For each query protein sequence, LocTree2 applies machine learning (profile kernel SVM) to predict the native sub-cellular localization in 18 classes for eukaryotes, in six for bacteria and in three for archaea. The method outputs a score that reflects the reliability of each prediction. LocTree2 has performed on par with or better than any other state-of-the-art method. Here, we report the availability of LocTree3 as a public web server. The server includes the machine learning-based LocTree2 and improves over it through the addition of homology-based inference. Assessed on sequence-unique data, LocTree3 reached an 18-state accuracy Q18=80±3% for eukaryotes and a six-state accuracy Q6=89±4% for bacteria. The server accepts submissions ranging from single protein sequences to entire proteomes. Response time of the unloaded server is about 90 s for a 300-residue eukaryotic protein and a few hours for an entire eukaryotic proteome not considering the generation of the alignments. For over 1000 entirely sequenced organisms, the predictions are directly available as downloads. The web server is available at http://www.rostlab.org/services/loctree3.
Related JoVE Video
PredictProtein--an open resource for online prediction of protein structural and functional features.
Nucleic Acids Res.
PUBLISHED: 05-05-2014
Show Abstract
Hide Abstract
PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein-protein binding sites (ISIS2), protein-polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org.
Related JoVE Video
ISCB: past-present perspective for the International Society for Computational Biology.
Bioinformatics
PUBLISHED: 04-16-2014
Show Abstract
Hide Abstract
Since its establishment in 1997, International Society for Computational Biology (ISCB) has contributed importantly toward advancing the understanding of living systems through computation. The ISCB represents nearly 3000 members working in >70 countries. It has doubled the number of members since 2007. At the same time, the number of meetings organized by the ISCB has increased from two in 2007 to eight in 2013, and the society has cemented many lasting alliances with regional societies and specialist groups. ISCB is ready to grow into a challenging and promising future. The progress over the past 7 years has resulted from the vision, and possibly more importantly, the passion and hard working dedication of many individuals.
Related JoVE Video
FreeContact: fast and free software for protein contact prediction from residue co-evolution.
BMC Bioinformatics
PUBLISHED: 03-18-2014
Show Abstract
Hide Abstract
20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software.
Related JoVE Video
Co-expression and co-localization of hub proteins and their partners are encoded in protein sequence.
Mol Biosyst
PUBLISHED: 01-23-2014
Show Abstract
Hide Abstract
Spatiotemporal coordination is a critical factor in biological processes. Some hubs in protein-protein interaction networks tend to be co-expressed and co-localized with their partners more strongly than others, a difference which is arguably related to functional differences between the hubs. Based on numerous analyses of yeast hubs, it has been suggested that differences in co-expression and co-localization are reflected in the structural and molecular characteristics of the hubs. We hypothesized that if indeed differences in co-expression and co-localization are encoded in the molecular characteristics of the protein, it may be possible to predict the tendency for co-expression and co-localization of human hubs based on features learned from systematically characterized yeast hubs. Thus, we trained a prediction algorithm on hubs from yeast that were classified as either strongly or weakly co-expressed and co-localized with their partners, and applied the trained model to 800 human hub proteins. We found that the algorithm significantly distinguishes between human hubs that are co-expressed and co-localized with their partners and hubs that are not. The prediction is based on sequence derived features such as "stickiness", i.e. the existence of multiple putative binding sites that enable multiple simultaneous interactions, "plasticity", i.e. the existence of predicted structural disorder which conjecturally allows for multiple consecutive interactions with the same binding site and predicted subcellular localization. These results suggest that spatiotemporal dynamics is encoded, at least in part, in the amino acid sequence of the protein and that this encoding is similar in yeast and in human.
Related JoVE Video
HeatMapViewer: interactive display of 2D data in biology.
F1000Res
PUBLISHED: 01-01-2014
Show Abstract
Hide Abstract
The HeatMapViewer is a BioJS component that lays-out and renders two-dimensional (2D) plots or heat maps that are ideally suited to visualize matrix formatted data in biology such as for the display of microarray experiments or the outcome of mutational studies and the study of SNP-like sequence variants. It can be easily integrated into documents and provides a powerful, interactive way to visualize heat maps in web applications. The software uses a scalable graphics technology that adapts the visualization component to any required resolution, a useful feature for a presentation with many different data-points. The component can be applied to present various biological data types. Here, we present two such cases - showing gene expression data and visualizing mutability landscape analysis.
Related JoVE Video
tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles.
Database (Oxford)
PUBLISHED: 01-01-2014
Show Abstract
Hide Abstract
The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. DATABASE URL: www.tagtog.net, www.flybase.org.
Related JoVE Video
ISCB: past-present perspective for the International Society for Computational Biology.
Bioinformatics
PUBLISHED: 11-14-2013
Show Abstract
Hide Abstract
Since its establishment in 1997, International Society for Computational Biology (ISCB) has contributed importantly toward advancing the understanding of living systems through computation. The ISCB represents nearly 3000 members working in >70 countries. It has doubled the number of members since 2007. At the same time, the number of meetings organized by the ISCB has increased from two in 2007 to eight in 2013, and the society has cemented many lasting alliances with regional societies and specialist groups. ISCB is ready to grow into a challenging and promising future. The progress over the past 7 years has resulted from the vision, and possibly more importantly, the passion and hard working dedication of many individuals.
Related JoVE Video
Neutral and weakly nonneutral sequence variants may define individuality.
Proc. Natl. Acad. Sci. U.S.A.
PUBLISHED: 08-12-2013
Show Abstract
Hide Abstract
Large-scale computational analyses of the growing wealth of genome-variation data consistently tell two distinct stories. The first is expected: coding variants reported in disease-related databases significantly alter the function of affected proteins. The second is surprising: the genomes of healthy individuals appear to carry many variants that are predicted to have some effect on function. As long as the complete experimental analysis of all human genome variants remains impossible, computational methods, such as PolyPhen, SNAP, and SIFT, might provide important insights. These methods capture the effects of particular variants very well and can highlight trends in populations of variants. Diseases are, arguably, extreme phenotypic variations and are often attributable to one or a few severely functionally disruptive variants. Our findings suggest a genomic basis of the different nondisease phenotypes. Prediction methods indicate that variants in seemingly healthy individuals tend to be neutral or weakly disruptive for protein molecular function. These variant effects are predicted to be largely either experimentally undetectable or are not deemed significant enough to be published. This may suggest that nondisease phenotypes arise through combinations of many variants whose effects are weakly nonneutral (damaging or enhancing) to the molecular protein function but fall within the wild-type range of overall physiological function.
Related JoVE Video
News from the protein mutability landscape.
J. Mol. Biol.
PUBLISHED: 04-30-2013
Show Abstract
Hide Abstract
Some mutations of protein residues matter more than others, and these are often conserved evolutionarily. The explosion of deep sequencing and genotyping increasingly requires the distinction between effect and neutral variants. The simplest approach predicts all mutations of conserved residues to have an effect; however, this works poorly, at best. Many computational tools that are optimized to predict the impact of point mutations provide more detail. Here, we expand the perspective from the view of single variants to the level of sketching the entire mutability landscape. This landscape is defined by the impact of substituting every residue at each position in a protein by each of the 19 non-native amino acids. We review some of the powerful conclusions about protein function, stability and their robustness to mutation that can be drawn from such an analysis. Large-scale experimental and computational mutagenesis experiments are increasingly furthering our understanding of protein function and of the genotype-phenotype associations. We also discuss how these can be used to improve predictions of protein function and pathogenicity of missense variants.
Related JoVE Video
An estimated 5% of new protein structures solved today represent a new Pfam family.
Acta Crystallogr. D Biol. Crystallogr.
PUBLISHED: 04-17-2013
Show Abstract
Hide Abstract
High-resolution structural knowledge is key to understanding how proteins function at the molecular level. The number of entries in the Protein Data Bank (PDB), the repository of all publicly available protein structures, continues to increase, with more than 8000 structures released in 2012 alone. The authors of this article have studied how structural coverage of the protein-sequence space has changed over time by monitoring the number of Pfam families that acquired their first representative structure each year from 1976 to 2012. Twenty years ago, for every 100 new PDB entries released, an estimated 20 Pfam families acquired their first structure. By 2012, this decreased to only about five families per 100 structures. The reasons behind the slower pace at which previously uncharacterized families are being structurally covered were investigated. It was found that although more than 50% of current Pfam families are still without a structural representative, this set is enriched in families that are small, functionally uncharacterized or rich in problem features such as intrinsically disordered and transmembrane regions. While these are important constraints, the reasons why it may not yet be time to give up the pursuit of a targeted but more comprehensive structural coverage of the protein-sequence space are discussed.
Related JoVE Video
Homology-based inference sets the bar high for protein function prediction.
BMC Bioinformatics
PUBLISHED: 02-28-2013
Show Abstract
Hide Abstract
Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference.
Related JoVE Video
A large-scale evaluation of computational protein function prediction.
Predrag Radivojac, Wyatt T Clark, Tal Ronnen Oron, Alexandra M Schnoes, Tobias Wittkop, Artem Sokolov, Kiley Graim, Christopher Funk, Karin Verspoor, Asa Ben-Hur, Gaurav Pandey, Jeffrey M Yunes, Ameet S Talwalkar, Susanna Repo, Michael L Souza, Damiano Piovesan, Rita Casadio, Zheng Wang, Jianlin Cheng, Hai Fang, Julian Gough, Patrik Koskinen, Petri Törönen, Jussi Nokso-Koivisto, Liisa Holm, Domenico Cozzetto, Daniel W A Buchan, Kevin Bryson, David T Jones, Bhakti Limaye, Harshal Inamdar, Avik Datta, Sunitha K Manjari, Rajendra Joshi, Meghana Chitale, Daisuke Kihara, Andreas M Lisewski, Serkan Erdin, Eric Venner, Olivier Lichtarge, Robert Rentzsch, Haixuan Yang, Alfonso E Romero, Prajwal Bhat, Alberto Paccanaro, Tobias Hamp, Rebecca Kassner, Stefan Seemayer, Esmeralda Vicedo, Christian Schaefer, Dominik Achten, Florian Auer, Ariane Boehm, Tatjana Braun, Maximilian Hecht, Mark Heron, Peter Hönigschmid, Thomas A Hopf, Stefanie Kaufmann, Michael Kiening, Denis Krompass, Cedric Landerer, Yannick Mahlich, Manfred Roos, Jari Björne, Tapio Salakoski, Andrew Wong, Hagit Shatkay, Fanny Gatzmann, Ingolf Sommer, Mark N Wass, Michael J E Sternberg, Nives Skunca, Fran Supek, Matko Bošnjak, Panče Panov, Sašo Džeroski, Tomislav Šmuc, Yiannis A I Kourmpetis, Aalt D J van Dijk, Cajo J F ter Braak, Yuanpeng Zhou, Qingtian Gong, Xinran Dong, Weidong Tian, Marco Falda, Paolo Fontana, Enrico Lavezzo, Barbara Di Camillo, Stefano Toppo, Liang Lan, Nemanja Djuric, Yuhong Guo, Slobodan Vucetic, Amos Bairoch, Michal Linial, Patricia C Babbitt, Steven E Brenner, Christine Orengo, Burkhard Rost, Sean D Mooney, Iddo Friedberg.
Nat. Methods
PUBLISHED: 01-27-2013
Show Abstract
Hide Abstract
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) todays best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
Related JoVE Video
Cloud prediction of protein structure and function with PredictProtein for Debian.
Biomed Res Int
PUBLISHED: 01-04-2013
Show Abstract
Hide Abstract
We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.
Related JoVE Video
Accelerating the Original Profile Kernel.
PLoS ONE
PUBLISHED: 01-01-2013
Show Abstract
Hide Abstract
One of the most accurate multi-class protein classification systems continues to be the profile-based SVM kernel introduced by the Leslie group. Unfortunately, its CPU requirements render it too slow for practical applications of large-scale classification tasks. Here, we introduce several software improvements that enable significant acceleration. Using various non-redundant data sets, we demonstrate that our new implementation reaches a maximal speed-up as high as 14-fold for calculating the same kernel matrix. Some predictions are over 200 times faster and render the kernel as possibly the top contender in a low ratio of speed/performance. Additionally, we explain how to parallelize various computations and provide an integrative program that reduces creating a production-quality classifier to a single program call. The new implementation is available as a Debian package under a free academic license and does not depend on commercial software. For non-Debian based distributions, the source package ships with a traditional Makefile-based installer. Download and installation instructions can be found at https://rostlab.org/owiki/index.php/Fast_Profile_Kernel. Bugs and other issues may be reported at https://rostlab.org/bugzilla3/enter_bug.cgi?product=fastprofkernel.
Related JoVE Video
SNPdbe: constructing an nsSNP functional impacts database.
Bioinformatics
PUBLISHED: 12-30-2011
Show Abstract
Hide Abstract
Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe-SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; human being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs.
Related JoVE Video
Towards big data science in the decade ahead from ten years of InCoB and the 1st ISCB-Asia Joint Conference.
BMC Bioinformatics
PUBLISHED: 11-30-2011
Show Abstract
Hide Abstract
The 2011 International Conference on Bioinformatics (InCoB) conference, which is the annual scientific conference of the Asia-Pacific Bioinformatics Network (APBioNet), is hosted by Kuala Lumpur, Malaysia, is co-organized with the first ISCB-Asia conference of the International Society for Computational Biology (ISCB). InCoB and the sequencing of the human genome are both celebrating their tenth anniversaries and InCoBs goalposts for the next decade, implementing standards in bioinformatics and globally distributed computational networks, will be discussed and adopted at this conference. Of the 49 manuscripts (selected from 104 submissions) accepted to BMC Genomics and BMC Bioinformatics conference supplements, 24 are featured in this issue, covering software tools, genome/proteome analysis, systems biology (networks, pathways, bioimaging) and drug discovery and design.
Related JoVE Video
InCoB celebrates its tenth anniversary as first joint conference with ISCB-Asia.
BMC Genomics
PUBLISHED: 11-30-2011
Show Abstract
Hide Abstract
In 2009 the International Society for Computational Biology (ISCB) started to roll out regional bioinformatics conferences in Africa, Latin America and Asia. The open and competitive bid for the first meeting in Asia (ISCB-Asia) was awarded to Asia-Pacific Bioinformatics Network (APBioNet) which has been running the International Conference on Bioinformatics (InCoB) in the Asia-Pacific region since 2002. InCoB/ISCB-Asia 2011 is held from November 30 to December 2, 2011 in Kuala Lumpur, Malaysia. Of 104 manuscripts submitted to BMC Genomics and BMC Bioinformatics conference supplements, 49 (47.1%) were accepted. The strong showing of Asia among submissions (82.7%) and acceptances (81.6%) signals the success of this tenth InCoB anniversary meeting, and bodes well for the future of ISCB-Asia.
Related JoVE Video
Comparison of a molecular dynamics model with the X-ray structure of the N370S acid-beta-glucosidase mutant that causes Gaucher disease.
Protein Eng. Des. Sel.
PUBLISHED: 07-01-2011
Show Abstract
Hide Abstract
Recently, two studies were published that examined the structure of the acid-?-glucosidase N370S mutant, the most common mutant that causes Gaucher disease. One study used the experimental tool of X-ray crystallography, and the other utilized molecular dynamics (MD). The two studies reinforced each other through the similarities in their findings, but each approach also added some unique information. Both studies report that the conformation of active site loop 3 changes, due to an altered hydrogen bonding network; however, the MD study produced additional data concerning the flexibility of loop 1 and the catalytic residues that are not observed in the other study.
Related JoVE Video
A mutation in VPS35, encoding a subunit of the retromer complex, causes late-onset Parkinson disease.
Am. J. Hum. Genet.
PUBLISHED: 05-08-2011
Show Abstract
Hide Abstract
To identify rare causal variants in late-onset Parkinson disease (PD), we investigated an Austrian family with 16 affected individuals by exome sequencing. We found a missense mutation, c.1858G>A (p.Asp620Asn), in the VPS35 gene in all seven affected family members who are alive. By screening additional PD cases, we saw the same variant cosegregating with the disease in an autosomal-dominant mode with high but incomplete penetrance in two further families with five and ten affected members, respectively. The mean age of onset in the affected individuals was 53 years. Genotyping showed that the shared haplotype extends across 65 kilobases around VPS35. Screening the entire VPS35 coding sequence in an additional 860 cases and 1014 controls revealed six further nonsynonymous missense variants. Three were only present in cases, two were only present in controls, and one was present in cases and controls. The familial mutation p.Asp620Asn and a further variant, c.1570C>T (p.Arg524Trp), detected in a sporadic PD case were predicted to be damaging by sequence-based and molecular-dynamics analyses. VPS35 is a component of the retromer complex and mediates retrograde transport between endosomes and the trans-Golgi network, and it has recently been found to be involved in Alzheimer disease.
Related JoVE Video
Characterization of metalloproteins by high-throughput X-ray absorption spectroscopy.
Genome Res.
PUBLISHED: 04-11-2011
Show Abstract
Hide Abstract
High-throughput X-ray absorption spectroscopy was used to measure transition metal content based on quantitative detection of X-ray fluorescence signals for 3879 purified proteins from several hundred different protein families generated by the New York SGX Research Center for Structural Genomics. Approximately 9% of the proteins analyzed showed the presence of transition metal atoms (Zn, Cu, Ni, Co, Fe, or Mn) in stoichiometric amounts. The method is highly automated and highly reliable based on comparison of the results to crystal structure data derived from the same protein set. To leverage the experimental metalloprotein annotations, we used a sequence-based de novo prediction method, MetalDetector, to identify Cys and His residues that bind to transition metals for the redundancy reduced subset of 2411 sequences sharing <70% sequence identity and having at least one His or Cys. As the HT-XAS identifies metal type and protein binding, while the bioinformatics analysis identifies metal- binding residues, the results were combined to identify putative metal-binding sites in the proteins and their associated families. We explored the combination of this data with homology models to generate detailed structure models of metal-binding sites for representative proteins. Finally, we used extended X-ray absorption fine structure data from two of the purified Zn metalloproteins to validate predicted metalloprotein binding site structures. This combination of experimental and bioinformatics approaches provides comprehensive active site analysis on the genome scale for metalloproteins as a class, revealing new insights into metalloprotein structure and function.
Related JoVE Video
Large-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coli.
Microb Inform Exp
PUBLISHED: 03-17-2011
Show Abstract
Hide Abstract
The biochemical and physical factors controlling protein expression level and solubility in vivo remain incompletely characterized. To gain insight into the primary sequence features influencing these outcomes, we performed statistical analyses of results from the high-throughput protein-production pipeline of the Northeast Structural Genomics Consortium. Proteins expressed in E. coli and consistently purified were scored independently for expression and solubility levels. These parameters nonetheless show a very strong positive correlation. We used logistic regressions to determine whether they are systematically influenced by fractional amino acid composition or several bulk sequence parameters including hydrophobicity, sidechain entropy, electrostatic charge, and predicted backbone disorder. Decreasing hydrophobicity correlates with higher expression and solubility levels, but this correlation apparently derives solely from the beneficial effect of three charged amino acids, at least for bacterial proteins. In fact, the three most hydrophobic residues showed very different correlations with solubility level. Leu showed the strongest negative correlation among amino acids, while Ile showed a slightly positive correlation in most data segments. Several other amino acids also had unexpected effects. Notably, Arg correlated with decreased expression and, most surprisingly, solubility of bacterial proteins, an effect only partially attributable to rare codons. However, rare codons did significantly reduce expression despite use of a codon-enhanced strain. Additional analyses suggest that positively but not negatively charged amino acids may reduce translation efficiency in E. coli irrespective of codon usage. While some observed effects may reflect indirect evolutionary correlations, others may reflect basic physicochemical phenomena. We used these results to construct and validate predictors of expression and solubility levels and overall protein usability, and we propose new strategies to be explored for engineering improved protein expression and solubility.
Related JoVE Video
Protein disorder--a breakthrough invention of evolution?
Curr. Opin. Struct. Biol.
PUBLISHED: 02-14-2011
Show Abstract
Hide Abstract
As an operational definition, we refer to regions in proteins that do not adopt regular three-dimensional structures in isolation, as disordered regions. An antipode to disorder would be well-structured rather than ordered. Here, we argue for the following three hypotheses. Firstly, it is more useful to picture disorder as a distinct phenomenon in structural biology than as an extreme example of protein flexibility. Secondly, there are many very different flavors of protein disorder, nevertheless, it seems advantageous to portray the universe of all possible proteins in terms of two main types: well-structured, disordered. There might be a third type other but we have so far no positive evidence for this. Thirdly, nature uses protein disorder as a tool to adapt to different environments. Protein disorder is evolutionarily conserved and this maintenance of disorder is highly nontrivial. Increasingly integrating protein disorder into the toolbox of a living cell was a crucial step in the evolution from simple bacteria to complex eukaryotes. We need new advanced computational methods to study this new milestone in the advance of protein biology.
Related JoVE Video
Crystal structure of a potassium ion transporter, TrkH.
Nature
PUBLISHED: 02-13-2011
Show Abstract
Hide Abstract
The TrkH/TrkG/KtrB proteins mediate K(+) uptake in bacteria and probably evolved from simple K(+) channels by multiple gene duplications or fusions. Here we present the crystal structure of a TrkH from Vibrio parahaemolyticus. TrkH is a homodimer, and each protomer contains an ion permeation pathway. A selectivity filter, similar in architecture to those of K(+) channels but significantly shorter, is lined by backbone and side-chain oxygen atoms. Functional studies showed that TrkH is selective for permeation of K(+) and Rb(+) over smaller ions such as Na(+) or Li(+). Immediately intracellular to the selectivity filter are an intramembrane loop and an arginine residue, both highly conserved, which constrict the permeation pathway. Substituting the arginine with an alanine significantly increases the rate of K(+) flux. These results reveal the molecular basis of K(+) selectivity and suggest a novel gating mechanism for this large and important family of membrane transport proteins.
Related JoVE Video
Crystal structure of a phosphorylation-coupled saccharide transporter.
Nature
PUBLISHED: 02-11-2011
Show Abstract
Hide Abstract
Saccharides have a central role in the nutrition of all living organisms. Whereas several saccharide uptake systems are shared between the different phylogenetic kingdoms, the phosphoenolpyruvate-dependent phosphotransferase system exists almost exclusively in bacteria. This multi-component system includes an integral membrane protein EIIC that transports saccharides and assists in their phosphorylation. Here we present the crystal structure of an EIIC from Bacillus cereus that transports diacetylchitobiose. The EIIC is a homodimer, with an expansive interface formed between the amino-terminal halves of the two protomers. The carboxy-terminal half of each protomer has a large binding pocket that contains a diacetylchitobiose, which is occluded from both sides of the membrane with its site of phosphorylation near the conserved His250 and Glu334 residues. The structure shows the architecture of this important class of transporters, identifies the determinants of substrate binding and phosphorylation, and provides a framework for understanding the mechanism of sugar translocation.
Related JoVE Video
LocDB: experimental annotations of localization for Homo sapiens and Arabidopsis thaliana.
Nucleic Acids Res.
PUBLISHED: 11-11-2010
Show Abstract
Hide Abstract
LocDB is a manually curated database with experimental annotations for the subcellular localizations of proteins in Homo sapiens (HS, human) and Arabidopsis thaliana (AT, thale cress). Currently, it contains entries for 19,604 UniProt proteins (HS: 13,342; AT: 6262). Each database entry contains the experimentally derived localization in Gene Ontology (GO) terminology, the experimental annotation of localization, localization predictions by state-of-the-art methods and, where available, the type of experimental information. LocDB is searchable by keyword, protein name and subcellular compartment, as well as by identifiers from UniProt, Ensembl and TAIR resources. In comparison to other public databases, LocDB as a resource adds about 10,000 experimental localization annotations for HS proteins and ?900 for AS proteins. Over 40% of the proteins in LocDB have multiple localization annotations providing a better platform for development of new multiple localization prediction methods with higher coverage and accuracy. Links to all referenced databases are provided. LocDB will be updated regularly by our group (available at: http://www.rostlab.org/services/locDB).
Related JoVE Video
MuD: an interactive web server for the prediction of non-neutral substitutions using protein structural data.
Nucleic Acids Res.
PUBLISHED: 06-11-2010
Show Abstract
Hide Abstract
The discrimination between functionally neutral amino acid substitutions and non-neutral mutations, affecting protein function, is very important for our understanding of diseases. The rapidly growing amounts of experimental data enable the development of computational tools to facilitate the annotation of these substitutions. Here, we describe a Random Forests-based classifier, named Mutation Detector (MuD) that utilizes structural and sequence-derived features to assess the impact of a given substitution on the protein function. In its automatic mode, MuD is comparable to alternative tools in performance. However, the uniqueness of MuD is that user-reported protein-specific structural and functional information can be added at run-time, thereby enhancing the prediction accuracy further. The MuD server, available at http://mud.tau.ac.il, assigns a reliability score to every prediction, thus offering a useful tool for the prioritization of substitutions in proteins with an available 3D structure.
Related JoVE Video
Bioinformatics predictions of localization and targeting.
Methods Mol. Biol.
PUBLISHED: 04-27-2010
Show Abstract
Hide Abstract
One of the major challenges in the post-genomic era with hundreds of genomes sequenced is the annotation of protein structure and function. Computational predictions of subcellular localization are an important step toward this end. The development of computational tools that predict targeting and localization has, therefore, been a very active area of research, in particular since the first release of the groundbreaking program PSORT in 1991. The most reliable means of annotating protein structure and function remains homology-based inference, i.e. the transfer of experimental annotations from one protein to its homologs. However, annotations about localization demonstrate how much can be gained from advanced machine learning: more proteins can be annotated more reliably. Contemporary computational tools for the annotation of protein targeting include automatic methods that mine the textual information from the biological literature and molecular biology databases. Some machine learning-based methods that accurately predict features of sorting signals and that use sequence-derived features to predict localization have reached remarkable levels of performance. Sustained prediction accuracy has increased by more than 30 percentage points over the last decade. Here, we review some of the most recent methods for the prediction of subcellular localization and protein targeting that contributed toward this breakthrough.
Related JoVE Video
Homologue structure of the SLAC1 anion channel for closing stomata in leaves.
Nature
PUBLISHED: 03-28-2010
Show Abstract
Hide Abstract
The plant SLAC1 anion channel controls turgor pressure in the aperture-defining guard cells of plant stomata, thereby regulating the exchange of water vapour and photosynthetic gases in response to environmental signals such as drought or high levels of carbon dioxide. Here we determine the crystal structure of a bacterial homologue (Haemophilus influenzae) of SLAC1 at 1.20 Å resolution, and use structure-inspired mutagenesis to analyse the conductance properties of SLAC1 channels. SLAC1 is a symmetrical trimer composed from quasi-symmetrical subunits, each having ten transmembrane helices arranged from helical hairpin pairs to form a central five-helix transmembrane pore that is gated by an extremely conserved phenylalanine residue. Conformational features indicate a mechanism for control of gating by kinase activation, and electrostatic features of the pore coupled with electrophysiological characteristics indicate that selectivity among different anions is largely a function of the energetic cost of ion dehydration.
Related JoVE Video
The New York Consortium on Membrane Protein Structure (NYCOMPS): a high-throughput platform for structural genomics of integral membrane proteins.
J. Struct. Funct. Genomics
PUBLISHED: 03-26-2010
Show Abstract
Hide Abstract
The New York Consortium on Membrane Protein Structure (NYCOMPS) was formed to accelerate the acquisition of structural information on membrane proteins by applying a structural genomics approach. NYCOMPS comprises a bioinformatics group, a centralized facility operating a high-throughput cloning and screening pipeline, a set of associated wet labs that perform high-level protein production and structure determination by x-ray crystallography and NMR, and a set of investigators focused on methods development. In the first three years of operation, the NYCOMPS pipeline has so far produced and screened 7,250 expression constructs for 8,045 target proteins. Approximately 600 of these verified targets were scaled up to levels required for structural studies, so far yielding 24 membrane protein crystals. Here we describe the overall structure of NYCOMPS and provide details on the high-throughput pipeline.
Related JoVE Video
Structural basis of O6-alkylguanine recognition by a bacterial alkyltransferase-like DNA repair protein.
J. Biol. Chem.
PUBLISHED: 03-08-2010
Show Abstract
Hide Abstract
Alkyltransferase-like proteins (ATLs) are a novel class of DNA repair proteins related to O(6)-alkylguanine-DNA alkyltransferases (AGTs) that tightly bind alkylated DNA and shunt the damaged DNA into the nucleotide excision repair pathway. Here, we present the first structure of a bacterial ATL, from Vibrio parahaemolyticus (vpAtl). We demonstrate that vpAtl adopts an AGT-like fold and that the protein is capable of tightly binding to O(6)-methylguanine-containing DNA and disrupting its repair by human AGT, a hallmark of ATLs. Mutation of highly conserved residues Tyr(23) and Arg(37) demonstrate their critical roles in a conserved mechanism of ATL binding to alkylated DNA. NMR relaxation data reveal a role for conformational plasticity in the guanine-lesion recognition cavity. Our results provide further evidence for the conserved role of ATLs in this primordial mechanism of DNA repair.
Related JoVE Video
Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be.
Bioinformatics
PUBLISHED: 01-16-2010
Show Abstract
Hide Abstract
The mutation of amino acids often impacts protein function and structure. Mutations without negative effect sustain evolutionary pressure. We study a particular aspect of structural robustness with respect to mutations: regular protein secondary structure and natively unstructured (intrinsically disordered) regions. Is the formation of regular secondary structure an intrinsic feature of amino acid sequences, or is it a feature that is lost upon mutation and is maintained by evolution against the odds? Similarly, is disorder an intrinsic sequence feature or is it difficult to maintain? To tackle these questions, we in silico mutated native protein sequences into random sequence-like ensembles and monitored the change in predicted secondary structure and disorder.
Related JoVE Video
Critical assessment of methods of protein structure prediction - Round VIII.
Proteins
PUBLISHED: 09-24-2009
Show Abstract
Hide Abstract
This article is an introduction to the special issue of the journal Proteins, dedicated to the eighth CASP experiment to assess the state of the art in protein structure prediction. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Highlights are the first blind assessment of model refinement methods showing that under some circumstances substantial model improvements are possible; improvements in the performance of methods for determining the accuracy of a model; and some progress in the accuracy of comparative models in regions not present in a principal template. Against these advances must be stacked the fact that there is no detectable progress in model quality compared with CASP7 in either template-based or template free modeling, using the established CASP measures.
Related JoVE Video
Evaluation of template-based models in CASP8 with standard measures.
Proteins
PUBLISHED: 09-05-2009
Show Abstract
Hide Abstract
The strategy for evaluating template-based models submitted to CASP has continuously evolved from CASP1 to CASP5, leading to a standard procedure that has been used in all subsequent editions. The established approach includes methods for calculating the quality of each individual model, for assigning scores based on the distribution of the results for each target and for computing the statistical significance of the differences in scores between prediction methods. These data are made available to the assessor of the template-based modeling category, who uses them as a starting point for further evaluations and analyses. This article describes the detailed workflow of the procedure, provides justifications for a number of choices that are customarily made for CASP data evaluation, and reports the results of the analysis of template-based predictions at CASP8.
Related JoVE Video
Correlating protein function and stability through the analysis of single amino acid substitutions.
BMC Bioinformatics
PUBLISHED: 08-27-2009
Show Abstract
Hide Abstract
Mutations resulting in the disruption of protein function are the underlying causes of many genetic diseases. Some mutations affect the number of expressed proteins while others alter the activity on a per-molecule basis. Single amino acid substitutions as caused by non-synonymous Single Nucleotide Polymorphisms (nsSNPs) often disrupt function by altering protein structure and/or stability, but can also wreak havoc by directly impacting functional binding sites. Given the experimental three-dimensional (3D) structure of a protein, we can try to differentiate between the "effect on structure/stability" and the "effect on binding". However, experimental 3D structures are available for only 1% of all known proteins; the magnitude of stability change caused by a given mutation is more widely available.
Related JoVE Video
NMR and X-RAY structures of human E2-like ubiquitin-fold modifier conjugating enzyme 1 (UFC1) reveal structural and functional conservation in the metazoan UFM1-UBA5-UFC1 ubiquination pathway.
J. Struct. Funct. Genomics
PUBLISHED: 07-10-2009
Show Abstract
Hide Abstract
For cell regulation, E2-like ubiquitin-fold modifier conjugating enzyme 1 (Ufc1) is involved in the transfer of ubiquitin-fold modifier 1 (Ufm1), a ubiquitin like protein which is activated by E1-like enzyme Uba5, to various target proteins. Thereby, Ufc1 participates in the very recently discovered Ufm1-Uba5-Ufc1 ubiquination pathway which is found in metazoan organisms. The structure of human Ufc1 was solved by using both NMR spectroscopy and X-ray crystallography. The complementary insights obtained with the two techniques provided a unique basis for understanding the function of Ufc1 at atomic resolution. The Ufc1 structure consists of the catalytic core domain conserved in all E2-like enzymes and an additional N-terminal helix. The active site Cys(116), which forms a thio-ester bond with Ufm1, is located in a flexible loop that is highly solvent accessible. Based on the Ufc1 and Ufm1 NMR structures, a model could be derived for the Ufc1-Ufm1 complex in which the C-terminal Gly(83) of Ufm1 may well form the expected thio-ester with Cys(116), suggesting that Ufm1-Ufc1 functions as described for other E1-E2-E3 machineries. alpha-helix 1 of Ufc1 adopts different conformations in the crystal and in solution, suggesting that this helix plays a key role to mediate specificity.
Related JoVE Video
Using genetic algorithms to select most predictive protein features.
Proteins
PUBLISHED: 05-13-2009
Show Abstract
Hide Abstract
Many important characteristics of proteins such as biochemical activity and subcellular localization present a challenge to machine-learning methods: it is often difficult to encode the appropriate input features at the residue level for the purpose of making a prediction for the entire protein. The problem is usually that the biophysics of the connection between a machine-learning methods input (sequence feature) and its output (observed phenomenon to be predicted) remains unknown; in other words, we may only know that a certain protein is an enzyme (output) without knowing which region may contain the active site residues (input). The goal then becomes to dissect a protein into a vast set of sequence-derived features and to correlate those features with the desired output. We introduce a framework that begins with a set of global sequence features and then vastly expands the feature space by generically encoding the coexistence of residue-based features. It is this combination of individual features, that is the step from the fractions of serine and buried (input space 20 + 2) to the fraction of buried serine (input space 20 * 2) that implicitly shifts the search space from global feature inputs to features that can capture very local evidence such as a the individual residues of a catalytic triad. The vast feature space created is explored by a genetic algorithm (GA) paired with neural networks and support vector machines. We find that the GA is critical for selecting combinations of features that are neither too general resulting in poor performance, nor too specific, leading to overtraining. The final framework manages to effectively sample a feature space that is far too large for exhaustive enumeration. We demonstrate the power of the concept by applying it to prediction of protein enzymatic activity.
Related JoVE Video
In silico mutagenesis: a case study of the melanocortin 4 receptor.
FASEB J.
PUBLISHED: 05-05-2009
Show Abstract
Hide Abstract
The melanocortin 4 receptor (MC4R) is a G-protein-coupled receptor (GPCR) and a key molecule in the regulation of energy homeostasis. At least 159 substitutions in the coding region of human MC4R (hMC4R) have been described experimentally; over 80 of those occur naturally, and many have been implicated in obesity. However, assessment of the presumably functionally essential residues remains incomplete. Here we have performed a complete in silico mutagenesis analysis to assess the functional essentiality of all possible nonnative point mutants in the entire hMC4R protein (332 residues). We applied SNAP, which is a method for quantifying functional consequences of single amino acid (AA) substitutions, to calculate the effects of all possible substitutions at each position in the hMC4R AA sequence. We compiled a mutability score that reflects the degree to which a particular residue is likely to be functionally important. We performed the same experiment for a paralogue human melanocortin receptor (hMC1R) and a mouse orthologue (mMC4R) in order to compare computational evaluations of highly related sequences. Three results are most salient: 1) our predictions largely agree with the available experimental annotations; 2) this analysis identified several AAs that are likely to be functionally critical, but have not yet been studied experimentally; and 3) the differential analysis of the receptors implicates a number of residues as specifically important to MC4Rs vs. other GPCRs, such as hMC1R.
Related JoVE Video
New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the Deep Web.
Curr Opin Drug Discov Devel
PUBLISHED: 04-28-2009
Show Abstract
Hide Abstract
The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the Deep Web (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.
Related JoVE Video
Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data.
Nat. Biotechnol.
PUBLISHED: 04-28-2009
Show Abstract
Hide Abstract
Crystallization is the most serious bottleneck in high-throughput protein-structure determination by diffraction methods. We have used data mining of the large-scale experimental results of the Northeast Structural Genomics Consortium and experimental folding studies to characterize the biophysical properties that control protein crystallization. This analysis leads to the conclusion that crystallization propensity depends primarily on the prevalence of well-ordered surface epitopes capable of mediating interprotein interactions and is not strongly influenced by overall thermodynamic stability. We identify specific sequence features that correlate with crystallization propensity and that can be used to estimate the crystallization probability of a given construct. Analyses of entire predicted proteomes demonstrate substantial differences in the amino acid-sequence properties of human versus eubacterial proteins, which likely reflect differences in biophysical properties, including crystallization propensity. Our thermodynamic measurements do not generally support previous claims regarding correlations between sequence properties and protein stability.
Related JoVE Video
Structural genomics target selection for the New York consortium on membrane protein structure.
J. Struct. Funct. Genomics
PUBLISHED: 04-22-2009
Show Abstract
Hide Abstract
The New York Consortium on Membrane Protein Structure (NYCOMPS), a part of the Protein Structure Initiative (PSI) in the USA, has as its mission to establish a high-throughput pipeline for determination of novel integral membrane protein structures. Here we describe our current target selection protocol, which applies structural genomics approaches informed by the collective experience of our team of investigators. We first extract all annotated proteins from our reagent genomes, i.e. the 96 fully sequenced prokaryotic genomes from which we clone DNA. We filter this initial pool of sequences and obtain a list of valid targets. NYCOMPS defines valid targets as those that, among other features, have at least two predicted transmembrane helices, no predicted long disordered regions and, except for community nominated targets, no significant sequence similarity in the predicted transmembrane region to any known protein structure. Proteins that feed our experimental pipeline are selected by defining a protein seed and searching the set of all valid targets for proteins that are likely to have a transmembrane region structurally similar to that of the seed. We require sequence similarity aligning at least half of the predicted transmembrane region of seed and target. Seeds are selected according to their feasibility and/or biological interest, and they include both centrally selected targets and community nominated targets. As of December 2008, over 6,000 targets have been selected and are currently being processed by the experimental pipeline. We discuss how our target list may impact structural coverage of the membrane protein space.
Related JoVE Video
PSI-2: structural genomics to cover protein domain family space.
Structure
PUBLISHED: 03-18-2009
Show Abstract
Hide Abstract
One major objective of structural genomics efforts, including the NIH-funded Protein Structure Initiative (PSI), has been to increase the structural coverage of protein sequence space. Here, we present the target selection strategy used during the second phase of PSI (PSI-2). This strategy, jointly devised by the bioinformatics groups associated with the PSI-2 large-scale production centers, targets representatives from large, structurally uncharacterized protein domain families, and from structurally uncharacterized subfamilies in very large and diverse families with incomplete structural coverage. These very large families are extremely diverse both structurally and functionally, and are highly overrepresented in known proteomes. On the basis of several metrics, we then discuss to what extent PSI-2, during its first 3 years, has increased the structural coverage of genomes, and contributed structural and functional novelty. Together, the results presented here suggest that PSI-2 is successfully meeting its objectives and provides useful insights into structural and functional space.
Related JoVE Video
Large-scale analysis of thermostable, mammalian proteins provides insights into the intrinsically disordered proteome.
J. Proteome Res.
PUBLISHED: 02-28-2009
Show Abstract
Hide Abstract
Intrinsically disordered proteins are predicted to be highly abundant and play broad biological roles in eukaryotic cells. In particular, by virtue of their structural malleability and propensity to interact with multiple binding partners, disordered proteins are thought to be specialized for roles in signaling and regulation. However, these concepts are based on in silico analyses of translated whole genome sequences, not on large-scale analyses of proteins expressed in living cells. Therefore, whether these concepts broadly apply to expressed proteins is currently unknown. Previous studies have shown that heat-treatment of cell extracts lead to partial enrichment of soluble, disordered proteins. On the basis of this observation, we sought to address the current dearth of knowledge about expressed, disordered proteins by performing a large-scale proteomics study of thermostable proteins isolated from mouse fibroblast cells. With the use of novel multidimensional chromatography methods and mass spectrometry, we identified a total of 1320 thermostable proteins from these cells. Further, we used a variety of bioinformatics methods to analyze the structural and biological properties of these proteins. Interestingly, more than 900 of these expressed proteins were predicted to be substantially disordered. These were divided into two categories, with 514 predicted to be predominantly disordered and 395 predicted to exhibit both disordered and ordered/folded features. In addition, 411 of the thermostable proteins were predicted to be folded. Despite the use of heat treatment (60 min at 98 degrees C) to partially enrich for disordered proteins, which might have been expected to select for small proteins, the sequences of these proteins exhibited a wide range of lengths (622 +/- 555 residues (average length +/- standard deviation) for disordered proteins and 569 +/- 598 residues for folded proteins). Computational structural analyses revealed several unexpected features of the thermostable proteins: (1) disordered domains and coiled-coil domains occurred together in a large number of disordered proteins, suggesting functional interplay between these domains; and (2) more than 170 proteins contained lengthy domains (>300 residues) known to be folded. Reference to Gene Ontology Consortium functional annotations revealed that, while disordered proteins play diverse biological roles in mouse fibroblasts, they do exhibit heightened involvement in several functional categories, including, cytoskeletal structure and cell movement, metabolic and biosynthetic processes, organelle structure, cell division, gene transcription, and ribonucleoprotein complexes. We believe that these results reflect the general properties of the mouse intrinsically disordered proteome (IDP-ome) although they also reflect the specialized physiology of fibroblast cells. Large-scale identification of expressed, thermostable proteins from other cell types in the future, grown under varied physiological conditions, will dramatically expand our understanding of the structural and biological properties of disordered eukaryotic proteins.
Related JoVE Video
Cell cycle kinases predicted from conserved biophysical properties.
Proteins
PUBLISHED: 02-14-2009
Show Abstract
Hide Abstract
Machine-learning techniques can classify functionally related proteins where homology-transfer as well as sequence and structure motifs fail. Here, we present a method that aimed at complementing homology-transfer in the identification of cell cycle control kinases from sequence alone. First, we identified functionally significant residues in cell cycle proteins through their high sequence conservation and biophysical properties. We then incorporated these residues and their features into support vector machines (SVM) to identify new kinases and more specifically to differentiate cell cycle kinases from other kinases and other proteins. As expected, the most informative residues tend to be highly conserved and tend to localize in the ATP binding regions of the kinases. Another observation confirmed that ATP binding regions are typically not found on the surface but in partially buried sites, and that this fact is correctly captured by accessibility predictions. Using these highly conserved, semi-buried residues and their biophysical properties, we could distinguish cell cycle S/T kinases from other kinase families at levels around 70-80% accuracy and 62-81% coverage. An application to the entire human proteome predicted at least 97 human proteins with limited previous annotations to be candidates for cell cycle kinases.
Related JoVE Video
Improved disorder prediction by combination of orthogonal approaches.
PLoS ONE
PUBLISHED: 02-11-2009
Show Abstract
Hide Abstract
Disordered proteins are highly abundant in regulatory processes such as transcription and cell-signaling. Different methods have been developed to predict protein disorder often focusing on different types of disordered regions. Here, we present MD, a novel META-Disorder prediction method that molds various sources of information predominantly obtained from orthogonal prediction methods, to significantly improve in performance over its constituents. In sustained cross-validation, MD not only outperforms its origins, but it also compares favorably to other state-of-the-art prediction methods in a variety of tests that we applied. Availability: http://www.rostlab.org/services/md/
Related JoVE Video
Structural genomics is the largest contributor of novel structural leverage.
J. Struct. Funct. Genomics
PUBLISHED: 02-05-2009
Show Abstract
Hide Abstract
The Protein Structural Initiative (PSI) at the US National Institutes of Health (NIH) is funding four large-scale centers for structural genomics (SG). These centers systematically target many large families without structural coverage, as well as very large families with inadequate structural coverage. Here, we report a few simple metrics that demonstrate how successfully these efforts optimize structural coverage: while the PSI-2 (2005-now) contributed more than 8% of all structures deposited into the PDB, it contributed over 20% of all novel structures (i.e. structures for protein sequences with no structural representative in the PDB on the date of deposition). The structural coverage of the protein universe represented by todays UniProt (v12.8) has increased linearly from 1992 to 2008; structural genomics has contributed significantly to the maintenance of this growth rate. Success in increasing novel leverage (defined in Liu et al. in Nat Biotechnol 25:849-851, 2007) has resulted from systematic targeting of large families. PSIs per structure contribution to novel leverage was over 4-fold higher than that for non-PSI structural biology efforts during the past 8 years. If the success of the PSI continues, it may just take another approximately15 years to cover most sequences in the current UniProt database.
Related JoVE Video
Structural genomics reveals EVE as a new ASCH/PUA-related domain.
Proteins
PUBLISHED: 02-05-2009
Show Abstract
Hide Abstract
We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.
Related JoVE Video
Online tools for predicting integral membrane proteins.
Methods Mol. Biol.
PUBLISHED: 01-21-2009
Show Abstract
Hide Abstract
We identify and describe a set of tools readily available for integral membrane protein prediction. These tools address two problems: finding potential transmembrane proteins in a pool of new sequences, and identifying their transmembrane regions. All methods involve comparing the query protein against one or more target models. In the simplest of these, the target "model" is another protein sequence, while the more elaborate methods group together the entire set of t ansmembrane helical or transmembrane beta-barrel proteins. In general, prediction accuracy either in identifying new integral membrane proteins or transmembrane regions of known integral membrane proteins depends strongly on how closely the query fits the model. Because of this, the best approach is an opportunistic one: submit the protein of interest to all methods and choose the results with the highest confidence scores.
Related JoVE Video
LocTree2 predicts localization for all domains of life.
Bioinformatics
Show Abstract
Hide Abstract
Subcellular localization is one aspect of protein function. Despite advances in high-throughput imaging, localization maps remain incomplete. Several methods accurately predict localization, but many challenges remain to be tackled.
Related JoVE Video
Alternative protein-protein interfaces are frequent exceptions.
PLoS Comput. Biol.
Show Abstract
Hide Abstract
The intricate molecular details of protein-protein interactions (PPIs) are crucial for function. Therefore, measuring the same interacting protein pair again, we expect the same result. This work measured the similarity in the molecular details of interaction for the same and for homologous protein pairs between different experiments. All scores analyzed suggested that different experiments often find exceptions in the interfaces of similar PPIs: up to 22% of all comparisons revealed some differences even for sequence-identical pairs of proteins. The corresponding number for pairs of close homologs reached 68%. Conversely, the interfaces differed entirely for 12-29% of all comparisons. All these estimates were calculated after redundancy reduction. The magnitude of interface differences ranged from subtle to the extreme, as illustrated by a few examples. An extreme case was a change of the interacting domains between two observations of the same biological interaction. One reason for different interfaces was the number of copies of an interaction in the same complex: the probability of observing alternative binding modes increases with the number of copies. Even after removing the special cases with alternative hetero-interfaces to the same homomer, a substantial variability remained. Our results strongly support the surprising notion that there are many alternative solutions to make the intricate molecular details of PPIs crucial for function.
Related JoVE Video
Paving the future: finding suitable ISMB venues.
Bioinformatics
Show Abstract
Hide Abstract
The International Society for Computational Biology, ISCB, organizes the largest event in the field of computational biology and bioinformatics, namely the annual international conference on Intelligent Systems for Molecular Biology, the ISMB. This year at ISMB 2012 in Long Beach, ISCB celebrated the 20th anniversary of its flagship meeting. ISCB is a young, lean and efficient society that aspires to make a significant impact with only limited resources. Many constraints make the choice of venues for ISMB a tough challenge. Here, we describe those challenges and invite the contribution of ideas for solutions.
Related JoVE Video
Predict impact of single amino acid change upon protein structure.
BMC Genomics
Show Abstract
Hide Abstract
Amino acid point mutations (nsSNPs) may change protein structure and function. However, no method directly predicts the impact of mutations on structure. Here, we compare pairs of pentamers (five consecutive residues) that locally change protein three-dimensional structure (3D, RMSD>0.4Å) to those that do not alter structure (RMSD<0.2Å). Mutations that alter structure locally can be distinguished from those that do not through a machine-learning (logistic regression) method.
Related JoVE Video
Disease-related mutations predicted to impact protein function.
BMC Genomics
Show Abstract
Hide Abstract
Non-synonymous single nucleotide polymorphisms (nsSNPs) alter the protein sequence and can cause disease. The impact has been described by reliable experiments for relatively few mutations. Here, we study predictions for functional impact of disease-annotated mutations from OMIM, PMD and Swiss-Prot and of variants not linked to disease.
Related JoVE Video
NMR structure of lipoprotein YxeF from Bacillus subtilis reveals a calycin fold and distant homology with the lipocalin Blc from Escherichia coli.
PLoS ONE
Show Abstract
Hide Abstract
The soluble monomeric domain of lipoprotein YxeF from the Gram positive bacterium B. subtilis was selected by the Northeast Structural Genomics Consortium (NESG) as a target of a biomedical theme project focusing on the structure determination of the soluble domains of bacterial lipoproteins. The solution NMR structure of YxeF reveals a calycin fold and distant homology with the lipocalin Blc from the Gram-negative bacterium E.coli. In particular, the characteristic ?-barrel, which is open to the solvent at one end, is extremely well conserved in YxeF with respect to Blc. The identification of YxeF as the first lipocalin homologue occurring in a Gram-positive bacterium suggests that lipocalins emerged before the evolutionary divergence of Gram positive and Gram negative bacteria. Since YxeF is devoid of the ?-helix that packs in all lipocalins with known structure against the ?-barrel to form a second hydrophobic core, we propose to introduce a new lipocalin sub-family named slim lipocalins, with YxeF and the other members of Pfam family PF11631 to which YxeF belongs constituting the first representatives. The results presented here exemplify the impact of structural genomics to enhance our understanding of biology and to generate new biological hypotheses.
Related JoVE Video
Structural genomics plucks high-hanging membrane proteins.
Curr. Opin. Struct. Biol.
Show Abstract
Hide Abstract
Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, for example, from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology.
Related JoVE Video
Three-dimensional structures of membrane proteins from genomic sequencing.
Cell
Show Abstract
Hide Abstract
We show that amino acid covariation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane) applies a maximum entropy approach to infer evolutionary covariation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modeling by this method.
Related JoVE Video
Solution NMR structure of the ribosomal protein RP-L35Ae from Pyrococcus furiosus.
Proteins
Show Abstract
Hide Abstract
The ribosome consists of small and large subunits each composed of dozens of proteins and RNA molecules. However, the functions of many of the individual protomers within the ribosome are still unknown. In this article, we describe the solution NMR structure of the ribosomal protein RP-L35Ae from the archaeon Pyrococcus furiosus. RP-L35Ae is buried within the large subunit of the ribosome and belongs to Pfam protein domain family PF01247, which is highly conserved in eukaryotes, present in a few archaeal genomes, but absent in bacteria. The protein adopts a six-stranded anti-parallel ?-barrel analogous to the "tRNA binding motif" fold. The structure of the P. furiosus RP-L35Ae presented in this article constitutes the first structural representative from this protein domain family.
Related JoVE Video

What is Visualize?

JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.

How does it work?

We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.

Video X seems to be unrelated to Abstract Y...

In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.