Hepatitis C Virus (HCV) infects 200 million individuals worldwide. Although several FDA approved drugs targeting the HCV serine protease and polymerase have shown promising results, there is a need for better drugs that are effective in treating a broader range of HCV genotypes and subtypes without being used in combination with interferon and/or ribavirin. Recently, two crystal structures of the core of the HCV E2 protein (E2c) have been determined, providing structural information that can now be used to target the E2 protein and develop drugs that disrupt the early stages of HCV infection by blocking E2's interaction with different host factors. Using the E2c structure as a template, we have created a structural model of the E2 protein core (residues 421-645) that contains the three amino acid segments that are not present in either structure. Computational docking of a diverse library of 1,715 small molecules to this model led to the identification of a set of 34 ligands predicted to bind near conserved amino acid residues involved in the HCV E2: CD81 interaction. Surface plasmon resonance detection was used to screen the ligand set for binding to recombinant E2 protein, and the best binders were subsequently tested to identify compounds that inhibit the infection of Huh-7 cells by HCV. One compound, 281816, blocked E2 binding to CD81 and inhibited HCV infection in a genotype-independent manner with IC50's ranging from 2.2 µM to 4.6 µM. 281816 blocked the early and late steps of cell-free HCV entry and also abrogated the cell-to-cell transmission of HCV. Collectively the results obtained with this new structural model of E2c suggest the development of small molecule inhibitors such as 281816 that target E2 and disrupt its interaction with CD81 may provide a new paradigm for HCV treatment.
A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism.
We describe here a strain of Yersinia pestis, G1670A, which exhibits a baseline mutation rate elevated 250-fold over wild-type Y. pestis. The responsible mutation, a C to T substitution in the mutS gene, results in the transition of a highly conserved leucine at position 689 to arginine (mutS(L689R)). When the MutSL 689R protein of G1670A was expressed in a ?mutS derivative of Y. pestis strain EV76, mutation rates observed were equivalent to those observed in G1670A, consistent with a causal association between the mutS mutation and the mutator phenotype. The observation of a mutator allele in Yersinia pestis has potential implications for the study of evolution of this and other especially dangerous pathogens.
The high mutation rate of RNA viruses enables a diverse genetic population of viral genotypes to exist within a single infected host. In-host genetic diversity could better position the virus population to respond and adapt to a diverse array of selective pressures such as host-switching events. Multiple new coronaviruses, including SARS, have been identified in human samples just within the last ten years, demonstrating the potential of coronaviruses as emergent human pathogens. Deep sequencing was used to characterize genomic changes in coronavirus quasispecies during simulated host-switching. Three bovine nasal samples infected with bovine coronavirus were used to infect human and bovine macrophage and lung cell lines. The virus reproduced relatively well in macrophages, but the lung cell lines were not infected efficiently enough to allow passage of non lab-adapted samples. Approximately 12 kb of the genome was amplified before and after passage and sequenced at average coverages of nearly 950×(454 sequencing) and 38,000×(Illumina). The consensus sequence of many of the passaged samples had a 12 nucleotide insert in the consensus sequence of the spike gene, and multiple point mutations were associated with the presence of the insert. Deep sequencing revealed that the insert was present but very rare in the unpassaged samples and could quickly shift to dominate the population when placed in a different environment. The insert coded for three arginine residues, occurred in a region associated with fusion entry into host cells, and may allow infection of new cell types via heparin sulfate binding. Analysis of the deep sequencing data indicated that two distinct genotypes circulated at different frequency levels in each sample, and support the hypothesis that the mutations present in passaged strains were "selected" from a pre-existing pool rather than through de novo mutation and subsequent population fixation.
In the future, we may be faced with the need to provide treatment for an emergent biological threat against which existing vaccines and drugs have limited efficacy or availability. To prepare for this eventuality, our objective was to use a metabolic network-based approach to rapidly identify potential drug targets and prospectively screen and validate novel small-molecule antimicrobials. Our target organism was the fully virulent Francisella tularensis subspecies tularensis Schu S4 strain, a highly infectious intracellular pathogen that is the causative agent of tularemia and is classified as a category A biological agent by the Centers for Disease Control and Prevention. We proceeded with a staggered computational and experimental workflow that used a strain-specific metabolic network model, homology modeling and X-ray crystallography of protein targets, and ligand- and structure-based drug design. Selected compounds were subsequently filtered based on physiological-based pharmacokinetic modeling, and we selected a final set of 40 compounds for experimental validation of antimicrobial activity. We began screening these compounds in whole bacterial cell-based assays in biosafety level 3 facilities in the 20th week of the study and completed the screens within 12 weeks. Six compounds showed significant growth inhibition of F. tularensis, and we determined their respective minimum inhibitory concentrations and mammalian cell cytotoxicities. The most promising compound had a low molecular weight, was non-toxic, and abolished bacterial growth at 13 µM, with putative activity against pantetheine-phosphate adenylyltransferase, an enzyme involved in the biosynthesis of coenzyme A, encoded by gene coaD. The novel antimicrobial compounds identified in this study serve as starting points for lead optimization, animal testing, and drug development against tularemia. Our integrated in silico/in vitro approach had an overall 15% success rate in terms of active versus tested compounds over an elapsed time period of 32 weeks, from pathogen strain identification to selection and validation of novel antimicrobial compounds.
Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures.
During microbial evolution, genome rearrangement increases with increasing sequence divergence. If the relationship between synteny and sequence divergence can be modeled, gene clusters in genomes of distantly related organisms exhibiting anomalous synteny can be identified and used to infer functional conservation. We applied the phylogenetic pairwise comparison method to establish and model a strong correlation between synteny and sequence divergence in all 634 available Archaeal and Bacterial genomes from the NCBI database and four newly assembled genomes of uncultivated Archaea from an acid mine drainage (AMD) community. In parallel, we established and modeled the trend between synteny and functional relatedness in the 118 genomes available in the STRING database. By combining these models, we developed a gene functional annotation method that weights evolutionary distance to estimate the probability of functional associations of syntenous proteins between genome pairs. The method was applied to the hypothetical proteins and poorly annotated genes in newly assembled acid mine drainage Archaeal genomes to add or improve gene annotations. This is the first method to assign possible functions to poorly annotated genes through quantification of the probability of gene functional relationships based on synteny at a significant evolutionary distance, and has the potential for broad application.
Here we present a perspective on a range of practical uses of structural genomics for mutagen research. Structural genomics is an overloaded term and requires some definition to bound the discussion; we give a brief description of public and private structural genomics endeavors, along with some of their objectives, their activities, their capabilities, and their limitations. We discuss how structural genomics might impact mutagen research in three different scenarios: at a structural genomics center, at a lab with modest resources that also conducts structural biology research, and at a lab that is conducting mutagen research without in-house experimental structural biology. Applications span functional annotation of single genes or SNP, to constructing gene networks and pathways, to an integrated systems biology approach. Structural genomics centers can take advantage of systems biology models to target high value targets for structure determination and in turn extend systems models to better understand systems biology diseases or phenomenon. Individual investigator run structural biology laboratories can collaborate with structural genomics centers, but can also take advantage of technical advances and tools developed by structural genomics centers and can employ a structural genomics approach to advancing biological understanding. Individual investigator-run non-structural biology laboratories can also collaborate with structural genomics centers, possibly influencing targeting decisions, but can also use structure based annotation tools enabled by the growing coverage of protein fold space provided by structural genomics. Better functional annotation can inform pathway and systems biology models.
For template-based modeling in the CASP8 Critical Assessment of Techniques for Protein Structure Prediction, this work develops and applies six new full-model metrics. They are designed to complement and add value to the traditional template-based assessment by the global distance test (GDT) and related scores (based on multiple superpositions of Calpha atoms between target structure and predictions labeled "Model 1"). The new metrics evaluate each predictor group on each target, using all atoms of their best model with above-average GDT. Two metrics evaluate how "protein-like" the predicted model is: the MolProbity score used for validating experimental structures, and a mainchain reality score using all-atom steric clashes, bond length and angle outliers, and backbone dihedrals. Four other new metrics evaluate match of model to target for mainchain and sidechain hydrogen bonds, sidechain end positioning, and sidechain rotamers. Group-average Z-score across the six full-model measures is averaged with group-average GDT Z-score to produce the overall ranking for full-model, high-accuracy performance. Separate assessments are reported for specific aspects of predictor-group performance, such as robustness of approximately correct template or fold identification, and self-scoring ability at identifying the best of their models. Fold identification is distinct from but correlated with group-average GDT Z-score if target difficulty is taken into account, whereas self-scoring is done best by servers and is uncorrelated with GDT performance. Outstanding individual models on specific targets are identified and discussed. Predictor groups excelled at different aspects, highlighting the diversity of current methodologies. However, good full-model scores correlate robustly with high Calpha accuracy.
The LcrV protein is a multifunctional virulence factor and protective antigen of the plague bacterium and is generally conserved between the epidemic strains of Yersinia pestis. We investigated the diversity in the LcrV sequences among non-epidemic Y. pestis strains which have a limited virulence in selected animal models and for humans. Sequencing of lcrV genes from 19 Y. pestis strains belonging to different phylogenetic groups (subspecies) showed that the LcrV proteins possess four major variable hotspots at positions 18, 72, 273, and 324-326. These major variations, together with other minor substitutions in amino acid sequences, allowed us to classify the LcrV alleles into five sequence types (A-E). We observed that the strains of different Y. pestis "subspecies" can have the same type of LcrV, including that conserved in epidemic strains, and different types of LcrV can exist within the same natural plague focus. Therefore, the phenomenon of "selective virulence" characteristic of the strains of the microtus biovar is unlikely to be the result of polymorphism of the V antigen. The LcrV polymorphisms were structurally analyzed by comparing the modeled structures of LcrV from all available strains. All changes except one occurred either in flexible regions or on the surface of the protein, but local chemical properties (i.e. those of a hydrophobic, hydrophilic, amphipathic, or charged nature) were conserved across all of the strains. Polymorphisms in flexible and surface regions are likely subject to less selective pressure, and have a limited impact on the structure. In contrast, the substitution of tryptophan at position 113 with either glutamic acid or glycine likely has a serious influence on the regional structure of the protein, and these mutations might have an effect on the function of LcrV. The polymorphisms at positions 18, 72 and 273 were accountable for differences in the oligomerization of LcrV.
Sequencing of bacterial and archaeal genomes has revolutionized our understanding of the many roles played by microorganisms. There are now nearly 1,000 completed bacterial and archaeal genomes available, most of which were chosen for sequencing on the basis of their physiology. As a result, the perspective provided by the currently available genomes is limited by a highly biased phylogenetic distribution. To explore the value added by choosing microbial genomes for sequencing on the basis of their evolutionary relationships, we have sequenced and analysed the genomes of 56 culturable species of Bacteria and Archaea selected to maximize phylogenetic coverage. Analysis of these genomes demonstrated pronounced benefits (compared to an equivalent set of genomes randomly selected from the existing database) in diverse areas including the reconstruction of phylogenetic history, the discovery of new protein families and biological properties, and the prediction of functions for known genes from other organisms. Our results strongly support the need for systematic phylogenomic efforts to compile a phylogeny-driven Genomic Encyclopedia of Bacteria and Archaea in order to derive maximum knowledge from existing microbial genome data as well as from genome sequences to come.
We analyzed near-complete population (composite) genomic sequences for coexisting acidophilic iron-oxidizing Leptospirillum group II and III bacteria (phylum Nitrospirae) and an extrachromosomal plasmid from a Richmond Mine, Iron Mountain, CA, acid mine drainage biofilm. Community proteomic analysis of the genomically characterized sample and two other biofilms identified 64.6% and 44.9% of the predicted proteins of Leptospirillum groups II and III, respectively, and 20% of the predicted plasmid proteins. The bacteria share 92% 16S rRNA gene sequence identity and >60% of their genes, including integrated plasmid-like regions. The extrachromosomal plasmid carries conjugation genes with detectable sequence similarity to genes in the integrated conjugative plasmid, but only those on the extrachromosomal element were identified by proteomics. Both bacterial groups have genes for community-essential functions, including carbon fixation and biosynthesis of vitamins, fatty acids, and biopolymers (including cellulose); proteomic analyses reveal these activities. Both Leptospirillum types have multiple pathways for osmotic protection. Although both are motile, signal transduction and methyl-accepting chemotaxis proteins are more abundant in Leptospirillum group III, consistent with its distribution in gradients within biofilms. Interestingly, Leptospirillum group II uses a methyl-dependent and Leptospirillum group III a methyl-independent response pathway. Although only Leptospirillum group III can fix nitrogen, these proteins were not identified by proteomics. The abundances of core proteins are similar in all communities, but the abundance levels of unique and shared proteins of unknown function vary. Some proteins unique to one organism were highly expressed and may be key to the functional and ecological differentiation of Leptospirillum groups II and III.
Here we introduce a quantitative structure-driven computational domain-fusion method, which we used to predict the structures of proteins believed to be involved in regulation of the subtilin pathway in Bacillus subtilis, and used to predict a protein-protein complex formed by interaction between the proteins. Homology modeling of SpaK and SpaR yielded preliminary structural models based on a best template for SpaK comprising a dimer of a histidine kinase, and for SpaR a response regulator protein. Our LGA code was used to identify multi-domain proteins with structure homology to both modeled structures, yielding a set of domain-fusion templates then used to model a hypothetical SpaK/SpaR complex. The models were used to identify putative functional residues and residues at the protein-protein interface, and bioinformatics was used to compare functionally and structurally relevant residues in corresponding positions among proteins with structural homology to the templates. Models of the complex were evaluated in light of known properties of the functional residues within two-component systems involving His-Asp phosphorelays. Based on this analysis, a phosphotransferase complexed with a beryllofluoride was selected as the optimal template for modeling a SpaK/SpaR complex conformation. In vitro phosphorylation studies performed using wild type and site-directed SpaK mutant proteins validated the predictions derived from application of the structure-driven domain-fusion method: SpaK was phosphorylated in the presence of (32)P-ATP and the phosphate moiety was subsequently transferred to SpaR, supporting the hypothesis that SpaK and SpaR function as sensor and response regulator, respectively, in a two-component signal transduction system, and furthermore suggesting that the structure-driven domain-fusion approach correctly predicted a physical interaction between SpaK and SpaR. Our domain-fusion algorithm leverages quantitative structure information and provides a tool for generation of hypotheses regarding protein function, which can then be tested using empirical methods.
Genes conferring antibiotic resistance to groups of bacterial pathogens are cause for considerable concern, as many once-reliable antibiotics continue to see a reduction in efficacy. The recent discovery of the metallo ?-lactamase blaNDM-1 gene, which appears to grant antibiotic resistance to a variety of Enterobacteriaceae via a mobile plasmid, is one example of this distressing trend. The following work describes a computational analysis of pathogen-borne MBLs that focuses on the structural aspects of characterized proteins.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.