Translate this page to:
In JoVE (1)
Other Publications (150)
- In Silico Biology
- In Silico Biology
- In Silico Biology
- Proceedings of the National Academy of Sciences of the United States of America
- Bioinformatics (Oxford, England)
- Bioinformatics (Oxford, England)
- Proteins
- FEBS Letters
- Nucleic Acids Research
- Bioinformatics (Oxford, England)
- Bioinformatics (Oxford, England)
- Protein Engineering
- Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
- Antimicrobial Agents and Chemotherapy
- FEBS Letters
- Bioinformatics (Oxford, England)
- BioTechniques
- Applied Bioinformatics
- FEBS Letters
- European Journal of Biochemistry / FEBS
- Blood
- Protein Engineering, Design & Selection : PEDS
- Journal of Chemical Information and Computer Sciences
- Statistical Applications in Genetics and Molecular Biology
- Drug Discovery Today
- Biochemical and Biophysical Research Communications
- Bioinformatics (Oxford, England)
- Nature Genetics
- Biochemical and Biophysical Research Communications
- Bioinformatics (Oxford, England)
- Journal of Molecular Biology
- Bioinformatics (Oxford, England)
- Nature Genetics
- Journal of Medicinal Chemistry
- Journal of Medicinal Chemistry
- European Journal of Human Genetics : EJHG
- The Journal of Infectious Diseases
- BMC Bioinformatics
- Human Genetics
- Bioinformatics (Oxford, England)
- PLoS Medicine
- Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
- Human Molecular Genetics
- Bioinformatics (Oxford, England)
- Bioinformatics (Oxford, England)
- Journal of Chemical Information and Modeling
- Journal of Chemical Information and Modeling
- Bioinformatics (Oxford, England)
- Journal of Medicinal Chemistry
- The Journal of Infectious Diseases
- Neoplasia (New York, N.Y.)
- Proceedings of the National Academy of Sciences of the United States of America
- BMC Bioinformatics
- BMC Bioinformatics
- Journal of Chemical Information and Modeling
- Bioinformatics (Oxford, England)
- Plant Physiology
- PLoS Genetics
- Journal of Chemical Information and Modeling
- Bioinformatics (Oxford, England)
- Bioinformatics (Oxford, England)
- BMC Bioinformatics
- Journal of Virology
- Antiviral Therapy
- Journal of Chemical Information and Modeling
- BMC Bioinformatics
- Bioinformatics (Oxford, England)
- Nature Reviews. Microbiology
- Journal of Clinical Virology : the Official Publication of the Pan American Society for Clinical Virology
- Statistical Applications in Genetics and Molecular Biology
- Nucleic Acids Research
- Nature Genetics
- Molecular Cancer
- Genome Biology
- Proceedings of the National Academy of Sciences of the United States of America
- PLoS Computational Biology
- Bioinformatics (Oxford, England)
- Antiviral Therapy
- Journal of Molecular Graphics & Modelling
- Retrovirology
- International Journal of Cancer. Journal International Du Cancer
- PLoS Computational Biology
- Protein Science : a Publication of the Protein Society
- Chemistry (Weinheim an Der Bergstrasse, Germany)
- Proteomics
- Journal of Virology
- Clinical and Vaccine Immunology : CVI
- Journal of Medical Virology
- Bioinformatics (Oxford, England)
- Biochimica Et Biophysica Acta
- Bioinformatics (Oxford, England)
- Nature Biotechnology
- Antiviral Therapy
- Bioinformatics (Oxford, England)
- Journal of Integrative Bioinformatics
- Genome Biology
- BMC Bioinformatics
- PloS One
- Nucleic Acids Research
- Bioinformatics (Oxford, England)
- PLoS Computational Biology
- Bioinformatics (Oxford, England)
- Bioinformatics (Oxford, England)
- Bioinformatics (Oxford, England)
- PloS One
- Bioinformatics (Oxford, England)
- Journal of Hepatology
- AIDS Research and Human Retroviruses
- Genome Biology
- The Journal of Infectious Diseases
- Bioinformatics (Oxford, England)
- Antiviral Therapy
- MMW Fortschritte Der Medizin
- Genome Biology
- PloS One
- Proteins
- PloS One
- BMC Bioinformatics
- Methods in Molecular Biology (Clifton, N.J.)
- AIDS Research and Human Retroviruses
- Bioinformatics (Oxford, England)
- Protein Science : a Publication of the Protein Society
- Nucleic Acids Research
- BMC Evolutionary Biology
- Bioinformatics (Oxford, England)
- Journal of Hepatology
- Bioinformatics (Oxford, England)
- The Journal of Infectious Diseases
- Molecular Biology and Evolution
- Cell Host & Microbe
- Statistical Applications in Genetics and Molecular Biology
- Medical Microbiology and Immunology
- Antiviral Therapy
- Nucleic Acids Research
- BMC Medical Informatics and Decision Making
- Medical Microbiology and Immunology
- Bioinformatics (Oxford, England)
- ACS Nano
- AIDS Research and Therapy
- Gastroenterology
- PLoS Genetics
- Medical Microbiology and Immunology
- Intervirology
- Intervirology
- Intervirology
- Intervirology
- Intervirology
- Journal of Chemical Information and Modeling
- Genome Research
- Medical Microbiology and Immunology
Articles by Thomas Lengauer in JoVE
Prediction of HIV-1 Coreceptor Usage (Tropism) by Sequence Analysis using a Genotypic Approach
Saleta Sierra1, Rolf Kaiser1, Nadine Lübke1, Alexander Thielen2, Eugen Schuelter1, Eva Heger1, Martin Däumer3, Stefan Reuter4, Stefan Esser5, Gerd Fätkenheuer6, Herbert Pfister1, Mark Oette7, Thomas Lengauer2
1Institute of Virology, University of Cologne, 2Max Planck Institute for Informatics, 3Institute for Immune genetics, 4Department of Gastroenterology, Hepatology and Infectiology, University of Duesseldorf, 5Department of Dermatology, University of Essen, 6Department of Internal Medicine, University of Cologne, 7Augustinerinnen Hospital
The prediction of the coreceptor usage of HIV-1 is required for the administration of a new class of antiretroviral drugs, i.e. coreceptor antagonists. It can be performed by sequence analysis of the env gene and subsequent interpretation through an internet based interpretation system (geno2pheno[coreceptor]).
Other articles by Thomas Lengauer on PubMed
ProML--the Protein Markup Language for Specification of Protein Sequences, Structures and Families
In Silico Biology. 2002 | Pubmed ID: 12542416
We propose a specification language ProML for protein sequences, structures, and families based on the open XML standard. The language allows for portable, system-independent, machine-parsable and human-readable representation of essential features of proteins. The language is of immediate use for several bioinformatics applications: we discuss clustering of proteins into families and the representation of the specific shared features of the respective clusters. Moreover, we use ProML for specification of data used in fold recognition bench-marks exploiting experimentally derived distance constraints.
Improving Fold Recognition of Protein Threading by Experimental Distance Constraints
In Silico Biology. 2002 | Pubmed ID: 12542417
We present a comprehensive analysis of methods for improving the fold recognition rate of the threading approach to protein structure prediction by the utilization of few additional distance constraints. The distance constraints between protein residues may be obtained by experiments such as mass spectrometry or NMR spectroscopy. We applied a post-filtering step with new scoring functions incorporating measures of constraint satisfaction to ranking lists of 123D threading alignments. The detailed analysis of the results on a small representative benchmark set show that the fold recognition rate can be improved significantly by up to 30% from about 54%-65% to 77%-84%, approaching the maximal attainable performance of 90% estimated by structural superposition alignments. This gain in performance adds about 10% to the recognition rate already achieved in our previous study with cross-link constraints only. Additional recent results on a larger benchmark set involving a confidence function for threading predictions also indicate notable improvements by our combined approach, which should be particularly valuable for rapid structure determination and validation of protein models.
A Hypergraph-based Method for Unification of Existing Protein Structure- and Sequence-families
In Silico Biology. 2002 | Pubmed ID: 12542418
Classification of proteins is a major challenge in bioinformatics. Here an approach is presented, that unifies different existing classifications of protein structures and sequences. Protein structural domains are represented as nodes in a hypergraph. Shared memberships in sequence families result in hyperedges in the graph. The presented method partitions the hypergraph into clusters of structural domains. Each computed cluster is based on a set of shared sequence family memberships. Thus, the clusters put existing protein sequence families into the context of structural family hierarchies. Conversely, structural domains are related to their sequence family memberships, which can be used to gain further knowledge about the respective structural families.
Diversity and Complexity of HIV-1 Drug Resistance: a Bioinformatics Approach to Predicting Phenotype from Genotype
Proceedings of the National Academy of Sciences of the United States of America. Jun, 2002 | Pubmed ID: 12060770
Drug resistance testing has been shown to be beneficial for clinical management of HIV type 1 infected patients. Whereas phenotypic assays directly measure drug resistance, the commonly used genotypic assays provide only indirect evidence of drug resistance, the major challenge being the interpretation of the sequence information. We analyzed the significance of sequence variations in the protease and reverse transcriptase genes for drug resistance and derived models that predict phenotypic resistance from genotypes. For 14 antiretroviral drugs, both genotypic and phenotypic resistance data from 471 clinical isolates were analyzed with a machine learning approach. Information profiles were obtained that quantify the statistical significance of each sequence position for drug resistance. For the different drugs, patterns of varying complexity were observed, including between one and nine sequence positions with substantial information content. Based on these information profiles, decision tree classifiers were generated to identify genotypic patterns characteristic of resistance or susceptibility to the different drugs. We obtained concise and easily interpretable models to predict drug resistance from sequence information. The prediction quality of the models was assessed in leave-one-out experiments in terms of the prediction error. We found prediction errors of 9.6-15.5% for all drugs except for zalcitabine, didanosine, and stavudine, with prediction errors between 25.4% and 32.0%. A prediction service is freely available at http://cartan.gmd.de/geno2pheno.html.
Confidence Measures for Protein Fold Recognition
Bioinformatics (Oxford, England). Jun, 2002 | Pubmed ID: 12075015
We present an extensive evaluation of different methods and criteria to detect remote homologs of a given protein sequence. We investigate two associated problems: first, to develop a sensitive searching method to identify possible candidates and, second, to assign a confidence to the putative candidates in order to select the best one. For searching methods where the score distributions are known, p-values are used as confidence measure with great success. For the cases where such theoretical backing is absent, we propose empirical approximations to p-values for searching procedures.
Co-clustering of Biological Networks and Gene Expression Data
Bioinformatics (Oxford, England). 2002 | Pubmed ID: 12169542
MOTIVATION: Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably. RESULTS: We propose constructing a distance function which combines information from expression data and biological networks. Based on this function, we compute a joint clustering of genes and vertices of the network. This general approach is elaborated for metabolic networks. We define a graph distance function on such networks and combine it with a correlation-based distance function for gene expression measurements. A hierarchical clustering and an associated statistical measure is computed to arrive at a reasonable number of clusters. Our method is validated using expression data of the yeast diauxic shift. The resulting clusters are easily interpretable in terms of the biochemical network and the gene expression data and suggest that our method is able to automatically identify processes that are relevant under the measured conditions.
Structural Modeling of Ataxin-3 Reveals Distant Homology to Adaptins
Proteins. Feb, 2003 | Pubmed ID: 12486728
Spinocerebellar ataxia type 3 (SCA3) is a polyglutamine disorder caused by a CAG repeat expansion in the coding region of a gene encoding ataxin-3, a protein of yet unknown function. Based on a comprehensive computational analysis, we propose a structural model and structure-based functions for ataxin-3. Our predictive strategy comprises the compilation of multiple sequence and structure alignments of carefully selected proteins related to ataxin-3. These alignments are consistent with additional information on sequence motifs, secondary structure, and domain architectures. The application of complementary methods revealed the homology of ataxin-3 to ENTH and VHS domain proteins involved in membrane trafficking and regulatory adaptor functions. We modeled the structure of ataxin-3 using the adaptin AP180 as a template and assessed the reliability of the model by comparison with known sequence and structural features. We could further infer potential functions of ataxin-3 in agreement with known experimental data. Our database searches also identified an as yet uncharacterized family of proteins, which we named josephins because of their pronounced homology to the Josephin domain of ataxin-3.
Identification of Mammalian Orthologs Associates PYPAF5 with Distinct Functional Roles
FEBS Letters. Mar, 2003 | Pubmed ID: 12633874
PYRIN- and CARD-containing proteins belong to a recently identified protein family involved in the regulation of apoptosis and inflammatory processes. Variations in the gene products of the family members PYPAF1 and NOD2/CARD15 have been associated with several autoinflammatory diseases. We could identify the mouse orthologs of PYPAF1, PYPAF5, NOD1, NOD2 and the rat ortholog of PYPAF5. Intriguingly, we found that PYPAF5 has been reported previously not only as regulator of NF-kappaB and caspase-1, but also as angiotensin II and vasopressin receptor. In particular, based on a comprehensive sequence analysis, we propose a structural model for this hormone receptor that is different from the model suggested previously.
Geno2pheno: Estimating Phenotypic Drug Resistance from HIV-1 Genotypes
Nucleic Acids Research. Jul, 2003 | Pubmed ID: 12824435
Therapeutic success of anti-HIV therapies is limited by the development of drug resistant viruses. These genetic variants display complex mutational patterns in their pol gene, which codes for protease and reverse transcriptase, the molecular targets of current antiretroviral therapy. Genotypic resistance testing depends on the ability to interpret such sequence data, whereas phenotypic resistance testing directly measures relative in vitro susceptibility to a drug. From a set of 650 matched genotype-phenotype pairs we construct regression models for the prediction of phenotypic drug resistance from genotypes. Since the range of resistance factors varies considerably between different drugs, two scoring functions are derived from different sets of predicted phenotypes. Firstly, we compare predicted values to those of samples derived from 178 treatment-naive patients and report the relative deviance. Secondly, estimation of the probability density of 2000 predicted phenotypes gives rise to an intrinsic definition of a susceptible and a resistant subpopulation. Thus, for a predicted phenotype, we calculate the probability of membership in the resistant subpopulation. Both scores provide standardized measures of resistance that can be calculated from the genotype and are comparable between drugs. The geno2pheno system makes these genotype interpretations available via the Internet (http://www.genafor.org/).
Pyranose Oxidase Identified As a Member of the GMC Oxidoreductase Family
Bioinformatics (Oxford, England). Jul, 2003 | Pubmed ID: 12835264
Fungal pyranose oxidase is a flavoenzyme whose preferred substrate among several monosaccharides is D-glucose. After a comprehensive analysis of conserved features in a structure-based multiple sequence alignment of homologous proteins, we could classify this enzyme into the GMC oxidoreductase family. The identified homology also suggests a three-dimensional protein structure similar to the functionally related glucose oxidase.
Methods for Optimizing Antiviral Combination Therapies
Bioinformatics (Oxford, England). 2003 | Pubmed ID: 12855433
MOTIVATION: Despite some progress with antiretroviral combination therapies, therapeutic success in the management of HIV-infected patients is limited. The evolution of drug-resistant genetic variants in response to therapy plays a key role in treatment failure and finding a new potent drug combination after therapy failure is considered challenging. RESULTS: To estimate the activity of a drug combination against a particular viral strain, we develop a scoring function whose independent variables describe a set of antiviral agents and viral DNA sequences coding for the molecular targets of the respective drugs. The construction of this activity score involves (1) predicting phenotypic drug resistance from genotypes for each drug individually, (2) probabilistic modeling of predicted resistance values and integration into a score for drug combinations, and (3) searching through the mutational neighborhood of the considered strain in order to estimate activity on nearby mutants. For a clinical data set, we determine the optimal search depth and show that the scoring scheme is predictive of therapeutic outcome. Properties of the activity score and applications are discussed.
Simple Consensus Procedures Are Effective and Sufficient in Secondary Structure Prediction
Protein Engineering. Jul, 2003 | Pubmed ID: 12915722
We have analyzed the performance of majority voting on minimal combination sets of three state-of-the-art secondary structure prediction methods in order to obtain a consensus prediction. Using three large benchmark sets from the EVA server, our results show a significant improvement in the average Q3 prediction accuracy of up to 1.5 percentage points by consensus formation. The application of an additional trivial filtering procedure for predicted secondary structure elements that are too short, does not significantly affect the prediction accuracy. Our analysis also provides valuable insight into the similarity of the results of the prediction methods that we combine as well as the higher confidence in consistently predicted secondary structure.
Microarrays: How Many Do You Need?
Journal of Computational Biology : a Journal of Computational Molecular Cell Biology. 2003 | Pubmed ID: 12935350
We estimate the number of microarrays that is required in order to gain reliable results from a common type of study: the pairwise comparison of different classes of samples. We show that current knowledge allows for the construction of models that look realistic with respect to searches for individual differentially expressed genes and derive prototypical parameters from real data sets. Such models allow investigation of the dependence of the required number of samples on the relevant parameters: the biological variability of the samples within each class, the fold changes in expression that are desired to be detected, the detection sensitivity of the microarrays, and the acceptable error rates of the results. We supply experimentalists with general conclusions as well as a freely accessible Java applet at www.scai.fhg.de/special/bio/howmanyarrays/ for fine tuning simulations to their particular settings.
Tenofovir Resistance and Resensitization
Antimicrobial Agents and Chemotherapy. Nov, 2003 | Pubmed ID: 14576105
Human immunodeficiency viruses in 321 samples from tenofovir-naïve patients were retrospectively evaluated for resistance to this nucleotide analogue. All virus strains with insertions between amino acids 67 and 70 of the reverse transcriptase (n = 6) were highly resistant. Virus strains with the Q151M mutation were divided into susceptible (n = 12) and highly resistant (n = 8) viruses. This difference was due to the absence or presence of the K65R mutation, which was confirmed by site-directed mutagenesis. Viral clones with various combinations of the mutations M41L, K70R, L210W, and T215F or T215Y were analyzed for cross-resistance induced by thymidine analogue mutations (TAMs). The levels of increased resistance induced by single, double, and triple mutations at the indicated positions could be ranked as follows: for mutants with single mutations, mutations at positions 41 > 215 > 70; for mutants with double mutations, mutations at positions 41 and 215 > 70 and 215 = 210 and 215 > 41 and 70; for mutants with triple mutations, mutations at positions 41, 210, and 215 > 41, 70, and 215. Viral clones with M184V or M184I exhibited slightly increased susceptibilities to tenofovir (0.7-fold). Almost all clones with TAM-induced resistance were resensitized when M184V was present (P < 0.001). Among the viruses in the clinical samples, the rate of tenofovir resistance significantly increased with the number of TAMs both in the samples with 184M and in those with 184V (P = 0.005 and P = 0.003, respectively). A resensitizing effect of M184V was confirmed for all samples exhibiting at least one TAM (P = 0.03). However, accumulation of at least two TAMs resulted in more than 2.0-fold reduced susceptibility to tenofovir, irrespective of the presence of M184V. Decision tree building, a classical machine learning technique, was used to generate models for the interpretation of mutations with respect to tenofovir resistance. The application of previously proposed cutoffs for a reduced response to therapy and treatment failure demonstrated the central roles of positions 215 and 65 for 1.5- and 4.0-fold reduced susceptibilities, respectively. Thus, clinically relevant resistance may be conferred by the accumulation of TAMs, and the resensitizing effect of M184V should be considered only minor.
Structural Localization of Disease-associated Sequence Variations in the NACHT and LRR Domains of PYPAF1 and NOD2
FEBS Letters. Nov, 2003 | Pubmed ID: 14623123
Several autoinflammatory diseases with distinct clinical manifestations have been associated with sequence variations in the gene products PYPAF1/CIAS1 and NOD2/CARD15. Both proteins belong to the PYD/CARD-containing family of apoptosis regulators and activators of pro-inflammatory caspases. To gain insight into the dysfunctional role of sequence alterations, we assembled a structure-based multiple sequence alignment of family members and related proteins. This allowed us to analyze the putative effect of the alterations on the function of nucleotide-binding (NACHT) and leucine-rich repeat (LRR) domains shared by the family members. In support of this analysis, we carefully selected template structures for the NACHT and LRR domains and mapped the genetic variations onto 3D domain models. Additionally, we propose a model of the NACHT and LRR domain complex. Our study revealed that many of the disease-associated sequence variants are located close to highly conserved sequence regions of functional relevance and are spatially adjacent in the predicted 3D structure. The implications on the domain functions such as NTP-hydrolysis or oligomerization are discussed.
Disease-associated Variants in PYPAF1 and NOD2 Result in Similar Alterations of Conserved Sequence
Bioinformatics (Oxford, England). Nov, 2003 | Pubmed ID: 14630645
Sequence variations in the gene products PYPAF1/CIAS1 and NOD2/CARD15 have been associated with several autoinflammatory diseases that, although clinically different, share a similar inflammatory pathophysiology. A multiple sequence alignment of homologous proteins demonstrates that some of the missense variants are located in highly conserved regions of the NTPase domain and possibly impair NTP-hydrolysis. Intriguingly, one of the variations, which is found identically in PYPAF1 and NOD2, is located at the same alignment position. Our findings suggest that evolutionary gene duplication can give rise to disease families because variants affect conserved sequence in a similar fashion.
Software Tool for Automated Processing of 13C Labeling Data from Mass Spectrometric Spectra
BioTechniques. Dec, 2003 | Pubmed ID: 14682055
Protein Function from Sequence and Structure Data
Applied Bioinformatics. 2003 | Pubmed ID: 15130830
With the large amount of genomics and proteomics data that we are confronted with, computational support for the elucidation of protein function becomes more and more pressing. Many different kinds of biological data harbour signals of protein function, but these signals are often concealed. Computational methods that use protein sequence and structure data can be used for discovering these signals. They provide information that can substantially speed up experimental function elucidation. In this review we concentrate on such methods.
Novel Sm-like Proteins with Long C-terminal Tails and Associated Methyltransferases
FEBS Letters. Jul, 2004 | Pubmed ID: 15225602
Sm and Sm-like proteins of the Lsm (like Sm) domain family are generally involved in essential RNA-processing tasks. While recent research has focused on the function and structure of small family members, little is known about Lsm domain proteins carrying additional domains. Using an integrative bioinformatics approach, we discovered five novel groups of Lsm domain proteins (Lsm12-16) with long C-terminal tails and investigated their functions. All of them are evolutionarily conserved in eukaryotes with an N-terminal Lsm domain to bind nucleic acids followed by as yet uncharacterized C-terminal domains and sequence motifs. Based on known yeast interaction partners, Lsm12-16 may play important roles in RNA metabolism. Particularly, Lsm12 is possibly involved in mRNA degradation or tRNA splicing, and Lsm13-16 in the regulation of the mitotic G2/M phase. Lsm16 proteins have an additional C-terminal YjeF_N domain of as yet unknown function. The identification of an additional methyltransferase domain at the C-terminus of one of the Lsm12 proteins also led to the recognition of three new groups of methyltransferases, presumably dependent on S-adenosyl-l-methionine. Further computational analyses revealed that some methyltransferases contain putative RNA-binding helix-turn-helix domains and zinc fingers.
Structural and Functional Analysis of Ataxin-2 and Ataxin-3
European Journal of Biochemistry / FEBS. Aug, 2004 | Pubmed ID: 15265035
Spinocerebellar ataxia types 2 (SCA2) and 3 (SCA3) are autosomal-dominantly inherited, neurodegenerative diseases caused by CAG repeat expansions in the coding regions of the genes encoding ataxin-2 and ataxin-3, respectively. To provide a rationale for further functional experiments, we explored the protein architectures of ataxin-2 and ataxin-3. Using structure-based multiple sequence alignments of homologous proteins, we investigated domains, sequence motifs, and interaction partners. Our analyses focused on presumably functional amino acids and the construction of tertiary structure models of the RNA-binding Lsm domain of ataxin-2 and the deubiquitinating Josephin domain of ataxin-3. We also speculate about distant evolutionary relationships of ubiquitin-binding UIM, GAT, UBA and CUE domains and helical ANTH and UBX domain extensions.
Association of HCV-related Mixed Cryoglobulinemia with Specific Mutational Pattern of the HCV E2 Protein and CD81 Expression on Peripheral B Lymphocytes
Blood. Aug, 2004 | Pubmed ID: 15294858
Automated Clustering of Ensembles of Alternative Models in Protein Structure Databases
Protein Engineering, Design & Selection : PEDS. Jun, 2004 | Pubmed ID: 15319469
Experimentally determined protein structures have been classified in different public databases according to their structural and evolutionary relationships. Frequently, alternative structural models, determined using X-ray crystallography or NMR spectroscopy, are available for a protein. These models can present significant structural dissimilarity. Currently there is no classification available for these alternative structures. In order to classify them, we developed STRuster, an automated method for clustering ensembles of structural models according to their backbone structure. The method is based on the calculation of carbon alpha (Calpha) distance matrices. Two filters are applied in the calculation of the dissimilarity measure in order to identify both large and small (but significant) backbone conformational changes. The resulting dissimilarity value is used for hierarchical clustering and partitioning around medoids (PAM). Hierarchical clustering reflects the hierarchy of similarities between all pairs of models, while PAM groups the models into the 'optimal' number of clusters. The method has been applied to cluster the structures in each SCOP species level and can be easily applied to any other sets of conformers. The results are available at: http://bioinf.mpi-sb.mpg.de/projects/struster/.
Ensemble Methods for Classification in Cheminformatics
Journal of Chemical Information and Computer Sciences. Nov-Dec, 2004 | Pubmed ID: 15554666
We describe the application of ensemble methods to binary classification problems on two pharmaceutical compound data sets. Several variants of single and ensembles models of k-nearest neighbors classifiers, support vector machines (SVMs), and single ridge regression models are compared. All methods exhibit robust classification even when more features are given than observations. On two data sets dealing with specific properties of drug-like substances (cytochrome P450 inhibition and "Frequent Hitters", i.e., unspecific protein inhibition), we achieve classification rates above 90%. We are able to reduce the cross-validated misclassification rate for the Frequent Hitters problem by a factor of 2 compared to previous results obtained for the same data set with different modeling techniques.
Calculating the Statistical Significance of Changes in Pathway Activity from Gene Expression Data
Statistical Applications in Genetics and Molecular Biology. 2004 | Pubmed ID: 16646794
We present a statistical approach to scoring changes in activity of metabolic pathways from gene expression data. The method identifies the biologically relevant pathways with corresponding statistical significance. Based on gene expression data alone, only local structures of genetic networks can be recovered. Instead of inferring such a network, we propose a hypothesis-based approach. We use given knowledge about biological networks to improve sensitivity and interpretability of findings from microarray experiments. Recently introduced methods test if members of predefined gene sets are enriched in a list of top-ranked genes in a microarray study. We improve this approach by defining scores that depend on all members of the gene set and that also take pairwise co-regulation of these genes into account. We calculate the significance of co-regulation of gene sets with a nonparametric permutation test. On two data sets the method is validated and its biological relevance is discussed. It turns out that useful measures for co-regulation of genes in a pathway can be identified adaptively. We refine our method in two aspects specific to pathways. First, to overcome the ambiguity of enzyme-to-gene mappings for a fixed pathway, we introduce algorithms for selecting the best fitting gene for a specific enzyme in a specific condition. In selected cases, functional assignment of genes to pathways is feasible. Second, the sensitivity of detecting relevant pathways is improved by integrating information about pathway topology. The distance of two enzymes is measured by the number of reactions needed to connect them, and enzyme pairs with a smaller distance receive a higher weight in the score calculation.
Novel Technologies for Virtual Screening
Drug Discovery Today. Jan, 2004 | Pubmed ID: 14761803
There are several methods for virtual screening of databases of small organic compounds to find tight binders to a given protein target. Recent reviews in Drug Discovery Today have concentrated on screening by docking and by pharmacophore searching. Here, we complement these reviews by focusing on virtual screening methods that are based on analyzing ligand similarity on a structural level. Specifically, we concentrate on methods that exploit structural properties of the complete ligand molecules, as opposed to using just partial structural templates, such as pharmacophores. The in silico procedure of virtual screening (VS) and its relationship to the experimental procedure, HTS, is discussed, new developments in the field are summarized and perspectives on future research are offered.
Survey on the PABC Recognition Motif PAM2
Biochemical and Biophysical Research Communications. Mar, 2004 | Pubmed ID: 15003521
The PABP-interacting motif PAM2 has been identified in various eukaryotic proteins as an important binding site for the PABC domain. This domain is contained in homologs of the poly(A)-binding protein PABP and the ubiquitin-protein ligase HYD. Despite the importance of the PAM2 motif, a comprehensive analysis of its occurrence in different proteins has been missing. Using iterated sequence profile searches, we obtained an extensive list of proteins carrying the PAM2 motif. We discuss their functional context and domain architecture, which often consists of RNA-binding domains. Our list of PAM2 motif proteins includes eukaryotic homologs of eRF3/GSPT1/2, PAIP1/2, Tob1/2, Ataxin-2, RBP37, RBP1, Blackjack, HELZ, TPRD, USP10, ERD15, C1D4.14, and the viral protease P29. The identification of the PAM2 motif in as yet uncharacterized proteins can give valuable hints with respect to their cellular function and potential interaction partners and suggests further experimentation. It is also striking that the PAM2 motif appears to occur solely outside globular protein domains.
Arby: Automatic Protein Structure Prediction Using Profile-profile Alignment and Confidence Measures
Bioinformatics (Oxford, England). Sep, 2004 | Pubmed ID: 15059818
Arby is a new server for protein structure prediction that combines several homology-based methods for predicting the three-dimensional structure of a protein, given its sequence. The methods used include a threading approach, which makes use of structural information, and a profile-profile alignment approach that incorporates secondary structure predictions. The combination of the different methods with the help of empirically derived confidence measures affords reliable template selection.
Genetic Variation in DLG5 is Associated with Inflammatory Bowel Disease
Nature Genetics. May, 2004 | Pubmed ID: 15107852
Crohn disease and ulcerative colitis are two subphenotypes of inflammatory bowel disease (IBD), a complex disorder resulting from gene-environment interaction. We refined our previously defined linkage region for IBD on chromosome 10q23 and used positional cloning to identify genetic variants in DLG5 associated with IBD. DLG5 encodes a scaffolding protein involved in the maintenance of epithelial integrity. We identified two distinct haplotypes with a replicable distortion in transmission (P = 0.000023 and P = 0.004 for association with IBD, P = 0.00012 and P = 0.04 for association with Crohn disease). One of the risk-associated DLG5 haplotypes is distinguished from the common haplotype by a nonsynonymous single-nucleotide polymorphism 113G-->A, resulting in the amino acid substitution R30Q in the DUF622 domain of DLG5. This mutation probably impedes scaffolding of DLG5. We stratified the study sample according to the presence of risk-associated CARD15 variants to study potential gene-gene interaction. We found a significant difference in association of the 113A DLG5 variant with Crohn disease in affected individuals carrying the risk-associated CARD15 alleles versus those carrying non-risk-associated CARD15 alleles. This is suggestive of a complex pattern of gene-gene interaction between DLG5 and CARD15, reflecting the complex nature of polygenic diseases. Further functional studies will evaluate the biological significance of DLG5 variants.
The HIN Domain of IFI-200 Proteins Consists of Two OB Folds
Biochemical and Biophysical Research Communications. Feb, 2005 | Pubmed ID: 15649401
The interferon-inducible p200 (IFI-200/HIN-200) family of proteins regulates cell growth and differentiation, and confers resistance to the development of tumors and virus infections. IFI-200 family members are thought to exert their biological effects by modulation of the transcriptional activities of numerous factors and interaction with other proteins through the C-terminal HIN domains. However, the HIN domain structure and function have remained obscure. Therefore, we performed a comprehensive bioinformatics analysis and assembled a structure-based multiple sequence alignment of IFI-200 proteins. The application of fold recognition methods revealed that the HIN domain consists of two consecutive OB domains. Our structural models of DNA-binding HIN domains afford the long-sought interpretations for many previous experimental observations. Our results also raise the possibility of as yet unexplored functional roles of IFI-200 proteins as transcriptional regulators and as interaction partners of proteins involved in immunomodulatory and apoptotic processes.
Mtreemix: a Software Package for Learning and Using Mixture Models of Mutagenetic Trees
Bioinformatics (Oxford, England). May, 2005 | Pubmed ID: 15657098
SUMMARY: Mixture models of mutagenetic trees constitute a class of probabilistic models for describing evolutionary processes that are characterized by the accumulation of permanent genetic changes. They have been applied to model the accumulation of chromosomal gains and losses in tumor development and the development of drug resistance-associated mutations in the HIV genome.Mtreemix is a software package for estimating mutagenetic trees mixture models from observed cross-sectional data and for using these models for predictions. We provide programs for model fitting, model selection, simulation, likelihood computation and waiting time estimation. AVAILABILITY: Mtreemix, including source code, documentation, sample data files and precompiled Solaris and Linux binaries, is freely available for non-commercial users at http://mtreemix.bioinf.mpi-sb.mpg.de/
An Integrative Approach to Gain Insights into the Cellular Function of Human Ataxin-2
Journal of Molecular Biology. Feb, 2005 | Pubmed ID: 15663938
Spinocerebellar ataxia type 2 (SCA2) is a hereditary neurodegenerative disorder caused by a trinucleotide expansion in the SCA2 gene, encoding a polyglutamine stretch in the gene product ataxin-2 (ATX2), whose cellular function is unknown. However, ATX2 interacts with A2BP1, a protein containing an RNA-recognition motif, and the existence of an interaction motif for the C-terminal domain of the poly(A)-binding protein (PABC) as well as an Lsm (Like Sm) domain in ATX2 suggest that ATX2 like its yeast homolog Pbp1 might be involved in RNA metabolism. Here, we show that, similar to Pbp1, ATX2 suppresses the petite (pet-) phenotype of Deltamrs2 yeast strains lacking mitochondrial group II introns. This finding points to a close functional relationship between the two homologs. To gain insight into potential functions of ATX2, we also generated a comprehensive protein interaction network for Pbp1 from publicly available databases, which implicates Pbp1 in diverse RNA-processing pathways. The functional relationship of ATX2 and Pbp1 is further corroborated by the experimental confirmation of the predicted interaction of ATX2 with the cytoplasmic poly(A)-binding protein 1 (PABP) using yeast-2-hybrid analysis as well as co-immunoprecipitation experiments. Immunofluorescence studies revealed that ATX2 and PABP co-localize in mammalian cells, remarkably, even under conditions in which PABP accumulates in distinct cytoplasmic foci representing sites of mRNA triage.
Estimating Cancer Survival and Clinical Outcome Based on Genetic Tumor Progression Scores
Bioinformatics (Oxford, England). May, 2005 | Pubmed ID: 15705654
In cancer research, prediction of time to death or relapse is important for a meaningful tumor classification and selecting appropriate therapies. Survival prognosis is typically based on clinical and histological parameters. There is increasing interest in identifying genetic markers that better capture the status of a tumor in order to improve on existing predictions. The accumulation of genetic alterations during tumor progression can be used for the assessment of the genetic status of the tumor. For modeling dependences between the genetic events, evolutionary tree models have been applied.
Sarcoidosis is Associated with a Truncating Splice Site Mutation in BTNL2
Nature Genetics. Apr, 2005 | Pubmed ID: 15735647
Sarcoidosis is a polygenic immune disorder with predominant manifestation in the lung. Genome-wide linkage analysis previously indicated that the extended major histocompatibility locus on chromosome 6p was linked to susceptibility to sarcoidosis. Here, we carried out a systematic three-stage SNP scan of 16.4 Mb on chromosome 6p21 in as many as 947 independent cases of familial and sporadic sarcoidosis and found that a 15-kb segment of the gene butyrophilin-like 2 (BTNL2) was associated with the disease. The primary disease-associated variant (rs2076530; P(TDT) = 3 x 10(-6), P(case-control) = 1.1 x 10(-8); replication P(TDT) = 0.0018, P(case-control) = 1.8 x 10(-6)) represents a risk factor that is independent of variation in HLA-DRB1. BTNL2 is a member of the immunoglobulin superfamily and has been implicated as a costimulatory molecule involved in T-cell activation on the basis of its homology to B7-1. The G --> A transition constituting rs2076530 leads to the use of a cryptic splice site located 4 bp upstream of the affected wild-type donor site. Transcripts of the risk-associated allele have a premature stop in the spliced mRNA. The resulting protein lacks the C-terminal IgC domain and transmembrane helix, thereby disrupting the membrane localization of the protein, as shown in experiments using green fluorescent protein and V5 fusion proteins.
Synthesis and Evaluation of (pyridylmethylene)tetrahydronaphthalenes/-indanes and Structurally Modified Derivatives: Potent and Selective Inhibitors of Aldosterone Synthase
Journal of Medicinal Chemistry. Mar, 2005 | Pubmed ID: 15743198
Elevated aldosterone levels are key effectors for the development and progression of congestive heart failure and myocardial fibrosis. Recently, we proposed inhibition of aldosterone synthase (CYP11B2) as an innovative strategy for the treatment of these diseases. In this study, the synthesis and biological evaluation of E- and Z-(pyridylmethylene)tetrahydronaphthalenes and -indanes (1a,b-38a) is described. The activity of the compounds was determined using human CYP11B2, and the selectivity was evaluated toward the human steroidogenic enzymes CYP11B1, CYP19, and CYP17. The biological results revealed a few rather selective inhibitors of CYP11B1, some compounds inhibiting both CYP11B1 and CYP11B2, and a large number of highly selective inhibitors of CYP11B2. The most active inhibitor was the 3-pyridyl compound 5a (IC(50) = 7 nM). The pyrimidyl-substituted derivative 28a was found to be the most selective CYP11B2 inhibitor (IC(50) = 27 nM) in this series, showing a 120-fold selectivity for CYP11B1 (IC(50) = 3179 nM). Molecular modeling, i.e., examination of the electronic and steric features of selected compounds and homology modeling and docking, was used to understand the structure-activity/-selectivity relationships.
Synthesis and Evaluation of Imidazolylmethylenetetrahydronaphthalenes and Imidazolylmethyleneindanes: Potent Inhibitors of Aldosterone Synthase
Journal of Medicinal Chemistry. Mar, 2005 | Pubmed ID: 15771425
Elevated plasma aldosterone levels play a detrimental role in certain forms of congestive heart failure and myocardial fibrosis. We proposed aldosterone synthase (CYP11B2) as a novel target for the treatment of these diseases. In this study, the synthesis and biological evaluation of substituted E- and Z-imidazolylmethylenetetrahydronaphthalenes and E- and Z-imidazolylmethyleneindanes (compounds 1a,b-9a,b) is described. The compounds were prepared by a Wittig-like reaction. They were tested for activity using bovine CYP11B and human CYP11B2 expressed in fission yeast and V79 MZh cells. Selectivity was determined toward human CYP11B1, CYP19, and CYP17. Especially in the case of CYP11B1 (steroid 11beta-hydroxylase), selectivity is a crucial issue, since sequence homology between this enzyme and the target enzyme is very high (93%). On the basis of the X-ray structure of human CYP2C9, a protein model of CYP11B2 was developed and docking experiments with the title compounds were performed. The biological results revealed highly potent inhibitors of CYP11B2 (IC(50) = 4-93 nM). The Z-isomers usually were more active than the corresponding E-isomers. Different inhibitory profiles could be observed: rather selective inhibitors of CYP11B1, dual inhibitors of both enzymes, and rather selective inhibitors of CYP11B2. The chloro derivative 8b was found to be a highly potent CYP11B2 inhibitor (IC(50) = 4 nM) showing a 5-fold selectivity for CYP11B1 (IC(50) = 20 nM). This compound could be an interesting lead for further optimization as a therapeutic agent. It also could be used as well as the CYP11B1 selective compounds as a pharmacological tool.
A New CARD15 Mutation in Blau Syndrome
European Journal of Human Genetics : EJHG. Jun, 2005 | Pubmed ID: 15812565
The caspase recruitment domain gene CARD15/NOD2, encoding a cellular receptor involved in an NF-kappaB-mediated pathway of innate immunity, was first identified as a major susceptibility gene for Crohn's disease (CD), and more recently, as responsible for Blau syndrome (BS), a rare autosomal-dominant trait characterized by arthritis, uveitis, skin rash and granulomatous inflammation. While CARD15 variants associated with CD are located within or near the C-terminal leucine-rich repeat domain and cause decreased NF-kappaB activation, BS mutations affect the central nucleotide-binding NACHT domain and result in increased NF-kappaB activation. In an Italian family with BS, we detected a novel mutation E383K, whose pathogenicity is strongly supported by cosegregation with the disease in the family and absence in controls, and by the evolutionary conservation and structural role of the affected glutamate close to the Walker B motif of the nucleotide-binding site in the NACHT domain. Interestingly, substitutions at corresponding positions in another NACHT family member cause similar autoinflammatory phenotypes.
Estimating HIV Evolutionary Pathways and the Genetic Barrier to Drug Resistance
The Journal of Infectious Diseases. Jun, 2005 | Pubmed ID: 15871130
The evolution of drug-resistant viruses challenges the management of human immunodeficiency virus (HIV) infections. Understanding this evolutionary process is important for the design of effective therapeutic strategies.
Confirmation of Human Protein Interaction Data by Human Expression Data
BMC Bioinformatics. 2005 | Pubmed ID: 15877815
With microarray technology the expression of thousands of genes can be measured simultaneously. It is well known that the expression levels of genes of interacting proteins are correlated significantly more strongly in Saccharomyces cerevisiae than those of proteins that are not interacting. The objective of this work is to investigate whether this observation extends to the human genome.
Structural and Functional Analysis of a Novel Mutation of CYP21B in a Heterozygote Carrier of 21-hydroxylase Deficiency
Human Genetics. Oct, 2005 | Pubmed ID: 16028060
Congenital adrenal hyperplasia (CAH) due to 21-hydroxylase deficiency is one of the most common autosomal recessive disorders and occurs in its non-classical form in up to 6% of hirsute women. We report on a young woman with the clinical diagnosis of non-classical CAH and a novel, heterozygous missense mutation CTG-->GTG in exon 8, codon 317, of the steroid 21-hydroxylase CYP21B and complete loss of pseudogenes. Protein sequences of closely related P450 cytochromes and a homology-based 3D model of CYP21B were used for further functional analyses. We found that the mutated residue is part of a large cluster of hydrophobic residues. This cluster has three important features: (1) it is located directly next to the binding pocket, in close vicinity of the heme-cofactor, (2) all amino acids of the cluster are directly connected to two important binding regions, and (3) the packing within the cluster is very dense. Due to the tight packing in the cluster and its direct connection to the binding pocket region, any changes induced by the mutation of residue 317 can be expected to lead to structural shifts within the binding pocket and can explain the clinically observed impairment of 21-hydroxylase activity. In conclusion, the novel mutation L317V of the steroid 21-hydroxylase gene is associated with reduced steroid 21-hydroxylase activity probably due to structural shifts within the binding pocket and a mild phenotype of steroid 21-hydroxylase deficiency. In addition, the results support previous findings in which heterozygous CYP21 mutations are associated with symptoms of hyperandrogenism in susceptible individuals.
ROCR: Visualizing Classifier Performance in R
Bioinformatics (Oxford, England). Oct, 2005 | Pubmed ID: 16096348
ROCR is a package for evaluating and visualizing the performance of scoring classifiers in the statistical language R. It features over 25 performance measures that can be freely combined to create two-dimensional performance curves. Standard methods for investigating trade-offs between specific performance measures are available within a uniform framework, including receiver operating characteristic (ROC) graphs, precision/recall plots, lift charts and cost curves. ROCR integrates tightly with R's powerful graphics capabilities, thus allowing for highly adjustable plots. Being equipped with only three commands and reasonable default values for optional parameters, ROCR combines flexibility with ease of usage. AVAILABILITY: http://rocr.bioinf.mpi-sb.mpg.de. ROCR can be used under the terms of the GNU General Public License. Running within R, it is platform-independent. CONTACT: tobias.sing@mpi-sb.mpg.de.
Dissection of the Inflammatory Bowel Disease Transcriptome Using Genome-wide CDNA Microarrays
PLoS Medicine. Aug, 2005 | Pubmed ID: 16107186
The differential pathophysiologic mechanisms that trigger and maintain the two forms of inflammatory bowel disease (IBD), Crohn disease (CD), and ulcerative colitis (UC) are only partially understood. cDNA microarrays can be used to decipher gene regulation events at a genome-wide level and to identify novel unknown genes that might be involved in perpetuating inflammatory disease progression.
Learning Multiple Evolutionary Pathways from Cross-sectional Data
Journal of Computational Biology : a Journal of Computational Molecular Cell Biology. Jul-Aug, 2005 | Pubmed ID: 16108705
We introduce a mixture model of trees to describe evolutionary processes that are characterized by the ordered accumulation of permanent genetic changes. The basic building block of the model is a directed weighted tree that generates a probability distribution on the set of all patterns of genetic events. We present an EM-like algorithm for learning a mixture model of K trees and show how to determine K with a maximum likelihood approach. As a case study, we consider the accumulation of mutations in the HIV-1 reverse transcriptase that are associated with drug resistance. The fitted model is statistically validated as a density estimator, and the stability of the model topology is analyzed. We obtain a generative probabilistic model for the development of drug resistance in HIV that agrees with biological knowledge. Further applications and extensions of the model are discussed.
Ataxin-2 and Huntingtin Interact with Endophilin-A Complexes to Function in Plastin-associated Pathways
Human Molecular Genetics. Oct, 2005 | Pubmed ID: 16115810
Spinocerebellar ataxia type 2 is an inherited neurodegenerative disorder that is caused by an expanded trinucleotide repeat in the SCA2 gene, encoding a polyglutamine stretch in the gene product ataxin-2. Although evidence has been provided that ataxin-2 is involved in RNA metabolism, the physiological function of ataxin-2 remains unclear. Here, we demonstrate that ataxin-2 interacts with two members of the endophilin family, endophilin-A1 and endophilin-A3. To elucidate the physiological implications of these interactions, we exploited yeast as a model system and discovered that expression of ataxin-2 as well as both endophilin proteins is toxic for yeast lacking the SAC6 gene product fimbrin, a protein involved in actin filament organization and endocytotic processes. Intriguingly, expression of huntingtin, another polyglutamine protein interacting with endophilin-A3, was also toxic in Deltasac6 yeast. These effects can be suppressed by simultaneous expression of one of the two human fimbrin orthologs, L- or T-plastin. Moreover, we have discovered that ataxin-2 associates with L- and T-plastin and that overexpression of ataxin-2 leads to accumulation of T-plastin in mammalian cells. Thus, our findings suggest an interplay between ataxin-2, endophilin proteins and huntingtin in plastin-associated cellular pathways.
BiQ Analyzer: Visualization and Quality Control for DNA Methylation Data from Bisulfite Sequencing
Bioinformatics (Oxford, England). Nov, 2005 | Pubmed ID: 16141249
SUMMARY: Manual processing of DNA methylation data from bisulfite sequencing is a tedious and error-prone task. Here we present an interactive software tool that provides start-to-end support for this process. In an easy-to-use manner, the tool helps the user to import the sequence files from the sequencer, to align them, to exclude or correct critical sequences, to document the experiment, to perform basic statistics and to produce publication-quality diagrams. Emphasis is put on quality control: The program automatically assesses data quality and provides warnings and suggestions for dealing with critical sequences. The BiQ Analyzer program is implemented in the Java programming language and runs on any platform for which a recent Java virtual machine is available. AVAILABILITY: The program is available without charge for non-commercial users and can be downloaded from http://biq-analyzer.bioinf.mpi-inf.mpg.de/
Computational Methods for the Design of Effective Therapies Against Drug Resistant HIV Strains
Bioinformatics (Oxford, England). Nov, 2005 | Pubmed ID: 16144807
The development of drug resistance is a major obstacle to successful treatment of HIV infection. The extraordinary replication dynamics of HIV facilitates its escape from selective pressure exerted by the human immune system and by combination drug therapy. We have developed several computational methods whose combined use can support the design of optimal antiretroviral therapies based on viral genomic data.
Automatic Generation of Complementary Descriptors with Molecular Graph Networks
Journal of Chemical Information and Modeling. Sep-Oct, 2005 | Pubmed ID: 16180893
We describe a method for the automatic generation of weakly correlated descriptors for molecular data sets. The method can be regarded as a statistical learning procedure that turns the molecular graph, representing the 2D formula of the compound, into an adaptive whole molecule composite descriptor. By translating the molecular graph structure into a dynamical system, the algorithm can compute an output value that is highly sensitive to the molecular topology. This system can be trained by gradient descent techniques, which rely on the efficient calculation of the gradient by back-propagation. We present computational experiments concerning the classification of the Developmental Therapeutics Program AIDS antiviral screen data set on which the performance of the method compares with that of approaches based on substructure comparison.
POEM: Parameter Optimization Using Ensemble Methods: Application to Target Specific Scoring Functions
Journal of Chemical Information and Modeling. Sep-Oct, 2005 | Pubmed ID: 16180906
In computational biology processes such as docking, binding, and folding are often described by simplified, empirical models. These models are fitted to physical properties of the process by adjustable parameters. An appropriate choice of these parameters is crucial for the quality of the models. Locating the best choices for the parameters is often is a difficult task, depending on the complexity of the model. We describe a new method and program, POEM (Parameter Optimization using Ensemble Methods), for this task. In POEM we combine the DOE (Design Of Experiment) procedure with ensembles of different regression methods. We apply the method to the optimization of target specific scoring functions in molecular docking. The method consists of an iterative procedure that uses alternate evaluation and prediction steps. During each cycle of optimization we fit an approximate function to a defined loss function landscape and improve the quality of this fit from cycle to cycle by constantly augmenting our data set. As test applications we fitted the FlexX and Screenscore scoring functions to the kinase and ATPase protein classes. The results are promising: Starting from random parameters we are able to locate parameter sets which show superior performance compared to the original values. The POEM approach converges quickly and the approximated loss function landscapes are smooth, thus making the approach a suitable method for optimizations on rugged landscapes.
Decomposing Protein Networks into Domain-domain Interactions
Bioinformatics (Oxford, England). Sep, 2005 | Pubmed ID: 16204107
The application of novel experimental techniques has generated large networks of protein-protein interactions. Frequently, important information on the structure and cellular function of protein-protein interactions can be gained from the domains of interacting proteins. We have designed a Cytoscape plugin that decomposes interacting proteins into their respective domains and computes a putative network of corresponding domain-domain interactions. To this end, the network graph of proteins has been extended by additional node and edge types for domain interactions, including different node and edge shapes and coloring schemes used for visualization. An additional plugin provides supplementary web links to Internet resources on domain function and structure. AVAILABILITY: Both Cytoscape plugins can be downloaded from http://www.cytoscape.org
Multiple-ligand-based Virtual Screening: Methods and Applications of the MTree Approach
Journal of Medicinal Chemistry. Oct, 2005 | Pubmed ID: 16220974
We present a novel approach for ligand-based virtual screening by combining query molecules into a multiple feature tree model called MTree. All molecules are described by the established feature tree descriptor, which is derived from a topological molecular graph. A new pairwise alignment algorithm leads to a consistent topological molecular alignment based on chemically reasonable matching of corresponding functional groups. These multiple feature tree models find application in ligand-based virtual screening to identify new lead structures for chemical optimization. Retrospective virtual screening with MTree models generated for angiotensin-converting enzyme and the alpha1a receptor on a large candidate database yielded enrichment factors up to 71 for the first 1% of the screened database. MTree models outperformed database searches using single feature trees in terms of hit rates and quality and additionally identified alternative molecular scaffolds not included in any of the query molecules. Furthermore, relevant molecular features, which are known to be important for affinity to the target, are identified by this new methodology.
Clinical Significance of in Vitro Replication-enhancing Mutations of the Hepatitis C Virus (HCV) Replicon in Patients with Chronic HCV Infection
The Journal of Infectious Diseases. Nov, 2005 | Pubmed ID: 16235168
Mutations in nonstructural (NS) hepatitis C virus (HCV) proteins enhance replication in HCV-1a/b replicons. The prevalence of such mutations and their clinical significance in vivo are unknown.
Patients with High-grade Gliomas Harboring Deletions of Chromosomes 9p and 10q Benefit from Temozolomide Treatment
Neoplasia (New York, N.Y.). Oct, 2005 | Pubmed ID: 16242071
Surgical cure of glioblastomas is virtually impossible and their clinical course is mainly determined by the biologic behavior of the tumor cells and their response to radiation and chemotherapy. We investigated whether response to temozolomide (TMZ) chemotherapy differs in subsets of malignant glioblastomas defined by genetic lesions. Eighty patients with newly diagnosed glioblastoma were analyzed with comparative genomic hybridization and loss of heterozygosity. All patients underwent radical resection. Fifty patients received TMZ after radiotherapy (TMZ group) and 30 patients received radiotherapy alone (RT group). The most common aberrations detected were gains of parts of chromosome 7 and losses of 10q, 9p, or 13q. The spectrum of genetic aberrations did not differ between the TMZ and RT groups. Patients treated with TMZ showed significantly better survival than patients treated with radiotherapy alone (19.5 vs 9.3 months). Genomic deletions on chromosomes 9 and 10 are typical for glioblastoma and associated with poor prognosis. However, patients with these aberrations benefited significantly from TMZ in univariate analysis. In multivariate analysis, this effect was pronounced for 9p deletion and for elderly patients with 10q deletions, respectively. This study demonstrates that molecular genetic and cytogenetic analyses potentially predict responses to chemotherapy in patients with newly diagnosed glioblastomas.
Diversity and Functional Plasticity of Eukaryotic Selenoproteins: Identification and Characterization of the SelJ Family
Proceedings of the National Academy of Sciences of the United States of America. Nov, 2005 | Pubmed ID: 16260744
Selenoproteins are a diverse group of proteins that contain selenocysteine (Sec), the 21st amino acid. In the genetic code, UGA serves as a termination signal and a Sec codon. This dual role has precluded the automatic annotation of selenoproteins. Recent advances in the computational identification of selenoprotein genes have provided a first glimpse of the size, functions, and phylogenetic diversity of eukaryotic selenoproteomes. Here, we describe the identification of a selenoprotein family named SelJ. In contrast to known selenoproteins, SelJ appears to be restricted to actinopterygian fishes and sea urchin, with Cys homologues only found in cnidarians. SelJ shows significant similarity to the jellyfish J1-crystallins and with them constitutes a distinct subfamily within the large family of ADP-ribosylation enzymes. Consistent with its potential role as a structural crystallin, SelJ has preferential and homogeneous expression in the eye lens in early stages of zebrafish development. A structural role for SelJ would be in contrast to the majority of known selenoenzymes. The unusually highly restricted phylogenetic distribution of SelJ, its specialization, and the comparative analysis of eukaryotic selenoproteomes reveal the diversity and functional plasticity of selenoproteins and point to a mosaic evolution of the use of Sec in proteins.
Local Protein Structure Prediction Using Discriminative Models
BMC Bioinformatics. 2006 | Pubmed ID: 16405736
In recent years protein structure prediction methods using local structure information have shown promising improvements. The quality of new fold predictions has risen significantly and in fold recognition incorporation of local structure predictions led to improvements in the accuracy of results. We developed a local structure prediction method to be integrated into either fold recognition or new fold prediction methods. For each local sequence window of a protein sequence the method predicts probability estimates for the sequence to attain particular local structures from a set of predefined local structure candidates. The first step is to define a set of local structure representatives based on clustering recurrent local structures. In the second step a discriminative model is trained to predict the local structure representative given local sequence information.
NOXclass: Prediction of Protein-protein Interaction Types
BMC Bioinformatics. 2006 | Pubmed ID: 16423290
Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available.
A Fully Computational Model for Predicting Percutaneous Drug Absorption
Journal of Chemical Information and Modeling. Jan-Feb, 2006 | Pubmed ID: 16426076
The prediction of transdermal absorption for arbitrary penetrant structures has several important applications in the pharmaceutical industry. We propose a new data-driven, predictive model for skin permeability coefficients k(p) based on an ensemble model using k-nearest-neighbor models and ridge regression. The model was trained and validated with a newly assembled data set containing experimental data and structures for 110 compounds. On the basis of three purely computational descriptors (molecular weight, calculated octanol/water partition coefficient, and solvation free energy), we have developed a model allowing for the reliable, purely computational prediction of skin permeability coefficients. The model is both accurate and robust, as we showed in an extensive validation (correlation coefficient for leave-one-out cross validation: Q = 0.948, mean standard error: 0.2 for log k(p)).
Recco: Recombination Analysis Using Cost Optimization
Bioinformatics (Oxford, England). May, 2006 | Pubmed ID: 16488909
MOTIVATION: Recombination plays an important role in the evolution of many pathogens, such as HIV or malaria. Despite substantial prior work, there is still a pressing need for efficient and effective methods of detecting recombination and analyzing recombinant sequences. RESULTS: We introduce Recco, a novel fast method that, given a multiple sequence alignment, scores the cost of obtaining one of the sequences from the others by mutation and recombination. The algorithm comes with an illustrative visualization tool for locating recombination breakpoints. We analyze the sequence alignment with respect to all choices of the parameter alpha weighting recombination cost against mutation cost. The analysis of the resulting cost curve yields additional information as to which sequence might be recombinant. On random genealogies Recco is comparable in its power of detecting recombination with the algorithm Geneconv (Sawyer, 1989). For specific relevant recombination scenarios Recco significantly outperforms Geneconv.
Mutations in the NB-ARC Domain of I-2 That Impair ATP Hydrolysis Cause Autoactivation
Plant Physiology. Apr, 2006 | Pubmed ID: 16489136
Resistance (R) proteins in plants confer specificity to the innate immune system. Most R proteins have a centrally located NB-ARC (nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4) domain. For two tomato (Lycopersicon esculentum) R proteins, I-2 and Mi-1, we have previously shown that this domain acts as an ATPase module that can hydrolyze ATP in vitro. To investigate the role of nucleotide binding and hydrolysis for the function of I-2 in planta, specific mutations were introduced in conserved motifs of the NB-ARC domain. Two mutations resulted in autoactivating proteins that induce a pathogen-independent hypersensitive response upon expression in planta. These mutant forms of I-2 were found to be impaired in ATP hydrolysis, but not in ATP binding, suggesting that the ATP- rather than the ADP-bound state of I-2 is the active form that triggers defense signaling. In addition, upon ADP binding, the protein displayed an increased affinity for ADP suggestive of a change of conformation. Based on these data, we propose that the NB-ARC domain of I-2, and likely of related R proteins, functions as a molecular switch whose state (on/off) depends on the nucleotide bound (ATP/ADP).
CpG Island Methylation in Human Lymphocytes is Highly Correlated with DNA Sequence, Repeats, and Predicted DNA Structure
PLoS Genetics. Mar, 2006 | Pubmed ID: 16520826
CpG island methylation plays an important role in epigenetic gene control during mammalian development and is frequently altered in disease situations such as cancer. The majority of CpG islands is normally unmethylated, but a sizeable fraction is prone to become methylated in various cell types and pathological situations. The goal of this study is to show that a computational epigenetics approach can discriminate between CpG islands that are prone to methylation from those that remain unmethylated. We develop a bioinformatics scoring and prediction method on the basis of a set of 1,184 DNA attributes, which refer to sequence, repeats, predicted structure, CpG islands, genes, predicted binding sites, conservation, and single nucleotide polymorphisms. These attributes are scored on 132 CpG islands across the entire human Chromosome 21, whose methylation status was previously established for normal human lymphocytes. Our results show that three groups of DNA attributes, namely certain sequence patterns, specific DNA repeats, and a particular DNA structure, are each highly correlated with CpG island methylation (correlation coefficients of 0.64, 0.66, and 0.49, respectively). We predicted, and subsequently experimentally examined 12 CpG islands from human Chromosome 21 with unknown methylation patterns and found more than 90% of our predictions to be correct. In addition, we applied our prediction method to analyzing Human Epigenome Project methylation data on human Chromosome 6 and again observed high prediction accuracy. In summary, our results suggest that DNA composition of CpG islands (sequence, repeats, and structure) plays a significant role in predisposing CpG islands for DNA methylation. This finding may have a strong impact on our understanding of changes in CpG island methylation in development and disease.
Fully Automated Flexible Docking of Ligands into Flexible Synthetic Receptors Using Forward and Inverse Docking Strategies
Journal of Chemical Information and Modeling. Mar-Apr, 2006 | Pubmed ID: 16563022
The prediction of the structure of host-guest complexes is one of the most challenging problems in supramolecular chemistry. Usual procedures for docking of ligands into receptors do not take full conformational freedom of the host molecule into account. We describe and apply a new docking approach which performs a conformational sampling of the host and then sequentially docks the ligand into all receptor conformers using the incremental construction technique of the FlexX software platform. The applicability of this approach is validated on a set of host-guest complexes with known crystal structure. Moreover, we demonstrate that due to the interchangeability of the roles of host and guest, the docking process can be inverted. In this inverse docking mode, the receptor molecule is docked around its ligand. For all investigated test cases, the predicted structures are in good agreement with the experiment for both normal (forward) and inverse docking. Since the ligand is often smaller than the receptor and, thus, its conformational space is more restricted, the inverse docking approach leads in most cases to considerable speed-up. By having the choice between two alternative docking directions, the application range of the method is significantly extended. Finally, an important result of this study is the suitability of the simple energy function used here for structure prediction of complexes in organic media.
Computational Recognition of Potassium Channel Sequences
Bioinformatics (Oxford, England). Jul, 2006 | Pubmed ID: 16595554
MOTIVATION: Potassium channels are mainly known for their role in regulating and maintaining the membrane potential. Since this is one of the key mechanisms of signal transduction, malfunction of these potassium channels leads to a wide variety of severe diseases. Thus potassium channels are priority targets of research for new drugs, despite the fact that this protein family is highly variable and closely related to other channels, which makes it very difficult to identify new types of potassium channel sequences. RESULTS: Here we present a new method for identifying potassium channel sequences (PSM, Property Signature Method), which-in contrast to the known methods for protein classification-is directly based on physicochemical properties of amino acids rather than on the amino acids themselves. A signature for the pore region including the selectivity filter has been created, representing the most common physicochemical properties of known potassium channels. This string enables genome-wide screening for sequences with similar features despite a very low degree of amino acid similarity within a protein family.
Improved Scoring of Functional Groups from Gene Expression Data by Decorrelating GO Graph Structure
Bioinformatics (Oxford, England). Jul, 2006 | Pubmed ID: 16606683
MOTIVATION: The result of a typical microarray experiment is a long list of genes with corresponding expression measurements. This list is only the starting point for a meaningful biological interpretation. Modern methods identify relevant biological processes or functions from gene expression data by scoring the statistical significance of predefined functional gene groups, e.g. based on Gene Ontology (GO). We develop methods that increase the explanatory power of this approach by integrating knowledge about relationships between the GO terms into the calculation of the statistical significance. RESULTS: We present two novel algorithms that improve GO group scoring using the underlying GO graph topology. The algorithms are evaluated on real and simulated gene expression data. We show that both methods eliminate local dependencies between GO terms and point to relevant areas in the GO graph that remain undetected with state-of-the-art algorithms for scoring functional terms. A simulation study demonstrates that the new methods exhibit a higher level of detecting relevant biological terms than competing methods.
A New Measure for Functional Similarity of Gene Products Based on Gene Ontology
BMC Bioinformatics. 2006 | Pubmed ID: 16776819
Gene Ontology (GO) is a standard vocabulary of functional terms and allows for coherent annotation of gene products. These annotations provide a basis for new methods that compare gene products regarding their molecular function and biological role.
Involvement of Novel Human Immunodeficiency Virus Type 1 Reverse Transcriptase Mutations in the Regulation of Resistance to Nucleoside Inhibitors
Journal of Virology. Jul, 2006 | Pubmed ID: 16809324
We characterized 16 additional mutations in human immunodeficiency virus type 1 (HIV-1) reverse transcriptase (RT) whose role in drug resistance is still unknown by analyzing 1,906 plasma-derived HIV-1 subtype B pol sequences from 551 drug-naïve patients and 1,355 nucleoside RT inhibitor (NRTI)-treated patients. Twelve mutations positively associated with NRTI treatment strongly correlated both in pairs and in clusters with known NRTI resistance mutations on divergent evolutionary pathways. In particular, T39A, K43E/Q, K122E, E203K, and H208Y clustered with the nucleoside analogue mutation 1 cluster (NAM1; M41L+L210W+T215Y). Their copresence in this cluster was associated with an increase in thymidine analogue resistance. Moreover, treatment failure in the presence of K43E, K122E, or H208Y was significantly associated with higher viremia and lower CD4 cell count. Differently, D218E clustered with the NAM2 pathway (D67N+K70R+K219Q+T215F), and its presence in this cluster determined an increase in zidovudine resistance. In contrast, three mutations (V35I, I50V, and R83K) negatively associated with NRTI treatment showed negative correlations with NRTI resistance mutations and were associated with increased susceptibility to specific NRTIs. In particular, I50V negatively correlated with the lamivudine-selected mutation M184V and was associated with a decrease in M184V/lamivudine resistance, whereas R83K negatively correlated with both NAM1 and NAM2 clusters and was associated with a decrease in thymidine analogue resistance. Finally, the association pattern of the F214L polymorphism revealed its propensity for the NAM2 pathway and its strong negative association with the NAM1 pathway. Our study provides evidence of novel RT mutational patterns that regulate positively and/or negatively NRTI resistance and strongly suggests that other mutations beyond those currently known to confer resistance should be considered for improved prediction of clinical response to antiretroviral drugs.
Amino Acid Variations in Hepatitis C Virus P7 and Sensitivity to Antiviral Combination Therapy with Amantadine in Chronic Hepatitis C
Antiviral Therapy. 2006 | Pubmed ID: 16856625
Formation of transmembrane ion channels by hepatitis C virus (HCV) p7 and abrogation of channel function by amantadine was demonstrated in vitro. The relevance of HCV p7 amino acid (aa) variations for response to antiviral therapy with amantadine is unknown.
Flexible Docking of Ligands into Synthetic Receptors Using a Two-sided Incremental Construction Algorithm
Journal of Chemical Information and Modeling. Jul-Aug, 2006 | Pubmed ID: 16859301
We present a new algorithm for the fast and reliable structure prediction of synthetic receptor-ligand complexes. Our method is based on the protein-ligand docking program FlexX and extends our recently introduced docking technique for synthetic receptors, which has been implemented in the program FlexR. To handle the flexibility of the relevant molecules, we apply a novel docking strategy that uses an adaptive two-sided incremental construction algorithm which incorporates the structural flexibility of both the ligand and synthetic receptor. We follow an adaptive strategy, in which one molecule is expanded by attaching its next fragment in all possible torsion angles, whereas the other (partially assembled) molecule serves as a rigid binding partner. Then the roles of the molecules are exchanged. Geometric filters are used to discard partial conformations that cannot realize a targeted interaction pattern derived in a graph-based precomputation phase. The process is repeated until the entire complex is built up. Our algorithm produces promising results on a test data set comprising 10 complexes of synthetic receptors and ligands. The method generated near-native solutions compared to crystal structures in all but one case. It is able to generate solutions within a couple of minutes and has the potential of being used as a virtual screening tool for searching for suitable guest molecules for a given synthetic receptor in large databases of guests and vice versa.
Improving the Quality of Protein Structure Models by Selecting from Alignment Alternatives
BMC Bioinformatics. 2006 | Pubmed ID: 16872519
In the area of protein structure prediction, recently a lot of effort has gone into the development of Model Quality Assessment Programs (MQAPs). MQAPs distinguish high quality protein structure models from inferior models. Here, we propose a new method to use an MQAP to improve the quality of models. With a given target sequence and template structure, we construct a number of different alignments and corresponding models for the sequence. The quality of these models is scored with an MQAP and used to choose the most promising model. An SVM-based selection scheme is suggested for combining MQAP partial potentials, in order to optimize for improved model selection.
DynaPred: a Structure and Sequence Based Method for the Prediction of MHC Class I Binding Peptide Sequences and Conformations
Bioinformatics (Oxford, England). Jul, 2006 | Pubmed ID: 16873467
MOTIVATION: The binding of endogenous antigenic peptides to MHC class I molecules is an important step during the immunologic response of a host against a pathogen. Thus, various sequence- and structure-based prediction methods have been proposed for this purpose. The sequence-based methods are computationally efficient, but are hampered by the need of sufficient experimental data and do not provide a structural interpretation of their results. The structural methods are data-independent, but are quite time-consuming and thus not suited for screening of whole genomes. Here, we present a new method, which performs sequence-based prediction by incorporating information obtained from molecular modeling. This allows us to perform large databases screening and to provide structural information of the results. RESULTS: We developed a SVM-trained, quantitative matrix-based method for the prediction of MHC class I binding peptides, in which the features of the scoring matrix are energy terms retrieved from molecular dynamics simulations. At the same time we used the equilibrated structures obtained from the same simulations in a simple and efficient docking procedure. Our method consists of two steps: First, we predict potential binders from sequence data alone and second, we construct protein-peptide complexes for the predicted binders. So far, we tested our approach on the HLA-A0201 allele. We constructed two prediction models, using local, position-dependent (DynaPred(POS)) and global, position-independent (DynaPred) features. The former model outperformed the two sequence-based methods used in our evaluation; the latter shows a much higher generalizability towards other alleles than the position-dependent models. The constructed peptide structures can be refined within seconds to structures with an average backbone RMSD of 1.53 A from the corresponding experimental structures.
Bioinformatics-assisted Anti-HIV Therapy
Nature Reviews. Microbiology. Oct, 2006 | Pubmed ID: 16980939
Highly active antiretroviral therapy (HAART), in which three or more drugs are given in combination, has substantially improved the clinical management of HIV-1 infection. Still, the emergence of drug-resistant variants eventually leads to therapy failure in most patients. In such a scenario, the high diversity of resistance-associated mutational patterns complicates the choice of an optimal follow-up regimen. To support physicians in this task, a range of bioinformatics tools for predicting drug resistance or response to combination therapy from the viral genotype have been developed. With several free and commercial software services available, computational advice is rapidly gaining acceptance as an important element of rational decision-making in the treatment of HIV infection.
Stable Coreceptor Usage of HIV in Patients with Ongoing Treatment Failure on HAART
Journal of Clinical Virology : the Official Publication of the Pan American Society for Clinical Virology. Dec, 2006 | Pubmed ID: 17005445
Disease progression in HIV infection has been associated with switch of viral coreceptor usage from CCR5 to CXCR4.
Model Selection for Mixtures of Mutagenetic Trees
Statistical Applications in Genetics and Molecular Biology. 2006 | Pubmed ID: 17049028
The evolution of drug resistance in HIV is characterized by the accumulation of resistance-associated mutations in the HIV genome. Mutagenetic trees, a family of restricted Bayesian tree models, have been applied to infer the order and rate of occurrence of these mutations. Understanding and predicting this evolutionary process is an important prerequisite for the rational design of antiretroviral therapies. In practice, mixtures models of K mutagenetic trees provide more flexibility and are often more appropriate for modelling observed mutational patterns. Here, we investigate the model selection problem for K-mutagenetic trees mixture models. We evaluate several classical model selection criteria including cross-validation, the Bayesian Information Criterion (BIC), and the Akaike Information Criterion. We also use the empirical Bayes method by constructing a prior probability distribution for the parameters of a mutagenetic trees mixture model and deriving the posterior probability of the model. In addition to the model dimension, we consider the redundancy of a mixture model, which is measured by comparing the topologies of trees within a mixture model. Based on the redundancy, we propose a new model selection criterion, which is a modification of the BIC. Experimental results on simulated and on real HIV data show that the classical criteria tend to select models with far too many tree components. Only cross-validation and the modified BIC recover the correct number of trees and the tree topologies most of the time. At the same optimal performance, the runtime of the new BIC modification is about one order of magnitude lower. Thus, this model selection criterion can also be used for large data sets for which cross-validation becomes computationally infeasible.
Mutations in the MutSalpha Interaction Interface of MLH1 Can Abolish DNA Mismatch Repair
Nucleic Acids Research. 2006 | Pubmed ID: 17135187
MutLalpha, a heterodimer of MLH1 and PMS2, plays a central role in human DNA mismatch repair. It interacts ATP-dependently with the mismatch detector MutSalpha and assembles and controls further repair enzymes. We tested if the interaction of MutLalpha with DNA-bound MutSalpha is impaired by cancer-associated mutations in MLH1, and identified one mutation (Ala128Pro) which abolished interaction as well as mismatch repair activity. Further examinations revealed three more residues whose mutation interfered with interaction. Homology modelling of MLH1 showed that all residues clustered in a small accessible surface patch, suggesting that the major interaction interface of MutLalpha for MutSalpha is located on the edge of an extensive beta-sheet that backs the MLH1 ATP binding pocket. Bioinformatic analysis confirmed that this patch corresponds to a conserved potential protein-protein interaction interface which is present in both human MLH1 and its E.coli homologue MutL. MutL could be site-specifically crosslinked to MutS from this patch, confirming that the bacterial MutL-MutS complex is established by the corresponding interface in MutL. This is the first study that identifies the conserved major MutLalpha-MutSalpha interaction interface in MLH1 and demonstrates that mutations in this interface can affect interaction and mismatch repair, and thereby can also contribute to cancer development.
A Genome-wide Association Scan of Nonsynonymous SNPs Identifies a Susceptibility Variant for Crohn Disease in ATG16L1
Nature Genetics. Feb, 2007 | Pubmed ID: 17200669
We performed a genome-wide association study of 19,779 nonsynonymous SNPs in 735 individuals with Crohn disease and 368 controls. A total of 7,159 of these SNPs were informative. We followed up on all 72 SNPs with P
Factor Interaction Analysis for Chromosome 8 and DNA Methylation Alterations Highlights Innate Immune Response Suppression and Cytoskeletal Changes in Prostate Cancer
Molecular Cancer. 2007 | Pubmed ID: 17280610
Alterations of chromosome 8 and hypomethylation of LINE-1 retrotransposons are common alterations in advanced prostate carcinoma. In a former study including many metastatic cases, they strongly correlated with each other. To elucidate a possible interaction between the two alterations, we investigated their relationship in less advanced prostate cancers.
GOTax: Investigating Biological Processes and Biochemical Activities Along the Taxonomic Tree
Genome Biology. 2007 | Pubmed ID: 17346342
We describe GOTax, a comparative genomics platform that integrates protein annotation with protein family classification and taxonomy. User-defined sets of proteins, protein families, annotation terms or taxonomic groups can be selected and compared, allowing for the analysis of distribution of biological processes and molecular activities over different taxonomic groups. In particular, a measure of functional similarity is available for comparing proteins and protein families, establishing functional relationships independent of evolution.
The Implications of Alternative Splicing in the ENCODE Protein Complement
Proceedings of the National Academy of Sciences of the United States of America. Mar, 2007 | Pubmed ID: 17372197
Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.
Structural Descriptors of Gp120 V3 Loop for the Prediction of HIV-1 Coreceptor Usage
PLoS Computational Biology. Mar, 2007 | Pubmed ID: 17397254
HIV-1 cell entry commonly uses, in addition to CD4, one of the chemokine receptors CCR5 or CXCR4 as coreceptor. Knowledge of coreceptor usage is critical for monitoring disease progression as well as for supporting therapy with the novel drug class of coreceptor antagonists. Predictive methods for inferring coreceptor usage based on the third hypervariable (V3) loop region of the viral gene coding for the envelope protein gp120 can provide us with these monitoring facilities while avoiding expensive phenotypic tests. All simple heuristics (such as the 11/25 rule) as well as statistical learning methods proposed to date predict coreceptor usage based on sequence features of the V3 loop exclusively. Here, we show, based on a recently resolved structure of gp120 with an untruncated V3 loop, that using structural information on the V3 loop in combination with sequence features of V3 variants improves prediction of coreceptor usage. In particular, we propose a distance-based descriptor of the spatial arrangement of physicochemical properties that increases discriminative performance. For a fixed specificity of 0.95, a sensitivity of 0.77 was achieved, improving further to 0.80 when combined with a sequence-based representation using amino acid indicators. This compares favorably with the sensitivities of 0.62 for the traditional 11/25 rule and 0.73 for a prediction based on sequence information as input to a support vector machine and constitutes a statistically significant improvement. A detailed analysis and interpretation of structural features important for classification shows the relevance of several specific hydrogen-bond donor sites and aliphatic side chains to coreceptor specificity towards CCR5 or CXCR4. Furthermore, an analysis of side chain orientation of the specificity-determining residues suggests a major role of one side of the V3 loop in the selection of the coreceptor. The proposed method constitutes the first approach to an improved prediction of coreceptor usage based on an original integration of structural bioinformatics methods with statistical learning.
Functional Evaluation of Domain-domain Interactions and Human Protein Interaction Networks
Bioinformatics (Oxford, England). Apr, 2007 | Pubmed ID: 17456608
Large amounts of protein and domain interaction data are being produced by experimental high-throughput techniques and computational approaches. To gain insight into the value of the provided data, we used our new similarity measure based on the Gene Ontology (GO) to evaluate the molecular functions and biological processes of interacting proteins or domains. The applied measure particularly addresses the frequent annotation of proteins or domains with multiple GO terms.
Improved Prediction of Response to Antiretroviral Combination Therapy Using the Genetic Barrier to Drug Resistance
Antiviral Therapy. 2007 | Pubmed ID: 17503659
The outcome of antiretroviral combination therapy depends on many factors involving host, virus, and drugs. We investigate prediction of treatment response from the applied drug combination and the genetic constellation of the virus population at baseline. The virus's evolutionary potential for escaping from drug pressure is explored as an additional predictor.
Structural and Functional Comparison of the Non-structural Protein 4B in Flaviviridae
Journal of Molecular Graphics & Modelling. Sep, 2007 | Pubmed ID: 17507273
Flaviviridae are evolutionarily related viruses, comprising the hepatitis C virus (HCV), with the non-structural protein 4B (NS4B) as one of the least characterized proteins. NS4B is located in the endoplasmic reticulum membrane and is assumed to be a multifunctional protein. However, detailed structure information is missing. The hydrophobic nature of NS4B is a major difficulty for many experimental techniques. We applied bioinformatics methods to analyse structural and functional properties of NS4B in different viruses. We distinguish a central non-globular membrane portion with four to five transmembrane regions from an N- and C-terminal part with non-transmembrane helical elements. We demonstrate high similarity in sequence and structure for the C-terminal part within the flaviviridae family. A palmitoylation site contained in the C-terminal part of HCV is equally conserved in GB virus B. Furthermore, we identify and characterize an N-terminal basic leucine zipper (bZIP) motif in HCV, which is suggestive of a functionally important interaction site. In addition, we model the interaction of the bZIP region with the recently identified interaction partner CREB-RP/ATF6beta, a human activating transcription factor involved in ER-stress. In conclusion, the versatile structure, together with functional sites and motifs, possibly enables NS4B to adopt a role as protein hub in the membranous web interaction network of virus and host proteins. Important structural and functional properties of NS4B are predicted with implications for ER-stress response, altered gene expression and replication efficacy.
Expression Pattern Analysis of Transcribed HERV Sequences is Complicated by Ex Vivo Recombination
Retrovirology. 2007 | Pubmed ID: 17550625
The human genome comprises numerous human endogenous retroviruses (HERVs) that formed millions of years ago in ancestral species. A number of loci of the HERV-K(HML-2) family are evolutionarily much younger. A recent study suggested an infectious HERV-K(HML-2) variant in humans and other primates. Isolating such a variant from human individuals would be a significant finding for human biology.
Application of Oncogenetic Trees Mixtures As a Biostatistical Model of the Clonal Cytogenetic Evolution of Meningiomas
International Journal of Cancer. Journal International Du Cancer. Oct, 2007 | Pubmed ID: 17557299
Meningiomas are mostly benign tumors that originate from the coverings of brain and spinal cord. Typically, they reveal a normal karyotype or monosomy for chromosome 22. Rare clinical progression of meningiomas is associated with a nonrandom pattern of secondary losses of other autosomes. Deletion of the short arm of one chromosome 1 appears to be a decisive step for anaplastic growth in meningiomas. We calculated an oncogenetic tree model that estimates the most likely cytogenetic pathways of 661 meningioma patients in terms of accumulation of somatic chromosome changes in tumor cells. The genetic progression score (GPS) estimates the genetic status of a tumor as progression in the corresponding tumor cells along this model. Large GPS values are highly correlated with early recurrence of meningiomas [p < 10(-4)]. This correlation holds even if patients are stratified by WHO grade. We show that tumor location also has an impact on genetic progression. Clinical relevance of the GPS is thus demonstrated with respect to origin, WHO grade and recurrence of the tumor. As a quantitative measure the GPS allows a more precise assessment of the prognosis of meningiomas than categorical cytogenetic markers based on single chromosomal aberrations.
CpG Island Mapping by Epigenome Prediction
PLoS Computational Biology. Jun, 2007 | Pubmed ID: 17559301
CpG islands were originally identified by epigenetic and functional properties, namely, absence of DNA methylation and frequent promoter association. However, this concept was quickly replaced by simple DNA sequence criteria, which allowed for genome-wide annotation of CpG islands in the absence of large-scale epigenetic datasets. Although widely used, the current CpG island criteria incur significant disadvantages: (1) reliance on arbitrary threshold parameters that bear little biological justification, (2) failure to account for widespread heterogeneity among CpG islands, and (3) apparent lack of specificity when applied to the human genome. This study is driven by the idea that a quantitative score of "CpG island strength" that incorporates epigenetic and functional aspects can help resolve these issues. We construct an epigenome prediction pipeline that links the DNA sequence of CpG islands to their epigenetic states, including DNA methylation, histone modifications, and chromatin accessibility. By training support vector machines on epigenetic data for CpG islands on human Chromosomes 21 and 22, we identify informative DNA attributes that correlate with open versus compact chromatin structures. These DNA attributes are used to predict the epigenetic states of all CpG islands genome-wide. Combining predictions for multiple epigenetic features, we estimate the inherent CpG island strength for each CpG island in the human genome, i.e., its inherent tendency to exhibit an open and transcriptionally competent chromatin structure. We extensively validate our results on independent datasets, showing that the CpG island strength predictions are applicable and informative across different tissues and cell types, and we derive improved maps of predicted "bona fide" CpG islands. The mapping of CpG islands by epigenome prediction is conceptually superior to identifying CpG islands by widely used sequence criteria since it links CpG island detection to their characteristic epigenetic and functional states. And it is superior to purely experimental epigenome mapping for CpG island detection since it abstracts from specific properties that are limited to a single cell type or tissue. In addition, using computational epigenetics methods we could identify high correlation between the epigenome and characteristics of the DNA sequence, a finding which emphasizes the need for a better understanding of the mechanistic links between genome and epigenome.
IRECS: a New Algorithm for the Selection of Most Probable Ensembles of Side-chain Conformations in Protein Models
Protein Science : a Publication of the Protein Society. Jul, 2007 | Pubmed ID: 17567749
We introduce a new algorithm, IRECS (Iterative REduction of Conformational Space), for identifying ensembles of most probable side-chain conformations for homology modeling. On the basis of a given rotamer library, IRECS ranks all side-chain rotamers of a protein according to the probability with which each side chain adopts the respective rotamer conformation. This ranking enables the user to select small rotamer sets that are most likely to contain a near-native rotamer for each side chain. IRECS can therefore act as a fast heuristic alternative to the Dead-End-Elimination algorithm (DEE). In contrast to DEE, IRECS allows for the selection of rotamer subsets of arbitrary size, thus being able to define structure ensembles for a protein. We show that the selection of more than one rotamer per side chain is generally meaningful, since the selected rotamers represent the conformational space of flexible side chains. A knowledge-based statistical potential ROTA was constructed for the IRECS algorithm. The potential was optimized to discriminate between side-chain conformations of native and rotameric decoys of protein structures. By restricting the number of rotamers per side chain to one, IRECS can optimize side chains for a single conformation model. The average accuracy of IRECS for the chi1 and chi1+2 dihedral angles amounts to 84.7% and 71.6%, respectively, using a 40 degrees cutoff. When we compared IRECS with SCWRL and SCAP, the performance of IRECS was comparable to that of both methods. IRECS and the ROTA potential are available for download from the URL http://irecs.bioinf.mpi-inf.mpg.de.
Improved Cyclodextrin-based Receptors for Camptothecin by Inverse Virtual Screening
Chemistry (Weinheim an Der Bergstrasse, Germany). 2007 | Pubmed ID: 17610225
We report the computer-aided optimization of a synthetic receptor for a given guest molecule, based on inverse virtual screening of receptor libraries. As an example, a virtual set of beta-cyclodextrin (beta-CD) derivatives was generated as receptor candidates for the anticancer drug camptothecin. We applied the two docking tools AutoDock and GlamDock to generate camptothecin complexes of every candidate receptor. Scoring functions were used to rank all generated complexes. From the 10 % top-ranking candidates nine were selected for experimental validation. They were synthesized by reaction of heptakis-[6-deoxy-6-iodo]-beta-CD with a thiol compound to form the hepta-substituted beta-CDs. The stabilities of the camptothecin complexes obtained from solubility measurements of five of the nine CD derivatives were significantly higher than for any other CD derivative known from literature. The remaining four CD derivatives were insoluble in water. In addition, corresponding mono-substituted CD derivatives were synthesized that also showed improved binding constants. Among them the 9-H-purine derivative was the best, being comparable to the investigated hepta-substituted beta-CDs. Since the measured binding free energies correlated satisfactorily with the calculated scores, the applied scoring functions appeared to be appropriate for the selection of promising candidates for receptor synthesis.
Computational Analysis of Human Protein Interaction Networks
Proteomics. Aug, 2007 | Pubmed ID: 17647236
Large amounts of human protein interaction data have been produced by experiments and prediction methods. However, the experimental coverage of the human interactome is still low in contrast to predicted data. To gain insight into the value of publicly available human protein network data, we compared predicted datasets, high-throughput results from yeast two-hybrid screens, and literature-curated protein-protein interactions. This evaluation is not only important for further methodological improvements, but also for increasing the confidence in functional hypotheses derived from predictions. Therefore, we assessed the quality and the potential bias of the different datasets using functional similarity based on the Gene Ontology, structural iPfam domain-domain interactions, likelihood ratios, and topological network parameters. This analysis revealed major differences between predicted datasets, but some of them also scored at least as high as the experimental ones regarding multiple quality measures. Therefore, since only small pair wise overlap between most datasets is observed, they may be combined to enlarge the available human interactome data. For this purpose, we additionally studied the influence of protein length on data quality and the number of disease proteins covered by each dataset. We could further demonstrate that protein interactions predicted by more than one method achieve an elevated reliability.
Characterization and Structural Analysis of Novel Mutations in Human Immunodeficiency Virus Type 1 Reverse Transcriptase Involved in the Regulation of Resistance to Nonnucleoside Inhibitors
Journal of Virology. Oct, 2007 | Pubmed ID: 17686836
Resistance to antivirals is a complex and dynamic phenomenon that involves more mutations than are currently known. Here, we characterize 10 additional mutations (L74V, K101Q, I135M/T, V179I, H221Y, K223E/Q, and L228H/R) in human immunodeficiency virus type 1 (HIV-1) reverse transcriptase which are involved in the regulation of resistance to nonnucleoside reverse transcriptase inhibitors (NNRTIs). These mutations are strongly associated with NNRTI failure and strongly correlate with the classical NNRTI resistance mutations in a data set of 1,904 HIV-1 B-subtype pol sequences from 758 drug-naïve patients, 592 nucleoside reverse transcriptase inhibitor (NRTI)-treated but NNRTI-naïve patients, and 554 patients treated with both NRTIs and NNRTIs. In particular, L74V and H221Y, positively correlated with Y181C, were associated with an increase in Y181C-mediated resistance to nevirapine, while I135M/T mutations, positively correlated with K103N, were associated with an increase in K103N-mediated resistance to efavirenz. In addition, the presence of the I135T polymorphism in NNRTI-naïve patients significantly correlated with the appearance of K103N in cases of NNRTI failure, suggesting that I135T may represent a crucial determinant of NNRTI resistance evolution. Molecular dynamics simulations show that I135T can contribute to the stabilization of the K103N-induced closure of the NNRTI binding pocket by reducing the distance and increasing the number of hydrogen bonds between 103N and 188Y. H221Y also showed negative correlations with type 2 thymidine analogue mutations (TAM2s); its copresence with the TAM2s was associated with a higher level of zidovudine susceptibility. Our study reinforces the complexity of NNRTI resistance and the significant interplay between NRTI- and NNRTI-selected mutations. Mutations beyond those currently known to confer resistance should be considered for a better prediction of clinical response to reverse transcriptase inhibitors and for the development of more efficient new-generation NNRTIs.
Selective Pressures of HLA Genotypes and Antiviral Therapy on Human Immunodeficiency Virus Type 1 Sequence Mutation at a Population Level
Clinical and Vaccine Immunology : CVI. Oct, 2007 | Pubmed ID: 17715334
The objective of this study was a comprehensive analysis of the immune-driven evolution of viruses of human immunodeficiency virus type 1 (HIV-1) clade B in a large patient cohort treated at a single hospital in Germany and its implications for antiretroviral therapy. We examined the association of the HLA-A, HLA-B, and HLA-DRB1 alleles with the emergence of mutations in the complete protease gene and the first 330 codons of the reverse transcriptase (RT) gene of HIV-1, studying their distribution and persistence and their impact on antiviral drug therapy. The clinical data for 179 HIV-infected patients, the results of HLA genotyping, and virus sequences were analyzed using a variety of statistical approaches. We describe new HLA-associated mutations in both viral protease and RT, several of which are associated with HLA-DRB1. The mutations reported are remarkably persistent within our cohort, developing more slowly in a minority of patients. Interestingly, several HLA-associated mutations occur at the same positions as drug resistance mutations in patient viruses, where the viral sequence was acquired before exposure to these drugs. The influence of HLA on thymidine analogue mutation pathways was not observed. We were able to confirm immune-driven selection pressure by major histocompatibility complex (MHC) class I and II alleles through the identification of HLA-associated mutations. HLA-B alleles were involved in more associations (68%) than either HLA-A (23%) or HLA-DRB1 (9%). As several of the HLA-associated mutations lie at positions associated with drug resistance, our results indicate possible negative effects of HLA genotypes on the development of HIV-1 drug resistance.
Fifteen Years of Env C2V3C3 Evolution in Six Individuals Infected Clonally with Human Immunodeficiency Virus Type 1
Journal of Medical Virology. Nov, 2007 | Pubmed ID: 17854039
The study of the evolution of human immunodeficiency virus type 1 (HIV-1) requires blood samples collected longitudinally and data on the approximate time point of infection. Although these requirements were fulfilled in several previous studies, the infectious sources were either unknown or heterogeneous genetically. In the present study, HIV-1 env C2V3C3 (nt 7029-7315) evolution was examined retrospectively in a cohort of hemophiliacs. Compared to other cohorts, the area of interest here was the infection of six hemophiliacs by the same virus strain, that is, the infecting viruses shared an identical genome. As expected, divergence from the founder sequence as well as interpatient divergence of the predominant virus strains increased significantly over time. Based on the V3 nucleotide sequences, CCR5 usage was predicted exclusively throughout the whole period of infection in all patients. Interestingly, common patterns of viral evolution were detected in the patients of the cohort. Four amino acid substitutions within the V3 loop emerged and persisted subsequently in five (positions 305 and 308 of the HXB2 gp120 reference sequence) and six patients (positions 325 and 328 in HXB2 gp120), respectively. These common changes within the V3 loop are likely to be enforced by HIV-1 specific immune response.
Conformational Analysis of Alternative Protein Structures
Bioinformatics (Oxford, England). Dec, 2007 | Pubmed ID: 17933849
Alternative structural models determined experimentally are available for an increasing number of proteins. Structural and functional studies of these proteins need to take these models into consideration as they can present considerable structural differences. The characterization of the structural differences and similarities between these models is a fundamental task in structural biology requiring appropriate methods.
Identification of PatL1, a Human Homolog to Yeast P Body Component Pat1
Biochimica Et Biophysica Acta. Dec, 2007 | Pubmed ID: 17936923
In yeast, the activators of mRNA decapping, Pat1, Lsm1 and Dhh1, accumulate in processing bodies (P bodies) together with other proteins of the 5'-3'-deadenylation-dependent mRNA decay pathway. The Pat1 protein is of particular interest because it functions in the opposing processes of mRNA translation and mRNA degradation, thus suggesting an important regulatory role. In contrast to other components of this mRNA decay pathway, the human homolog of the yeast Pat1 protein was unknown. Here we describe the identification of two human PAT1 genes and show that one of them, PATL1, codes for an ORF with similar features as the yeast PAT1. As expected for a protein with a fundamental role in translation control, PATL1 mRNA was ubiquitously expressed in all human tissues as were the mRNAs of LSM1 and RCK, the human homologs of yeast LSM1 and DHH1, respectively. Furthermore, fluorescence-tagged PatL1 protein accumulated in distinct foci that correspond to P bodies, as they co-localized with the P body components Lsm1, Rck/p54 and the decapping enzyme Dcp1. In addition, as for its yeast counterpart, PatL1 expression was required for P body formation. Taken together, these data emphasize the conservation of important P body components from yeast to human cells.
Moment Invariants As Shape Recognition Technique for Comparing Protein Binding Sites
Bioinformatics (Oxford, England). Dec, 2007 | Pubmed ID: 17977888
An approach for identifying similarities of protein-protein binding sites is presented. The geometric shape of a binding site is described by computing a feature vector based on moment invariants. In order to search for similarities, feature vectors of binding sites are compared. Similar feature vectors indicate binding sites with similar shapes.
Bioinformatics Prediction of HIV Coreceptor Usage
Nature Biotechnology. Dec, 2007 | Pubmed ID: 18066037
Predicting HIV Coreceptor Usage on the Basis of Genetic and Clinical Covariates
Antiviral Therapy. 2007 | Pubmed ID: 18018768
We compared several statistical learning methods for the prediction of HIV coreceptor use from clonal HIV third hypervariable (V3) loop sequences, and evaluated and improved their effectiveness on clinical samples.
Computational Epigenetics
Bioinformatics (Oxford, England). Jan, 2008 | Pubmed ID: 18024971
Epigenetic research aims to understand heritable gene regulation that is not directly encoded in the DNA sequence. Epigenetic mechanisms such as DNA methylation and histone modifications modulate the packaging of the DNA in the nucleus and thereby influence gene expression. Patterns of epigenetic information are faithfully propagated over multiple cell divisions, which makes epigenetic regulation a key mechanism for cellular differentiation and cell fate decisions. In addition, incomplete erasure of epigenetic information can lead to complex patterns of non-Mendelian inheritance. Stochastic and environment-induced epigenetic defects are known to play a major role in cancer and ageing, and they may also contribute to mental disorders and autoimmune diseases. Recent technical advances such as ChIP-on-chip and ChIP-seq have started to convert epigenetic research into a high-throughput endeavor, to which bioinformatics is expected to make significant contributions. Here, we review pioneering computational studies that have contributed to epigenetic research. In addition, we give a brief introduction into epigenetics-targeted at bioinformaticians who are new to the field-and we outline future challenges in computational epigenetics.
Integrative Visual Analysis of the Effects of Alternative Splicing on Protein Domain Interaction Networks
Journal of Integrative Bioinformatics. 2008 | Pubmed ID: 20134061
Proteins and their interactions are essential for the functioning of all organisms and for understanding biological processes. Alternative splicing is an important molecular mechanism for increasing the protein diversity in eukaryotic cells. Splicing events that alter the protein structure and the domain composition can be responsible for the regulation of protein interactions and the functional diversity of different tissues. Discovering the occurrence of splicing events and studying protein isoforms have become feasible using Affymetrix Exon Arrays. Therefore, we have developed the versatile Cytoscape plugin DomainGraph that allows for the visual analysis of protein domain interaction networks and their integration with exon expression data. Protein domains affected by alternative splicing are highlighted and splicing patterns can be compared.
Molecular Basis of Telaprevir Resistance Due to V36 and T54 Mutations in the NS3-4A Protease of the Hepatitis C Virus
Genome Biology. 2008 | Pubmed ID: 18215275
The inhibitor telaprevir (VX-950) of the hepatitis C virus (HCV) protease NS3-4A has been tested in a recent phase 1b clinical trial in patients infected with HCV genotype 1. This trial revealed residue mutations that confer varying degrees of drug resistance. In particular, two protease positions with the mutations V36A/G/L/M and T54A/S were associated with low to medium levels of drug resistance during viral breakthrough, together with only an intermediate reduction of viral replication fitness. These mutations are located in the protein interior and far away from the ligand binding pocket.
Stability Analysis of Mixtures of Mutagenetic Trees
BMC Bioinformatics. 2008 | Pubmed ID: 18366778
Mixture models of mutagenetic trees are evolutionary models that capture several pathways of ordered accumulation of genetic events observed in different subsets of patients. They were used to model HIV progression by accumulation of resistance mutations in the viral genome under drug pressure and cancer progression by accumulation of chromosomal aberrations in tumor cells. From the mixture models a genetic progression score (GPS) can be derived that estimates the genetic status of single patients according to the corresponding progression along the tree models. GPS values were shown to have predictive power for estimating drug resistance in HIV or the survival time in cancer. Still, the reliability of the exact values of such complex markers derived from graphical models can be questioned.
Alignment of Non-covalent Interactions at Protein-protein Interfaces
PloS One. 2008 | Pubmed ID: 18382693
The study and comparison of protein-protein interfaces is essential for the understanding of the mechanisms of interaction between proteins. While there are many methods for comparing protein structures and protein binding sites, so far no methods have been reported for comparing the geometry of non-covalent interactions occurring at protein-protein interfaces.
Inter-individual Variation of DNA Methylation and Its Implications for Large-scale Epigenome Mapping
Nucleic Acids Research. Jun, 2008 | Pubmed ID: 18413340
Genomic DNA methylation profiles exhibit substantial variation within the human population, with important functional implications for gene regulation. So far little is known about the characteristics and determinants of DNA methylation variation among healthy individuals. We performed bioinformatic analysis of high-resolution methylation profiles from multiple individuals, uncovering complex patterns of inter-individual variation that are strongly correlated with the local DNA sequence. CpG-rich regions exhibit low and relatively similar levels of DNA methylation in all individuals, but the sequential order of the (few) methylated among the (many) unmethylated CpGs differs randomly across individuals. In contrast, CpG-poor regions exhibit substantially elevated levels of inter-individual variation, but also significant conservation of specific DNA methylation patterns between unrelated individuals. This observation has important implications for experimental analysis of DNA methylation, e.g. in the context of epigenome projects. First, DNA methylation mapping at single-CpG resolution is expected to uncover informative DNA methylation patterns for the CpG-poor bulk of the human genome. Second, for CpG-rich regions it will be sufficient to measure average methylation levels rather than assaying every single CpG. We substantiate these conclusions by an in silico benchmarking study of six widely used methods for DNA methylation mapping. Based on our findings, we propose a cost-optimized two-track strategy for mammalian methylome projects.
Selecting Anti-HIV Therapies Based on a Variety of Genomic and Clinical Factors
Bioinformatics (Oxford, England). Jul, 2008 | Pubmed ID: 18586740
Optimizing HIV therapies is crucial since the virus rapidly develops mutations to evade drug pressure. Recent studies have shown that genotypic information might not be sufficient for the design of therapies and that other clinical and demographical factors may play a role in therapy failure. This study is designed to assess the improvement in prediction achieved when such information is taken into account. We use these factors to generate a prediction engine using a variety of machine learning methods and to determine which clinical conditions are most misleading in terms of predicting the outcome of a therapy.
Local Function Conservation in Sequence and Structure Space
PLoS Computational Biology. 2008 | Pubmed ID: 18604264
We assess the variability of protein function in protein sequence and structure space. Various regions in this space exhibit considerable difference in the local conservation of molecular function. We analyze and capture local function conservation by means of logistic curves. Based on this analysis, we propose a method for predicting molecular function of a query protein with known structure but unknown function. The prediction method is rigorously assessed and compared with a previously published function predictor. Furthermore, we apply the method to 500 functionally unannotated PDB structures and discuss selected examples. The proposed approach provides a simple yet consistent statistical model for the complex relations between protein sequence, structure, and function. The GOdot method is available online (http://godot.bioinf.mpi-inf.mpg.de).
An Integrative Approach for Predicting Interactions of Protein Regions
Bioinformatics (Oxford, England). Aug, 2008 | Pubmed ID: 18689837
Protein-protein interactions are commonly mediated by the physical contact of distinct protein regions. Computational identification of interacting protein regions aids in the detailed understanding of protein networks and supports the prediction of novel protein interactions and the reconstruction of protein complexes.
Integrating Expression Data with Domain Interaction Networks
Bioinformatics (Oxford, England). Nov, 2008 | Pubmed ID: 18710874
Recent studies have revealed that alternative splicing plays an important role in the observed protein and interaction diversity. Special microarrays allow for measuring gene expression at the exon level and thus for studying alternative transcripts and their corresponding protein domain architecture. We have developed the Cytoscape plugin DomainGraph that enables the visualization and detailed study of domain-domain interactions forming protein interaction networks. In addition, the integration of exon expression data supports the analysis of alternative splicing events and the characterization of their effects on the protein and domain interaction network. Different expression patterns between human tissues or cells can be identified by comparing the generated domain graphs. AVAILABILITY: The plugin DomainGraph and the online documentation are available at http://domaingraph.bioinf.mpi-inf.mpg.de.
Rtreemix: an R Package for Estimating Evolutionary Pathways and Genetic Progression Scores
Bioinformatics (Oxford, England). Oct, 2008 | Pubmed ID: 18718947
In genetics, many evolutionary pathways can be modeled by the ordered accumulation of permanent changes. Mixture models of mutagenetic trees have been used to describe disease progression in cancer and in HIV. In cancer, progression is modeled by the accumulation of chromosomal gains and losses in tumor cells; in HIV, the accumulation of drug resistance-associated mutations in the viral genome is known to be associated with disease progression. From such evolutionary models, genetic progression scores can be derived that assign measures for the disease state to single patients. Rtreemix is an R package for estimating mixture models of evolutionary pathways from observed cross-sectional data and for estimating associated genetic progression scores. The package also provides extended functionality for estimating confidence intervals for estimated model parameters and for evaluating the stability of the estimated evolutionary mixture models.
Comparison of Classifier Fusion Methods for Predicting Response to Anti HIV-1 Therapy
PloS One. 2008 | Pubmed ID: 18941628
Analysis of the viral genome for drug resistance mutations is state-of-the-art for guiding treatment selection for human immunodeficiency virus type 1 (HIV-1)-infected patients. These mutations alter the structure of viral target proteins and reduce or in the worst case completely inhibit the effect of antiretroviral compounds while maintaining the ability for effective replication. Modern anti-HIV-1 regimens comprise multiple drugs in order to prevent or at least delay the development of resistance mutations. However, commonly used HIV-1 genotype interpretation systems provide only classifications for single drugs. The EuResist initiative has collected data from about 18,500 patients to train three classifiers for predicting response to combination antiretroviral therapy, given the viral genotype and further information. In this work we compare different classifier fusion methods for combining the individual classifiers.
Computing Topological Parameters of Biological Networks
Bioinformatics (Oxford, England). Jan, 2008 | Pubmed ID: 18006545
Rapidly increasing amounts of molecular interaction data are being produced by various experimental techniques and computational prediction methods. In order to gain insight into the organization and structure of the resultant large complex networks formed by the interacting molecules, we have developed the versatile Cytoscape plugin NetworkAnalyzer. It computes and displays a comprehensive set of topological parameters, which includes the number of nodes, edges, and connected components, the network diameter, radius, density, centralization, heterogeneity, and clustering coefficient, the characteristic path length, and the distributions of node degrees, neighborhood connectivities, average clustering coefficients, and shortest path lengths. NetworkAnalyzer can be applied to both directed and undirected networks and also contains extra functionality to construct the intersection or union of two networks. It is an interactive and highly customizable application that requires no expert knowledge in graph theory from the user. AVAILABILITY: NetworkAnalyzer can be downloaded via the Cytoscape web site: http://www.cytoscape.org
Clinical Relevance of the 2'-5'-oligoadenylate Synthetase/RNase L System for Treatment Response in Chronic Hepatitis C
Journal of Hepatology. Jan, 2009 | Pubmed ID: 19022516
Interferon-alpha induces 2'-5'-oligoadenylate synthetase which activates RNase L. Viral RNA is cleaved by RNase L at UU/UA dinucleotides. The clinical relevance of RNase L cleavage for response to an interferon-alpha-based therapy in chronic hepatitis C is unknown.
Dynamics of NRTI Resistance Mutations During Therapy Interruption
AIDS Research and Human Retroviruses. Jan, 2009 | Pubmed ID: 19182921
Abstract To date, very little information is available regarding the evolution of drug resistance mutations during treatment interruption (TI). Using a survival analysis approach, we investigated the dynamics of mutations associated with resistance to nucleoside analogue reverse transcriptase inhibitors (NRTIs) during TI. Analyzing 132 patients having at least two consecutive genotypes, one at last NRTI-containing regimen failure, and at least one during TI, we observed that the NRTI resistance mutations disappear at different rates during TI and are lost independently of each other in the majority of patients. The disappearance of the K65R and M184I/V mutations occurred in the majority of patients, was rapid, and was associated with the reemergence of wild-type virus, thus showing their negative impact on viral fitness. Overall, it seems that the loss of NRTI drug resistance mutations during TI is not an ordered process, and in the majority of patients occurs without specific interaction among mutations.
EpiGRAPH: User-friendly Software for Statistical Analysis and Prediction of (epi)genomic Data
Genome Biology. 2009 | Pubmed ID: 19208250
The EpiGRAPH web service http://epigraph.mpi-inf.mpg.de/ enables biologists to uncover hidden associations in vertebrate genome and epigenome datasets. Users can upload sets of genomic regions and EpiGRAPH will test multiple attributes (including DNA sequence, chromatin structure, epigenetic modifications and evolutionary conservation) for enrichment or depletion among these regions. Furthermore, EpiGRAPH learns to predictively identify similar genomic regions. This paper demonstrates EpiGRAPH's practical utility in a case study on monoallelic gene expression and describes its novel approach to reproducible bioinformatic analysis.
Predicting the Response to Combination Antiretroviral Therapy: Retrospective Validation of Geno2pheno-THEO on a Large Clinical Database
The Journal of Infectious Diseases. Apr, 2009 | Pubmed ID: 19239365
Expert-based genotypic interpretation systems are standard methods for guiding treatment selection for patients infected with human immunodeficiency virus type 1. We previously introduced the software pipeline geno2pheno-THEO (g2p-THEO), which on the basis of viral sequence predicts the response to treatment with a combination of antiretroviral compounds by applying methods from statistical learning and the estimated potential of the virus to escape from drug pressure.
DASMI: Exchanging, Annotating and Assessing Molecular Interaction Data
Bioinformatics (Oxford, England). May, 2009 | Pubmed ID: 19420069
Ever increasing amounts of biological interaction data are being accumulated worldwide, but they are currently not readily accessible to the biologist at a single site. New techniques are required for retrieving, sharing and presenting data spread over the Internet.
Advantages of Predicted Phenotypes and Statistical Learning Models in Inferring Virological Response to Antiretroviral Therapy from HIV Genotype
Antiviral Therapy. 2009 | Pubmed ID: 19430102
Inferring response to antiretroviral therapy from the viral genotype alone is challenging. The utility of an intermediate step of predicting in vitro drug susceptibility is currently controversial. Here, we provide a retrospective comparison of approaches using either genotype or predicted phenotypes alone, or in combination.
MethMarker: User-friendly Design and Optimization of Gene-specific DNA Methylation Assays
Genome Biology. 2009 | Pubmed ID: 19804638
DNA methylation is a key mechanism of epigenetic regulation that is frequently altered in diseases such as cancer. To confirm the biological or clinical relevance of such changes, gene-specific DNA methylation changes need to be validated in multiple samples. We have developed the MethMarker http://methmarker.mpi-inf.mpg.de/ software to help design robust and cost-efficient DNA methylation assays for six widely used methods. Furthermore, MethMarker implements a bioinformatic workflow for transforming disease-specific differentially methylated genomic regions into robust clinical biomarkers.
V3 Loop Sequence Space Analysis Suggests Different Evolutionary Patterns of CCR5- and CXCR4-tropic HIV
PloS One. 2009 | Pubmed ID: 19816596
The V3 loop of human immunodeficiency virus type 1 (HIV-1) is critical for coreceptor binding and is the main determinant of which of the cellular coreceptors, CCR5 or CXCR4, the virus uses for cell entry. The aim of this study is to provide a large-scale data driven analysis of HIV-1 coreceptor usage with respect to the V3 loop evolution and to characterize CCR5- and CXCR4-tropic viral phenotypes previously studied in small- and medium-scale settings. We use different sequence similarity measures, phylogenetic and clustering methods in order to analyze the distribution in sequence space of roughly 1000 V3 loop sequences and their tropism phenotypes. This analysis affords a means of characterizing those sequences that are misclassified by several sequence-based coreceptor prediction methods, as well as predicting the coreceptor using the location of the sequence in sequence space and of relating this location to the CD4(+) T-cell count of the patient. We support previous findings that the usage of CCR5 is correlated with relatively high sequence conservation whereas CXCR4-tropic viruses spread over larger regions in sequence space. The incorrectly predicted sequences are mostly located in regions in which their phenotype represents the minority or in close vicinity of regions dominated by the opposite phenotype. Nevertheless, the location of the sequence in sequence space can be used to improve the accuracy of the prediction of the coreceptor usage. Sequences from patients with high CD4(+) T-cell counts are relatively highly conserved as compared to those of immunosuppressed patients. Our study thus supports hypotheses of an association of immune system depletion with an increase in V3 loop sequence variability and with the escape of the viral sequence to distant parts of the sequence space.
Docking and Scoring with Alternative Side-chain Conformations
Proteins. Feb, 2009 | Pubmed ID: 18704939
We describe a scoring and modeling procedure for docking ligands into protein models that have either modeled or flexible side-chain conformations. Our methodical contribution comprises a procedure for generating new potentials of mean force for the ROTA scoring function which we have introduced previously for optimizing side-chain conformations with the tool IRECS. The ROTA potentials are specially trained to tolerate small-scale positional errors of atoms that are characteristic of (i) side-chain conformations that are modeled using a sparse rotamer library and (ii) ligand conformations that are generated using a docking program. We generated both rigid and flexible protein models with our side-chain prediction tool IRECS and docked ligands to proteins using the scoring function ROTA and the docking programs FlexX (for rigid side chains) and FlexE (for flexible side chains). We validated our approach on the forty screening targets of the DUD database. The validation shows that the ROTA potentials are especially well suited for estimating the binding affinity of ligands to proteins. The results also show that our procedure can compensate for the performance decrease in screening that occurs when using protein models with side chains modeled with a rotamer library instead of using X-ray structures. The average runtime per ligand of our method is 168 seconds on an Opteron V20z, which is fast enough to allow virtual screening of compound libraries for drug candidates.
Only Slight Impact of Predicted Replicative Capacity for Therapy Response Prediction
PloS One. 2010 | Pubmed ID: 20140263
Replication capacity (RC) of specific HIV isolates is occasionally blamed for unexpected treatment responses. However, the role of viral RC in response to antiretroviral therapy is not yet fully understood.
Predicting MHC Class I Epitopes in Large Datasets
BMC Bioinformatics. 2010 | Pubmed ID: 20163709
Experimental screening of large sets of peptides with respect to their MHC binding capabilities is still very demanding due to the large number of possible peptide sequences and the extensive polymorphism of the MHC proteins. Therefore, there is significant interest in the development of computational methods for predicting the binding capability of peptides to MHC molecules, as a first step towards selecting peptides for actual screening.
Web-based Analysis of (Epi-) Genome Data Using EpiGRAPH and Galaxy
Methods in Molecular Biology (Clifton, N.J.). 2010 | Pubmed ID: 20238087
Modern life sciences are becoming increasingly data intensive, posing a significant challenge for most researchers and shifting the bottleneck of scientific discovery from data generation to data analysis. As a result, progress in genome research is increasingly impeded by bioinformatic hurdles. A new generation of powerful and easy-to-use genome analysis tools has been developed to address this issue, enabling biologists to perform complex bioinformatic analyses online - without having to learn a programming language or downloading and manually processing large datasets. In this tutorial paper, we describe the use of EpiGRAPH (http://epigraph.mpi-inf.mpg.de/) and Galaxy (http://galaxyproject.org/) for genome and epigenome analysis, and we illustrate how these two web services work together to identify epigenetic modifications that are characteristics of highly polymorphic (SNP-rich) promoters. This paper is supplemented with video tutorials (http://tinyurl.com/yc5xkqq), which provide a step-by-step guide through each example analysis.
Short Communication: Selection of Thymidine Analogue Resistance Mutational Patterns in Children Infected from a Common HIV Type 1 Subtype G Source
AIDS Research and Human Retroviruses. Mar, 2010 | Pubmed ID: 20334563
In HIV-1, thymidine analogue mutations (TAMs) cluster in one of two groups (215Y, 41L, 210W, or 215F, 219E/Q), representing two independent mutational patterns (T215Y and T215F cluster, respectively). The mechanisms by which these pathways are selected are not fully understood. To investigate possible factors driving the selection of the TAMs, we analyzed the TAM patterns with regard to the respective treatment, viral load, and HLA in 18 children all infected from a common source of HIV-1 clade G virus and initially treated with zidovudine. The HIV reverse transcriptase sequences of 14/18 children carried at least one TAM. At first sampling date, the T215Y-linked pattern was observed in five cases and the T215F cluster was seen in nine. During the follow-up period, three patients changed their patterns. Children treated with identical NRTI combinations at the first sampling date developed different pathways. Under AZT/d4T therapies, an association was found between the HLA B*13 (in combination with HLA DRB1*0701) and the mutation T215Y. The mutation T215Y reverted in three out of four patients who discontinued AZT/d4T treatment. We speculate that in the context of these subtype G viruses, the development of the T215Y mutation may be strongly disfavored whereas the presence of HLA B*13 may counteract this effect and permit its development.
Permutation Importance: a Corrected Feature Importance Measure
Bioinformatics (Oxford, England). May, 2010 | Pubmed ID: 20385727
In life sciences, interpretability of machine learning models is as important as their prediction accuracy. Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support vector machines and RandomForest (RF) models. Recently, it has been observed that RF models are biased in such a way that categorical variables with a large number of categories are preferred.
Dimerization of the Hepatitis C Virus Nonstructural Protein 4B Depends on the Integrity of an Aminoterminal Basic Leucine Zipper
Protein Science : a Publication of the Protein Society. Jul, 2010 | Pubmed ID: 20506268
The hepatitis C virus (HCV) nonstructural (NS) protein 4B is known for protein-protein interactions with virus and host cell factors. Only little is known about the corresponding protein binding sites and underlying molecular mechanisms. Recently, we have predicted a putative basic leucine zipper (bZIP) motif within the aminoterminal part of NS4B. The aim of this study was to investigate the importance of this NS4B bZIP motif for specific protein-protein interactions. We applied in silico approaches for 3D-structure modeling of NS4B-homodimerization via the bZIP motif and identified crucial amino acid positions by multiple sequence analysis. The selected sites were used for site-directed mutagenesis within the NS4B bZIP motif and subsequent co-immunoprecipitation of wild-type and mutant NS4B molecules. Respective interaction energies were calculated for wild-type and mutant structural models. NS4B-homodimerization with a gradual alleviation of dimer interaction from wild-type towards the mutant-dimers was observed. The putative bZIP motif was confirmed by a co-immunoprecipitation assay and western blot analysis. NS4B-NS4B interaction depends on the integrity of the bZIP hydrophobic core and can be abolished due to changes of crucial residues within NS4B. In conclusion, our data indicate NS4B-homodimerization and that this interaction is facilitated by the aminoterminal part containing a bZIP motif.
AltAnalyze and DomainGraph: Analyzing and Visualizing Exon Expression Data
Nucleic Acids Research. Jul, 2010 | Pubmed ID: 20513647
Alternative splicing is an important mechanism for increasing protein diversity. However, its functional effects are largely unknown. Here, we present our new software workflow composed of the open-source application AltAnalyze and the Cytoscape plugin DomainGraph. Both programs provide an intuitive and comprehensive end-to-end solution for the analysis and visualization of alternative splicing data from Affymetrix Exon and Gene Arrays at the level of proteins, domains, microRNA binding sites, molecular interactions and pathways. Our software tools include easy-to-use graphical user interfaces, rigorous statistical methods (FIRMA, MiDAS and DABG filtering) and do not require prior knowledge of exon array analysis or programming. They provide new methods for automatic interpretation and visualization of the effects of alternative exon inclusion on protein domain composition and microRNA binding sites. These data can be visualized together with affected pathways and gene or protein interaction networks, allowing a straightforward identification of potential biological effects due to alternative splicing at different levels of granularity. Our programs are available at http://www.altanalyze.org and http://www.domaingraph.de. These websites also include extensive documentation, tutorials and sample data.
Positive Selection of HIV Host Factors and the Evolution of Lentivirus Genes
BMC Evolutionary Biology. 2010 | Pubmed ID: 20565842
Positive selection of host proteins that interact with pathogens can indicate factors relevant for infection and potentially be a measure of pathogen driven evolution.
Dealing with Sparse Data in Predicting Outcomes of HIV Combination Therapies
Bioinformatics (Oxford, England). Sep, 2010 | Pubmed ID: 20624779
As there exists no cure or vaccine for the infection with human immunodeficiency virus (HIV), the standard approach to treating HIV patients is to repeatedly administer different combinations of several antiretroviral drugs. Because of the large number of possible drug combinations, manually finding a successful regimen becomes practically impossible. This presents a major challenge for HIV treatment. The application of machine learning methods for predicting virological responses to potential therapies is a possible approach to solving this problem. However, due to evolving trends in treating HIV patients the available clinical datasets have a highly unbalanced representation, which might negatively affect the usefulness of derived statistical models.
HLA Class I Allele Associations with HCV Genetic Variants in Patients with Chronic HCV Genotypes 1a or 1b Infection
Journal of Hepatology. Dec, 2010 | Pubmed ID: 20800922
The adaptive immune response against hepatitis C virus (HCV) is significantly shaped by the host's composition of HLA-alleles with the consequence that the HLA phenotype is a critical determinant of viral evolution during adaptive immune pressure. In the present study, we aimed to identify associations of HLA class I alleles with HCV subtypes 1a and 1b genetic variants.
Improving Disease Gene Prioritization Using the Semantic Similarity of Gene Ontology Terms
Bioinformatics (Oxford, England). Sep, 2010 | Pubmed ID: 20823322
Many hereditary human diseases are polygenic, resulting from sequence alterations in multiple genes. Genomic linkage and association studies are commonly performed for identifying disease-related genes. Such studies often yield lists of up to several hundred candidate genes, which have to be prioritized and validated further. Recent studies discovered that genes involved in phenotypically similar diseases are often functionally related on the molecular level.
Improved Prediction of HIV‐1 Coreceptor Usage with Sequence Information from the Second Hypervariable Loop of Gp120
The Journal of Infectious Diseases. Nov, 2010 | Pubmed ID: 20874088
Human immunodeficiency virus type 1 (HIV‐1) uses the CD4 receptor and a coreceptor to gain cell entry. Coreceptor usage is mainly determined by the V3 loop of gp120. Therefore, coreceptor usage is currently inferred from the genotype on the basis of V3 alone. However, several mutations outside V3 have been repeatedly reported to influence coreceptor usage. In this study, the impact of the V2 loop on coreceptor usage prediction was analyzed.
Reconstructing the Ancestral Germ Line Methylation State of Young Repeats
Molecular Biology and Evolution. Jun, 2011 | Pubmed ID: 21212152
One of the key objectives of comparative genomics is the characterization of the forces that shape genomes over the course of evolution. In the last decades, evidence has been accumulated that for vertebrate genomes also epigenetic modifications have to be considered in this context. Especially, the elevated mutation frequency of 5-methylcytosine (5mC) is assumed to facilitate the depletion of CpG dinucleotides in species that exhibit global DNA methylation. For instance, the underrepresentation of CpG dinucleotides in many mammalian genomes is attributed to this effect, which is only neutralized in so-called CpG islands (CGIs) that are preferentially unmethylated and thus partially protected from rapid CpG decay. For primate-specific CpG-rich transposable elements from the ALU family, it is unclear whether their elevated CpG frequency is caused by their small age or by the absence of DNA methylation. In consequence, these elements are often misclassified in CGI annotations. We present a method for the estimation of germ line methylation from pairwise ancestral-descendant alignments. The approach is validated in a simulation study and tested on DNA repeats from the AluSx family. We conclude that a predicted unmethylated state in the germ line is highly correlated with epigenetic activity of the respective genomic region. Thus, CpG-rich repeats can be facilitated as in silico probes for the epigenetic potential of their genomic neighborhood.
Recruitment and Activation of a Lipid Kinase by Hepatitis C Virus NS5A is Essential for Integrity of the Membranous Replication Compartment
Cell Host & Microbe. Jan, 2011 | Pubmed ID: 21238945
Hepatitis C virus (HCV) is a major causative agent of chronic liver disease in humans. To gain insight into host factor requirements for HCV replication, we performed a siRNA screen of the human kinome and identified 13 different kinases, including phosphatidylinositol-4 kinase III alpha (PI4KIIIα), as being required for HCV replication. Consistent with elevated levels of the PI4KIIIα product phosphatidylinositol-4-phosphate (PI4P) detected in HCV-infected cultured hepatocytes and liver tissue from chronic hepatitis C patients, the enzymatic activity of PI4KIIIα was critical for HCV replication. Viral nonstructural protein 5A (NS5A) was found to interact with PI4KIIIα and stimulate its kinase activity. The absence of PI4KIIIα activity induced a dramatic change in the ultrastructural morphology of the membranous HCV replication complex. Our analysis suggests that the direct activation of a lipid kinase by HCV NS5A contributes critically to the integrity of the membranous viral replication complex.
Learning from Past Treatments and Their Outcome Improves Prediction of in Vivo Response to Anti-HIV Therapy
Statistical Applications in Genetics and Molecular Biology. 2011 | Pubmed ID: 21291416
Infections with the human immunodeficiency virus type 1 (HIV-1) are treated with combinations of drugs. Unfortunately, HIV responds to the treatment by developing resistance mutations. Consequently, the genome of the viral target proteins is sequenced and inspected for resistance mutations as part of routine diagnostic procedures for ensuring an effective treatment. For predicting response to a combination therapy, currently available computer-based methods rely on the genotype of the virus and the composition of the regimen as input. However, no available tool takes full advantage of the knowledge about the order of and the response to previously prescribed regimens. The resulting high-dimensional feature space makes existing methods difficult to apply in a straightforward fashion. The machine learning system proposed in this work, sequence boosting, is tailored to exploiting such high-dimensional information, i.e. the extraction of longitudinal features, by utilizing the recent advancements in data mining and boosting. When applied to predicting the latest treatment outcome for 3,759 treatment-experienced patients from the EuResist integrated database, sequence boosting achieved superior performance compared to SVMs with RBF kernels. Moreover, sequence boosting allows an easy access to the discriminative treatment information. Analysis of feature importance values provided by our model confirmed known facts regarding HIV treatment. For instance, application of potent and recently licensed drugs was beneficial for patients, and, conversely, the patient group that was subject to NRTI mono-therapies in the past had poor treatment perspectives today. Furthermore, our model revealed novel biological insights. More precisely, the combination of previously used drugs with their in vivo response is more informative than the information of previously used drugs alone. Using this information improves the performance of systems for predicting therapy outcome.
HIV Prevalence and Route of Transmission in Turkish Immigrants Living in North-Rhine Westphalia, Germany
Medical Microbiology and Immunology. Nov, 2011 | Pubmed ID: 21461764
The high number of Turkish immigrants in the German state North-Rhine Westphalia (NRW) compelled us to look for HIV-infected patients with Turkish nationality. In the AREVIR database, we found 127 (107 men, 20 women) Turkish HIV patients living in NRW. In order to investigate transmission clusters and their correlation to gender, nationality and self-reported transmission mode, a phylogenetic analysis including pol gene sequences was performed. Subtype distribution and the number of HIV drug resistance mutations in the Turkish patient group were found to be similar to the proportion in the non-Turkish patients. Great differences were observed in self-reported mode of transmission in the heterosexual Turkish male subgroup. Neighbour-joining tree of pol gene sequences gave indication that 59% of these reported heterosexual transmissions cluster with those of men having sex with men in the database. This is the first study analysing HIV type distribution, drug resistance mutations and transmission mode in a Turkish immigrant population.
Mutations in Gp41 Are Correlated with Coreceptor Tropism but Do Not Improve Prediction Methods Substantially
Antiviral Therapy. 2011 | Pubmed ID: 21555814
The main determinants of HIV-1 coreceptor usage are located in the V3-loop of gp120, although mutations in V2 and gp41 are also known. Incorporation of V2 is known to improve prediction algorithms; however, this has not been confirmed for gp41 mutations.
BiQ Analyzer HT: Locus-specific Analysis of DNA Methylation by High-throughput Bisulfite Sequencing
Nucleic Acids Research. Jul, 2011 | Pubmed ID: 21565797
Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data.
Genotypic Tropism Testing by Massively Parallel Sequencing: Qualitative and Quantitative Analysis
BMC Medical Informatics and Decision Making. 2011 | Pubmed ID: 21569501
Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4) variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS) detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage.
Prevalence and Characteristics of Hepatitis B and C Virus Infections in Treatment-naïve HIV-infected Patients
Medical Microbiology and Immunology. Feb, 2011 | Pubmed ID: 20853118
In HIV-infected treatment-naïve patients, we analyzed risk factors for either chronic hepatitis B (HBV) infection, occult HBV infection (OHBV) or a positive hepatitis C (HCV) serostatus. A total of 918 patients of the RESINA-cohort in Germany were included in this study. Before initiating antiretroviral therapy, clinical parameters were collected and blood samples were analyzed for antibodies against HIV, HBV and HCV, HBs antigen and viral nucleic acids for HIV and HBV. Present or past HBV infection (i.e. HBsAg and/or anti-HBc) was found in 43.4% of patients. HBsAg was detected in 4.5% (41/918) and HBV DNA in 6.1% (34/554), resulting in OHBV infection in 2.9% (16/554) of patients. OHBV infection could not be ruled out by the presence of anti-HBs (50.1%) or the absence of all HBV seromarkers (25%). A HCV-positive serostatus was associated with the IVDU transmission route, non-African ethnicity, elevated liver parameters (ASL or GGT) and low HIV viral load. Replicative HBV infection and HCV-positive serostatus both correlated with HIV resistance mutations (P = 0.001 and P = 0.028). HBV and HCV infection are frequent co-infections in HIV treatment-naive patients. These co-infections influence viral evolution, clinical parameters and serological markers. Consequently, HIV patients should routinely be tested for HBV and HCV infection before initiating HIV treatment. OHBV infection constituted almost half of all HBV infections with detectable HBV DNA. Due to a lack of risk factors indicating OHBV infection, HBV diagnosis should not only include serological markers but also the detection of HBV DNA.
Classification with Correlated Features: Unreliability of Feature Ranking and Solutions
Bioinformatics (Oxford, England). Jul, 2011 | Pubmed ID: 21576180
Classification and feature selection of genomics or transcriptomics data is often hampered by the large number of features as compared with the small number of samples available. Moreover, features represented by probes that either have similar molecular functions (gene expression analysis) or genomic locations (DNA copy number analysis) are highly correlated. Classical model selection methods such as penalized logistic regression or random forest become unstable in the presence of high feature correlations. Sophisticated penalties such as group Lasso or fused Lasso can force the models to assign similar weights to correlated features and thus improve model stability and interpretability. In this article, we show that the measures of feature relevance corresponding to the above-mentioned methods are biased such that the weights of the features belonging to groups of correlated features decrease as the sizes of the groups increase, which leads to incorrect model interpretation and misleading feature ranking.
Preserving Charge and Oxidation State of Au(III) Ions in an Agent-functionalized Nanocrystal Model System
ACS Nano. Aug, 2011 | Pubmed ID: 21736315
Supporting functional molecules on crystal facets is an established technique in nanotechnology. To preserve the original activity of ionic metallorganic agents on a supporting template, conservation of the charge and oxidation state of the active center is indispensable. We present a model system of a metallorganic agent that, indeed, fulfills this design criterion on a technologically relevant metal support with potential impact on Au(III)-porphyrin-functionalized nanoparticles for an improved anticancer-drug delivery. Employing scanning tunneling microscopy and -spectroscopy in combination with photoemission spectroscopy, we clarify at the single-molecule level the underlying mechanisms of this exceptional adsorption mode. It is based on the balance between a high-energy oxidation state and an electrostatic screening-response of the surface (image charge). Modeling with first principles methods reveals submolecular details of the metal-ligand bonding interaction and completes the study by providing an illustrative electrostatic model relevant for ionic metalorganic agent molecules, in general.
HIV-1 Mutational Pathways Under Multidrug Therapy
AIDS Research and Therapy. 2011 | Pubmed ID: 21794106
ABSTRACT:
Peptidomimetic Escape Mechanisms Arise Via Genetic Diversity in the Ligand-Binding Site of the Hepatitis C Virus NS3/4A Serine Protease
Gastroenterology. Dec, 2011 | Pubmed ID: 22155364
BACKGROUND & AIMS: It is a challenge to develop direct-acting antiviral agents that target the nonstructural protein 3/4A protease of hepatitis C virus because resistant variants develop. Ketoamide compounds, designed to mimic the natural protease substrate, have been developed as inhibitors. However, clinical trials have revealed rapid selection of resistant mutants, most of which are considered to be pre-existing variants. METHODS: We identified residues near the ketoamide-binding site in x-ray structures of the genotype 1a protease, co-crystallized with boceprevir or a telaprevir-like ligand, and then identified variants at these positions in 219 genotype-1 sequences from a public database. We used side-chain modeling to assess the potential effects of these variants on the interaction between ketoamide and the protease, and compared these results with the phenotypic effects on ketoamide resistance, RNA replication capacity, and infectious virus yields in a cell culture model of infection. RESULTS: Thirteen natural binding-site variants with potential for ketoamide resistance were identified at 10 residues in the protease, near the ketoamide binding site. Rotamer analysis of amino acid side-chain conformations indicated that 2 variants (R155K and D168G) could affect binding of telaprevir more than boceprevir. Measurements of antiviral susceptibility in cell-culture studies were consistent with this observation. Four variants (ie, Q41H, I132V, R155K, and D168G) caused low-to-moderate levels of ketoamide resistance; 3 of these were highly fit (Q41H, I132V, and R155K). CONCLUSIONS: Using a comprehensive sequence and structure-based analysis, we showed how natural variation in the hepatitis C virus protease nonstructural protein 3/4A sequences might affect susceptibility to first-generation direct-acting antiviral agents. These findings increase our understanding of the molecular basis of ketoamide resistance among naturally existing viral variants.
Genomic Distribution and Inter-sample Variation of Non-CpG Methylation Across Human Cell Types
PLoS Genetics. Dec, 2011 | Pubmed ID: 22174693
DNA methylation plays an important role in development and disease. The primary sites of DNA methylation in vertebrates are cytosines in the CpG dinucleotide context, which account for roughly three quarters of the total DNA methylation content in human and mouse cells. While the genomic distribution, inter-individual stability, and functional role of CpG methylation are reasonably well understood, little is known about DNA methylation targeting CpA, CpT, and CpC (non-CpG) dinucleotides. Here we report a comprehensive analysis of non-CpG methylation in 76 genome-scale DNA methylation maps across pluripotent and differentiated human cell types. We confirm non-CpG methylation to be predominantly present in pluripotent cell types and observe a decrease upon differentiation and near complete absence in various somatic cell types. Although no function has been assigned to it in pluripotency, our data highlight that non-CpG methylation patterns reappear upon iPS cell reprogramming. Intriguingly, the patterns are highly variable and show little conservation between different pluripotent cell lines. We find a strong correlation of non-CpG methylation and DNMT3 expression levels while showing statistical independence of non-CpG methylation from pluripotency associated gene expression. In line with these findings, we show that knockdown of DNMTA and DNMT3B in hESCs results in a global reduction of non-CpG methylation. Finally, non-CpG methylation appears to be spatially correlated with CpG methylation. In summary these results contribute further to our understanding of cytosine methylation patterns in human cells using a large representative sample set.
Endogenous or Exogenous Spreading of HIV-1 in Nordrhein-Westfalen, Germany, Investigated by Phylodynamic Analysis of the RESINA Study Cohort
Medical Microbiology and Immunology. Jan, 2012 | Pubmed ID: 22262052
HIV's genetic instability means that sequence similarity can illuminate the underlying transmission network. Previous application of such methods to samples from the United Kingdom has suggested that as many as 86% of UK infections arose outside of the country, a conclusion contrary to usual patterns of disease spread. We investigated transmission networks in the Resina cohort, a 2,747 member sample from Nordrhein-Westfalen, Germany, sequenced at therapy start. Transmission networks were determined by thresholding the pairwise genetic distance in the pol gene at 96.8% identity. At first blush the results concurred with the UK studies. Closer examination revealed four large and growing transmission networks that encompassed all major transmission groups. One of these formed a supercluster containing 71% of the sex with men (MSM) subjects when the network was thresholded at levels roughly equivalent to those used in the UK studies, though methodological differences suggest that this threshold may be too generous in the current data. Examination of the endo- versus exogenesis hypothesis by testing whether infections that were exogenous to Cologne or to Dusseldorf were endogenous to the greater region supported endogenous spread in MSM subjects and exogenous spread in the endemic transmission group. In intravenous drug using group subjects, it depended on viral strain, with subtype B sequences appearing to have origin exogenous to the Resina data, while non-B sequences (primarily subtype A) were almost completely endogenous to their local community. These results suggest that, at least in Germany, the question of endogenous versus exogenous linkages depends on subject group.
Bioinformatical Assistance of Selecting Anti-HIV Therapies: Where Do We Stand?
Intervirology. 2012 | Pubmed ID: 22286878
In this opinion statement, we give a critical synopsis of the state-of-the-art of bioinformatic HIV resistance analysis and point out what we consider to be challenges and perspectives.
Geno2pheno[454]: A Web Server for the Prediction of HIV-1 Coreceptor Usage from Next-Generation Sequencing Data
Intervirology. 2012 | Pubmed ID: 22286879
Inferring HIV-1 coreceptor usage from a genotype is becoming more and more important for the appropriate treatment of long-term patients. While results are already encouraging where standard bulk-nucleic acid sequencing methods are used, they are limited with respect to the detection of minor variants. In contrast, next-generation sequencing methods (ultradeep sequencing, pyrosequencing) are capable of sequencing virus quasispecies at very low quantities. However, as well as being very expensive, these methods generate vast amounts of data such that sequence analysis has to be automated by computer assistance. Here, we describe the geno2pheno[454] system which handles all processing and prediction steps involved in the prediction of coreceptor usage from massively parallel sequencing data. The system is split into a JAVA preprocessor which is run locally on the client side and a Web server which generates the prediction results. Predictions are based on the same prediction method as used in the geno2pheno[coreceptor] tool.
Predicting Response to Antiretroviral Treatment by Machine Learning: The EuResist Project
Intervirology. 2012 | Pubmed ID: 22286881
For a long time, the clinical management of antiretroviral drug resistance was based on sequence analysis of the HIV genome followed by estimating drug susceptibility from the mutational pattern that was detected. The large number of anti-HIV drugs and HIV drug resistance mutations has prompted the development of computer-aided genotype interpretation systems, typically comprising rules handcrafted by experts via careful examination of in vitro and in vivo resistance data. More recently, machine learning approaches have been applied to establish data-driven engines able to indicate the most effective treatments for any patient and virus combination. Systems of this kind, currently including the Resistance Response Database Initiative and the EuResist engine, must learn from the large data sets of patient histories and can provide an objective and accurate estimate of the virological response to different antiretroviral regimens. The EuResist engine was developed by a European consortium of HIV and bioinformatics experts and compares favorably with the most commonly used genotype interpretation systems and HIV drug resistance experts. Next-generation treatment response prediction engines may valuably assist the HIV specialist in the challenging task of establishing effective regimens for patients harboring drug-resistant virus strains. The extensive collection and accurate processing of increasingly large patient data sets are eagerly awaited to further train and translate these systems from prototype engines into real-life treatment decision support tools.
Risk Factors Associated with Older Age in Treatment-Naive HIV-Positive Patients
Intervirology. 2012 | Pubmed ID: 22286885
Background: Older HIV patients are defined as aged 50 years and older. This group is a growing population in developed countries. In order to improve care for older HIV patients, we intended to gain insight into the specific features of transmission, epidemiology, immunology and antiretroviral treatment (ART) of this population. Patients and Methods: All patients from the RESINA cohort were analyzed, comprising 2,085 individuals at the beginning of 2010. RESINA is an ongoing study analyzing epidemiological and immunological data, resistance patterns and therapeutic data in treatment-naive HIV-positive patients from North Rhine-Westphalia, Germany. Patients are included in the RESINA cohort at the time of the intended start of ART. For statistical evaluation, we used χ(2) and Mann-Whitney U tests. Results: A total of 14.6% of patients in our cohort was above 50 years. Men were significantly more prevalent among older patients (86.8 vs. 78.6%; p < 0.001). The proportion of older patients was significantly higher in the heterosexual group (30%) as compared to bisexual (20%), homosexual (13%) and intravenous drug user (4%) modes of transmission (p < 0.001). When comparing ethnic groups, older patients were most often found among Caucasians (17 vs. 4% in other groups, p < 0.001). No significant difference for transmitted drug resistance patterns was found. The proportion of older patients with CDC stage A was significantly lower than with stages B or C (10 vs. 21 vs. 21%, p < 0.001). In older patients, changes of ART regimes were more frequent (p = 0.015) and the median CD4 cell count at the start of treatment was lower (176 vs. 200/μl, p = 0.017). After 72 weeks of ART, the relative increase of CD4 cells was significantly lower in older as compared to younger patients (200 vs. 231/μl, p = 0.017). Conclusions: Our results provide insight into the epidemiology of HIV in the elderly. In our cohort, the typical older patient was a Caucasian male who had acquired HIV through heterosexual contact. The prognosis in older patients is worsened as a result of several unfavorable circumstances, such as delayed start of ART, more frequent treatment changes and diminished immune reconstitution. As a consequence, better strategies for more frequent HIV testing in patients at risk for HIV are needed, and ART should be offered to older patients at earlier time points and higher CD4 cell counts.
Epidemiology of Transmitted Drug Resistance in Chronically HIV-Infected Patients in Germany: The RESINA Study 2001-2009
Intervirology. 2012 | Pubmed ID: 22286886
Objectives: Transmitted HIV drug resistance may impair treatment efficacy of combination antiretroviral therapy (ART). This study describes the epidemiology of transmitted resistance in chronically infected patients. Methods: In a prospective multicenter trial in Nordrhein-Westfalen, Germany, transmitted drug resistance was determined by genotypic resistance testing in patients on initiation of first-line ART. Results: From 2001 to 2009, 2,078 patients were enrolled in the study. 79.9% were male, 81.2% were Caucasians, and a homosexual transmission mode was found in 51.3%. Of these patients, 41.5% were at the stage of AIDS, median CD4 cell count was 230/μl, and median viral load was 64.466 copies/ml. Transmitted drug resistance mutations were seen in 9.2% (95% CI, 7.9-10.4). Resistance in the nucleoside reverse transcriptase inhibitor class was found in 5.8% (4.8-6.8), in the nonnucleoside reverse transcriptase inhibitor class in 2.8% (2.1-3.6), and in the protease inhibitor class in 2.7% (2.0-3.4). After a continuous increase to a level above 10% in the years 2006 and 2007, a decline of drug resistance prevalence followed in 2008 and 2009. Conclusions: Transmitted HIV drug resistance was found in around 10% of chronically infected patients in Germany who started their ART. We showed a moderate decline of the prevalence of mutant virus strains in recent years. Further surveillance of this phenomenon is mandatory.
On the Applicability of Elastic Network Normal Modes in Small-Molecule Docking
Journal of Chemical Information and Modeling. Feb, 2012 | Pubmed ID: 22320151
Incorporating backbone flexibility into protein-ligand docking is still a challenging problem. In protein-protein docking, normal mode analysis (NMA) has become increasingly popular as it can be used to describe the collective motions of a biological system, but the question whether NMA can also be useful in predicting the conformational changes observed upon small-molecule binding has only been addressed in a few case studies. Here, we describe a large-scale study on the applicability of NMA for protein-ligand docking using 433 apo/holo pairs of the Astex data sets. Based on sets of the first normal modes from the apo structure, we first generated for each paired holo structure a set of conformations that optimally reproduce its Cα trace w.r.t. the underlying normal mode subspace. Using AutoDock, GOLD, and FlexX we then docked the original ligands into these conformations to assess how the docking performance depends on the number of modes used to reproduce the holo structure. The results of our study indicate that, even for such a best-case scenario, the use of normal mode analysis in small-molecule docking is restricted, and that a general rule on how many modes to use does not seem to exist or at least is not easy to find.
A DNA Methylation Fingerprint of 1628 Human Samples
Genome Research. Feb, 2012 | Pubmed ID: 21613409
Most of the studies characterizing DNA methylation patterns have been restricted to particular genomic loci in a limited number of human samples and pathological conditions. Herein, we present a compromise between an extremely comprehensive study of a human sample population with an intermediate level of resolution of CpGs at the genomic level. We obtained a DNA methylation fingerprint of 1628 human samples in which we interrogated 1505 CpG sites. The DNA methylation patterns revealed show this epigenetic mark to be critical in tissue-type definition and stemness, particularly around transcription start sites that are not within a CpG island. For disease, the generated DNA methylation fingerprints show that, during tumorigenesis, human cancer cells underwent a progressive gain of promoter CpG-island hypermethylation and a loss of CpG methylation in non-CpG-island promoters. Although transformed cells are those in which DNA methylation disruption is more obvious, we observed that other common human diseases, such as neurological and autoimmune disorders, had their own distinct DNA methylation profiles. Most importantly, we provide proof of principle that the DNA methylation fingerprints obtained might be useful for translational purposes by showing that we are able to identify the tumor type origin of cancers of unknown primary origin (CUPs). Thus, the DNA methylation patterns identified across the largest spectrum of samples, tissues, and diseases reported to date constitute a baseline for developing higher-resolution DNA methylation maps and provide important clues concerning the contribution of CpG methylation to tissue identity and its changes in the most prevalent human diseases.
APOBEC3G/F As One Possible Driving Force for Co-receptor Switch of the Human Immunodeficiency Virus-1
Medical Microbiology and Immunology. Feb, 2012 | Pubmed ID: 21573951
Human immunodeficiency virus-1 tropism highly correlates with the amino acid (aa) composition of the third hypervariable region (V3) of gp120. A shift towards more positively charged aa is seen when binding to CXCR4 compared with CCR5 (X4 vs. R5 strains), especially positions 11 and 25 (11/25-rule) predicting X4 viruses in the presence of positively charged residues. At nucleotide levels, negatively or uncharged aa, e.g., aspartic and glutamic acid and glycine, which are encoded by the triplets GAN (guanine-adenosine-any nucleotide) or GGN are found more often in R5 strains. Positively charged aa such as arginine and lysine encoded by AAR or AGR (CGN) (R means A or G) are seen more frequently in X4 strains suggesting our hypothesis that a switch from R5 to X4 strains occurs via a G-to-A mutation. 1527 V3 sequences from three independent data sets of X4 and R5 strains were analysed with respect to their triplet composition. A higher number of G-containing triplets was found in R5 viruses, whereas X4 strains displayed a higher content of A-comprising triplets. These findings also support our hypothesis that G-to-A mutations are leading to the co-receptor switch from R5 to X4 strains. Causative agents for G-to-A mutations are the deaminases APOBEC3F and APOBEC3G. We therefore hypothesize that these proteins are one driving force facilitating the appearance of X4 variants. G-to-A mutations can lead to a switch from negatively to positively charged aa and a respective alteration of the net charge of gp120 resulting in a change of co-receptor usage.
