Recently discovered broadly neutralizing antibodies have revitalized hopes of developing a universal vaccine against HIV-1. Mainly responsible for new infections are variants only using CCR5 for cell entry, whereas CXCR4-using variants can become dominant in later infection stages.
Recent data suggest important biological roles for oxidative modifications of methylated cytosines, specifically hydroxymethylation, formylation and carboxylation. Several assays are now available for profiling these DNA modifications genome-wide as well as in targeted, locus-specific settings. Here we present BiQ Analyzer HiMod, a user-friendly software tool for sequence alignment, quality control and initial analysis of locus-specific DNA modification data. The software supports four different assay types, and it leads the user from raw sequence reads to DNA modification statistics and publication-quality plots. BiQ Analyzer HiMod combines well-established graphical user interface of its predecessor tool, BiQ Analyzer HT, with new and extended analysis modes. BiQ Analyzer HiMod also includes updates of the analysis workspace, an intuitive interface, a custom vector graphics engine and support of additional input and output data formats. The tool is freely available as a stand-alone installation package from http://biq-analyzer-himod.bioinf.mpi-inf.mpg.de/.
The therapy of HIV patients is characterized by both the high genomic diversity of the virus population harbored by the patient and a substantial volume of therapy options. The virus population is unique for each patient and time point. The large number of therapy options makes it difficult to select an optimal or near optimal therapy, especially with therapy-experienced patients. In the past decade, computer-based support for therapy selection, which assesses the level of viral resistance against drugs has become a mainstay for HIV patients. We discuss the properties of available systems and the perspectives of the field.
RnBeads is a software tool for large-scale analysis and interpretation of DNA methylation data, providing a user-friendly analysis workflow that yields detailed hypertext reports (http://rnbeads.mpi-inf.mpg.de/). Supported assays include whole-genome bisulfite sequencing, reduced representation bisulfite sequencing, Infinium microarrays and any other protocol that produces high-resolution DNA methylation data. Notable applications of RnBeads include the analysis of epigenome-wide association studies and epigenetic biomarker discovery in cancer cohorts.
Rules-based HIV-1 drug-resistance interpretation (DRI) systems disregard many amino-acid positions of the drug's target protein. The aims of this study are (1) the development of a drug-resistance interpretation system that is based on HIV-1 sequences from clinical practice rather than hard-to-get phenotypes, and (2) the assessment of the benefit of taking all available amino-acid positions into account for DRI.
In excess of 12% of human cancer incidents have a viral cofactor. Epidemiological studies of idiopathic human cancers indicate that additional tumor viruses remain to be discovered. Recent advances in sequencing technology have enabled systematic screenings of human tumor transcriptomes for viral transcripts. However, technical problems such as low abundances of viral transcripts in large volumes of sequencing data, viral sequence divergence, and homology between viral and human factors significantly confound identification of tumor viruses. We have developed a novel computational approach for detecting viral transcripts in human cancers that takes the aforementioned confounding factors into account and is applicable to a wide variety of viruses and tumors. We apply the approach to conducting the first systematic search for viruses in neuroblastoma, the most common cancer in infancy. The diverse clinical progression of this disease as well as related epidemiological and virological findings are highly suggestive of a pathogenic cofactor. However, a viral etiology of neuroblastoma is currently contested. We mapped 14 transcriptomes of neuroblastoma as well as positive and negative controls to the human and all known viral genomes in order to detect both known and unknown viruses. Analysis of controls, comparisons with related methods, and statistical estimates demonstrate the high sensitivity of our approach. Detailed investigation of putative viral transcripts within neuroblastoma samples did not provide evidence for the existence of any known human viruses. Likewise, de-novo assembly and analysis of chimeric transcripts did not result in expression signatures associated with novel human pathogens. While confounding factors such as sample dilution or viral clearance in progressed tumors may mask viral cofactors in the data, in principle, this is rendered less likely by the high sensitivity of our approach and the number of biological replicates analyzed. Therefore, our results suggest that frequent viral cofactors of metastatic neuroblastoma are unlikely.
CCR5 and CXCR4 are the two membrane-standing proteins that, along with CD4, facilitate entry of HIV particles into the host cell. HIV strains differ in their ability to utilize either CCR5 or CXCR4, and this specificity, also known as viral tropism, is largely determined by the sequence of the V3 loop of the viral envelope protein gp120.
Recurrent DNA breakpoints in cancer genomes indicate the presence of critical functional elements for tumor development. Identifying them can help determine new therapeutic targets. High-dimensional DNA microarray experiments like arrayCGH afford the identification of DNA copy number breakpoints with high precision, offering a solid basis for computational estimation of recurrent breakpoint locations.
The preeminent mode of action of the broad-spectrum antiviral nucleoside ribavirin in the therapy of chronic hepatitis C is currently unresolved. Particularly under contest are possible mutagenic effects of ribavirin that may lead to viral extinction by lethal mutagenesis of the hepatitis C virus (HCV) genome. We applied ultradeep sequencing to determine ribavirin-induced sequence changes in the HCV coding region (nucleotides [nt] 330 to 9351) of patients treated with 6-week ribavirin monotherapy (n = 6) in comparison to placebo (n = 6). Baseline HCV RNA levels maximally declined on average by -0.8 or -0.1 log10 IU/ml in ribavirin- versus placebo-treated patients. No general increase in rates of nucleotide substitutions in ribavirin-treated patients was observed. However, more HCV genome positions with high G-to-A and C-to-U transition rates were detected between baseline and treatment week 6 in ribavirin-treated patients in comparison to placebo-treated patients (rate of 0.0041 transitions per base pair versus rate of 0.0022 transitions per base pair; P = 0.049). Similarly, the sensitive detection of low-frequency minority variants by statistical filtering indicated significantly more positions with G-to-A and C-to-U transitions in ribavirin-treated patients than in placebo-treated patients (rate of 0.0331 transitions versus rate of 0.0186 transitions per G/C-containing position at baseline; P = 0.018). In contrast, non-ribavirin-associated A-to-G and U-to-C transitions were not enriched in the ribavirin group (P = 0.152). We conclude that ribavirin exerts a mutagenic effect on the virus in patients with chronic hepatitis C by facilitating G-to-A and C-to-U nucleotide transitions.
The efficacy of highly active antiretroviral therapy (HAART) in the treatment of HIV infection is influenced by factors such as potency of applied drugs, adherence of the patient, and resistance-associated mutations. Up to now, there is insufficient data on the impact of the therapeutic setting.
The relationship of HIV tropism with disease progression and the recent development of CCR5-blocking drugs underscore the importance of monitoring virus coreceptor usage. As an alternative to costly phenotypic assays, computational methods aim at predicting virus tropism based on the sequence and structure of the V3 loop of the virus gp120 protein. Here we present a numerical descriptor of the V3 loop encoding its physicochemical and structural properties. The descriptor allows for structure-based prediction of HIV tropism and identification of properties of the V3 loop that are crucial for coreceptor usage. Use of the proposed descriptor for prediction results in a statistically significant improvement over the prediction based solely on V3 sequence with 3 percentage points improvement in AUC and 7 percentage points in sensitivity at the specificity of the 11/25 rule (95%). We additionally assessed the predictive power of the new method on clinically derived bulk sequence data and obtained a statistically significant improvement in AUC of 3 percentage points over sequence-based prediction. Furthermore, we demonstrated the capacity of our method to predict therapy outcome by applying it to 53 samples from patients undergoing Maraviroc therapy. The analysis of structural features of the loop informative of tropism indicates the importance of two loop regions and their physicochemical properties. The regions are located on opposite strands of the loop stem and the respective features are predominantly charge-, hydrophobicity- and structure-related. These regions are in close proximity in the bound conformation of the loop potentially forming a site determinant for the coreceptor binding. The method is available via server under http://structure.bioinf.mpi-inf.mpg.de/.
Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip.
Predicting the transcription start sites (TSSs) of microRNAs (miRNAs) is important for understanding how these small RNA molecules, known to regulate translation and stability of protein-coding genes, are regulated themselves. Previous approaches are primarily based on genetic features, trained on TSSs of protein-coding genes, and have low prediction accuracy. Recently, a support vector machine based technique has been proposed for miRNA TSS prediction that uses known miRNA TSS for training the classifier along with a set of existing and novel CpG island based features. Current progress in epigenetics research has provided genomewide and tissue-specific reports about various phenotypic traits. We hypothesize that incorporating epigenetic characteristics into statistical models may lead to better prediction of primary transcripts of human miRNAs. In this paper, we have tested our hypothesis on brain-specific miRNAs by using epigenetic as well as genetic features to predict the primary transcripts. For this, we have used a sophisticated feature selection technique and a robust classification model. Our prediction model achieves an accuracy of more than 80% and establishes the potential of epigenetic analysis for in silico prediction of TSSs.
Maraviroc (MVC) is the first licensed antiretroviral drug from the class of coreceptor antagonists. It binds to the host coreceptor CCR5, which is used by the majority of HIV strains in order to infect the human immune cells (Fig. 1). Other HIV isolates use a different coreceptor, the CXCR4. Which receptor is used, is determined in the virus by the Env protein (Fig. 2). Depending on the coreceptor used, the viruses are classified as R5 or X4, respectively. MVC binds to the CCR5 receptor inhibiting the entry of R5 viruses into the target cell. During the course of disease, X4 viruses may emerge and outgrow the R5 viruses. Determination of coreceptor usage (also called tropism) is therefore mandatory prior to administration of MVC, as demanded by EMA and FDA. The studies for MVC efficiency MOTIVATE, MERIT and 1029 have been performed with the Trofile assay from Monogram, San Francisco, U.S.A. This is a high quality assay based on sophisticated recombinant tests. The acceptance for this test for daily routine is rather low outside of the U.S.A., since the European physicians rather tend to work with decentralized expert laboratories, which also provide concomitant resistance testing. These laboratories have undergone several quality assurance evaluations, the last one being presented in 2011. For several years now, we have performed tropism determinations based on sequence analysis from the HIV env-V3 gene region (V3). This region carries enough information to perform a reliable prediction. The genotypic determination of coreceptor usage presents advantages such as: shorter turnover time (equivalent to resistance testing), lower costs, possibility to adapt the results to the patients needs and possibility of analysing clinical samples with very low or even undetectable viral load (VL), particularly since the number of samples analysed with VL < 1000 copies/?l roughly increased in the last years (Fig. 3). The main steps for tropism testing (Fig. 4) demonstrated in this video: Collection of a blood sample Isolation of the HIV RNA from the plasma and/or HIV proviral DNA from blood mononuclear cells Amplification of the env region Amplification of the V3 region Sequence reaction of the V3 amplicon Purification of the sequencing samples Sequencing the purified samples Sequence editing Sequencing data interpretation and tropism prediction.
Supporting functional molecules on crystal facets is an established technique in nanotechnology. To preserve the original activity of ionic metallorganic agents on a supporting template, conservation of the charge and oxidation state of the active center is indispensable. We present a model system of a metallorganic agent that, indeed, fulfills this design criterion on a technologically relevant metal support with potential impact on Au(III)-porphyrin-functionalized nanoparticles for an improved anticancer-drug delivery. Employing scanning tunneling microscopy and -spectroscopy in combination with photoemission spectroscopy, we clarify at the single-molecule level the underlying mechanisms of this exceptional adsorption mode. It is based on the balance between a high-energy oxidation state and an electrostatic screening-response of the surface (image charge). Modeling with first principles methods reveals submolecular details of the metal-ligand bonding interaction and completes the study by providing an illustrative electrostatic model relevant for ionic metalorganic agent molecules, in general.
It is a challenge to develop direct-acting antiviral agents that target the nonstructural protein 3/4A protease of hepatitis C virus because resistant variants develop. Ketoamide compounds, designed to mimic the natural protease substrate, have been developed as inhibitors. However, clinical trials have revealed rapid selection of resistant mutants, most of which are considered to be pre-existing variants.
Most of the studies characterizing DNA methylation patterns have been restricted to particular genomic loci in a limited number of human samples and pathological conditions. Herein, we present a compromise between an extremely comprehensive study of a human sample population with an intermediate level of resolution of CpGs at the genomic level. We obtained a DNA methylation fingerprint of 1628 human samples in which we interrogated 1505 CpG sites. The DNA methylation patterns revealed show this epigenetic mark to be critical in tissue-type definition and stemness, particularly around transcription start sites that are not within a CpG island. For disease, the generated DNA methylation fingerprints show that, during tumorigenesis, human cancer cells underwent a progressive gain of promoter CpG-island hypermethylation and a loss of CpG methylation in non-CpG-island promoters. Although transformed cells are those in which DNA methylation disruption is more obvious, we observed that other common human diseases, such as neurological and autoimmune disorders, had their own distinct DNA methylation profiles. Most importantly, we provide proof of principle that the DNA methylation fingerprints obtained might be useful for translational purposes by showing that we are able to identify the tumor type origin of cancers of unknown primary origin (CUPs). Thus, the DNA methylation patterns identified across the largest spectrum of samples, tissues, and diseases reported to date constitute a baseline for developing higher-resolution DNA methylation maps and provide important clues concerning the contribution of CpG methylation to tissue identity and its changes in the most prevalent human diseases.
Classification and feature selection of genomics or transcriptomics data is often hampered by the large number of features as compared with the small number of samples available. Moreover, features represented by probes that either have similar molecular functions (gene expression analysis) or genomic locations (DNA copy number analysis) are highly correlated. Classical model selection methods such as penalized logistic regression or random forest become unstable in the presence of high feature correlations. Sophisticated penalties such as group Lasso or fused Lasso can force the models to assign similar weights to correlated features and thus improve model stability and interpretability. In this article, we show that the measures of feature relevance corresponding to the above-mentioned methods are biased such that the weights of the features belonging to groups of correlated features decrease as the sizes of the groups increase, which leads to incorrect model interpretation and misleading feature ranking.
Human immunodeficiency virus-1 tropism highly correlates with the amino acid (aa) composition of the third hypervariable region (V3) of gp120. A shift towards more positively charged aa is seen when binding to CXCR4 compared with CCR5 (X4 vs. R5 strains), especially positions 11 and 25 (11/25-rule) predicting X4 viruses in the presence of positively charged residues. At nucleotide levels, negatively or uncharged aa, e.g., aspartic and glutamic acid and glycine, which are encoded by the triplets GAN (guanine-adenosine-any nucleotide) or GGN are found more often in R5 strains. Positively charged aa such as arginine and lysine encoded by AAR or AGR (CGN) (R means A or G) are seen more frequently in X4 strains suggesting our hypothesis that a switch from R5 to X4 strains occurs via a G-to-A mutation. 1527 V3 sequences from three independent data sets of X4 and R5 strains were analysed with respect to their triplet composition. A higher number of G-containing triplets was found in R5 viruses, whereas X4 strains displayed a higher content of A-comprising triplets. These findings also support our hypothesis that G-to-A mutations are leading to the co-receptor switch from R5 to X4 strains. Causative agents for G-to-A mutations are the deaminases APOBEC3F and APOBEC3G. We therefore hypothesize that these proteins are one driving force facilitating the appearance of X4 variants. G-to-A mutations can lead to a switch from negatively to positively charged aa and a respective alteration of the net charge of gp120 resulting in a change of co-receptor usage.
Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4) variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS) detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage.
Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data.
The main determinants of HIV-1 coreceptor usage are located in the V3-loop of gp120, although mutations in V2 and gp41 are also known. Incorporation of V2 is known to improve prediction algorithms; however, this has not been confirmed for gp41 mutations.
DNA methylation plays an important role in development and disease. The primary sites of DNA methylation in vertebrates are cytosines in the CpG dinucleotide context, which account for roughly three quarters of the total DNA methylation content in human and mouse cells. While the genomic distribution, inter-individual stability, and functional role of CpG methylation are reasonably well understood, little is known about DNA methylation targeting CpA, CpT, and CpC (non-CpG) dinucleotides. Here we report a comprehensive analysis of non-CpG methylation in 76 genome-scale DNA methylation maps across pluripotent and differentiated human cell types. We confirm non-CpG methylation to be predominantly present in pluripotent cell types and observe a decrease upon differentiation and near complete absence in various somatic cell types. Although no function has been assigned to it in pluripotency, our data highlight that non-CpG methylation patterns reappear upon iPS cell reprogramming. Intriguingly, the patterns are highly variable and show little conservation between different pluripotent cell lines. We find a strong correlation of non-CpG methylation and DNMT3 expression levels while showing statistical independence of non-CpG methylation from pluripotency associated gene expression. In line with these findings, we show that knockdown of DNMTA and DNMT3B in hESCs results in a global reduction of non-CpG methylation. Finally, non-CpG methylation appears to be spatially correlated with CpG methylation. In summary these results contribute further to our understanding of cytosine methylation patterns in human cells using a large representative sample set.
The high number of Turkish immigrants in the German state North-Rhine Westphalia (NRW) compelled us to look for HIV-infected patients with Turkish nationality. In the AREVIR database, we found 127 (107 men, 20 women) Turkish HIV patients living in NRW. In order to investigate transmission clusters and their correlation to gender, nationality and self-reported transmission mode, a phylogenetic analysis including pol gene sequences was performed. Subtype distribution and the number of HIV drug resistance mutations in the Turkish patient group were found to be similar to the proportion in the non-Turkish patients. Great differences were observed in self-reported mode of transmission in the heterosexual Turkish male subgroup. Neighbour-joining tree of pol gene sequences gave indication that 59% of these reported heterosexual transmissions cluster with those of men having sex with men in the database. This is the first study analysing HIV type distribution, drug resistance mutations and transmission mode in a Turkish immigrant population.
Genotype-derived drug resistance profiles are a valuable asset in HIV-1 therapy decisions. Therapy decisions could be further improved, both in terms of predicting length of current therapy success and in preserving followup therapy options, through better knowledge of mutational pathways- here defined as specific locations on the viral genome which, when mutant, alter the risk that additional specific mutations arise. We limit the search to locations in the reverse transcriptase region of the HIV-1 genome which host resistance mutations to nucleoside (NRTI) and non-nucleoside (NNRTI) reverse transcriptase inhibitors (as listed in the 2008 International AIDS Society report), or which were mutant at therapy start in 5% or more of the therapies studied.
Hepatitis C virus (HCV) is a major causative agent of chronic liver disease in humans. To gain insight into host factor requirements for HCV replication, we performed a siRNA screen of the human kinome and identified 13 different kinases, including phosphatidylinositol-4 kinase III alpha (PI4KIII?), as being required for HCV replication. Consistent with elevated levels of the PI4KIII? product phosphatidylinositol-4-phosphate (PI4P) detected in HCV-infected cultured hepatocytes and liver tissue from chronic hepatitis C patients, the enzymatic activity of PI4KIII? was critical for HCV replication. Viral nonstructural protein 5A (NS5A) was found to interact with PI4KIII? and stimulate its kinase activity. The absence of PI4KIII? activity induced a dramatic change in the ultrastructural morphology of the membranous HCV replication complex. Our analysis suggests that the direct activation of a lipid kinase by HCV NS5A contributes critically to the integrity of the membranous viral replication complex.
Infections with the human immunodeficiency virus type 1 (HIV-1) are treated with combinations of drugs. Unfortunately, HIV responds to the treatment by developing resistance mutations. Consequently, the genome of the viral target proteins is sequenced and inspected for resistance mutations as part of routine diagnostic procedures for ensuring an effective treatment. For predicting response to a combination therapy, currently available computer-based methods rely on the genotype of the virus and the composition of the regimen as input. However, no available tool takes full advantage of the knowledge about the order of and the response to previously prescribed regimens. The resulting high-dimensional feature space makes existing methods difficult to apply in a straightforward fashion. The machine learning system proposed in this work, sequence boosting, is tailored to exploiting such high-dimensional information, i.e. the extraction of longitudinal features, by utilizing the recent advancements in data mining and boosting. When applied to predicting the latest treatment outcome for 3,759 treatment-experienced patients from the EuResist integrated database, sequence boosting achieved superior performance compared to SVMs with RBF kernels. Moreover, sequence boosting allows an easy access to the discriminative treatment information. Analysis of feature importance values provided by our model confirmed known facts regarding HIV treatment. For instance, application of potent and recently licensed drugs was beneficial for patients, and, conversely, the patient group that was subject to NRTI mono-therapies in the past had poor treatment perspectives today. Furthermore, our model revealed novel biological insights. More precisely, the combination of previously used drugs with their in vivo response is more informative than the information of previously used drugs alone. Using this information improves the performance of systems for predicting therapy outcome.
One of the key objectives of comparative genomics is the characterization of the forces that shape genomes over the course of evolution. In the last decades, evidence has been accumulated that for vertebrate genomes also epigenetic modifications have to be considered in this context. Especially, the elevated mutation frequency of 5-methylcytosine (5mC) is assumed to facilitate the depletion of CpG dinucleotides in species that exhibit global DNA methylation. For instance, the underrepresentation of CpG dinucleotides in many mammalian genomes is attributed to this effect, which is only neutralized in so-called CpG islands (CGIs) that are preferentially unmethylated and thus partially protected from rapid CpG decay. For primate-specific CpG-rich transposable elements from the ALU family, it is unclear whether their elevated CpG frequency is caused by their small age or by the absence of DNA methylation. In consequence, these elements are often misclassified in CGI annotations. We present a method for the estimation of germ line methylation from pairwise ancestral-descendant alignments. The approach is validated in a simulation study and tested on DNA repeats from the AluSx family. We conclude that a predicted unmethylated state in the germ line is highly correlated with epigenetic activity of the respective genomic region. Thus, CpG-rich repeats can be facilitated as in silico probes for the epigenetic potential of their genomic neighborhood.
Human immunodeficiency virus type 1 (HIV?1) uses the CD4 receptor and a coreceptor to gain cell entry. Coreceptor usage is mainly determined by the V3 loop of gp120. Therefore, coreceptor usage is currently inferred from the genotype on the basis of V3 alone. However, several mutations outside V3 have been repeatedly reported to influence coreceptor usage. In this study, the impact of the V2 loop on coreceptor usage prediction was analyzed.
Many hereditary human diseases are polygenic, resulting from sequence alterations in multiple genes. Genomic linkage and association studies are commonly performed for identifying disease-related genes. Such studies often yield lists of up to several hundred candidate genes, which have to be prioritized and validated further. Recent studies discovered that genes involved in phenotypically similar diseases are often functionally related on the molecular level.
As there exists no cure or vaccine for the infection with human immunodeficiency virus (HIV), the standard approach to treating HIV patients is to repeatedly administer different combinations of several antiretroviral drugs. Because of the large number of possible drug combinations, manually finding a successful regimen becomes practically impossible. This presents a major challenge for HIV treatment. The application of machine learning methods for predicting virological responses to potential therapies is a possible approach to solving this problem. However, due to evolving trends in treating HIV patients the available clinical datasets have a highly unbalanced representation, which might negatively affect the usefulness of derived statistical models.
In HIV-infected treatment-naïve patients, we analyzed risk factors for either chronic hepatitis B (HBV) infection, occult HBV infection (OHBV) or a positive hepatitis C (HCV) serostatus. A total of 918 patients of the RESINA-cohort in Germany were included in this study. Before initiating antiretroviral therapy, clinical parameters were collected and blood samples were analyzed for antibodies against HIV, HBV and HCV, HBs antigen and viral nucleic acids for HIV and HBV. Present or past HBV infection (i.e. HBsAg and/or anti-HBc) was found in 43.4% of patients. HBsAg was detected in 4.5% (41/918) and HBV DNA in 6.1% (34/554), resulting in OHBV infection in 2.9% (16/554) of patients. OHBV infection could not be ruled out by the presence of anti-HBs (50.1%) or the absence of all HBV seromarkers (25%). A HCV-positive serostatus was associated with the IVDU transmission route, non-African ethnicity, elevated liver parameters (ASL or GGT) and low HIV viral load. Replicative HBV infection and HCV-positive serostatus both correlated with HIV resistance mutations (P = 0.001 and P = 0.028). HBV and HCV infection are frequent co-infections in HIV treatment-naive patients. These co-infections influence viral evolution, clinical parameters and serological markers. Consequently, HIV patients should routinely be tested for HBV and HCV infection before initiating HIV treatment. OHBV infection constituted almost half of all HBV infections with detectable HBV DNA. Due to a lack of risk factors indicating OHBV infection, HBV diagnosis should not only include serological markers but also the detection of HBV DNA.
Alternative splicing is an important mechanism for increasing protein diversity. However, its functional effects are largely unknown. Here, we present our new software workflow composed of the open-source application AltAnalyze and the Cytoscape plugin DomainGraph. Both programs provide an intuitive and comprehensive end-to-end solution for the analysis and visualization of alternative splicing data from Affymetrix Exon and Gene Arrays at the level of proteins, domains, microRNA binding sites, molecular interactions and pathways. Our software tools include easy-to-use graphical user interfaces, rigorous statistical methods (FIRMA, MiDAS and DABG filtering) and do not require prior knowledge of exon array analysis or programming. They provide new methods for automatic interpretation and visualization of the effects of alternative exon inclusion on protein domain composition and microRNA binding sites. These data can be visualized together with affected pathways and gene or protein interaction networks, allowing a straightforward identification of potential biological effects due to alternative splicing at different levels of granularity. Our programs are available at http://www.altanalyze.org and http://www.domaingraph.de. These websites also include extensive documentation, tutorials and sample data.
The hepatitis C virus (HCV) nonstructural (NS) protein 4B is known for protein-protein interactions with virus and host cell factors. Only little is known about the corresponding protein binding sites and underlying molecular mechanisms. Recently, we have predicted a putative basic leucine zipper (bZIP) motif within the aminoterminal part of NS4B. The aim of this study was to investigate the importance of this NS4B bZIP motif for specific protein-protein interactions. We applied in silico approaches for 3D-structure modeling of NS4B-homodimerization via the bZIP motif and identified crucial amino acid positions by multiple sequence analysis. The selected sites were used for site-directed mutagenesis within the NS4B bZIP motif and subsequent co-immunoprecipitation of wild-type and mutant NS4B molecules. Respective interaction energies were calculated for wild-type and mutant structural models. NS4B-homodimerization with a gradual alleviation of dimer interaction from wild-type towards the mutant-dimers was observed. The putative bZIP motif was confirmed by a co-immunoprecipitation assay and western blot analysis. NS4B-NS4B interaction depends on the integrity of the bZIP hydrophobic core and can be abolished due to changes of crucial residues within NS4B. In conclusion, our data indicate NS4B-homodimerization and that this interaction is facilitated by the aminoterminal part containing a bZIP motif.
In life sciences, interpretability of machine learning models is as important as their prediction accuracy. Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support vector machines and RandomForest (RF) models. Recently, it has been observed that RF models are biased in such a way that categorical variables with a large number of categories are preferred.
In HIV-1, thymidine analogue mutations (TAMs) cluster in one of two groups (215Y, 41L, 210W, or 215F, 219E/Q), representing two independent mutational patterns (T215Y and T215F cluster, respectively). The mechanisms by which these pathways are selected are not fully understood. To investigate possible factors driving the selection of the TAMs, we analyzed the TAM patterns with regard to the respective treatment, viral load, and HLA in 18 children all infected from a common source of HIV-1 clade G virus and initially treated with zidovudine. The HIV reverse transcriptase sequences of 14/18 children carried at least one TAM. At first sampling date, the T215Y-linked pattern was observed in five cases and the T215F cluster was seen in nine. During the follow-up period, three patients changed their patterns. Children treated with identical NRTI combinations at the first sampling date developed different pathways. Under AZT/d4T therapies, an association was found between the HLA B*13 (in combination with HLA DRB1*0701) and the mutation T215Y. The mutation T215Y reverted in three out of four patients who discontinued AZT/d4T treatment. We speculate that in the context of these subtype G viruses, the development of the T215Y mutation may be strongly disfavored whereas the presence of HLA B*13 may counteract this effect and permit its development.
Modern life sciences are becoming increasingly data intensive, posing a significant challenge for most researchers and shifting the bottleneck of scientific discovery from data generation to data analysis. As a result, progress in genome research is increasingly impeded by bioinformatic hurdles. A new generation of powerful and easy-to-use genome analysis tools has been developed to address this issue, enabling biologists to perform complex bioinformatic analyses online - without having to learn a programming language or downloading and manually processing large datasets. In this tutorial paper, we describe the use of EpiGRAPH (http://epigraph.mpi-inf.mpg.de/) and Galaxy (http://galaxyproject.org/) for genome and epigenome analysis, and we illustrate how these two web services work together to identify epigenetic modifications that are characteristics of highly polymorphic (SNP-rich) promoters. This paper is supplemented with video tutorials (http://tinyurl.com/yc5xkqq), which provide a step-by-step guide through each example analysis.
Experimental screening of large sets of peptides with respect to their MHC binding capabilities is still very demanding due to the large number of possible peptide sequences and the extensive polymorphism of the MHC proteins. Therefore, there is significant interest in the development of computational methods for predicting the binding capability of peptides to MHC molecules, as a first step towards selecting peptides for actual screening.
Replication capacity (RC) of specific HIV isolates is occasionally blamed for unexpected treatment responses. However, the role of viral RC in response to antiretroviral therapy is not yet fully understood.
The adaptive immune response against hepatitis C virus (HCV) is significantly shaped by the hosts composition of HLA-alleles with the consequence that the HLA phenotype is a critical determinant of viral evolution during adaptive immune pressure. In the present study, we aimed to identify associations of HLA class I alleles with HCV subtypes 1a and 1b genetic variants.
The V3 loop of human immunodeficiency virus type 1 (HIV-1) is critical for coreceptor binding and is the main determinant of which of the cellular coreceptors, CCR5 or CXCR4, the virus uses for cell entry. The aim of this study is to provide a large-scale data driven analysis of HIV-1 coreceptor usage with respect to the V3 loop evolution and to characterize CCR5- and CXCR4-tropic viral phenotypes previously studied in small- and medium-scale settings. We use different sequence similarity measures, phylogenetic and clustering methods in order to analyze the distribution in sequence space of roughly 1000 V3 loop sequences and their tropism phenotypes. This analysis affords a means of characterizing those sequences that are misclassified by several sequence-based coreceptor prediction methods, as well as predicting the coreceptor using the location of the sequence in sequence space and of relating this location to the CD4(+) T-cell count of the patient. We support previous findings that the usage of CCR5 is correlated with relatively high sequence conservation whereas CXCR4-tropic viruses spread over larger regions in sequence space. The incorrectly predicted sequences are mostly located in regions in which their phenotype represents the minority or in close vicinity of regions dominated by the opposite phenotype. Nevertheless, the location of the sequence in sequence space can be used to improve the accuracy of the prediction of the coreceptor usage. Sequences from patients with high CD4(+) T-cell counts are relatively highly conserved as compared to those of immunosuppressed patients. Our study thus supports hypotheses of an association of immune system depletion with an increase in V3 loop sequence variability and with the escape of the viral sequence to distant parts of the sequence space.
Inferring response to antiretroviral therapy from the viral genotype alone is challenging. The utility of an intermediate step of predicting in vitro drug susceptibility is currently controversial. Here, we provide a retrospective comparison of approaches using either genotype or predicted phenotypes alone, or in combination.
Ever increasing amounts of biological interaction data are being accumulated worldwide, but they are currently not readily accessible to the biologist at a single site. New techniques are required for retrieving, sharing and presenting data spread over the Internet.
DNA methylation is a key mechanism of epigenetic regulation that is frequently altered in diseases such as cancer. To confirm the biological or clinical relevance of such changes, gene-specific DNA methylation changes need to be validated in multiple samples. We have developed the MethMarker http://methmarker.mpi-inf.mpg.de/ software to help design robust and cost-efficient DNA methylation assays for six widely used methods. Furthermore, MethMarker implements a bioinformatic workflow for transforming disease-specific differentially methylated genomic regions into robust clinical biomarkers.
Expert-based genotypic interpretation systems are standard methods for guiding treatment selection for patients infected with human immunodeficiency virus type 1. We previously introduced the software pipeline geno2pheno-THEO (g2p-THEO), which on the basis of viral sequence predicts the response to treatment with a combination of antiretroviral compounds by applying methods from statistical learning and the estimated potential of the virus to escape from drug pressure.
Interferon-alpha induces 2-5-oligoadenylate synthetase which activates RNase L. Viral RNA is cleaved by RNase L at UU/UA dinucleotides. The clinical relevance of RNase L cleavage for response to an interferon-alpha-based therapy in chronic hepatitis C is unknown.
We describe a scoring and modeling procedure for docking ligands into protein models that have either modeled or flexible side-chain conformations. Our methodical contribution comprises a procedure for generating new potentials of mean force for the ROTA scoring function which we have introduced previously for optimizing side-chain conformations with the tool IRECS. The ROTA potentials are specially trained to tolerate small-scale positional errors of atoms that are characteristic of (i) side-chain conformations that are modeled using a sparse rotamer library and (ii) ligand conformations that are generated using a docking program. We generated both rigid and flexible protein models with our side-chain prediction tool IRECS and docked ligands to proteins using the scoring function ROTA and the docking programs FlexX (for rigid side chains) and FlexE (for flexible side chains). We validated our approach on the forty screening targets of the DUD database. The validation shows that the ROTA potentials are especially well suited for estimating the binding affinity of ligands to proteins. The results also show that our procedure can compensate for the performance decrease in screening that occurs when using protein models with side chains modeled with a rotamer library instead of using X-ray structures. The average runtime per ligand of our method is 168 seconds on an Opteron V20z, which is fast enough to allow virtual screening of compound libraries for drug candidates.
The EpiGRAPH web service http://epigraph.mpi-inf.mpg.de/ enables biologists to uncover hidden associations in vertebrate genome and epigenome datasets. Users can upload sets of genomic regions and EpiGRAPH will test multiple attributes (including DNA sequence, chromatin structure, epigenetic modifications and evolutionary conservation) for enrichment or depletion among these regions. Furthermore, EpiGRAPH learns to predictively identify similar genomic regions. This paper demonstrates EpiGRAPHs practical utility in a case study on monoallelic gene expression and describes its novel approach to reproducible bioinformatic analysis.
Abstract To date, very little information is available regarding the evolution of drug resistance mutations during treatment interruption (TI). Using a survival analysis approach, we investigated the dynamics of mutations associated with resistance to nucleoside analogue reverse transcriptase inhibitors (NRTIs) during TI. Analyzing 132 patients having at least two consecutive genotypes, one at last NRTI-containing regimen failure, and at least one during TI, we observed that the NRTI resistance mutations disappear at different rates during TI and are lost independently of each other in the majority of patients. The disappearance of the K65R and M184I/V mutations occurred in the majority of patients, was rapid, and was associated with the reemergence of wild-type virus, thus showing their negative impact on viral fitness. Overall, it seems that the loss of NRTI drug resistance mutations during TI is not an ordered process, and in the majority of patients occurs without specific interaction among mutations.
Aberrant DNA methylation often occurs in colorectal cancer (CRC). In our study we applied a genome-wide DNA methylation analysis approach, MethylCap-seq, to map the differentially methylated regions (DMRs) in 24 tumors and matched normal colon samples. In total, 2687 frequently hypermethylated and 468 frequently hypomethylated regions were identified, which include potential biomarkers for CRC diagnosis. Hypermethylation in the tumor samples was enriched at CpG islands and gene promoters, while hypomethylation was distributed throughout the genome. Using epigenetic data from human embryonic stem cells, we show that frequently hypermethylated regions coincide with bivalent loci in human embryonic stem cells. DNA methylation is commonly thought to lead to gene silencing; however, integration of publically available gene expression data indicates that 75% of the frequently hypermethylated genes were most likely already lowly or not expressed in normal tissue. Collectively, our study provides genome-wide DNA methylation maps of CRC, comprehensive lists of DMRs, and gives insights into the role of aberrant DNA methylation in CRC formation.
ABSTRACT: Epigenome mapping consortia are generating resources of tremendous value for studying epigenetic regulation. To maximize their utility and impact, new tools are needed that facilitate interactive analysis of epigenome datasets. Here we describe EpiExplorer, a web tool for exploring genome and epigenome data on a genomic scale. We demonstrate EpiExplorers utility by describing a hypothesis-generating analysis of DNA hydroxymethylation in relation to public reference maps of the human epigenome. All EpiExplorer analyses are performed dynamically within seconds, using an efficient and versatile text indexing scheme that we introduce to bioinformatics. EpiExplorer is available at http://epiexplorer.mpi-inf.mpg.de.
Due to the high mutation rate of human immunodeficiency virus (HIV), drug-resistant-variants emerge frequently. Therefore, researchers are constantly searching for new ways to attack the virus. One new class of anti-HIV drugs is the class of coreceptor antagonists that block cell entry by occupying a coreceptor on CD4 cells. This type of drug just has an effect on the subset of HIVs that use the inhibited coreceptor. A good prediction of whether the viral population inside a patient is susceptible to the treatment is hence very important for therapy decisions and pre-requisite to administering the respective drug. The first prediction models were based on data from Sanger sequencing of the V3 loop of HIV. Recently, a method based on next-generation sequencing (NGS) data was introduced that predicts labels for each read separately and decides on the patient label through a percentage threshold for the resistant viral minority.
Entry of human immunodeficiency virus type 1 (HIV-1) into the host cell involves interactions between the viral envelope glycoproteins (Env) and the cellular receptor CD4 as well as a coreceptor molecule (most importantly CCR5 or CXCR4). Viral preference for a specific coreceptor (tropism) is in particular determined by the third variable loop (V3) of the Env glycoprotein gp120. The approval and use of a coreceptor antagonist for antiretroviral therapy make detailed understanding of tropism and its accurate prediction from patient derived virus isolates essential. The aim of the present study is the development of an extended description of the HIV entry phenotype reflecting its co-dependence on several key determinants as the basis for a more accurate prediction of HIV-1 entry phenotype from genotypic data.
The International Society for Computational Biology, ISCB, organizes the largest event in the field of computational biology and bioinformatics, namely the annual international conference on Intelligent Systems for Molecular Biology, the ISMB. This year at ISMB 2012 in Long Beach, ISCB celebrated the 20th anniversary of its flagship meeting. ISCB is a young, lean and efficient society that aspires to make a significant impact with only limited resources. Many constraints make the choice of venues for ISMB a tough challenge. Here, we describe those challenges and invite the contribution of ideas for solutions.
The hepatitis B virus (HBV) is classified into distinct genotypes A-H that are characterized by different progression of hepatitis B and sensitivity to interferon treatment. Previous computational genotyping methods are not robust enough regarding HBV dual infections with different genotypes. The correct classification of HBV sequences into the present genotypes is impaired due to multiple ambiguous sequence positions. We present a computational model that is able to identify and genotype inter- and intragenotype dual infections using population-based sequencing data. Model verification on synthetic data showed 100?% accuracy for intergenotype dual infections and 36.4?% sensitivity in intragenotype dual infections. Screening patient sera (n?=?241) revealed eight putative cases of intergenotype dual infection (one A-D, six A-G and one D-G) and four putative cases of intragenotype dual infection (one A-A, two D-D and one E-E). Clonal experiments from the original patient material confirmed three out of three of our predictions. The method has been integrated into geno2pheno([hbv]), an established web-service in clinical use for analysing HBV sequence data. It offers exact and detailed identification of HBV genotypes in patients with dual infections that helps to optimize antiviral therapy regimens. geno2pheno([hbv]) is available under http://www.genafor.org/g2p_hbv/index.php.
Drug resistance is a common cause of treatment failure for HIV infection and cancer. The high mutation rate of HIV leads to genetic heterogeneity among viral populations and provides the seed from which drug-resistant clones emerge in response to therapy. Similarly, most cancers are characterized by extensive genetic, epigenetic, transcriptional and cellular diversity, and drug-resistant cancer cells outgrow their non-resistant peers in a process of somatic evolution. Patient-specific combination of antiviral drugs has emerged as a powerful approach for treating drug-resistant HIV infection, using genotype-based predictions to identify the best matched combination therapy among several hundred possible combinations of HIV drugs. In this Opinion article, we argue that HIV therapy provides a blueprint for designing and validating patient-specific combination therapies in cancer.
HIV patients are treated by administration of combinations of antiretroviral drugs. The very large number of such combinations makes the manual search for an effective therapy practically impossible, especially in advanced stages of the disease. Therapy selection can be supported by statistical methods that predict the outcomes of candidate therapies. However, these methods are based on clinical data sets that have highly unbalanced therapy representation. This paper presents a novel approach that considers each drug belonging to a target combination therapy as a separate task in a multi-task hierarchical Bayes setting. The drug-specific models take into account information on all therapies containing the drug, not just the target therapy. In this way, we can circumvent the problem of data sparseness pertaining to some target therapies. The computational validation shows that compared to the most commonly used approach that provides therapy information in the form of input features, our model has significantly higher predictive power for therapies with very few training samples and is at least as powerful for abundant therapies.
MicroRNAs (miRNAs) are non-coding, short (21-23nt) regulators of protein-coding genes that are generally transcribed first into primary miRNA (pri-miR), followed by the generation of precursor miRNA (pre-miR). This finally leads to the production of the mature miRNA. A large amount of information is available on the pre- and mature miRNAs. However, very little is known about the pri-miRs, due to a lack of knowledge about their transcription start sites (TSSs). Based on the genomic loci, miRNAs can be categorized into two types --intragenic (intra-miR) and intergenic (inter-miR). While it is already an established fact that intra-miRs are commonly transcribed in conjunction with their host genes, the transcription machinery of inter-miRs is poorly understood. Although it is assumed that miRNA promoters are similar in structure to gene promoters, since both are transcribed by RNA polymerase II (Pol II), computational validations exhibit poor performance of gene promoter prediction methods on miRNAs. In this paper, we concentrate on the problem of TSS prediction for miRNAs. The present study begins with the identification of positive and negative promoter samples from recently published data stemming from RNA-sequencing studies. From these samples of experimentally validated miRNA TSSs, a number of standard sequence features are extracted. Furthermore, to account for potential footprints related to promoter regulation by CpG dinucleotide targeted DNA methylation, a number of novel features are defined. We develop a support vector machine (SVM) with RBF kernel for the prediction of miRNA TSSs trained on human miRNA promoters. A novel feature reduction technique based on archived multi-objective simulated annealing (AMOSA) identifies the final set of features. The resulting model trained on miRNA promoters shows improved performance over the one trained on protein-coding gene promoters in terms of classification accuracy, sensitivity and specificity. Results are also reported for a completely independent biologically validated test set. In a part of the investigation, the proposed approach is used to predict protein-coding gene TSSs. It shows a significantly improved performance when compared to previously published gene TSS prediction methods.
This chapter describes bioinformatic tools for analyzing epigenome differences between species and in diseased versus normal cells. We illustrate the interplay of several Web-based tools in a case study of CpG island evolution between human and mouse. Starting from a list of orthologous genes, we use the Galaxy Web service to obtain gene coordinates for both species. These data are further analyzed in EpiGRAPH, a Web-based tool that identifies statistically significant epigenetic differences between genome region sets. Finally, we outline how the use of the statistical programming language R enables deeper insights into the epigenetics of human diseases, which are difficult to obtain without writing custom scripts. In summary, our tutorial describes how Web-based tools provide an easy entry into epigenome data analysis while also highlighting the benefits of learning a scripting language in order to unlock the vast potential of public epigenome datasets.
Incorporating backbone flexibility into protein-ligand docking is still a challenging problem. In protein-protein docking, normal mode analysis (NMA) has become increasingly popular as it can be used to describe the collective motions of a biological system, but the question of whether NMA can also be useful in predicting the conformational changes observed upon small-molecule binding has only been addressed in a few case studies. Here, we describe a large-scale study on the applicability of NMA for protein-ligand docking using 433 apo/holo pairs of the Astex data sets. On the basis of sets of the first normal modes from the apo structure, we first generated for each paired holo structure a set of conformations that optimally reproduce its C(?) trace with respect to the underlying normal mode subspace. Using AutoDock, GOLD, and FlexX we then docked the original ligands into these conformations to assess how the docking performance depends on the number of modes used to reproduce the holo structure. The results of our study indicate that, even for such a best-case scenario, the use of normal mode analysis in small-molecule docking is restricted and that a general rule on how many modes to use does not seem to exist or at least is not easy to find.
Transmitted HIV drug resistance may impair treatment efficacy of combination antiretroviral therapy (ART). This study describes the epidemiology of transmitted resistance in chronically infected patients.
Older HIV patients are defined as aged 50 years and older. This group is a growing population in developed countries. In order to improve care for older HIV patients, we intended to gain insight into the specific features of transmission, epidemiology, immunology and antiretroviral treatment (ART) of this population.
For a long time, the clinical management of antiretroviral drug resistance was based on sequence analysis of the HIV genome followed by estimating drug susceptibility from the mutational pattern that was detected. The large number of anti-HIV drugs and HIV drug resistance mutations has prompted the development of computer-aided genotype interpretation systems, typically comprising rules handcrafted by experts via careful examination of in vitro and in vivo resistance data. More recently, machine learning approaches have been applied to establish data-driven engines able to indicate the most effective treatments for any patient and virus combination. Systems of this kind, currently including the Resistance Response Database Initiative and the EuResist engine, must learn from the large data sets of patient histories and can provide an objective and accurate estimate of the virological response to different antiretroviral regimens. The EuResist engine was developed by a European consortium of HIV and bioinformatics experts and compares favorably with the most commonly used genotype interpretation systems and HIV drug resistance experts. Next-generation treatment response prediction engines may valuably assist the HIV specialist in the challenging task of establishing effective regimens for patients harboring drug-resistant virus strains. The extensive collection and accurate processing of increasingly large patient data sets are eagerly awaited to further train and translate these systems from prototype engines into real-life treatment decision support tools.
Inferring HIV-1 coreceptor usage from a genotype is becoming more and more important for the appropriate treatment of long-term patients. While results are already encouraging where standard bulk-nucleic acid sequencing methods are used, they are limited with respect to the detection of minor variants. In contrast, next-generation sequencing methods (ultradeep sequencing, pyrosequencing) are capable of sequencing virus quasispecies at very low quantities. However, as well as being very expensive, these methods generate vast amounts of data such that sequence analysis has to be automated by computer assistance. Here, we describe the geno2pheno system which handles all processing and prediction steps involved in the prediction of coreceptor usage from massively parallel sequencing data. The system is split into a JAVA preprocessor which is run locally on the client side and a Web server which generates the prediction results. Predictions are based on the same prediction method as used in the geno2pheno[coreceptor] tool.
HIVs genetic instability means that sequence similarity can illuminate the underlying transmission network. Previous application of such methods to samples from the United Kingdom has suggested that as many as 86% of UK infections arose outside of the country, a conclusion contrary to usual patterns of disease spread. We investigated transmission networks in the Resina cohort, a 2,747 member sample from Nordrhein-Westfalen, Germany, sequenced at therapy start. Transmission networks were determined by thresholding the pairwise genetic distance in the pol gene at 96.8% identity. At first blush the results concurred with the UK studies. Closer examination revealed four large and growing transmission networks that encompassed all major transmission groups. One of these formed a supercluster containing 71% of the sex with men (MSM) subjects when the network was thresholded at levels roughly equivalent to those used in the UK studies, though methodological differences suggest that this threshold may be too generous in the current data. Examination of the endo- versus exogenesis hypothesis by testing whether infections that were exogenous to Cologne or to Dusseldorf were endogenous to the greater region supported endogenous spread in MSM subjects and exogenous spread in the endemic transmission group. In intravenous drug using group subjects, it depended on viral strain, with subtype B sequences appearing to have origin exogenous to the Resina data, while non-B sequences (primarily subtype A) were almost completely endogenous to their local community. These results suggest that, at least in Germany, the question of endogenous versus exogenous linkages depends on subject group.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.