Increased risk for autism spectrum disorders (ASD) is attributed to hundreds of genetic loci. The convergence of ASD variants have been investigated using various approaches, including protein interactions extracted from the published literature. However, these datasets are frequently incomplete, carry biases and are limited to interactions of a single splicing isoform, which may not be expressed in the disease-relevant tissue. Here we introduce a new interactome mapping approach by experimentally identifying interactions between brain-expressed alternatively spliced variants of ASD risk factors. The Autism Spliceform Interaction Network reveals that almost half of the detected interactions and about 30% of the newly identified interacting partners represent contribution from splicing variants, emphasizing the importance of isoform networks. Isoform interactions greatly contribute to establishing direct physical connections between proteins from the de novo autism CNVs. Our findings demonstrate the critical role of spliceform networks for translating genetic knowledge into a better understanding of human diseases.
Alternative splicing is an important gene regulatory mechanism that dramatically increases the complexity of the proteome. However, how alternative splicing is regulated and how transcription and splicing are coordinated are still poorly understood, and functions of transcript isoforms have been studied only in a few limited cases. Nowadays, RNA-seq technology provides an exceptional opportunity to study alternative splicing on genome-wide scales and in an unbiased manner. With the rapid accumulation of data in public repositories, new challenges arise from the urgent need to effectively integrate many different RNA-seq datasets for study alterative splicing. This paper discusses a set of advanced computational methods that can integrate and analyze many RNA-seq datasets to systematically identify splicing modules, unravel the coupling of transcription and splicing, and predict the functions of splicing isoforms on a genome-wide scale.
Alternative transcript processing is an important mechanism for generating functional diversity in genes. However, little is known about the precise functions of individual isoforms. In fact, proteins (translated from transcript isoforms), not genes, are the function carriers. By integrating multiple human RNA-seq data sets, we carried out the first systematic prediction of isoform functions, enabling high-resolution functional annotation of human transcriptome. Unlike gene function prediction, isoform function prediction faces a unique challenge: the lack of the training data-all known functional annotations are at the gene level. To address this challenge, we modelled the gene-isoform relationships as multiple instance data and developed a novel label propagation method to predict functions. Our method achieved an average area under the receiver operating characteristic curve of 0.67 and assigned functions to 15 572 isoforms. Interestingly, we observed that different functions have different sensitivities to alternative isoform processing, and that the function diversity of isoforms from the same gene is positively correlated with their tissue expression diversity. Finally, we surveyed the literature to validate our predictions for a number of apoptotic genes. Strikingly, for the famous TP53 gene, we not only accurately identified the apoptosis regulation function of its five isoforms, but also correctly predicted the precise direction of the regulation.
Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
Although accumulating evidence has provided insight into the various functions of long-non-coding RNAs (lncRNAs), the exact functions of the majority of such transcripts are still unknown. Here, we report the first computational annotation of lncRNA functions based on public microarray expression profiles. A coding-non-coding gene co-expression (CNC) network was constructed from re-annotated Affymetrix Mouse Genome Array data. Probable functions for altogether 340 lncRNAs were predicted based on topological or other network characteristics, such as module sharing, association with network hubs and combinations of co-expression and genomic adjacency. The functions annotated to the lncRNAs mainly involve organ or tissue development (e.g. neuron, eye and muscle development), cellular transport (e.g. neuronal transport and sodium ion, acid or lipid transport) or metabolic processes (e.g. involving macromolecules, phosphocreatine and tyrosine).
Current methods of predicting mutation-induced protein stability change are imprecise. Machine learning methods have been introduced for this prediction recently; however, the available experimental data used for training these predictors are biased. Abundant data are available for several frequently occurring amino acid substitutions, whereas only limited data have been accumulated for some other mutation types. Generally, current statistical models do not account for this bias toward the commoner amino acids during the encoding process and are thus less effective in making predictions on less frequently occurring mutations. In this paper, we propose a method based on support vector machines and property encoding of amino acids. The predictor we constructed outperforms other methods on the same data sets and is more robust with poor training data. The prediction accuracy for mutations with no training data exceeded 80%. This advantage is critical for practical application, where the prediction could be applied for any type of mutations. Further analysis demonstrates our model relies on biological significant features to make predictions. To overcome the drawbacks of classifying mutations into stabilizing and destabilizing ones, a three-class classification of mutations was also discussed, where our method obtained an overall accuracy of 79.1%.
MicroRNAs (miRNAs), a growing class of small RNAs with crucial regulatory roles at the post-transcriptional level, are usually found to be clustered on chromosomes. However, with the exception of a few individual cases, so far little is known about the functional consequence of this conserved clustering of miRNA loci. In animal genomes such clusters often contain non-homologous miRNA genes. One hypothesis to explain this heterogeneity suggests that clustered miRNAs are functionally related by virtue of co-targeting downstream pathways.
MicroRNAs (miRNAs) are an important class of small noncoding RNAs capable of regulating other genes expression. Much progress has been made in computational target prediction of miRNAs in recent years. More than 10 miRNA target prediction programs have been established, yet, the prediction of animal miRNA targets remains a challenging task. We have developed miRecords, an integrated resource for animal miRNA-target interactions. The Validated Targets component of this resource hosts a large, high-quality manually curated database of experimentally validated miRNA-target interactions with systematic documentation of experimental support for each interaction. The current release of this database includes 1135 records of validated miRNA-target interactions between 301 miRNAs and 902 target genes in seven animal species. The Predicted Targets component of miRecords stores predicted miRNA targets produced by 11 established miRNA target prediction programs. miRecords is expected to serve as a useful resource not only for experimental miRNA researchers, but also for informatics scientists developing the next-generation miRNA target prediction programs. The miRecords is available at http://miRecords.umn.edu/miRecords.
De novo mutation plays an important role in autism spectrum disorders (ASDs). Notably, pathogenic copy number variants (CNVs) are characterized by high mutation rates. We hypothesize that hypermutability is a property of ASD genes and may also include nucleotide-substitution hot spots. We investigated global patterns of germline mutation by whole-genome sequencing of monozygotic twins concordant for ASD and their parents. Mutation rates varied widely throughout the genome (by 100-fold) and could be explained by intrinsic characteristics of DNA sequence and chromatin structure. Dense clusters of mutations within individual genomes were attributable to compound mutation or gene conversion. Hypermutability was a characteristic of genes involved in ASD and other diseases. In addition, genes impacted by mutations in this study were associated with ASD in independent exome-sequencing data sets. Our findings suggest that regional hypermutation is a significant factor shaping patterns of genetic variation and disease risk in humans.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.