Exhaustive mapping of next-generation sequencing data to a set of relevant reference sequences becomes an important task in pathogen discovery and metagenomic classification. However, the runtime and memory usage increase as the number of reference sequences and the repeat content among these sequences increase. In many applications, read mapping time dominates the entire application. We developed CompMap, a reference-based compression program, to speed up this process. CompMap enables the generation of a non-redundant representative sequence for the input sequences. We have demonstrated that reads can be mapped to this representative sequence with a much reduced time and memory usage, and the mapping to the original reference sequences can be recovered with high accuracy. Availability and implementation: CompMap is implemented in C and freely available at http://csse.szu.edu.cn/staff/zhuzx/CompMap/.
Disease module is a group of molecular components that interact intensively in the disease specific biological network. Since the connectivity and activity of disease modules may shed light on the molecular mechanisms of pathogenesis and disease progression, their identification becomes one of the most important challenges in network medicine, an emerging paradigm to study complex human disease. This paper proposes a novel algorithm, DiME (Disease Module Extraction), to identify putative disease modules from biological networks. We have developed novel heuristics to optimise Community Extraction, a module criterion originally proposed for social network analysis, to extract topological core modules from biological networks as putative disease modules. In addition, we have incorporated a statistical significance measure, B-score, to evaluate the quality of extracted modules. As an application to complex diseases, we have employed DiME to investigate the molecular mechanisms that underpin the progression of glioma, the most common type of brain tumour. We have built low (grade II)--and high (GBM)--grade glioma co-expression networks from three independent datasets and then applied DiME to extract potential disease modules from both networks for comparison. Examination of the interconnectivity of the identified modules have revealed changes in topology and module activity (expression) between low- and high- grade tumours, which are characteristic of the major shifts in the constitution and physiology of tumour cells during glioma progression. Our results suggest that transcription factors E2F4, AR and ETS1 are potential key regulators in tumour progression. Our DiME compiled software, R/C++ source code, sample data and a tutorial are available at http://www.cs.bham.ac.uk/~szh/DiME.
Experimental MS(n) mass spectral libraries currently do not adequately cover chemical space. This limits the robust annotation of metabolites in metabolomics studies of complex biological samples. In-silico fragmentation libraries would improve the identification of compounds from experimental multi-stage fragmentation data when experimental reference data is unavailable. Here we present a freely-available software package to automatically control Mass Frontier software to construct in-silico mass spectral libraries, and to perform spectral matching. Based on two case studies we have demonstrated that HAMMER allows researchers to generate in-silico mass spectral libraries in an automated and high-throughput fashion with little or no human intervention required.
The exponential growth of high-throughput DNA sequence data has posed great challenges to genomic data storage, retrieval and transmission. Compression is a critical tool to address these challenges, where many methods have been developed to reduce the storage size of the genomes and sequencing data (reads, quality scores and metadata). However, genomic data are being generated faster than they could be meaningfully analyzed, leaving a large scope for developing novel compression algorithms that could directly facilitate data analysis beyond data transfer and storage. In this article, we categorize and provide a comprehensive review of the existing compression methods specialized for genomic data and present experimental results on compression ratio, memory usage, time for compression and decompression. We further present the remaining challenges and potential directions for future research.
Multiclass cancer classification on microarray data has provided the feasibility of cancer diagnosis across all of the common malignancies in parallel. Using multiclass cancer feature selection approaches, it is now possible to identify genes relevant to a set of cancer types. However, besides identifying the relevant genes for the set of all cancer types, it is deemed to be more informative to biologists if the relevance of each gene to specific cancer or subset of cancer types could be revealed or pinpointed. In this paper, we introduce two new definitions of multiclass relevancy features, i.e., full class relevant (FCR) and partial class relevant (PCR) features. Particularly, FCR denotes genes that serve as candidate biomarkers for discriminating all cancer types. PCR, on the other hand, are genes that distinguish subsets of cancer types. Subsequently, a Markov blanket embedded memetic algorithm is proposed for the simultaneous identification of both FCR and PCR genes. Results obtained on commonly used synthetic and real-world microarray data sets show that the proposed approach converges to valid FCR and PCR genes that would assist biologists in their research work. The identification of both FCR and PCR genes is found to generate improvement in classification accuracy on many microarray data sets. Further comparison study to existing state-of-the-art feature selection algorithms also reveals the effectiveness and efficiency of the proposed approach.
The GP2 peptide is derived from the Human Epidermal growth factor Receptor 2 (HER2/nue), a marker protein for breast cancer present in saliva. In this paper we study the temperature dependent behavior of hydrated GP2 at terahertz frequencies and find that the peptide undergoes a dynamic transition between 200 and 220 K. By fitting suitable molecular models to the frequency response we determine the molecular processes involved above and below the transition temperature (T(D)). In particular, we show that below T(D) the dynamic transition is dominated by a simple harmonic vibration with a slow and temperature dependent relaxation time constant and that above T(D), the dynamic behavior is governed by two oscillators, one of which has a fast and temperature independent relaxation time constant and the other of which is a heavily damped oscillator with a slow and temperature dependent time constant. Furthermore a red shifting of the characteristic frequency of the damped oscillator was observed, confirming the presence of a non-harmonic vibration potential. Our measurements and modeling of GP2 highlight the unique capabilities of THz spectroscopy for protein characterization.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.