JoVE Visualize What is visualize?
Stop Reading. Start Watching.
Advanced Search
Stop Reading. Start Watching.
Regular Search
Find video protocols related to scientific articles indexed in Pubmed.
DNA-LCEB: a high-capacity and mutation-resistant DNA data-hiding approach by employing encryption, error correcting codes, and hybrid twofold and fourfold codon-based strategy for synonymous substitution in amino acids.
Med Biol Eng Comput
PUBLISHED: 08-25-2014
Show Abstract
Hide Abstract
Data-hiding in deoxyribonucleic acid (DNA) sequences can be used to develop an organic memory and to track parent genes in an offspring as well as in genetically modified organism. However, the main concerns regarding data-hiding in DNA sequences are the survival of organism and successful extraction of watermark from DNA. This implies that the organism should live and reproduce without any functional disorder even in the presence of the embedded data. Consequently, performing synonymous substitution in amino acids for watermarking becomes a primary option. In this regard, a hybrid watermark embedding strategy that employs synonymous substitution in both twofold and fourfold codons of amino acids is proposed. This work thus presents a high-capacity and mutation-resistant watermarking technique, DNA-LCEB, for hiding secret information in DNA of living organisms. By employing the different types of synonymous codons of amino acids, the data storage capacity has been significantly increased. It is further observed that the proposed DNA-LCEB employing a combination of synonymous substitution, lossless compression, encryption, and Bose-Chaudary-Hocquenghem coding is secure and performs better in terms of both capacity and robustness compared to existing DNA data-hiding schemes. The proposed DNA-LCEB is tested against different mutations, including silent, miss-sense, and non-sense mutations, and provides substantial improvement in terms of mutation detection/correction rate and bits per nucleotide. A web application for DNA-LCEB is available at http://111.68.99.218/DNA-LCEB .
Related JoVE Video
IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids.
Amino Acids
PUBLISHED: 01-04-2014
Show Abstract
Hide Abstract
Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our analysis demonstrates that ensemble-RF, ensemble-SVM and ensemble-KNN are more effective than their individual counterparts. The proposed 'IDM-PhyChm-Ens' method has shown improved performance compared to existing techniques.
Related JoVE Video
From genes to health - challenges and opportunities.
Front Pediatr
PUBLISHED: 01-01-2014
Show Abstract
Hide Abstract
In genome science, the advancement in high-throughput sequencing technologies and bioinformatics analysis is facilitating the better understanding of Mendelian and complex trait inheritance. Charting the genetic basis of complex diseases - including pediatric cancer, and interpreting huge amount of next-generation sequencing data are among the major technical challenges to be overcome in order to understand the molecular basis of various diseases and genetic disorders. In this review, we provide insights into some major challenges currently hindering a better understanding of Mendelian and complex trait inheritance, and thus impeding medical benefits to patients.
Related JoVE Video
A recent survey on colon cancer detection techniques.
IEEE/ACM Trans Comput Biol Bioinform
PUBLISHED: 10-05-2013
Show Abstract
Hide Abstract
Colon cancer causes deaths of about half a million people every year. Common method of its detection is histopathological tissue analysis, which, though leads to vital diagnosis, is significantly correlated to the tiredness, experience, and workload of the pathologist. Researchers have been working since decades to get rid of manual inspection, and to develop trustworthy systems for detecting colon cancer. Several techniques, based on spectral/spatial analysis of colon biopsy images, and serum and gene analysis of colon samples, have been proposed in this regard. Due to rapid evolution of colon cancer detection techniques, a latest review of recent research in this field is highly desirable. The aim of this paper is to discuss various colon cancer detection techniques. In this survey, we categorize the techniques on the basis of the adopted methodology and underlying data set, and provide detailed description of techniques in each category. Additionally, this study provides an extensive comparison of various colon cancer detection categories, and of multiple techniques within each category. Further, most of the techniques have been evaluated on similar data set to provide a fair performance comparison. Analysis reveals that neither of the techniques is perfect; however, research community is progressively inching toward the finest possible solution.
Related JoVE Video
A Recent Survey on Colon Cancer Detection Techniques.
IEEE/ACM Trans Comput Biol Bioinform
PUBLISHED: 07-31-2013
Show Abstract
Hide Abstract
Colon cancer causes deaths of about half a million people every year. Common method of its detection is histo-pathological tissue analysis, which, though leads to vital diagnosis, is significantly correlated to tiredness, experience and workload of the pathologist. Researchers have been working since decades to get rid of manual inspection, and to develop trustworthy systems for detecting colon cancer. Several techniques, based on spectral/spatial analysis of colon biopsy images, and serum and gene analysis of colon samples, have been proposed in this regard. Due to rapid evolution of colon cancer detection techniques, a latest review of recent research in this field is highly desirable. The aim of this paper is to discuss various colon cancer detection techniques. In this survey, we categorize the techniques on the basis of the adopted methodology and underlying dataset, and provide detailed description of techniques in each category. Additionally, this study provides an extensive comparison of various colon cancer detection categories, and multiple techniques within each category. Furthermore, most of the techniques have been evaluated by using similar dataset to provide a fair performance comparison. Analysis reveals that neither of the techniques is perfect, however, research community is progressively inching towards the finest possible solution.
Related JoVE Video
Robust information gain based fuzzy c-means clustering and classification of carotid artery ultrasound images.
Comput Methods Programs Biomed
PUBLISHED: 07-11-2013
Show Abstract
Hide Abstract
In this paper, a robust method is proposed for segmentation of medical images by exploiting the concept of information gain. Medical images contain inherent noise due to imaging equipment, operating environment and patient movement during image acquisition. A robust medical image segmentation technique is thus inevitable for accurate results in subsequent stages. The clustering technique proposed in this work updates fuzzy membership values and cluster centroids based on information gain computed from the local neighborhood of a pixel. The proposed approach is less sensitive to noise and produces homogeneous clustering. Experiments are performed on medical and non-medical images and results are compared with state of the art segmentation approaches. Analysis of visual and quantitative results verifies that the proposed approach outperforms other techniques both on noisy and noise free images. Furthermore, the proposed technique is used to segment a dataset of 300 real carotid artery ultrasound images. A decision system for plaque detection in the carotid artery is then proposed. Intima media thickness (IMT) is measured from the segmented images produced by the proposed approach. A feature vector based on IMT values is constructed for making decision about the presence of plaque in carotid artery using probabilistic neural network (PNN). The proposed decision system detects plaque in carotid artery images with high accuracy. Finally, effect of the proposed segmentation technique has also been investigated on classification of carotid artery ultrasound images.
Related JoVE Video
Protein subcellular localization in human and hamster cell lines: Employing local ternary patterns of fluorescence microscopy images.
J. Theor. Biol.
PUBLISHED: 05-03-2013
Show Abstract
Hide Abstract
Discriminative feature extraction technique is always required for the development of accurate and efficient prediction systems for protein subcellular localization so that effective drugs can be developed. In this work, we showed that Local Ternary Patterns (LTPs) effectively exploit small variations in pixel intensities; present in fluorescence microscopy based protein images of human and hamster cell lines. Further, Synthetic Minority Oversampling Technique is applied to balance the feature space for the classification stage. We observed that LTPs coupled with data balancing technique could enable a classifier, in this case support vector machine, to yield good performance. The proposed ensemble based prediction system, using 10-fold cross-validation, has yielded better performance compared to existing techniques in predicting various subcellular compartments for both 2D HeLa and CHO datasets. The proposed predictor is available online at: http://111.68.99.218/Protein_SubLoc/, which is freely accessible to the public.
Related JoVE Video
MitProt-Pred: Predicting mitochondrial proteins of Plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification.
Comput. Biol. Med.
PUBLISHED: 04-22-2013
Show Abstract
Hide Abstract
Mitochondrial protein of Plasmodium falciparum is an important target for anti-malarial drugs. Experimental approaches for detecting mitochondrial proteins are costly and time consuming. Therefore, MitProt-Pred is developed that utilizes Bi-profile Bayes, Pseudo Average Chemical Shift, Split Amino Acid Composition, and Pseudo Amino Acid Composition based features of the protein sequences. Hybrid feature space is also developed by combining different individual feature spaces. These feature spaces are learned and exploited through SVM based ensemble. MitProt-Pred achieved significantly improved prediction performance for two standard datasets. We also developed the score level ensemble, which outperforms the feature level ensemble.
Related JoVE Video
Genome characterization of a novel Burkholderia cepacia complex genomovar isolated from dieback affected mango orchards.
World J. Microbiol. Biotechnol.
PUBLISHED: 02-19-2013
Show Abstract
Hide Abstract
We characterized the genome of the antibiotic resistant, caseinolytic and non-hemolytic Burkholderia sp. strain TJI49, isolated from mango trees (Mangifera indica L.) with dieback disease. This isolate produced severe disease symptoms on the indicator plants. Next generation DNA sequencing and short-read assembly generated the 60X deep 7,631,934 nucleotide draft genome of Burkholderia sp. TJI49 which comprised three chromosomes and at least one mega plasmid. Genome annotation studies revealed a total 8,992 genes, out of which 8,940 were protein coding genes. Comparative genomics and phylogenetics identified Burkholderia sp. TJI49 as a distinct species of Burkholderia cepacia complex (BCC), closely related to B. multivorans ATCC17616. Genome-wide sequence alignment of this isolate with replicons of BCC members showed conservation of core function genes but considerable variations in accessory genes. Subsystem-based gene annotation identified the active presence of wide spread colonization island and type VI secretion system in Burkholderia sp. TJI49. Sequence comparisons revealed (a) 28 novel ORFs that have no database matches and (b) 23 ORFs with orthologues in species other than Burkholderia, indicating horizontal gene transfer events. Fold recognition of novel ORFs identified genes encoding pertactin autotransporter-like proteins (a constituent of type V secretion system) and Hap adhesion-like proteins (involved in cell-cell adhesion) in the genome of Burkholderia sp. TJI49. The genomic characterization of this isolate provided additional information related to the pan-genome of Burkholderia species.
Related JoVE Video
Automatic active contour-based segmentation and classification of carotid artery ultrasound images.
J Digit Imaging
PUBLISHED: 02-19-2013
Show Abstract
Hide Abstract
In this paper, we present automatic image segmentation and classification technique for carotid artery ultrasound images based on active contour approach. For early detection of the plaque in carotid artery to avoid serious brain strokes, active contour-based techniques have been applied successfully to segment out the carotid artery ultrasound images. Further, ultrasound images might be affected due to rotation, scaling, or translational factors during acquisition process. Keeping in view these facts, image alignment is used as a preprocessing step to align the carotid artery ultrasound images. In our experimental study, we exploit intima-media thickness (IMT) measurement to detect the presence of plaque in the artery. Support vector machine (SVM) classification is employed using these segmented images to distinguish the normal and diseased artery images. IMT measurement is used to form the feature vector. Our proposed approach segments the carotid artery images in an automatic way and further classifies them using SVM. Experimental results show the learning capability of SVM classifier and validate the usefulness of our proposed approach. Further, the proposed approach needs minimum interaction from a user for an early detection of plaque in carotid artery. Regarding the usefulness of the proposed approach in healthcare, it can be effectively used in remote areas as a preliminary clinical step even in the absence of highly skilled radiologists.
Related JoVE Video
Predicting G-protein-coupled receptors families using different physiochemical properties and pseudo amino acid composition.
Meth. Enzymol.
PUBLISHED: 02-05-2013
Show Abstract
Hide Abstract
G-protein-coupled receptors (GPCRs) initiate signaling pathways via trimetric guanine nucleotide-binding proteins. GPCRs are classified based on their ligand-binding properties and molecular phylogenetic analyses. Nonetheless, these later analyses are in most case dependent on multiple sequence alignments, themselves dependent on human intervention and expertise. Alignment-free classifications of GPCR sequences, in addition to being unbiased, present many applications uncovering hidden physicochemical parameters shared among specific groups of receptors, to being used in automated workflows for large-scale molecular modeling applications. Current alignment-free classification methods, however, do not reach a full accuracy. This chapter discusses how GPCRs amino acid sequences can be classified using pseudo amino acid composition and multiscale energy representation of different physiochemical properties of amino acids. A hybrid feature extraction strategy is shown to be suitable to represent GPCRs and to be able to exploit GPCR amino acid sequence discrimination capability in spatial as well as transform domain. Classification strategies such as support vector machine and probabilistic neural network are then discussed in regards to GPCRs classification. The work of GPCR-Hybrid web predictor is also discussed.
Related JoVE Video
WRF-TMH: predicting transmembrane helix by fusing composition index and physicochemical properties of amino acids.
Amino Acids
PUBLISHED: 01-23-2013
Show Abstract
Hide Abstract
Membrane protein is the prime constituent of a cell, which performs a role of mediator between intra and extracellular processes. The prediction of transmembrane (TM) helix and its topology provides essential information regarding the function and structure of membrane proteins. However, prediction of TM helix and its topology is a challenging issue in bioinformatics and computational biology due to experimental complexities and lack of its established structures. Therefore, the location and orientation of TM helix segments are predicted from topogenic sequences. In this regard, we propose WRF-TMH model for effectively predicting TM helix segments. In this model, information is extracted from membrane protein sequences using compositional index and physicochemical properties. The redundant and irrelevant features are eliminated through singular value decomposition. The selected features provided by these feature extraction strategies are then fused to develop a hybrid model. Weighted random forest is adopted as a classification approach. We have used two benchmark datasets including low and high-resolution datasets. tenfold cross validation is employed to assess the performance of WRF-TMH model at different levels including per protein, per segment, and per residue. The success rates of WRF-TMH model are quite promising and are the best reported so far on the same datasets. It is observed that WRF-TMH model might play a substantial role, and will provide essential information for further structural and functional studies on membrane proteins. The accompanied web predictor is accessible at http://111.68.99.218/WRF-TMH/ .
Related JoVE Video
Complete genome sequencing and variant analysis of a Pakistani individual.
J. Hum. Genet.
PUBLISHED: 01-17-2013
Show Abstract
Hide Abstract
We sequenced the genome of a Pakistani male at 25.5x coverage using massively parallel sequencing technology. More than 90% of the sequence reads were mapped to the human reference genome. In subsequent analysis, we identified 3,224,311 single-nucleotide polymorphisms (SNPs), of which 388,532 (12% of the total SNPs) had not been previously recorded in single nucleotide polymorphism database (dbSNP) or the 1000 Genomes Project database. The 5991 non-synonymous coding variants were screened for deleterious or disease-associated SNPs. Analysis of genes with deleterious SNPs identified retinoic acid signaling and regulation of transcription as the enriched Gene Ontology terms. Scanning of non-synonymous SNPs against the OMIM revealed several disease and phenotype-associated variants in Pakistani genome. Comparative analysis with Indian genome sequence revealed >1.8 million shared SNPs; 32% of which were annotated in ~14,000 genes. Gene Ontology (GO) terms analysis of these genes identified response to jasmonic acid stimulus, aminoglycoside antibiotic metabolic process and glycoside metabolic process with considerable enrichment. A total of 59,558 of small indels (1-5 bp) and 16,063 large structural variations were found; 54% of which was novel. Substantial number of novel structural variations discovered in Pakistani genome enforced previous inferences that (a) structural variations are major type of variation in the genome and (b) compared with SNPs, they putatively exhibit equivalent or superior functional roles. This genome sequence information will be an important reference for population-wide genomics studies of ethnically diverse South Asian subcontinent.
Related JoVE Video
Protein subcellular localization of fluorescence imagery using spatial and transform domain features.
Bioinformatics
PUBLISHED: 11-15-2011
Show Abstract
Hide Abstract
Subcellular localization of proteins is one of the most significant characteristics of living cells. Prediction of protein subcellular locations is crucial to the understanding of various protein functions. Therefore, an accurate, computationally efficient and reliable prediction system is required.
Related JoVE Video
Vasodilator effect of Phlomis bracteosa constituents is mediated through dual endothelium-dependent and endothelium-independent pathways.
Clin. Exp. Hypertens.
PUBLISHED: 10-03-2011
Show Abstract
Hide Abstract
This study describes the vasorelaxant potential of some pure compounds isolated from Phlomis bracteosa L. marrubiin, phlomeoic acid, and two new constituents labeled as RA and RB. In rat thoracic aortic rings denuded of endothelium, marrubiin, phlomeoic acid, RA, and RB caused relaxation of high K(+) (80 mM) and phenylephrine (1 ?M)-induced contractions at the concentration range of 1.0-1000 ?g/mL. Marrubiin, phlomeoic acid, RA, and RB concentration dependently (3.0-10 ?g/mL) shifted the Ca(++) curves to the right obtained in Ca(++)-free medium. The vasodilator effect of marrubiin, phlomeoic acid, RA, and RB was partially blocked by N(?)-nitro-L-arginine methyl ester in endothelium-intact aorta preparations. These results reveal that P. bracteosa constituents: marrubiin, phlomeoic acid, RA, and RB exhibit vasodilator action occurred via a combination of endothelium-independent Ca(++) antagonism and endothelium-dependent N(?)-nitro-L-arginine methyl ester-sensitive nitric oxide-modulating mechanism.
Related JoVE Video
MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM.
J. Theor. Biol.
PUBLISHED: 06-21-2011
Show Abstract
Hide Abstract
About 50% of available drugs are targeted against membrane proteins. Knowledge of membrane proteins structure and function has great importance in biological and pharmacological research. Therefore, an automated method is exceedingly advantageous, which can help in identifying the new membrane protein types based on their primary sequence. In this paper, we tackle the interesting problem of classifying membrane protein types using their sequence information. We consider both evolutionary and physicochemical features and provide them to our classification system based on support vector machine (SVM) with error correction code. We employ a powerful sequence encoding scheme by fusing position specific scoring matrix and split amino acid composition to effectively discriminate membrane protein types. Linear, polynomial, and RBF based-SVM with Bose, Chaudhuri, Hocquenghem coding are trained and tested. The highest success rate of 91.1% and 93.4% on two datasets is obtained by RBF-SVM using leave-one-out cross-validation. Thus, our proposed approach is an effective tool for the discrimination of membrane protein types and might be helpful to researchers/academicians working in the field of Drug Discovery, Cell Biology, and Bioinformatics. The web server for the proposed MemHyb-SVM is accessible at http://111.68.99.218/MemHyb-SVM.
Related JoVE Video
Prediction of membrane proteins using split amino acid and ensemble classification.
Amino Acids
PUBLISHED: 04-01-2011
Show Abstract
Hide Abstract
Knowledge of the types of membrane protein provides useful clues in deducing the functions of uncharacterized membrane proteins. An automatic method for efficiently identifying uncharacterized proteins is thus highly desirable. In this work, we have developed a novel method for predicting membrane protein types by exploiting the discrimination capability of the difference in amino acid composition at the N and C terminus through split amino acid composition (SAAC). We also show that the ensemble classification can better exploit this discriminating capability of SAAC. In this study, membrane protein types are classified using three feature extraction and several classification strategies. An ensemble classifier Mem-EnsSAAC is then developed using the best feature extraction strategy. Pseudo amino acid (PseAA) composition, discrete wavelet analysis (DWT), SAAC, and a hybrid model are employed for feature extraction. The nearest neighbor, probabilistic neural network, support vector machine, random forest, and Adaboost are used as individual classifiers. The predicted results of the individual learners are combined using genetic algorithm to form an ensemble classifier, Mem-EnsSAAC yielding an accuracy of 92.4 and 92.2% for the Jackknife and independent dataset test, respectively. Performance measures such as MCC, sensitivity, specificity, F-measure, and Q-statistics show that SAAC-based prediction yields significantly higher performance compared to PseAA- and DWT-based systems, and is also the best reported so far. The proposed Mem-EnsSAAC is able to predict the membrane protein types with high accuracy and consequently, can be very helpful in drug discovery. It can be accessed at http://111.68.99.218/membrane.
Related JoVE Video
GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble.
Amino Acids
PUBLISHED: 03-26-2011
Show Abstract
Hide Abstract
G protein-coupled receptors (GPCRs) are transmembrane proteins, which transduce signals from extracellular ligands to intracellular G protein. Automatic classification of GPCRs can provide important information for the development of novel drugs in pharmaceutical industry. In this paper, we propose an evolutionary approach, GPCR-MPredictor, which combines individual classifiers for predicting GPCRs. GPCR-MPredictor is a web predictor that can efficiently predict GPCRs at five levels. The first level determines whether a protein sequence is a GPCR or a non-GPCR. If the predicted sequence is a GPCR, then it is further classified into family, subfamily, sub-subfamily, and subtype levels. In this work, our aim is to analyze the discriminative power of different feature extraction and classification strategies in case of GPCRs prediction and then to use an evolutionary ensemble approach for enhanced prediction performance. Features are extracted using amino acid composition, pseudo amino acid composition, and dipeptide composition of protein sequences. Different classification approaches, such as k-nearest neighbor (KNN), support vector machine (SVM), probabilistic neural networks (PNN), J48, Adaboost, and Naives Bayes, have been used to classify GPCRs. The proposed hierarchical GA-based ensemble classifier exploits the prediction results of SVM, KNN, PNN, and J48 at each level. The GA-based ensemble yields an accuracy of 99.75, 92.45, 87.80, 83.57, and 96.17% at the five levels, on the first dataset. We further perform predictions on a dataset consisting of 8,000 GPCRs at the family, subfamily, and sub-subfamily level, and on two other datasets of 365 and 167 GPCRs at the second and fourth levels, respectively. In comparison with the existing methods, the results demonstrate the effectiveness of our proposed GPCR-MPredictor in classifying GPCRs families. It is accessible at http://111.68.99.218/gpcr-mpredictor/.
Related JoVE Video
Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition.
Amino Acids
PUBLISHED: 03-09-2011
Show Abstract
Hide Abstract
Mitochondria are all-important organelles of eukaryotic cells since they are involved in processes associated with cellular mortality and human diseases. Therefore, trustworthy techniques are highly required for the identification of new mitochondrial proteins. We propose Mito-GSAAC system for prediction of mitochondrial proteins. The aim of this work is to investigate an effective feature extraction strategy and to develop an ensemble approach that can better exploit the advantages of this feature extraction strategy for mitochondria classification. We investigate four kinds of protein representations for prediction of mitochondrial proteins: amino acid composition, dipeptide composition, pseudo amino acid composition, and split amino acid composition (SAAC). Individual classifiers such as support vector machine (SVM), k-nearest neighbor, multilayer perceptron, random forest, AdaBoost, and bagging are first trained. An ensemble classifier is then built using genetic programming (GP) for evolving a complex but effective decision space from the individual decision spaces of the trained classifiers. The highest prediction performance for Jackknife test is 92.62% using GP-based ensemble classifier on SAAC features, which is the highest accuracy, reported so far on the Mitochondria dataset being used. While on the Malaria Parasite Mitochondria dataset, the highest accuracy is obtained by SVM using SAAC and it is further enhanced to 93.21% using GP-based ensemble. It is observed that SAAC has better discrimination power for mitochondria prediction over the rest of the feature extraction strategies. Thus, the improved prediction performance is largely due to the better capability of SAAC for discriminating between mitochondria and non-mitochondria proteins at the N and C terminus and the effective combination capability of GP. Mito-GSAAC can be accessed at http://111.68.99.218/Mito-GSAAC . It is expected that the novel approach and the accompanied predictor will have a major impact to Molecular Cell Biology, Proteomics, Bioinformatics, System Biology, and Drug Development.
Related JoVE Video
CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition.
Comput Biol Chem
PUBLISHED: 02-23-2011
Show Abstract
Hide Abstract
Precise information about protein locations in a cell facilitates in the understanding of the function of a protein and its interaction in the cellular environment. This information further helps in the study of the specific metabolic pathways and other biological processes. We propose an ensemble approach called "CE-PLoc" for predicting subcellular locations based on fusion of individual classifiers. The proposed approach utilizes features obtained from both dipeptide composition (DC) and amphiphilic pseudo amino acid composition (PseAAC) based feature extraction strategies. Different feature spaces are obtained by varying the dimensionality using PseAAC for a selected base learner. The performance of the individual learning mechanisms such as support vector machine, nearest neighbor, probabilistic neural network, covariant discriminant, which are trained using PseAAC based features is first analyzed. Classifiers are developed using same learning mechanism but trained on PseAAC based feature spaces of varying dimensions. These classifiers are combined through voting strategy and an improvement in prediction performance is achieved. Prediction performance is further enhanced by developing CE-PLoc through the combination of different learning mechanisms trained on both DC based feature space and PseAAC based feature spaces of varying dimensions. The predictive performance of proposed CE-PLoc is evaluated for two benchmark datasets of protein subcellular locations using accuracy, MCC, and Q-statistics. Using the jackknife test, prediction accuracies of 81.47 and 83.99% are obtained for 12 and 14 subcellular locations datasets, respectively. In case of independent dataset test, prediction accuracies are 87.04 and 87.33% for 12 and 14 class datasets, respectively.
Related JoVE Video
G-protein-coupled receptor prediction using pseudo-amino-acid composition and multiscale energy representation of different physiochemical properties.
Anal. Biochem.
PUBLISHED: 01-26-2011
Show Abstract
Hide Abstract
G-protein-coupled receptors (GPCRs) are the largest family of cell surface receptors that, via trimetric guanine nucleotide-binding proteins (G-proteins), initiate some signaling pathways in the eukaryotic cell. Many diseases involve malfunction of GPCRs making their role evident in drug discovery. Thus, the automatic prediction of GPCRs can be very helpful in the pharmaceutical industry. However, prediction of GPCRs, their families, and their subfamilies is a challenging task. In this article, GPCRs are classified into families, subfamilies, and sub-subfamilies using pseudo-amino-acid composition and multiscale energy representation of different physiochemical properties of amino acids. The aim of the current research is to assess different feature extraction strategies and to develop a hybrid feature extraction strategy that can exploit the discrimination capability in both the spatial and transform domains for GPCR classification. Support vector machine, nearest neighbor, and probabilistic neural network are used for classification purposes. The overall performance of each classifier is computed individually for each feature extraction strategy. It is observed that using the jackknife test the proposed GPCR-hybrid method provides the best results reported so far. The GPCR-hybrid web predictor to help researchers working on GPCRs in the field of biochemistry and bioinformatics is available at http://111.68.99.218/GPCR.
Related JoVE Video
Prediction of GPCRs with pseudo amino acid composition: employing composite features and grey incidence degree based classification.
Protein Pept. Lett.
PUBLISHED: 01-21-2011
Show Abstract
Hide Abstract
G-protein coupled receptor (GPCR) is a protein family that is found only in the Eukaryotes. They are used for the interfacing of cell to the outside world and are involved in many physiological processes. Their role in drug development is evident. Hence, the prediction of GPCRs is very much demanding. Because of the unavailability of 3D structures of most of the GPCRs; the statistical and machine learning based prediction of GPCRs is much demanding. The GPCRs are classified into family, sub family and sub-sub family levels in the proposed approach. We have extracted features using the hybrid combination of Pseudo amino acid, Fast Fourier Transform and Split amino acid techniques. The overall feature vector is then reduced using Principle component analysis. Mostly, GPCRs are composed of two or more sub units. The arrangement and number of sub units forming a GPCR are referred to as quaternary structure. The functions of GPCRs are closely related to their quaternary structure. The classification in the present research is performed using grey incidence degree (GID) measure, which can efficiently analyze the numerical relation between various components of GPCRs. The GID measure based classification has shown remarkable improvement in predicting GPCRs.
Related JoVE Video
Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition.
J. Theor. Biol.
PUBLISHED: 07-29-2010
Show Abstract
Hide Abstract
Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. The highest success rate using the jackknife test obtained through SVM is 86.01%. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathews correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor.
Related JoVE Video
Predicting protein subcellular location: exploiting amino acid based sequence of feature spaces and fusion of diverse classifiers.
Amino Acids
PUBLISHED: 01-12-2009
Show Abstract
Hide Abstract
A novel approach CE-Ploc is proposed for predicting protein subcellular locations by exploiting diversity both in feature and decision spaces. The diversity in a sequence of feature spaces is exploited using hydrophobicity and hydrophilicity of amphiphilic pseudo amino acid composition and a specific learning mechanism. Diversity in learning mechanisms is exploited by fusion of classifiers that are based on different learning mechanisms. Significant improvement in prediction performance is observed using jackknife and independent dataset tests.
Related JoVE Video
Carotid artery image segmentation using modified spatial fuzzy c-means and ensemble clustering.
Comput Methods Programs Biomed
Show Abstract
Hide Abstract
Disease diagnosis based on ultrasound imaging is popular because of its non-invasive nature. However, ultrasound imaging system produces low quality images due to the presence of spackle noise and wave interferences. This shortcoming requires a considerable effort from experts to diagnose a disease from the carotid artery ultrasound images. Image segmentation is one of the techniques, which can help efficiently in diagnosing a disease from the carotid artery ultrasound images. Most of the pixels in an image are highly correlated. Considering the spatial information of surrounding pixels in the process of image segmentation may further improve the results. When data is highly correlated, one pixel may belong to more than one clusters with different degree of membership. In this paper, we present an image segmentation technique namely improved spatial fuzzy c-means and an ensemble clustering approach for carotid artery ultrasound images to identify the presence of plaque. Spatial, wavelets and gray level co-occurrence matrix (GLCM) features are extracted from carotid artery ultrasound images. Redundant and less important features are removed from the features set using genetic search process. Finally, segmentation process is performed on optimal or reduced features. Ensemble clustering with reduced feature set outperforms with respect to segmentation time as well as clustering accuracy. Intima-media thickness (IMT) is measured from the images segmented by the proposed approach. Based on IMT measured values, Multi-Layer Back-Propagation Neural Networks (MLBPNN) is used to classify the images into normal or abnormal. Experimental results show the learning capability of MLBPNN classifier and validate the effectiveness of our proposed technique. The proposed approach of segmentation and classification of carotid artery ultrasound images seems to be very useful for detection of plaque in carotid artery.
Related JoVE Video
Mem-PHybrid: hybrid features-based prediction system for classifying membrane protein types.
Anal. Biochem.
Show Abstract
Hide Abstract
Membrane proteins are a major class of proteins and encoded by approximately 20% to 30% of genes in most organisms. In this work, a two-layer novel membrane protein prediction system, called Mem-PHybrid, is proposed. It is able to first identify the protein query as a membrane or nonmembrane protein. In the second level, it further identifies the type of membrane protein. The proposed Mem-PHybrid prediction system is based on hybrid features, whereby a fusion of both the physicochemical and split amino acid composition-based features is performed. This enables the proposed Mem-PHybrid to exploit the discrimination capabilities of both types of feature extraction strategy. In addition, minimum redundancy and maximum relevance has also been applied to reduce the dimensionality of a feature vector. We employ random forest, evidence-theoretic K-nearest neighbor, and support vector machine (SVM) as classifiers and analyze their performance on two datasets. SVM using hybrid features yields the highest accuracy of 89.6% and 97.3% on dataset1 and 91.5% and 95.5% on dataset2 for jackknife and independent dataset tests, respectively. The enhanced prediction performance of Mem-PHybrid is largely attributed to the exploitation of the discrimination power of the hybrid features and of the learning capability of SVM. Mem-PHybrid is accessible at http://www.111.68.99.218/Mem-PHybrid.
Related JoVE Video
Identifying GPCRs and their types with Chous pseudo amino acid composition: an approach from multi-scale energy representation and position specific scoring matrix.
Protein Pept. Lett.
Show Abstract
Hide Abstract
G-protein coupled receptor (GPCR) is a membrane protein family, which serves as an interface between cell and the outside world. They are involved in various physiological processes and are the targets of more than 50% of the marketed drugs. The function of GPCRs can be known by conducting Biological experiments. However, the rapid increase of GPCR sequences entering into databanks, it is very time consuming and expensive to determine their function based only on experimental techniques. Hence, the computational prediction of GPCRs is very much demanding for both pharmaceutical and educational research. Feature extraction of GPCRs in the proposed research is performed using three techniques i.e. Pseudo amino acid composition, Wavelet based multi-scale energy and Evolutionary information based feature extraction by utilizing the position specific scoring matrices. For classification purpose, a majority voting based ensemble method is used; whose weights are optimized using genetic algorithm. Four classifiers are used in the ensemble i.e. Nearest Neighbor, Probabilistic Neural Network, Support Vector Machine and Grey Incidence Degree. The performance of the proposed method is assessed using Jackknife test for a number of datasets. First, the individual performances of classifiers are assessed for each dataset using Jackknife test. After that, the performance for each dataset is improved by using weighted ensemble classification. The weights of ensemble are optimized using various runs of Genetic Algorithm. We have compared our method with various other methods. The significance in performance of the proposed method depicts it to be useful for GPCRs classification.
Related JoVE Video

What is Visualize?

JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.

How does it work?

We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.

Video X seems to be unrelated to Abstract Y...

In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.