Modern 3D electron microscopy approaches have recently allowed unprecedented insight into the 3D ultrastructural organization of cells and tissues, enabling the visualization of large macromolecular machines, such as adhesion complexes, as well as higher-order structures, such as the cytoskeleton and cellular organelles in their respective cell and tissue context. Given the inherent complexity of cellular volumes, it is essential to first extract the features of interest in order to allow visualization, quantification, and therefore comprehension of their 3D organization. Each data set is defined by distinct characteristics, e.g., signal-to-noise ratio, crispness (sharpness) of the data, heterogeneity of its features, crowdedness of features, presence or absence of characteristic shapes that allow for easy identification, and the percentage of the entire volume that a specific region of interest occupies. All these characteristics need to be considered when deciding on which approach to take for segmentation.
The six different 3D ultrastructural data sets presented were obtained by three different imaging approaches: resin embedded stained electron tomography, focused ion beam- and serial block face- scanning electron microscopy (FIB-SEM, SBF-SEM) of mildly stained and heavily stained samples, respectively. For these data sets, four different segmentation approaches have been applied: (1) fully manual model building followed solely by visualization of the model, (2) manual tracing segmentation of the data followed by surface rendering, (3) semi-automated approaches followed by surface rendering, or (4) automated custom-designed segmentation algorithms followed by surface rendering and quantitative analysis. Depending on the combination of data set characteristics, it was found that typically one of these four categorical approaches outperforms the others, but depending on the exact sequence of criteria, more than one approach may be successful. Based on these data, we propose a triage scheme that categorizes both objective data set characteristics and subjective personal criteria for the analysis of the different data sets.
26 Related JoVE Articles!
Microarray-based Identification of Individual HERV Loci Expression: Application to Biomarker Discovery in Prostate Cancer
Institutions: Joint Unit Hospices de Lyon-bioMérieux, BioMérieux, Hospices Civils de Lyon, Lyon 1 University, BioMérieux, Hospices Civils de Lyon, Hospices Civils de Lyon.
The prostate-specific antigen (PSA) is the main diagnostic biomarker for prostate cancer in clinical use, but it lacks specificity and sensitivity, particularly in low dosage values1
. ‘How to use PSA' remains a current issue, either for diagnosis as a gray zone corresponding to a concentration in serum of 2.5-10 ng/ml which does not allow a clear differentiation to be made between cancer and noncancer2
or for patient follow-up as analysis of post-operative PSA kinetic parameters can pose considerable challenges for their practical application3,4
. Alternatively, noncoding RNAs (ncRNAs) are emerging as key molecules in human cancer, with the potential to serve as novel markers of disease, e.g.
PCA3 in prostate cancer5,6
and to reveal uncharacterized aspects of tumor biology. Moreover, data from the ENCODE project published in 2012 showed that different RNA types cover about 62% of the genome. It also appears that the amount of transcriptional regulatory motifs is at least 4.5x higher than the one corresponding to protein-coding exons. Thus, long terminal repeats (LTRs) of human endogenous retroviruses (HERVs) constitute a wide range of putative/candidate transcriptional regulatory sequences, as it is their primary function in infectious retroviruses. HERVs, which are spread throughout the human genome, originate from ancestral and independent infections within the germ line, followed by copy-paste propagation processes and leading to multicopy families occupying 8% of the human genome (note that exons span 2% of our genome). Some HERV loci still express proteins that have been associated with several pathologies including cancer7-10
. We have designed a high-density microarray, in Affymetrix format, aiming to optimally characterize individual HERV loci expression, in order to better understand whether they can be active, if they drive ncRNA transcription or modulate coding gene expression. This tool has been applied in the prostate cancer field (Figure 1
Medicine, Issue 81, Cancer Biology, Genetics, Molecular Biology, Prostate, Retroviridae, Biomarkers, Pharmacological, Tumor Markers, Biological, Prostatectomy, Microarray Analysis, Gene Expression, Diagnosis, Human Endogenous Retroviruses, HERV, microarray, Transcriptome, prostate cancer, Affymetrix
A Method for Investigating Age-related Differences in the Functional Connectivity of Cognitive Control Networks Associated with Dimensional Change Card Sort Performance
Institutions: University of Western Ontario.
The ability to adjust behavior to sudden changes in the environment develops gradually in childhood and adolescence. For example, in the Dimensional Change Card Sort task, participants switch from sorting cards one way, such as shape, to sorting them a different way, such as color. Adjusting behavior in this way exacts a small performance cost, or switch cost, such that responses are typically slower and more error-prone on switch trials in which the sorting rule changes as compared to repeat trials in which the sorting rule remains the same. The ability to flexibly adjust behavior is often said to develop gradually, in part because behavioral costs such as switch costs typically decrease with increasing age. Why aspects of higher-order cognition, such as behavioral flexibility, develop so gradually remains an open question. One hypothesis is that these changes occur in association with functional changes in broad-scale cognitive control networks. On this view, complex mental operations, such as switching, involve rapid interactions between several distributed brain regions, including those that update and maintain task rules, re-orient attention, and select behaviors. With development, functional connections between these regions strengthen, leading to faster and more efficient switching operations. The current video describes a method of testing this hypothesis through the collection and multivariate analysis of fMRI data from participants of different ages.
Behavior, Issue 87, Neurosciences, fMRI, Cognitive Control, Development, Functional Connectivity
Mapping Cortical Dynamics Using Simultaneous MEG/EEG and Anatomically-constrained Minimum-norm Estimates: an Auditory Attention Example
Institutions: University of Washington.
Magneto- and electroencephalography (MEG/EEG) are neuroimaging techniques that provide a high temporal resolution particularly suitable to investigate the cortical networks involved in dynamical perceptual and cognitive tasks, such as attending to different sounds in a cocktail party. Many past studies have employed data recorded at the sensor level only, i.e
., the magnetic fields or the electric potentials recorded outside and on the scalp, and have usually focused on activity that is time-locked to the stimulus presentation. This type of event-related field / potential analysis is particularly useful when there are only a small number of distinct dipolar patterns that can be isolated and identified in space and time. Alternatively, by utilizing anatomical information, these distinct field patterns can be localized as current sources on the cortex. However, for a more sustained response that may not be time-locked to a specific stimulus (e.g
., in preparation for listening to one of the two simultaneously presented spoken digits based on the cued auditory feature) or may be distributed across multiple spatial locations unknown a priori
, the recruitment of a distributed cortical network may not be adequately captured by using a limited number of focal sources.
Here, we describe a procedure that employs individual anatomical MRI data to establish a relationship between the sensor information and the dipole activation on the cortex through the use of minimum-norm estimates (MNE). This inverse imaging approach provides us a tool for distributed source analysis. For illustrative purposes, we will describe all procedures using FreeSurfer and MNE software, both freely available. We will summarize the MRI sequences and analysis steps required to produce a forward model that enables us to relate the expected field pattern caused by the dipoles distributed on the cortex onto the M/EEG sensors. Next, we will step through the necessary processes that facilitate us in denoising the sensor data from environmental and physiological contaminants. We will then outline the procedure for combining and mapping MEG/EEG sensor data onto the cortical space, thereby producing a family of time-series of cortical dipole activation on the brain surface (or "brain movies") related to each experimental condition. Finally, we will highlight a few statistical techniques that enable us to make scientific inference across a subject population (i.e
., perform group-level analysis) based on a common cortical coordinate space.
Neuroscience, Issue 68, Magnetoencephalography, MEG, Electroencephalography, EEG, audition, attention, inverse imaging
Automated, Quantitative Cognitive/Behavioral Screening of Mice: For Genetics, Pharmacology, Animal Cognition and Undergraduate Instruction
Institutions: Rutgers University, Koç University, New York University, Fairfield University.
We describe a high-throughput, high-volume, fully automated, live-in 24/7 behavioral testing system for assessing the effects of genetic and pharmacological manipulations on basic mechanisms of cognition and learning in mice. A standard polypropylene mouse housing tub is connected through an acrylic tube to a standard commercial mouse test box. The test box has 3 hoppers, 2 of which are connected to pellet feeders. All are internally illuminable with an LED and monitored for head entries by infrared (IR) beams. Mice live in the environment, which eliminates handling during screening. They obtain their food during two or more daily feeding periods by performing in operant (instrumental) and Pavlovian (classical) protocols, for which we have written protocol-control software and quasi-real-time data analysis and graphing software. The data analysis and graphing routines are written in a MATLAB-based language created to simplify greatly the analysis of large time-stamped behavioral and physiological event records and to preserve a full data trail from raw data through all intermediate analyses to the published graphs and statistics within a single data structure. The data-analysis code harvests the data several times a day and subjects it to statistical and graphical analyses, which are automatically stored in the "cloud" and on in-lab computers. Thus, the progress of individual mice is visualized and quantified daily. The data-analysis code talks to the protocol-control code, permitting the automated advance from protocol to protocol of individual subjects. The behavioral protocols implemented are matching, autoshaping, timed hopper-switching, risk assessment in timed hopper-switching, impulsivity measurement, and the circadian anticipation of food availability. Open-source protocol-control and data-analysis code makes the addition of new protocols simple. Eight test environments fit in a 48 in x 24 in x 78 in cabinet; two such cabinets (16 environments) may be controlled by one computer.
Behavior, Issue 84, genetics, cognitive mechanisms, behavioral screening, learning, memory, timing
Whole-cell MALDI-TOF Mass Spectrometry is an Accurate and Rapid Method to Analyze Different Modes of Macrophage Activation
Institutions: Aix Marseille Université, Hôpital de la Timone.
MALDI-TOF is an extensively used mass spectrometry technique in chemistry and biochemistry. It has been also applied in medicine to identify molecules and biomarkers. Recently, it has been used in microbiology for the routine identification of bacteria grown from clinical samples, without preparation or fractionation steps. We and others have applied this whole-cell MALDI-TOF mass spectrometry technique successfully to eukaryotic cells. Current applications range from cell type identification to quality control assessment of cell culture and diagnostic applications. Here, we describe its use to explore the various polarization phenotypes of macrophages in response to cytokines or heat-killed bacteria. It allowed the identification of macrophage-specific fingerprints that are representative of the diversity of proteomic responses of macrophages. This application illustrates the accuracy and simplicity of the method. The protocol we described here may be useful for studying the immune host response in pathological conditions or may be extended to wider diagnostic applications.
Immunology, Issue 82, MALDI-TOF, mass spectrometry, fingerprint, Macrophages, activation, IFN-g, TNF, LPS, IL-4, bacterial pathogens
Cell Death Associated with Abnormal Mitosis Observed by Confocal Imaging in Live Cancer Cells
Institutions: Sheba Medical Center, Tel-Aviv University, Tel-Aviv University, Tel-Aviv University, Ecole Superieure de Biotechnologie Strasbourg, Tel-Aviv University.
Phenanthrene derivatives acting as potent PARP1 inhibitors prevented the bi-focal clustering of supernumerary centrosomes in multi-centrosomal human cancer cells in mitosis. The phenanthridine PJ-34 was the most potent molecule. Declustering of extra-centrosomes causes mitotic failure and cell death in multi-centrosomal cells. Most solid human cancers have high occurrence of extra-centrosomes. The activity of PJ-34 was documented in real-time by confocal imaging of live human breast cancer MDA-MB-231 cells transfected with vectors encoding for fluorescent γ-tubulin, which is highly abundant in the centrosomes and for fluorescent histone H2b present in the chromosomes. Aberrant chromosomes arrangements and de-clustered γ-tubulin foci representing declustered centrosomes were detected in the transfected MDA-MB-231 cells after treatment with PJ-34. Un-clustered extra-centrosomes in the two spindle poles preceded their cell death. These results linked for the first time the recently detected exclusive cytotoxic activity of PJ-34 in human cancer cells with extra-centrosomes de-clustering in mitosis, and mitotic failure leading to cell death. According to previous findings observed by confocal imaging of fixed cells, PJ-34 exclusively eradicated cancer cells with multi-centrosomes without impairing normal cells undergoing mitosis with two centrosomes and bi-focal spindles. This cytotoxic activity of PJ-34 was not shared by other potent PARP1 inhibitors, and was observed in PARP1 deficient MEF harboring extracentrosomes, suggesting its independency of PARP1 inhibition. Live confocal imaging offered a useful tool for identifying new molecules eradicating cells during mitosis.
Cancer Biology, Issue 78, Medicine, Cellular Biology, Molecular Biology, Biomedical Engineering, Anatomy, Physiology, Genetics, Neoplastic Processes, Pharmacologic Actions, Live confocal imaging, Extra-centrosomes clustering/de-clustering, Mitotic Catastrophe cell death, PJ-34, myocardial infarction, microscopy, imaging
A Protocol for Computer-Based Protein Structure and Function Prediction
Institutions: University of Michigan , University of Kansas.
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server.
Biochemistry, Issue 57, On-line server, I-TASSER, protein structure prediction, function prediction
Low Molecular Weight Protein Enrichment on Mesoporous Silica Thin Films for Biomarker Discovery
Institutions: The Methodist Hospital Research Institute, National Center for Nanoscience and Technology.
The identification of circulating biomarkers holds great potential for non invasive approaches in early diagnosis and prognosis, as well as for the monitoring of therapeutic efficiency.1-3
The circulating low molecular weight proteome (LMWP) composed of small proteins shed from tissues and cells or peptide fragments derived from the proteolytic degradation of larger proteins, has been associated with the pathological condition in patients and likely reflects the state of disease.4,5
Despite these potential clinical applications, the use of Mass Spectrometry (MS) to profile the LMWP from biological fluids has proven to be very challenging due to the large dynamic range of protein and peptide concentrations in serum.6
Without sample pre-treatment, some of the more highly abundant proteins obscure the detection of low-abundance species in serum/plasma. Current proteomic-based approaches, such as two-dimensional polyacrylamide gel-electrophoresis (2D-PAGE) and shotgun proteomics methods are labor-intensive, low throughput and offer limited suitability for clinical applications.7-9
Therefore, a more effective strategy is needed to isolate LMWP from blood and allow the high throughput screening of clinical samples.
Here, we present a fast, efficient and reliable multi-fractionation system based on mesoporous silica chips to specifically target and enrich LMWP.10,11
Mesoporous silica (MPS) thin films with tunable features at the nanoscale were fabricated using the triblock copolymer template pathway. Using different polymer templates and polymer concentrations in the precursor solution, various pore size distributions, pore structures, connectivity and surface properties were determined and applied for selective recovery of low mass proteins. The selective parsing of the enriched peptides into different subclasses according to their physicochemical properties will enhance the efficiency of recovery and detection of low abundance species. In combination with mass spectrometry and statistic analysis, we demonstrated the correlation between the nanophase characteristics of the mesoporous silica thin films and the specificity and efficacy of low mass proteome harvesting. The results presented herein reveal the potential of the nanotechnology-based technology to provide a powerful alternative to conventional methods for LMWP harvesting from complex biological fluids. Because of the ability to tune the material properties, the capability for low-cost production, the simplicity and rapidity of sample collection, and the greatly reduced sample requirements for analysis, this novel nanotechnology will substantially impact the field of proteomic biomarker research and clinical proteomic assessment.
Bioengineering, Issue 62, Nanoporous silica chip, Low molecular weight proteomics, Peptidomics, MALDI-TOF mass spectrometry, early diagnostics, proteomics
A Comparative Approach to Characterize the Landscape of Host-Pathogen Protein-Protein Interactions
Institutions: Institut Pasteur , Université Sorbonne Paris Cité, Dana Farber Cancer Institute.
Significant efforts were gathered to generate large-scale comprehensive protein-protein interaction network maps. This is instrumental to understand the pathogen-host relationships and was essentially performed by genetic screenings in yeast two-hybrid systems. The recent improvement of protein-protein interaction detection by a Gaussia
luciferase-based fragment complementation assay now offers the opportunity to develop integrative comparative interactomic approaches necessary to rigorously compare interaction profiles of proteins from different pathogen strain variants against a common set of cellular factors.
This paper specifically focuses on the utility of combining two orthogonal methods to generate protein-protein interaction datasets: yeast two-hybrid (Y2H) and a new assay, high-throughput Gaussia princeps
protein complementation assay (HT-GPCA) performed in mammalian cells.
A large-scale identification of cellular partners of a pathogen protein is performed by mating-based yeast two-hybrid screenings of cDNA libraries using multiple pathogen strain variants. A subset of interacting partners selected on a high-confidence statistical scoring is further validated in mammalian cells for pair-wise interactions with the whole set of pathogen variants proteins using HT-GPCA. This combination of two complementary methods improves the robustness of the interaction dataset, and allows the performance of a stringent comparative interaction analysis. Such comparative interactomics constitute a reliable and powerful strategy to decipher any pathogen-host interplays.
Immunology, Issue 77, Genetics, Microbiology, Biochemistry, Molecular Biology, Cellular Biology, Biomedical Engineering, Infection, Cancer Biology, Virology, Medicine, Host-Pathogen Interactions, Host-Pathogen Interactions, Protein-protein interaction, High-throughput screening, Luminescence, Yeast two-hybrid, HT-GPCA, Network, protein, yeast, cell, culture
Using Informational Connectivity to Measure the Synchronous Emergence of fMRI Multi-voxel Information Across Time
Institutions: University of Pennsylvania.
It is now appreciated that condition-relevant information can be present within distributed patterns of functional magnetic resonance imaging (fMRI) brain activity, even for conditions with similar levels of univariate activation. Multi-voxel pattern (MVP) analysis has been used to decode this information with great success. FMRI investigators also often seek to understand how brain regions interact in interconnected networks, and use functional connectivity (FC) to identify regions that have correlated responses over time. Just as univariate analyses can be insensitive to information in MVPs, FC may not fully characterize the brain networks that process conditions with characteristic MVP signatures. The method described here, informational connectivity (IC), can identify regions with correlated changes in MVP-discriminability across time, revealing connectivity that is not accessible to FC. The method can be exploratory, using searchlights to identify seed-connected areas, or planned, between pre-selected regions-of-interest. The results can elucidate networks of regions that process MVP-related conditions, can breakdown MVPA searchlight maps into separate networks, or can be compared across tasks and patient groups.
Neuroscience, Issue 89, fMRI, MVPA, connectivity, informational connectivity, functional connectivity, networks, multi-voxel pattern analysis, decoding, classification, method, multivariate
Trajectory Data Analyses for Pedestrian Space-time Activity Study
Institutions: Kean University, University of Wisconsin-Madison.
It is well recognized that human movement in the spatial and temporal dimensions has direct influence on disease transmission1-3
. An infectious disease typically spreads via contact between infected and susceptible individuals in their overlapped activity spaces. Therefore, daily mobility-activity information can be used as an indicator to measure exposures to risk factors of infection. However, a major difficulty and thus the reason for paucity of studies of infectious disease transmission at the micro scale arise from the lack of detailed individual mobility data. Previously in transportation and tourism research detailed space-time activity data often relied on the time-space diary technique, which requires subjects to actively record their activities in time and space. This is highly demanding for the participants and collaboration from the participants greatly affects the quality of data4
Modern technologies such as GPS and mobile communications have made possible the automatic collection of trajectory data. The data collected, however, is not ideal for modeling human space-time activities, limited by the accuracies of existing devices. There is also no readily available tool for efficient processing of the data for human behavior study. We present here a suite of methods and an integrated ArcGIS desktop-based visual interface for the pre-processing and spatiotemporal analyses of trajectory data. We provide examples of how such processing may be used to model human space-time activities, especially with error-rich pedestrian trajectory data, that could be useful in public health studies such as infectious disease transmission modeling.
The procedure presented includes pre-processing, trajectory segmentation, activity space characterization, density estimation and visualization, and a few other exploratory analysis methods. Pre-processing is the cleaning of noisy raw trajectory data. We introduce an interactive visual pre-processing interface as well as an automatic module. Trajectory segmentation5
involves the identification of indoor and outdoor parts from pre-processed space-time tracks. Again, both interactive visual segmentation and automatic segmentation are supported. Segmented space-time tracks are then analyzed to derive characteristics of one's activity space such as activity radius etc.
Density estimation and visualization are used to examine large amount of trajectory data to model hot spots and interactions. We demonstrate both density surface mapping6
and density volume rendering7
. We also include a couple of other exploratory data analyses (EDA) and visualizations tools, such as Google Earth animation support and connection analysis. The suite of analytical as well as visual methods presented in this paper may be applied to any trajectory data for space-time activity studies.
Environmental Sciences, Issue 72, Computer Science, Behavior, Infectious Diseases, Geography, Cartography, Data Display, Disease Outbreaks, cartography, human behavior, Trajectory data, space-time activity, GPS, GIS, ArcGIS, spatiotemporal analysis, visualization, segmentation, density surface, density volume, exploratory data analysis, modelling
A Cell-to-cell Macromolecular Transport Assay in Planta Utilizing Biolistic Bombardment
Institutions: State University of New York at Stony Brook, NED University of Engineering and Technology.
Here, we present a simple and rapid protocol to detect and assess the extent of cell-to-cell macromolecular transport in planta
. In this protocol, a fluorescently tagged-protein of interest is transiently expressed in plant tissue following biolistic delivery of its encoding DNA construct. The intra- and intercellular distribution of the tagged protein is then analyzed by confocal microscopy. We describe this technology in detail, providing step-by-step protocols to assay and evaluate the extent of symplastic protein transport in three plant species, Arabidopsis thaliana
, Nicotiana benthamiana
and N. tabacum
Cellular Biology, Issue 42, Symplastic transport, transient expression, microbombardment, fluorescent protein, plant, confocal microscopy
In Situ SIMS and IR Spectroscopy of Well-defined Surfaces Prepared by Soft Landing of Mass-selected Ions
Institutions: Pacific Northwest National Laboratory.
Soft landing of mass-selected ions onto surfaces is a powerful approach for the highly-controlled preparation of materials that are inaccessible using conventional synthesis techniques. Coupling soft landing with in situ
characterization using secondary ion mass spectrometry (SIMS) and infrared reflection absorption spectroscopy (IRRAS) enables analysis of well-defined surfaces under clean vacuum conditions. The capabilities of three soft-landing instruments constructed in our laboratory are illustrated for the representative system of surface-bound organometallics prepared by soft landing of mass-selected ruthenium tris(bipyridine) dications, [Ru(bpy)3
(bpy = bipyridine), onto carboxylic acid terminated self-assembled monolayer surfaces on gold (COOH-SAMs). In situ
time-of-flight (TOF)-SIMS provides insight into the reactivity of the soft-landed ions. In addition, the kinetics of charge reduction, neutralization and desorption occurring on the COOH-SAM both during and after ion soft landing are studied using in situ
Fourier transform ion cyclotron resonance (FT-ICR)-SIMS measurements. In situ
IRRAS experiments provide insight into how the structure of organic ligands surrounding metal centers is perturbed through immobilization of organometallic ions on COOH-SAM surfaces by soft landing. Collectively, the three instruments provide complementary information about the chemical composition, reactivity and structure of well-defined species supported on surfaces.
Chemistry, Issue 88, soft landing, mass selected ions, electrospray, secondary ion mass spectrometry, infrared spectroscopy, organometallic, catalysis
RNA-seq Analysis of Transcriptomes in Thrombin-treated and Control Human Pulmonary Microvascular Endothelial Cells
Institutions: Children's Mercy Hospital and Clinics, School of Medicine, University of Missouri-Kansas City.
The characterization of gene expression in cells via measurement of mRNA levels is a useful tool in determining how the transcriptional machinery of the cell is affected by external signals (e.g.
drug treatment), or how cells differ between a healthy state and a diseased state. With the advent and continuous refinement of next-generation DNA sequencing technology, RNA-sequencing (RNA-seq) has become an increasingly popular method of transcriptome analysis to catalog all species of transcripts, to determine the transcriptional structure of all expressed genes and to quantify the changing expression levels of the total set of transcripts in a given cell, tissue or organism1,2
. RNA-seq is gradually replacing DNA microarrays as a preferred method for transcriptome analysis because it has the advantages of profiling a complete transcriptome, providing a digital type datum (copy number of any transcript) and not relying on any known genomic sequence3
Here, we present a complete and detailed protocol to apply RNA-seq to profile transcriptomes in human pulmonary microvascular endothelial cells with or without thrombin treatment. This protocol is based on our recent published study entitled "RNA-seq Reveals Novel Transcriptome of Genes and Their Isoforms in Human Pulmonary Microvascular Endothelial Cells Treated with Thrombin,"4
in which we successfully performed the first complete transcriptome analysis of human pulmonary microvascular endothelial cells treated with thrombin using RNA-seq. It yielded unprecedented resources for further experimentation to gain insights into molecular mechanisms underlying thrombin-mediated endothelial dysfunction in the pathogenesis of inflammatory conditions, cancer, diabetes, and coronary heart disease, and provides potential new leads for therapeutic targets to those diseases.
The descriptive text of this protocol is divided into four parts. The first part describes the treatment of human pulmonary microvascular endothelial cells with thrombin and RNA isolation, quality analysis and quantification. The second part describes library construction and sequencing. The third part describes the data analysis. The fourth part describes an RT-PCR validation assay. Representative results of several key steps are displayed. Useful tips or precautions to boost success in key steps are provided in the Discussion section. Although this protocol uses human pulmonary microvascular endothelial cells treated with thrombin, it can be generalized to profile transcriptomes in both mammalian and non-mammalian cells and in tissues treated with different stimuli or inhibitors, or to compare transcriptomes in cells or tissues between a healthy state and a disease state.
Genetics, Issue 72, Molecular Biology, Immunology, Medicine, Genomics, Proteins, RNA-seq, Next Generation DNA Sequencing, Transcriptome, Transcription, Thrombin, Endothelial cells, high-throughput, DNA, genomic DNA, RT-PCR, PCR
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
Institutions: Princeton University.
The aim of de novo
protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo
protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity.
To disseminate these methods for broader use we present Protein WISDOM (http://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.
Genetics, Issue 77, Molecular Biology, Bioengineering, Biochemistry, Biomedical Engineering, Chemical Engineering, Computational Biology, Genomics, Proteomics, Protein, Protein Binding, Computational Biology, Drug Design, optimization (mathematics), Amino Acids, Peptides, and Proteins, De novo protein and peptide design, Drug design, In silico sequence selection, Optimization, Fold specificity, Binding affinity, sequencing
Mapping Bacterial Functional Networks and Pathways in Escherichia Coli using Synthetic Genetic Arrays
Institutions: University of Toronto, University of Toronto, University of Regina.
Phenotypes are determined by a complex series of physical (e.g.
protein-protein) and functional (e.g.
gene-gene or genetic) interactions (GI)1
. While physical interactions can indicate which bacterial proteins are associated as complexes, they do not necessarily reveal pathway-level functional relationships1. GI screens, in which the growth of double mutants bearing two deleted or inactivated genes is measured and compared to the corresponding single mutants, can illuminate epistatic dependencies between loci and hence provide a means to query and discover novel functional relationships2
. Large-scale GI maps have been reported for eukaryotic organisms like yeast3-7
, but GI information remains sparse for prokaryotes8
, which hinders the functional annotation of bacterial genomes. To this end, we and others have developed high-throughput quantitative bacterial GI screening methods9, 10
Here, we present the key steps required to perform quantitative E. coli
Synthetic Genetic Array (eSGA) screening procedure on a genome-scale9
, using natural bacterial conjugation and homologous recombination to systemically generate and measure the fitness of large numbers of double mutants in a colony array format.
Briefly, a robot is used to transfer, through conjugation, chloramphenicol (Cm) - marked mutant alleles from engineered Hfr (High frequency of recombination) 'donor strains' into an ordered array of kanamycin (Kan) - marked F- recipient strains. Typically, we use loss-of-function single mutants bearing non-essential gene deletions (e.g.
the 'Keio' collection11
) and essential gene hypomorphic mutations (i.e.
alleles conferring reduced protein expression, stability, or activity9, 12, 13
) to query the functional associations of non-essential and essential genes, respectively. After conjugation and ensuing genetic exchange mediated by homologous recombination, the resulting double mutants are selected on solid medium containing both antibiotics. After outgrowth, the plates are digitally imaged and colony sizes are quantitatively scored using an in-house automated image processing system14
. GIs are revealed when the growth rate of a double mutant is either significantly better or worse than expected9
. Aggravating (or negative) GIs often result between loss-of-function mutations in pairs of genes from compensatory pathways that impinge on the same essential process2
. Here, the loss of a single gene is buffered, such that either single mutant is viable. However, the loss of both pathways is deleterious and results in synthetic lethality or sickness (i.e.
slow growth). Conversely, alleviating (or positive) interactions can occur between genes in the same pathway or protein complex2
as the deletion of either gene alone is often sufficient to perturb the normal function of the pathway or complex such that additional perturbations do not reduce activity, and hence growth, further. Overall, systematically identifying and analyzing GI networks can provide unbiased, global maps of the functional relationships between large numbers of genes, from which pathway-level information missed by other approaches can be inferred9
Genetics, Issue 69, Molecular Biology, Medicine, Biochemistry, Microbiology, Aggravating, alleviating, conjugation, double mutant, Escherichia coli, genetic interaction, Gram-negative bacteria, homologous recombination, network, synthetic lethality or sickness, suppression
Characterization of Complex Systems Using the Design of Experiments Approach: Transient Protein Expression in Tobacco as a Case Study
Institutions: RWTH Aachen University, Fraunhofer Gesellschaft.
Plants provide multiple benefits for the production of biopharmaceuticals including low costs, scalability, and safety. Transient expression offers the additional advantage of short development and production times, but expression levels can vary significantly between batches thus giving rise to regulatory concerns in the context of good manufacturing practice. We used a design of experiments (DoE) approach to determine the impact of major factors such as regulatory elements in the expression construct, plant growth and development parameters, and the incubation conditions during expression, on the variability of expression between batches. We tested plants expressing a model anti-HIV monoclonal antibody (2G12) and a fluorescent marker protein (DsRed). We discuss the rationale for selecting certain properties of the model and identify its potential limitations. The general approach can easily be transferred to other problems because the principles of the model are broadly applicable: knowledge-based parameter selection, complexity reduction by splitting the initial problem into smaller modules, software-guided setup of optimal experiment combinations and step-wise design augmentation. Therefore, the methodology is not only useful for characterizing protein expression in plants but also for the investigation of other complex systems lacking a mechanistic description. The predictive equations describing the interconnectivity between parameters can be used to establish mechanistic models for other complex systems.
Bioengineering, Issue 83, design of experiments (DoE), transient protein expression, plant-derived biopharmaceuticals, promoter, 5'UTR, fluorescent reporter protein, model building, incubation conditions, monoclonal antibody
Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
Institutions: The Feinstein Institute for Medical Research.
The scaled subprofile model (SSM)1-4
is a multivariate PCA-based algorithm that identifies major sources of variation in patient and control group brain image data while rejecting lesser components (Figure 1
). Applied directly to voxel-by-voxel covariance data of steady-state multimodality images, an entire group image set can be reduced to a few significant linearly independent covariance patterns and corresponding subject scores. Each pattern, termed a group invariant subprofile (GIS), is an orthogonal principal component that represents a spatially distributed network of functionally interrelated brain regions. Large global mean scalar effects that can obscure smaller network-specific contributions are removed by the inherent logarithmic conversion and mean centering of the data2,5,6
. Subjects express each of these patterns to a variable degree represented by a simple scalar score that can correlate with independent clinical or psychometric descriptors7,8
. Using logistic regression analysis of subject scores (i.e.
pattern expression values), linear coefficients can be derived to combine multiple principal components into single disease-related spatial covariance patterns, i.e.
composite networks with improved discrimination of patients from healthy control subjects5,6
. Cross-validation within the derivation set can be performed using bootstrap resampling techniques9
. Forward validation is easily confirmed by direct score evaluation of the derived patterns in prospective datasets10
. Once validated, disease-related patterns can be used to score individual patients with respect to a fixed reference sample, often the set of healthy subjects that was used (with the disease group) in the original pattern derivation11
. These standardized values can in turn be used to assist in differential diagnosis12,13
and to assess disease progression and treatment effects at the network level7,14-16
. We present an example of the application of this methodology to FDG PET data of Parkinson's Disease patients and normal controls using our in-house software to derive a characteristic covariance pattern biomarker of disease.
Medicine, Issue 76, Neurobiology, Neuroscience, Anatomy, Physiology, Molecular Biology, Basal Ganglia Diseases, Parkinsonian Disorders, Parkinson Disease, Movement Disorders, Neurodegenerative Diseases, PCA, SSM, PET, imaging biomarkers, functional brain imaging, multivariate spatial covariance analysis, global normalization, differential diagnosis, PD, brain, imaging, clinical techniques
Diffusion Tensor Magnetic Resonance Imaging in the Analysis of Neurodegenerative Diseases
Institutions: University of Ulm.
Diffusion tensor imaging (DTI) techniques provide information on the microstructural processes of the cerebral white matter (WM) in vivo
. The present applications are designed to investigate differences of WM involvement patterns in different brain diseases, especially neurodegenerative disorders, by use of different DTI analyses in comparison with matched controls.
DTI data analysis is performed in a variate fashion, i.e.
voxelwise comparison of regional diffusion direction-based metrics such as fractional anisotropy (FA), together with fiber tracking (FT) accompanied by tractwise fractional anisotropy statistics (TFAS) at the group level in order to identify differences in FA along WM structures, aiming at the definition of regional patterns of WM alterations at the group level. Transformation into a stereotaxic standard space is a prerequisite for group studies and requires thorough data processing to preserve directional inter-dependencies. The present applications show optimized technical approaches for this preservation of quantitative and directional information during spatial normalization in data analyses at the group level. On this basis, FT techniques can be applied to group averaged data in order to quantify metrics information as defined by FT. Additionally, application of DTI methods, i.e.
differences in FA-maps after stereotaxic alignment, in a longitudinal analysis at an individual subject basis reveal information about the progression of neurological disorders. Further quality improvement of DTI based results can be obtained during preprocessing by application of a controlled elimination of gradient directions with high noise levels.
In summary, DTI is used to define a distinct WM pathoanatomy of different brain diseases by the combination of whole brain-based and tract-based DTI analysis.
Medicine, Issue 77, Neuroscience, Neurobiology, Molecular Biology, Biomedical Engineering, Anatomy, Physiology, Neurodegenerative Diseases, nuclear magnetic resonance, NMR, MR, MRI, diffusion tensor imaging, fiber tracking, group level comparison, neurodegenerative diseases, brain, imaging, clinical techniques
Analysis of Tubular Membrane Networks in Cardiac Myocytes from Atria and Ventricles
Institutions: Heart Research Center Goettingen, University Medical Center Goettingen, German Center for Cardiovascular Research (DZHK) partner site Goettingen, University of Maryland School of Medicine.
In cardiac myocytes a complex network of membrane tubules - the transverse-axial tubule system (TATS) - controls deep intracellular signaling functions. While the outer surface membrane and associated TATS membrane components appear to be continuous, there are substantial differences in lipid and protein content. In ventricular myocytes (VMs), certain TATS components are highly abundant contributing to rectilinear tubule networks and regular branching 3D architectures. It is thought that peripheral TATS components propagate action potentials from the cell surface to thousands of remote intracellular sarcoendoplasmic reticulum (SER) membrane contact domains, thereby activating intracellular Ca2+
release units (CRUs). In contrast to VMs, the organization and functional role of TATS membranes in atrial myocytes (AMs) is significantly different and much less understood. Taken together, quantitative structural characterization of TATS membrane networks in healthy and diseased myocytes is an essential prerequisite towards better understanding of functional plasticity and pathophysiological reorganization. Here, we present a strategic combination of protocols for direct quantitative analysis of TATS membrane networks in living VMs and AMs. For this, we accompany primary cell isolations of mouse VMs and/or AMs with critical quality control steps and direct membrane staining protocols for fluorescence imaging of TATS membranes. Using an optimized workflow for confocal or superresolution TATS image processing, binarized and skeletonized data are generated for quantitative analysis of the TATS network and its components. Unlike previously published indirect regional aggregate image analysis strategies, our protocols enable direct characterization of specific components and derive complex physiological properties of TATS membrane networks in living myocytes with high throughput and open access software tools. In summary, the combined protocol strategy can be readily applied for quantitative TATS network studies during physiological myocyte adaptation or disease changes, comparison of different cardiac or skeletal muscle cell types, phenotyping of transgenic models, and pharmacological or therapeutic interventions.
Bioengineering, Issue 92, cardiac myocyte, atria, ventricle, heart, primary cell isolation, fluorescence microscopy, membrane tubule, transverse-axial tubule system, image analysis, image processing, T-tubule, collagenase
Simultaneous Multicolor Imaging of Biological Structures with Fluorescence Photoactivation Localization Microscopy
Institutions: University of Maine.
Localization-based super resolution microscopy can be applied to obtain a spatial map (image) of the distribution of individual fluorescently labeled single molecules within a sample with a spatial resolution of tens of nanometers. Using either photoactivatable (PAFP) or photoswitchable (PSFP) fluorescent proteins fused to proteins of interest, or organic dyes conjugated to antibodies or other molecules of interest, fluorescence photoactivation localization microscopy (FPALM) can simultaneously image multiple species of molecules within single cells. By using the following approach, populations of large numbers (thousands to hundreds of thousands) of individual molecules are imaged in single cells and localized with a precision of ~10-30 nm. Data obtained can be applied to understanding the nanoscale spatial distributions of multiple protein types within a cell. One primary advantage of this technique is the dramatic increase in spatial resolution: while diffraction limits resolution to ~200-250 nm in conventional light microscopy, FPALM can image length scales more than an order of magnitude smaller. As many biological hypotheses concern the spatial relationships among different biomolecules, the improved resolution of FPALM can provide insight into questions of cellular organization which have previously been inaccessible to conventional fluorescence microscopy. In addition to detailing the methods for sample preparation and data acquisition, we here describe the optical setup for FPALM. One additional consideration for researchers wishing to do super-resolution microscopy is cost: in-house setups are significantly cheaper than most commercially available imaging machines. Limitations of this technique include the need for optimizing the labeling of molecules of interest within cell samples, and the need for post-processing software to visualize results. We here describe the use of PAFP and PSFP expression to image two protein species in fixed cells. Extension of the technique to living cells is also described.
Basic Protocol, Issue 82, Microscopy, Super-resolution imaging, Multicolor, single molecule, FPALM, Localization microscopy, fluorescent proteins
Monitoring Acupuncture Effects on Human Brain by fMRI
Institutions: Massachusetts General Hospital and Harvard Medical School, William Beaumont Hospital.
Functional MRI is used to study the effects of acupuncture on the BOLD response and the functional connectivity of the human brain. Results demonstrate that acupuncture mobilizes a limbic-paralimbic-neocortical network and its anti-correlated sensorimotor/paralimbic network at multiple levels of the brain and that the hemodynamic response is influenced by the psychophysical response. Physiological monitoring may be performed to explore the peripheral response of the autonomic nerve function. This video describes the studies performed at LI4 (hegu), ST36 (zusanli) and LV3 (taichong), classical acupoints that are commonly used for modulatory and pain-reducing actions. Some issues that require attention in the applications of fMRI to acupuncture investigation are noted.
Neuroscience, Issue 38, acupuncture, BOLD fMRI, limbic-paralimbic-neocortical system, psychophysical response, physiological monitoring
Using an Automated Cell Counter to Simplify Gene Expression Studies: siRNA Knockdown of IL-4 Dependent Gene Expression in Namalwa Cells
Institutions: Bio-Rad Laboratories.
The use of siRNA mediated gene knockdown is continuing to be an important tool in studies of gene expression. siRNA studies are being conducted not only to study the effects of downregulating single genes, but also to interrogate signaling pathways and other complex interaction networks. These pathway analyses require both the use of relevant cellular models and methods that cause less perturbation to the cellular physiology. Electroporation is increasingly being used as an effective way to introduce siRNA and other nucleic acids into difficult to transfect cell lines and primary cells without altering the signaling pathway under investigation. There are multiple critical steps to a successful siRNA experiment, and there are ways to simplify the work while improving the data quality at several experimental stages. To help you get started with your siRNA mediated gene knockdown project, we will demonstrate how to perform a pathway study complete from collecting and counting the cells prior to electroporation through post transfection real-time PCR gene expression analysis. The following study investigates the role of the transcriptional activator STAT6 in IL-4 dependent gene expression of CCL17 in a Burkitt lymphoma cell line (Namalwa). The techniques demonstrated are useful for a wide range of siRNA-based experiments on both adherent and suspension cells. We will also show how to streamline cell counting with the TC10 automated cell counter, how to electroporate multiple samples simultaneously using the MXcell electroporation system, and how to simultaneously assess RNA quality and quantity with the Experion automated electrophoresis system.
Cellular Biology, Issue 38, Cell Counting, Gene Silencing, siRNA, Namalwa Cells, IL4, Gene Expression, Electroporation, Real Time PCR
Basics of Multivariate Analysis in Neuroimaging Data
Institutions: Columbia University.
Multivariate analysis techniques for neuroimaging data have recently received increasing attention as they have many attractive features that cannot be easily realized by the more commonly used univariate, voxel-wise, techniques1,5,6,7,8,9
. Multivariate approaches evaluate correlation/covariance of activation across brain regions, rather than proceeding on a voxel-by-voxel basis. Thus, their results can be more easily interpreted as a signature of neural networks. Univariate approaches, on the other hand, cannot directly address interregional correlation in the brain. Multivariate approaches can also result in greater statistical power when compared with univariate techniques, which are forced to employ very stringent corrections for voxel-wise multiple comparisons. Further, multivariate techniques also lend themselves much better to prospective application of results from the analysis of one dataset to entirely new datasets. Multivariate techniques are thus well placed to provide information about mean differences and correlations with behavior, similarly to univariate approaches, with potentially greater statistical power and better reproducibility checks. In contrast to these advantages is the high barrier of entry to the use of multivariate approaches, preventing more widespread application in the community. To the neuroscientist becoming familiar with multivariate analysis techniques, an initial survey of the field might present a bewildering variety of approaches that, although algorithmically similar, are presented with different emphases, typically by people with mathematics backgrounds. We believe that multivariate analysis techniques have sufficient potential to warrant better dissemination. Researchers should be able to employ them in an informed and accessible manner. The current article is an attempt at a didactic introduction of multivariate techniques for the novice. A conceptual introduction is followed with a very simple application to a diagnostic data set from the Alzheimer s Disease Neuroimaging Initiative (ADNI), clearly demonstrating the superior performance of the multivariate approach.
JoVE Neuroscience, Issue 41, fMRI, PET, multivariate analysis, cognitive neuroscience, clinical neuroscience
Automated Quantification of Synaptic Fluorescence in C. elegans
Institutions: University of Toledo .
Synapse strength refers to the amplitude of postsynaptic responses to presynaptic neurotransmitter release events, and has a major impact on overall neural circuit function. Synapse strength critically depends on the abundance of neurotransmitter receptors clustered at synaptic sites on the postsynaptic membrane. Receptor levels are established developmentally, and can be altered by receptor trafficking between surface-localized, subsynaptic, and intracellular pools, representing important mechanisms of synaptic plasticity and neuromodulation. Rigorous methods to quantify synaptically-localized neurotransmitter receptor abundance are essential to study synaptic development and plasticity. Fluorescence microscopy is an optimal approach because it preserves spatial information, distinguishing synaptic from non-synaptic pools, and discriminating among receptor populations localized to different types of synapses. The genetic model organism Caenorhabditis elegans
is particularly well suited for these studies due to the small size and relative simplicity of its nervous system, its transparency, and the availability of powerful genetic techniques, allowing examination of native synapses in intact animals.
Here we present a method for quantifying fluorescently-labeled synaptic neurotransmitter receptors in C. elegans
. Its key feature is the automated identification and analysis of individual synapses in three dimensions in multi-plane confocal microscope output files, tabulating position, volume, fluorescence intensity, and total fluorescence for each synapse. This approach has two principal advantages over manual analysis of z-plane projections of confocal data. First, because every plane of the confocal data set is included, no data are lost through z-plane projection, typically based on pixel intensity averages or maxima. Second, identification of synapses is automated, but can be inspected by the experimenter as the data analysis proceeds, allowing fast and accurate extraction of data from large numbers of synapses. Hundreds to thousands of synapses per sample can easily be obtained, producing large data sets to maximize statistical power. Considerations for preparing C. elegans
for analysis, and performing confocal imaging to minimize variability between animals within treatment groups are also discussed. Although developed to analyze C. elegans
postsynaptic receptors, this method is generally useful for any type of synaptically-localized protein, or indeed, any fluorescence signal that is localized to discrete clusters, puncta, or organelles.
The procedure is performed in three steps: 1) preparation of samples, 2) confocal imaging, and 3) image analysis. Steps 1 and 2 are specific to C. elegans
, while step 3 is generally applicable to any punctate fluorescence signal in confocal micrographs.
Neuroscience, Issue 66, Developmental Biology, Neurotransmitter receptors, quantification, confocal microscopy, immunostaining, fluorescence, Volocity, UNC-49 GABA receptors, C. elegans
Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes
Institutions: Dartmouth College.
SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference1
. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data1
. In this article, we utilize a web version of SCOPE2
to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs3,4
and has been used in other studies5-8
The three algorithms that comprise SCOPE are BEAM9
, which finds non-degenerate motifs (ACCGGT), PRISM10
, which finds degenerate motifs (ASCGWT), and SPACER11
, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well.
Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor.
Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run.
Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from a file. The output from SCOPE contains a list of all identified motifs with their scores, number of occurrences, fraction of genes containing the motif, and the algorithm used to identify the motif. For each motif, result details include a consensus representation of the motif, a sequence logo, a position weight matrix, and a list of instances for every motif occurrence (with exact positions and "strand" indicated). Results are returned in a browser window and also optionally by email. Previous papers describe the SCOPE algorithms in detail1,2,9-11
Genetics, Issue 51, gene regulation, computational biology, algorithm, promoter sequence motif