There is a critical need for standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments. Here we assess technical performance with a proposed standard 'dashboard' of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagnostic performance of differentially expressed transcript lists, limit of detection of ratio (LODR) estimates and expression ratio variability and measurement bias. The performance metrics suite is applicable to analysis of a typical experiment, and here we also apply these metrics to evaluate technical performance among laboratories. An interlaboratory study using identical samples shared among 12 laboratories with three different measurement processes demonstrates generally consistent diagnostic power across 11 laboratories. Ratio measurement variability and bias are also comparable among laboratories for the same measurement process. We observe different biases for measurement processes using different mRNA-enrichment protocols.
Differences in gene expression of human bone marrow stromal cells (hBMSCs) during culture in three-dimensional (3D) nanofiber scaffolds or on two-dimensional (2D) films were investigated via pathway analysis of microarray mRNA expression profiles. Previous work has shown that hBMSC culture in nanofiber scaffolds can induce osteogenic differentiation in the absence of osteogenic supplements (OS). Analysis using ontology databases revealed that nanofibers and OS regulated similar pathways and that both were enriched for TGF-? and cell-adhesion/ECM-receptor pathways. The most notable difference between the two was that nanofibers had stronger enrichment for cell-adhesion/ECM-receptor pathways. Comparison of nanofibers scaffolds with flat films yielded stronger differences in gene expression than comparison of nanofibers made from different polymers, suggesting that substrate structure had stronger effects on cell function than substrate polymer composition. These results demonstrate that physical (nanofibers) and biochemical (OS) signals regulate similar ontological pathways, suggesting that these cues use similar molecular mechanisms to control hBMSC differentiation.
Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method.
Stem cell response to a library of scaffolds with varied 3D structures was investigated. Microarray screening revealed that each type of scaffold structure induced a unique gene expression signature in primary human bone marrow stromal cells (hBMSCs). Hierarchical cluster analysis showed that treatments sorted by scaffold structure and not by polymer chemistry suggesting that scaffold structure was more influential than scaffold composition. Further, the effects of scaffold structure on hBMSC function were mediated by cell shape. Of all the scaffolds tested, only scaffolds with a nanofibrous morphology were able to drive the hBMSCs down an osteogenic lineage in the absence of osteogenic supplements. Nanofiber scaffolds forced the hBMSCs to assume an elongated, highly branched morphology. This same morphology was seen in osteogenic controls where hBMSCs were cultured on flat polymer films in the presence of osteogenic supplements (OS). In contrast, hBMSCs cultured on flat polymer films in the absence of OS assumed a more rounded and less-branched morphology. These results indicate that cells are more sensitive to scaffold structure than previously appreciated and suggest that scaffold efficacy can be optimized by tailoring the scaffold structure to force cells into morphologies that direct them to differentiate down the desired lineage.
High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2(20) concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.
While minimum information about a microarray experiment (MIAME) standards have helped to increase the value of the microarray data deposited into public databases like ArrayExpress and Gene Expression Omnibus (GEO), limited means have been available to assess the quality of this data or to identify the procedures used to normalize and transform raw data. The EMERALD FP6 Coordination Action was designed to deliver approaches to assess and enhance the overall quality of microarray data and to disseminate these approaches to the microarray community through an extensive series of workshops, tutorials, and symposia. Tools were developed for assessing data quality and used to demonstrate how the removal of poor-quality data could improve the power of statistical analyses and facilitate analysis of multiple joint microarray data sets. These quality metrics tools have been disseminated through publications and through the software package arrayQualityMetrics. Within the framework provided by the Ontology of Biomedical Investigations, ontology was developed to describe data transformations, and software ontology was developed for gene expression analysis software. In addition, the consortium has advocated for the development and use of external reference standards in microarray hybridizations and created the Molecular Methods (MolMeth) database, which provides a central source for methods and protocols focusing on microarray-based technologies.
The maturing of gene expression microarray technology and interest in the use of microarray-based applications for clinical and diagnostic applications calls for quantitative measures of quality. This manuscript presents a retrospective study characterizing several approaches to assess technical performance of microarray data measured on the Affymetrix GeneChip platform, including whole-array metrics and information from a standard mixture of external spike-in and endogenous internal controls. Spike-in controls were found to carry the same information about technical performance as whole-array metrics and endogenous "housekeeping" genes. These results support the use of spike-in controls as general tools for performance assessment across time, experimenters and array batches, suggesting that they have potential for comparison of microarray data generated across species using different technologies.
We describe a control system to automatically distribute antibody-functionalized beads to addressable assay chambers within a PDMS microfluidic device. The system used real-time image acquisition and processing to manage the valve states required to sort beads with unit precision. The image processing component of the control system correctly counted the number of beads in 99.81% of images (2689 of 2694), with only four instances of an incorrect number of beads being sorted to an assay chamber, and one instance of inaccurately counted beads being improperly delivered to waste. Post-experimental refinement of the counting script resulted in one counting error in 2694 images of beads (99.96% accuracy). We analyzed a range of operational variables (flow pressure, bead concentration, etc.) using a statistical model to characterize those that yielded optimal sorting speed and efficiency. The integrated device was able to capture, count, and deliver beads at a rate of approximately four per minute so that bead arrays could be assembled in 32 individually addressable assay chambers for eight analytical measurements in duplicate (512 beads total) within 2.5 hours. This functionality demonstrates the successful integration of a robust control system with precision bead handling that is the enabling technology for future development of a highly multiplexed bead-based analytical device.
The ability to demonstrate the reproducibility of gene expression microarray results is a critical consideration for the use of microarray technology in clinical applications. While studies have asserted that microarray data can be "highly reproducible" under given conditions, there is little ability to quantitatively compare amongst the various metrics and terminology used to characterize and express measurement performance. Use of standardized conceptual tools can greatly facilitate communication among the user, developer, and regulator stakeholders of the microarray community. While shaped by less highly multiplexed systems, measurement science (metrology) is devoted to establishing a coherent and internationally recognized vocabulary and quantitative practice for the characterization of measurement processes.
Using spike-in controls designed to mimic mammalian mRNA species, we used the quantitative reverse transcription polymerase chain reaction (RT-qPCR) to assess the performance of in vitro transcription (IVT) amplification process of small samples. We focused especially on the confidence of the transcript level measurement, which is essential for differential gene expression analyses. IVT reproduced gene expression profiles down to approximately 100 absolute input copies. However, a RT-qPCR analysis of the antisense RNA showed a systematic bias against low copy number transcripts, regardless of sequence. Experiments also showed that noise increases with decreasing copy number. First-round IVT preserved the gene expression information within a sample down to the 100 copy level, regardless of total input sample amount. However, the amplification was nonlinear under low total RNA input/long IVT conditions. Variability of the amplification increased predictably with decreasing input copy number. For the small enrichments of interest in typical differential gene expression studies (e.g., twofold changes), the bias from IVT reactions is unlikely to affect the results. In limited cases, some transcript-specific differential gene expression values will need adjustment to reflect this bias. Proper experimental design with reasonable detection limits will yield differential gene expression capability even between low copy number transcripts.
While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being "recalibrated" (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration.
Gene dosage change is a mild perturbation that is a valuable tool for pathway reconstruction in Drosophila. While it is often assumed that reducing gene dose by half leads to two-fold less expression, there is partial autosomal dosage compensation in Drosophila, which may be mediated by feedback or buffering in expression networks.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.