In JoVE (2)

Other Publications (110)

Articles by Simon Lin in JoVE

Other articles by Simon Lin on PubMed

Energy Landscape Paving for X-ray Structure Determination of Organic Molecules

Acta Crystallographica. Section A, Foundations of Crystallography. May, 2002  |  Pubmed ID: 11961287

The efficiency of a recently proposed novel global optimization method, energy landscape paving (ELP), is evaluated with regard to the problem of crystal structure determination from simulated X-ray diffraction data comprising integrated diffraction intensities. The new approach has been tested using the example of 9-(methylamino)-1H-phenalen-1-one 1,4-dioxan-2-y1 hydroperoxide solvate (C14H11NO.C4H8O4). The results indicate that, for this example, ELP outperforms standard techniques such as simulated annealing.

Evolution and Structure Formation of the Distribution of Partition Function Zeros: Triangular Type Ising Lattices with Cell Decoration

Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics. Jun, 2002  |  Pubmed ID: 12188800

The distribution of partition function zeros of the two-dimensional Ising model in the complex temperature plane is studied within the context of triangular decorated lattices and their triangle-star transformations. Exact recursion relations for the zeros are deduced for the description of the evolution of the distribution of the zeros subject to the change of decoration level. In the limit of infinite decoration level, the decorated lattices essentially possess the Sierpiński gasket or its triangle-star transformation as the inherent structure. The positions of the zeros for the infinite decorated lattices are shown to coincide with the ones for the Sierpiński gasket or its triangle-star transformation, and the distributions of zeros all appear to be a union of infinite scattered points and a Jordan curve, which is the limit of the scattered points.

Applications of Tree-Maps to Hierarchical Biological Data

Bioinformatics (Oxford, England). Sep, 2002  |  Pubmed ID: 12217926

A brief overview of Tree-Maps provides the basis for understanding two new implementations of Tree-Map methods. TreeMapClusterView provides a new way to view microarray gene expression data, and GenePlacer provides a view of gene ontology annotation data. We also discuss the benefits of Tree-Maps to visualize complex hierarchies in functional genomics.

Microarray Analysis: a Comparative Approach

Molecular Cancer Therapeutics. Jan, 2002  |  Pubmed ID: 12467218

A Complexity Reduction Algorithm for Analysis and Annotation of Large Genomic Sequences

Genome Research. Feb, 2003  |  Pubmed ID: 12566410

DNA is a universal language encrypted with biological instruction for life. In higher organisms, the genetic information is preserved predominantly in an organized exon/intron structure. When a gene is expressed, the exons are spliced together to form the transcript for protein synthesis. We have developed a complexity reduction algorithm for sequence analysis (CRASA) that enables direct alignment of cDNA sequences to the genome. This method features a progressive data structure in hierarchical orders to facilitate a fast and efficient search mechanism. CRASA implementation was tested with already annotated genomic sequences in two benchmark data sets and compared with 15 annotation programs (10 ab initio and 5 homology-based approaches) against the EST database. By the use of layered noise filters, the complexity of CRASA-matched data was reduced exponentially. The results from the benchmark tests showed that CRASA annotation excelled in both the sensitivity and specificity categories. When CRASA was applied to the analysis of human Chromosomes 21 and 22, an additional 83 potential genes were identified. With its large-scale processing capability, CRASA can be used as a robust tool for genome annotation with high accuracy by matching the EST sequences precisely to the genomic sequences.

QA/QC As a Pressing Need for Microarray Analysis: Meeting Report from CAMDA'02

BioTechniques. Mar, 2003  |  Pubmed ID: 12664687

Transcriptional Profiling of the Sonic Hedgehog Response: a Critical Role for N-myc in Proliferation of Neuronal Precursors

Proceedings of the National Academy of Sciences of the United States of America. Jun, 2003  |  Pubmed ID: 12777630

Cerebellar granule cells are the most abundant neurons in the brain, and granule cell precursors (GCPs) are a common target of transformation in the pediatric brain tumor medulloblastoma. Proliferation of GCPs is regulated by the secreted signaling molecule Sonic hedgehog (Shh), but the mechanisms by which Shh controls proliferation of GCPs remain inadequately understood. We used DNA microarrays to identify targets of Shh in these cells and found that Shh activates a program of transcription that promotes cell cycle entry and DNA replication. Among the genes most robustly induced by Shh are cyclin D1 and N-myc. N-myc transcription is induced in the presence of the protein synthesis inhibitor cycloheximide, so it appears to be a direct target of Shh. Retroviral transduction of N-myc into GCPs induces expression of cyclin D1, E2F1, and E2F2, and promotes proliferation. Moreover, dominant-negative N-myc substantially reduces Shh-induced proliferation, indicating that N-myc is required for the Shh response. Finally, cyclin D1 and N-myc are overexpressed in murine medulloblastoma. These findings suggest that cyclin D1 and N-myc are important mediators of Shh-induced proliferation and tumorigenesis.

Data Mining Issues and Opportunities for Building Nursing Knowledge

Journal of Biomedical Informatics. Aug-Oct, 2003  |  Pubmed ID: 14643734

Health care information systems tend to capture data for nursing tasks, and have little basis in nursing knowledge. Opportunity lies in an important issue where the knowledge used by expert nurses (nursing knowledge workers) in caring for patients is undervalued in the health care system. The complexity of nursing's knowledge base remains poorly articulated and inadequately represented in contemporary information systems. There is opportunity for data mining methods to assist with discovering important linkages between clinical data, nursing interventions, and patient outcomes. Following a brief overview of relevant data mining techniques, a preterm risk prediction case study illustrates the opportunities and describes typical data mining issues in the nontrivial task of building knowledge. Building knowledge in nursing, using data mining or any other method, will make progress only if important data that capture expert nurses' contributions are available in clinical information systems configurations.

MedlineR: an Open Source Library in R for Medline Literature Data Mining

Bioinformatics (Oxford, England). Dec, 2004  |  Pubmed ID: 15284107

We describe an open source library written in the R programming language for Medline literature data mining. This MedlineR library includes programs to query Medline through the NCBI PubMed database; to construct the co-occurrence matrix; and to visualize the network topology of query terms. The open source nature of this library allows users to extend it freely in the statistical programming language of R. To demonstrate its utility, we have built an application to analyze term-association by using only 10 lines of code. We provide MedlineR as a library foundation for bioinformaticians and statisticians to build more sophisticated literature data mining applications.

Impaired Cell Adhesion and Apoptosis in a Novel CLN9 Batten Disease Variant

Annals of Neurology. Sep, 2004  |  Pubmed ID: 15349861

We describe the ninth variant of neuronal ceroid lipofuscinosis (NCL) or Batten disease, due to defects in a putative new gene, CLN9. We therefore refer to the new variant as CLN9-deficient. Two Serbian sisters and two German brothers are described. Their clinical history is characteristic for juvenile NCL. They show similar gene expression patterns. The existence of this variant is supported by the presence of curvilinear inclusions, fingerprint profiles, and granular osmiophilic deposits in neurons, lymphocytes, and conjunctival cells. Enzyme screening and sequencing of the coding regions of other NCL genes was negative. CLN9-deficient cells have a distinctive phenotype. They have rounded cell bodies, have prominent nucleoli, attach poorly to the culture dish, and are sensitive to apoptosis but have increased growth rates. Gene expression of proteins involved in cell adhesion and apoptosis is altered in these cells. Sphingolipid metabolism is also perturbed. They have decreased levels of ceramide, sphingomyelin, lactosylceramide, ceramide trihexoside, and globoside and increased activity of serine palmitoyl transferase.

RNA-binding Proteins to Assess Gene Expression States of Co-cultivated Cells in Response to Tumor Cells

Molecular Cancer. Sep, 2004  |  Pubmed ID: 15353001

Tumors and complex tissues consist of mixtures of communicating cells that differ significantly in their gene expression status. In order to understand how different cell types influence one another's gene expression, it will be necessary to monitor the mRNA profiles of each cell type independently and to dissect the mechanisms that regulate their gene expression outcomes.

Loss of Patched and Disruption of Granule Cell Development in a Pre-neoplastic Stage of Medulloblastoma

Development (Cambridge, England). May, 2005  |  Pubmed ID: 15843415

Medulloblastoma is the most common malignant brain tumor in children. It is thought to result from the transformation of granule cell precursors (GCPs) in the developing cerebellum, but little is known about the early stages of the disease. Here, we identify a pre-neoplastic stage of medulloblastoma in patched heterozygous mice, a model of the human disease. We show that pre-neoplastic cells are present in the majority of patched mutants, although only 16% of these mice develop tumors. Pre-neoplastic cells, like tumor cells, exhibit activation of the Sonic hedgehog pathway and constitutive proliferation. Importantly, they also lack expression of the wild-type patched allele, suggesting that loss of patched is an early event in tumorigenesis. Although pre-neoplastic cells resemble GCPs and tumor cells in many respects, they have a distinct molecular signature. Genes that mark the pre-neoplastic stage include regulators of migration, apoptosis and differentiation, processes crucial for normal development but previously unrecognized for their role in medulloblastoma. The identification and molecular characterization of pre-neoplastic cells provides insight into the early steps in medulloblastoma formation, and may yield important markers for early detection and therapy of this disease.

Differential Cardiac Gene Expression During Cardiopulmonary Bypass: Ischemia-independent Upregulation of Proinflammatory Genes

The Journal of Thoracic and Cardiovascular Surgery. Aug, 2005  |  Pubmed ID: 16077395

Cardiac surgery with cardiopulmonary bypass induces both systemic and local inflammatory responses implicated in the pathogenesis of myocardial dysfunction. Multifactorial perioperative sources of myocardial injury complicate understanding of the molecular mechanisms involved. By using microarray technology, this study examines myocardial gene expression responses to cardiopulmonary bypass in the absence of cardioplegic arrest and ischemia-reperfusion injury.

Heterogeneity of Flt3-expressing Multipotent Progenitors in Mouse Bone Marrow

Journal of Immunology (Baltimore, Md. : 1950). Oct, 2005  |  Pubmed ID: 16210604

Mechanisms of lymphoid and myeloid lineage choice by hemopoietic stem cells remain unclear. In this study we show that the multipotent progenitor (MPP) population, which is immediately downstream of hemopoietic stem cells, is heterogeneous and can be subdivided in terms of VCAM-1 expression. VCAM-1(+) MPPs were fully capable of differentiating into both lymphoid and myeloid lineages. In contrast, VCAM-1(-) MPPs gave rise to lymphocytes predominately in vivo. T and B cell development from VCAM-1(-) MPPs was 1 wk faster than that from VCAM-1(+) MPPs. Furthermore, VCAM-1(+) MPPs gave rise to common myeloid progenitors and VCAM-1(-) MPPs in vivo, indicating that VCAM-1(-) MPPs are progenies of VCAM-1(+) MPPs. VCAM-1(-) MPPs, in turn, developed into lymphoid lineage-restricted common lymphoid progenitors. These results establish a hierarchy of developmental relationship between MPP subsets and lymphoid and myeloid progenitors. In addition, VCAM-1(+) MPPs may represent the branching point between the lymphoid and myeloid lineages.

Irrational Exuberance in Clinical Proteomics

Clinical Cancer Research : an Official Journal of the American Association for Cancer Research. Nov, 2005  |  Pubmed ID: 16299224

What is MzXML Good For?

Expert Review of Proteomics. Dec, 2005  |  Pubmed ID: 16307524

mzXML (extensible markup language) is one of the pioneering data formats for mass spectrometry-based proteomics data collection. It is an open data format that has benefited and evolved as a result of the input of many groups, and it continues to evolve. Due to its dynamic history, its structure, purpose and applicability have all changed with time, meaning that groups that have looked at the standard at different points during its evolution have differing impressions of the usefulness of mzXML. In discussing mzXML, it is important to understand what mzXML is not. First, mzXML does not capture the raw data. Second, mzXML is not sufficient for regulatory submission. Third, mzXML is not optimized for computation and, finally, mzXML does not capture the experiment design. In general, it is the authors' opinion that XML is not a panacea for bioinformatics or a substitute for good data representation, and groups that want to use mzXML (or other XML-based representations) directly for data storage or computation will encounter performance and scalability problems. With these limitations in mind, the authors conclude that mzXML is, nonetheless, an indispensable data exchange format for proteomics.

Characterising Phase Variations in MALDI-TOF Data and Correcting Them by Peak Alignment

Cancer Informatics. 2005  |  Pubmed ID: 19305630

The use of MALDI-TOF mass spectrometry as a means of analyzing the proteome has been evaluated extensively in recent years. One of the limitations of this technique that has impeded the development of robust data analysis algorithms is the variability in the location of protein ion signals along the x-axis. We studied technical variations of MALDI-TOF measurements in the context of proteomics profiling. By acquiring a benchmark data set with five replicates, we estimated 76% to 85% of the total variance is due to phase variation. We devised a lobster plot, so named because of the resemblance to a lobster claw, to help detect the phase variation in replicates. We also investigated a peak alignment algorithm to remove the phase variation. This operation is analogous to the normalization step in microarray data analysis. Only after this critical step can features of biological interest be clearly revealed. With the help of principal component analysis, we demonstrated that after peak alignment, the differences among replicates are reduced. We compared this approach to peak alignment with a model-based calibration approach in which there was known information about peaks in common among all spectra. Finally, we examined the potential value at each point in an analysis pipeline of having a set of methods available that includes parametric, semiparametric and nonparametric methods; among such methods are those that benefit from the use of prior information.

Molecular Profile and Partial Functional Analysis of Novel Endothelial Cell-derived Growth Factors That Regulate Hematopoiesis

Stem Cells (Dayton, Ohio). May, 2006  |  Pubmed ID: 16373696

Recent progress has been made in the identification of the osteoblastic cellular niche for hematopoietic stem cells (HSCs) within the bone marrow (BM). Attempts to identify the soluble factors that regulate HSC self-renewal have been less successful. We have demonstrated that primary human brain endothelial cells (HUBECs) support the ex vivo amplification of primitive human BM and cord blood cells capable of repopulating non-obese diabetic/severe combined immunodeficient repopulating (SCID) mice (SCID repopulating cells [SRCs]). In this study, we sought to characterize the soluble hematopoietic activity produced by HUBECs and to identify the growth factors secreted by HUBECs that contribute to this HSC-supportive effect. Extended noncontact HUBEC cultures supported an eight-fold increase in SRCs when combined with thrombopoietin, stem cell factor, and Flt-3 ligand compared with input CD34(+) cells or cytokines alone. Gene expression analysis of HUBEC biological replicates identified 65 differentially expressed, nonredundant transcripts without annotated hematopoietic activity. Gene ontology studies of the HUBEC transcriptome revealed a high concentration of genes encoding extracellular proteins with cell-cell signaling function. Functional analyses demonstrated that adrenomedullin, a vasodilatory hormone, synergized with stem cell factor and Flt-3 ligand to induce the proliferation of primitive human CD34(+)CD38(-)lin(-) cells and promoted the expansion of CD34(+) progenitors in culture. These data demonstrate the potential of primary HUBECs as a reservoir for the discovery of novel secreted proteins that regulate human hematopoiesis.

Gene Expression Profiles of the Rat Brain Both Immediately and 3 Months Following Acute Sarin Exposure

Biochemical Pharmacology. Feb, 2006  |  Pubmed ID: 16376859

We have studied sarin-induced global gene expression patterns at an early time point (15 min; 0.5xLD50) and a later time point (3 months; 1xLD50) using Affymetrix: Rat Neurobiology U34 chips in male, Sprague-Dawley rats and have identified a total of 65 (early) and 38 (late) genes showing statistically significant alterations from control levels at 15 min and 3 months, respectively. At the early time point, those that are classified as ion channel, cytoskeletal and cell adhesion molecules, in addition to neuropeptides and their receptors predominated over all other groups. The other groups included: cholinergic signaling, calcium channel and binding proteins, transporters, chemokines, GABAnergic, glutamatergic, aspartate, catecholaminergic, nitric oxide synthase, purinergic, and serotonergic signaling molecules. At the late time point, genes that are classified as calcium channel and binding proteins, cytoskeletal and cell adhesion molecules and GABAnergic signaling molecules were most prominent. Seven molecules (Ania-9, Arrb-1, CX-3C, Gabab-1d, Nos-2a, Nrxn-1b, PDE2) were identified that showed altered persistent expression in both time points. Selected genes from each of these time points were further validated using semi quantitative RT-PCR approaches. Some of the genes that were identified in the present study have been shown to be involved in organophosphate-induced neurotoxicity by both other groups as well as ours. Principal component analysis (PCA) of the expression data from both time points was used for comparative analysis of the gene expression, which indicated that the changes in gene expression were a function of dose and time of euthanasia after the treatment. Our model also predicts that besides dose and duration of post-treatment period, age and possibly other factors may be playing important roles in the regulation of pathways, leading to the neurotoxicity.

Improved Prediction of Treatment Response Using Microarrays and Existing Biological Knowledge

Pharmacogenomics. Apr, 2006  |  Pubmed ID: 16610959

A desired application for microarrays in the clinic is to predict treatment response from an often diverse patient population. We present a method for analyzing microarray data that is predicated on biological pathway and function knowledge as opposed to a purely data-driven initial analysis. From an analysis perspective, this methodology takes advantage of information that is available across genes on a single array, as well as differences in those patterns across measurements. By using biological knowledge in the initial analysis, the accuracy and robustness of microarray profile classification is enhanced, especially when low numbers of samples are available. For clinical studies, particularly Phase I or I/II studies, this technique is exceptionally advantageous.

Partition Functions and Finite-size Scalings of Ising Model on Helical Tori

Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics. May, 2006  |  Pubmed ID: 16802982

The exact closed forms of the partition functions of a two-dimensional Ising model on square lattices with twisted boundary conditions are given. The constructions of helical tori are unambiguously related to the twisted boundary conditions by virtue of the SL(2, Z) transforms. The numerical analyses on the deviations of the specific-heat peaks away from the bulk critical temperature reveal that the finite-size effect of herical tori is independent of the chirality.

Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-based Pattern Matching

Bioinformatics (Oxford, England). Sep, 2006  |  Pubmed ID: 16820428

A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods.

Grid-enabled High-throughput in Silico Screening Against Influenza A Neuraminidase

IEEE Transactions on Nanobioscience. Dec, 2006  |  Pubmed ID: 17181029

Encouraged by the success of the first EGEE biomedical data challenge against malaria (WISDOM), the second data challenge battling avian flu was kicked off in April 2006 to identify new drugs for the potential variants of the influenza A virus. Mobilizing thousands of CPUs on the Grid, the six-week-long high-throughput screening activity has fulfilled over 100 CPU years of computing power and produced around 600 gigabytes of results on the Grid for further biological analysis and testing. In the paper, we demonstrate the impact of a worldwide Grid infrastructure to efficiently deploy large-scale virtual screening to speed up the drug design process. Lessons learned through the data challenge activity are also discussed.

Master Equation Approach to Folding Kinetics of Lattice Polymers Based on Conformation Networks

The Journal of Chemical Physics. Apr, 2007  |  Pubmed ID: 17430067

Based on the master equation with the inherent structure of conformation network, the authors investigate some important issues in the folding kinetics of lattice polymers. First, the topologies of conformation networks are discussed. Moreover, a new scheme of implementing Metropolis algorithm, which fulfills the condition of detailed balance, is proposed. Then, upon incorporating this new scheme into the geometric structure of conformation network the authors provide a theorem which can be used to place an upper bound on relaxation time. To effectively identify the kinetic traps of folding, the authors also introduce a new quantity, which is employed from the continuous time Monte Carlo method, called rigidity factor. Throughout the discussions, the authors analyze the results for different move sets to demonstrate the methods and to study the features of the kinetics of folding.

Other Riffs on Cooperation Are Already Showing How Well a Wiki Could Work

Nature. Apr, 2007  |  Pubmed ID: 17443163

NuID: a Universal Naming Scheme of Oligonucleotides for Illumina, Affymetrix, and Other Microarrays

Biology Direct. May, 2007  |  Pubmed ID: 17540033

Oligonucleotide probes that are sequence identical may have different identifiers between manufacturers and even between different versions of the same company's microarray; and sometimes the same identifier is reused and represents a completely different oligonucleotide, resulting in ambiguity and potentially mis-identification of the genes hybridizing to that probe.

The New York City Palliative Care Quality Improvement Collaborative

Joint Commission Journal on Quality and Patient Safety. Jun, 2007  |  Pubmed ID: 17566540

Care for persons living with fatal chronic conditions is expensive and challenging, and can be unreliable. A quality improvement collaborative was conducted to develop capacity among health care providers in a single geographic area-New York City-to apply quality improvement methodology to palliative care services..

Gene Expression Analysis of Frontotemporal Lobar Degeneration of the Motor Neuron Disease Type with Ubiquitinated Inclusions

Acta Neuropathologica. Jul, 2007  |  Pubmed ID: 17569064

Neurodegenerative disorders share a process of aggregation of insoluble protein. Frontotemporal lobar degeneration with ubiquitinated inclusions (FTLD-U) is characterized by the presence of ubiquitin and TDP-43 positive aggregates which are likely related to specific gene expression profiles. We carried out gene expression microarray analysis on post-mortem brain tissue from FTLD-U, FTLD-MND, and controls. Using total RNA from carefully dissected frontal cortical layer II, we obtained gene expression profiles showing that FTLD-U and controls differ in over 100 networks, including those involved in synapse formation, the ubiquitin-proteasome system, endosomal sorting, and apoptosis. We performed qRT-PCR validation for three genes, representative of three different networks. Dynein axonemal light intermediate chain 1 (DNALI1) (microtubule/cytoskeleton network associated) expression was 3-fold higher and myeloid differentiation primary response gene 88 (MYD88) (signal transduction network) was 3.3 times higher in FTLD-U than FTLD-MND and controls; annexin A2 (ANXA2) (endosomal sorting) expression was 11.3-fold higher in FTLD-U than FTLD-MND and 2.3-fold higher than controls. The identification of progranulin (PGRN) gene mutations and TDP-43 as the major protein component of the ubiquitinated inclusions, are two recent landmark discoveries in the field of FTLD-U. We found 1.5-fold increase in TDP-43 in both FTLD-MND and FTLD-U while progranulin showed no gene expression differences between controls and FTLD-MND. However, one of the FTLD-U cases tested by Affymetrix microarray showed "absence call" of this transcript, suggesting absent or decreased gene expression. Our findings point to specific gene-linked-pathways which may be influenced by neurodegenerative disease process and may be targeted for further exploration.

Interpreting Microarray Results with Gene Ontology and MeSH

Methods in Molecular Biology (Clifton, N.J.). 2007  |  Pubmed ID: 17634620

Methods are described to take a list of genes generated from a microarray experiment and interpret these results using various tools and ontologies. A workflow is described that details how to convert gene identifiers with SOURCE and MatchMiner and then use these converted gene lists to search the gene ontology (GO) and the medical subject headings (MeSH) ontology. Examples of searching GO with DAVID, EASE, and GOMiner are provided along with an interpretation of results. The mining of MeSH using high-density array pattern interpreter with a set of gene identifiers is also described.

Mining Biomedical Data Using MetaMap Transfer (MMtx) and the Unified Medical Language System (UMLS)

Methods in Molecular Biology (Clifton, N.J.). 2007  |  Pubmed ID: 18314582

Detailed instruction is described for mapping unstructured, free text data into common biomedical concepts (drugs, diseases, anatomy, and so on) found in the Unified Medical Language System using MetaMap Transfer (MMTx). MMTx can be used in applications including mining and inferring relationship between concepts in MEDLINE publications by transforming free text into computable concepts. MMTx is in general not designed to be an end-user program; therefore, a simple analysis is described using MMTx for users without any programming knowledge. In addition, two Java template files are provided for automated processing of the output from MMTx and users can adopt this with minimum Java program experience.

Model-based Variance-stabilizing Transformation for Illumina Microarray Data

Nucleic Acids Research. Feb, 2008  |  Pubmed ID: 18178591

Variance stabilization is a step in the preprocessing of microarray data that can greatly benefit the performance of subsequent statistical modeling and inference. Due to the often limited number of technical replicates for Affymetrix and cDNA arrays, achieving variance stabilization can be difficult. Although the Illumina microarray platform provides a larger number of technical replicates on each array (usually over 30 randomly distributed beads per probe), these replicates have not been leveraged in the current log2 data transformation process. We devised a variance-stabilizing transformation (VST) method that takes advantage of the technical replicates available on an Illumina microarray. We have compared VST with log2 and Variance-stabilizing normalization (VSN) by using the Kruglyak bead-level data (2006) and Barnes titration data (2005). The results of the Kruglyak data suggest that VST stabilizes variances of bead-replicates within an array. The results of the Barnes data show that VST can improve the detection of differentially expressed genes and reduce false-positive identifications. We conclude that although both VST and VSN are built upon the same model of measurement noise, VST stabilizes the variance better and more efficiently for the Illumina platform by leveraging the availability of a larger number of within-array replicates. The algorithms and Supplementary Data are included in the lumi package of Bioconductor, available at:

Self-similarity in the Classification of Finite-size Scaling Functions for Toroidal Boundary Conditions

Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics. Jan, 2008  |  Pubmed ID: 18351807

The conventional periodic boundary conditions in two dimensions are extended to general boundary conditions, prescribed by primitive vector pairs that may not coincide with the coordinate axes. This extension is shown to be unambiguously specified by the twisting scheme. Equivalent relations between different twist settings are constructed explicitly. The classification of finite-size scaling functions is discussed based on the equivalent relations. A self-similar pattern for distinct classes of finite-size scaling functions is shown to appear on the plane that parametrizes the toroidal geometry.

Lumi: a Pipeline for Processing Illumina Microarray

Bioinformatics (Oxford, England). Jul, 2008  |  Pubmed ID: 18467348

Illumina microarray is becoming a popular microarray platform. The BeadArray technology from Illumina makes its preprocessing and quality control different from other microarray technologies. Unfortunately, most other analyses have not taken advantage of the unique properties of the BeadArray system, and have just incorporated preprocessing methods originally designed for Affymetrix microarrays. lumi is a Bioconductor package especially designed to process the Illumina microarray data. It includes data input, quality control, variance stabilization, normalization and gene annotation portions. In specific, the lumi package includes a variance-stabilizing transformation (VST) algorithm that takes advantage of the technical replicates available on every Illumina microarray. Different normalization method options and multiple quality control plots are provided in the package. To better annotate the Illumina data, a vendor independent nucleotide universal identifier (nuID) was devised to identify the probes of Illumina microarray. The nuID annotation packages and output of lumi processed results can be easily integrated with other Bioconductor packages to construct a statistical data analysis pipeline for Illumina data.

Susceptibility to Glaucoma: Differential Comparison of the Astrocyte Transcriptome from Glaucomatous African American and Caucasian American Donors

Genome Biology. 2008  |  Pubmed ID: 18613964

Epidemiological and genetic studies indicate that ethnic/genetic background plays an important role in susceptibility to primary open angle glaucoma (POAG). POAG is more prevalent among the African-descent population compared to the Caucasian population. Damage in POAG occurs at the level of the optic nerve head (ONH) and is mediated by astrocytes. Here we investigated differences in gene expression in primary cultures of ONH astrocytes obtained from age-matched normal and glaucomatous donors of Caucasian American (CA) and African American (AA) populations using oligonucleotide microarrays.

Gene Expression and Functional Studies of the Optic Nerve Head Astrocyte Transcriptome from Normal African Americans and Caucasian Americans Donors

PloS One. Aug, 2008  |  Pubmed ID: 18716680

To determine whether optic nerve head (ONH) astrocytes, a key cellular component of glaucomatous neuropathy, exhibit differential gene expression in primary cultures of astrocytes from normal African American (AA) donors compared to astrocytes from normal Caucasian American (CA) donors.

2009 and Beyond: the Decade of Personalised Medicine

International Journal of Computational Biology and Drug Design. 2008  |  Pubmed ID: 20063461

With a better understanding of the human genome, it is possible to tailor the medical care, including prevention, diagnosis and treatment of disease, to an individual's needs. This revolution of medical care is called Personalised Medicine. This paper provides a brief review of recent progresses in high-throughput measurement technologies, their effects on personalise medicine, milestone projects in human genome researches and related challenges in bioinformatics and computational biology.

A Divide-and-conquer Strategy to Solve the Out-of-memory Problem of Processing Thousands of Affymetrix Microarrays

International Journal of Computational Biology and Drug Design. 2008  |  Pubmed ID: 20063464

Out-of-memory problem was frequently encountered when processing thousands of CEL files using Bioconductor. We propose a divide-and-conquer strategy combined with randomised resampling to solve this problem. The CAMDA 2007 META-analysis data set which contains 5896 CEL files was used to test the approach on a typical commodity computer cluster by running established pre-processing algorithms for Affymetrix arrays in the Bioconductor package. The results were validated against a golden standard obtained by using a supercomputer. In addition to the performance improvement, the general divide-and-conquer strategy can be applied to any other normalisation algorithms without modifying the underlying implementation.

Expression of Cyclophilin B is Associated with Malignant Progression and Regulation of Genes Implicated in the Pathogenesis of Breast Cancer

The American Journal of Pathology. Jan, 2009  |  Pubmed ID: 19056847

Cyclophilin B (CypB) is a 21-kDa protein with peptidyl-prolyl cis-trans isomerase activity that functions as a transcriptional inducer for Stat5 and as a ligand for CD147. To better understand the global function of CypB in breast cancer, T47D cells with a small interfering RNA-mediated knockdown of CypB were generated. Subsequent expression profiling analysis showed that 663 transcripts were regulated by CypB knockdown, and that many of these gene products contributed to cell proliferation, cell motility, and tumorigenesis. Real-time PCR confirmed that STMN3, S100A4, S100A6, c-Myb, estrogen receptor alpha, growth hormone receptor, and progesterone receptor were all down-regulated in si-CypB cells. A linkage analysis of these array data to protein networks resulted in the identification of 27 different protein networks that were impacted by CypB knockdown. Functional assays demonstrated that CypB knockdown also decreased cell growth, proliferation, and motility. Immunohistochemical and immunofluorescent analyses of a matched breast cancer progression tissue microarray that was labeled with an anti-CypB antibody demonstrated a highly significant increase in CypB protein levels as a function of breast cancer progression. Taken together, these results suggest that the enhanced expression of CypB in malignant breast epithelium may contribute to the pathogenesis of this disease through its regulation of the expression of hormone receptors and gene products that are involved in cell proliferation and motility.

Overexpression of RhoA Induces Preneoplastic Transformation of Primary Mammary Epithelial Cells

Cancer Research. Jan, 2009  |  Pubmed ID: 19147561

Rho family small GTPases serve as molecular switches in the regulation of diverse cellular functions, including actin cytoskeleton remodeling, cell migration, gene transcription, and cell proliferation. Importantly, Rho overexpression is frequently seen in many carcinomas. However, published studies have almost invariably used immortal or tumorigenic cell lines to study Rho GTPase functions and there are no studies on the potential of Rho small GTPase to overcome senescence checkpoints and induce preneoplastic transformation of human mammary epithelial cells (hMEC). We show here that ectopic expression of wild-type (WT) RhoA as well as a constitutively active RhoA mutant (G14V) in two independent primary hMEC strains led to their immortalization and preneoplastic transformation. These cells have continued to grow over 300 population doublings (PD) with no signs of senescence, whereas cells expressing the vector or dominant-negative RhoA mutant (T19N) senesced after 20 PDs. Significantly, RhoA-T37A mutant, known to be incapable of interacting with many well-known Rho effectors including Rho kinase, PKN, mDia1, and mDia2, was also capable of immortalizing hMECs. Notably, similar to parental normal cells, Rho-immortalized cells have WT p53 and intact G(1) cell cycle arrest on Adriamycin treatment. Rho-immortalized cells were anchorage dependent and were unable to form tumors when implanted in nude mice. Lastly, microarray expression profiling of Rho-immortalized versus parental cells showed altered expression of several genes previously implicated in immortalization and breast cancer progression. Taken together, these results show that RhoA can induce the preneoplastic transformation of hMECs by altering multiple pathways linked to cellular transformation and breast cancer.

Intragraft TNF Receptor Signaling Contributes to Activation of Innate and Adaptive Immunity in a Renal Allograft Model

Transplantation. Jan, 2009  |  Pubmed ID: 19155971

Increased levels of tumor necrosis factor (TNF) are a risk factor for allograft rejection. In vitro studies have shown that binding of TNF to its receptor activates signaling cascades that induce expression of many genes involved in inflammation. The role of intragraft TNF receptor (TNFR) signaling in activation of gene expression in allografts has not been studied.

BALB/c Mice Genetically Susceptible to Proteoglycan-induced Arthritis and Spondylitis Show Colony-dependent Differences in Disease Penetrance

Arthritis Research & Therapy. 2009  |  Pubmed ID: 19220900

The major histocompatibility complex (H-2d) and non-major histocompatibility complex genetic backgrounds make the BALB/c strain highly susceptible to inflammatory arthritis and spondylitis. Although different BALB/c colonies develop proteoglycan-induced arthritis and proteoglycan-induced spondylitis in response to immunization with human cartilage proteoglycan, they show significant differences in disease penetrance despite being maintained by the same vendor at either the same or a different location.

Genomics of Human Intracranial Aneurysm Wall

Stroke. Apr, 2009  |  Pubmed ID: 19228845

The pathogenesis of intracranial aneurysms (IAs) remains elusive. Most studies have focused on individual genes, or a few interrelated genes or products, at a time in human IA. However, a broad view of pathologic mechanisms has not been investigated by identifying pathogenic genes and their interaction in networks. Our study aimed to analyze global gene expression patterns in the IA wall.

From Disease Ontology to Disease-ontology Lite: Statistical Methods to Adapt a General-purpose Ontology for the Test of Gene-ontology Associations

Bioinformatics (Oxford, England). Jun, 2009  |  Pubmed ID: 19478018

Subjective methods have been reported to adapt a general-purpose ontology for a specific application. For example, Gene Ontology (GO) Slim was created from GO to generate a highly aggregated report of the human-genome annotation. We propose statistical methods to adapt the general purpose, OBO Foundry Disease Ontology (DO) for the identification of gene-disease associations. Thus, we need a simplified definition of disease categories derived from implicated genes. On the basis of the assumption that the DO terms having similar associated genes are closely related, we group the DO terms based on the similarity of gene-to-DO mapping profiles. Two types of binary distance metrics are defined to measure the overall and subset similarity between DO terms. A compactness-scalable fuzzy clustering method is then applied to group similar DO terms. To reduce false clustering, the semantic similarities between DO terms are also used to constrain clustering results. As such, the DO terms are aggregated and the redundant DO terms are largely removed. Using these methods, we constructed a simplified vocabulary list from the DO called Disease Ontology Lite (DOLite). We demonstrated that DOLite results in more interpretable results than DO for gene-disease association tests. The resultant DOLite has been used in the Functional Disease Ontology (FunDO) Web application at

Annotating the Human Genome with Disease Ontology

BMC Genomics. Jul, 2009  |  Pubmed ID: 19594883

The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases.

GATA-2 Reinforces Megakaryocyte Development in the Absence of GATA-1

Molecular and Cellular Biology. Sep, 2009  |  Pubmed ID: 19620289

GATA-2 is an essential transcription factor that regulates multiple aspects of hematopoiesis. Dysregulation of GATA-2 is a hallmark of acute megakaryoblastic leukemia in children with Down syndrome, a malignancy that is defined by the combination of trisomy 21 and a GATA1 mutation. Here, we show that GATA-2 is required for normal megakaryocyte development as well as aberrant megakaryopoiesis in Gata1 mutant cells. Furthermore, we demonstrate that GATA-2 indirectly controls cell cycle progression in GATA-1-deficient megakaryocytes. Genome-wide microarray analysis and chromatin immunoprecipitation studies revealed that GATA-2 regulates a wide set of genes, including cell cycle regulators and megakaryocyte-specific genes. Surprisingly, GATA-2 also negatively regulates the expression of crucial myeloid transcription factors, such as Sfpi1 and Cebpa. In the absence of GATA-1, GATA-2 prevents induction of a latent myeloid gene expression program. Thus, GATA-2 contributes to cell cycle progression and the maintenance of megakaryocyte identity of GATA-1-deficient cells, including GATA-1s-expressing fetal megakaryocyte progenitors. Moreover, our data reveal that overexpression of GATA-2 facilitates aberrant megakaryopoiesis.

Using Free and Open-Source Bioconductor Packages to Analyze Array Comparative Genomics Hybridization (aCGH) Data

Current Genomics. Mar, 2009  |  Pubmed ID: 19721812

Whole-genome array Comparative Genomics Hybridization (aCGH) can be used to scan chromosomes for deletions and amplifications. Because of the increased accessibility of many commercial platforms, a lot of cancer researchers have used aCGH to study tumorigenesis or to predict clinical outcomes. Each data set is typically in several hundred thousands to one million rows of hybridization measurements. Thus, statistical analysis is a key to unlock the knowledge obtained from an aCGH study. We review several free and open-source packages in Bioconductor and provide example codes to run the analysis. The analysis of aCGH data provides insights of genomic abnormalities of cancers.

Visual Annotation of the Gene Database

Conference Proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference. 2009  |  Pubmed ID: 19964623

The genes in NCBI databases are currently annotated with itemized text (Gene Reference Into Function, or GeneRIF). A previous work suggests that the visual presentation can be more effective when time and space are under heavy constraints. Here we report a novel annotation of the genome information using Web 2.0 technologies: GeneGIF (Gene Graphics Into Function). The users can quickly scan through important functions of each gene from a graph, and then go to detailed pages when they find interesting annotations. The modular implementation makes it easily pluggable into other widely used databases without reprogramming. Similar approaches are being developed to incorporate information to other types of genomics and proteomics databases.

A Collection of Bioconductor Methods to Visualize Gene-list Annotations

BMC Research Notes. Jan, 2010  |  Pubmed ID: 20180973

Gene-list annotations are critical for researchers to explore the complex relationships between genes and functionalities. Currently, the annotations of a gene list are usually summarized by a table or a barplot. As such, potentially biologically important complexities such as one gene belonging to multiple annotation categories are difficult to extract. We have devised explicit and efficient visualization methods that provide intuitive methods for interrogating the intrinsic connections between biological categories and genes.

ChIPpeakAnno: a Bioconductor Package to Annotate ChIP-seq and ChIP-chip Data

BMC Bioinformatics. May, 2010  |  Pubmed ID: 20459804

Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome.

Suppression by L-methionine of Cell Cycle Progression in LNCaP and MCF-7 Cells but Not Benign Cells

Anticancer Research. Jun, 2010  |  Pubmed ID: 20651330

Methionine inhibits proliferation of breast and prostate cancer cells. This study aimed to determine cell cycle effects of methionine and selectivity for cancer cells.

The MicroArray Quality Control (MAQC)-II Study of Common Practices for the Development and Validation of Microarray-based Predictive Models

Nature Biotechnology. Aug, 2010  |  Pubmed ID: 20676074

Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.

Visual Presentation As a Welcome Alternative to Textual Presentation of Gene Annotation Information

Advances in Experimental Medicine and Biology. 2010  |  Pubmed ID: 20865558

The functions of a gene are traditionally annotated textually using either free text (Gene Reference Into Function or GeneRIF) or controlled vocabularies (e.g., Gene Ontology or Disease Ontology). Inspired by the latest word cloud tools developed by the Information Visualization Group at IBM Research, we have prototyped a visual system for capturing gene annotations, which we named Gene Graph Into Function or GeneGIF. Fully developing the GeneGIF system would be a significant effort. To justify the necessity and to specify the design requirements of GeneGIF, we first surveyed the end-user preferences. From 53 responses, we found that a majority (64%, p < 0.05) of the users were either positive or neutral toward using GeneGIF in their daily work (acceptance); in terms of preference, a slight majority (51%, p > 0.05) of the users favored visual presentation of information (GeneGIF) compared to textual (GeneRIF) information. The results of this study indicate that a visual presentation tool, such as GeneGIF, can complement standard textual presentation of gene annotations. Moreover, the survey participants provided many constructive comments that will specify the development of a phase-two project ( to visually annotate each gene in the human genome.

Comparison of Beta-value and M-value Methods for Quantifying Methylation Levels by Microarray Analysis

BMC Bioinformatics. Nov, 2010  |  Pubmed ID: 21118553

High-throughput profiling of DNA methylation status of CpG islands is crucial to understand the epigenetic regulation of genes. The microarray-based Infinium methylation assay by Illumina is one platform for low-cost high-throughput methylation profiling. Both Beta-value and M-value statistics have been used as metrics to measure methylation levels. However, there are no detailed studies of their relations and their strengths and limitations.

L-methionine-induced Alterations in Molecular Signatures in MCF-7 and LNCaP Cancer Cells

Journal of Cancer Research and Clinical Oncology. Mar, 2011  |  Pubmed ID: 20454975

Methionine inhibits proliferation of breast and prostate cancer cells. Here, we determined the influence of L-methionine on functional molecular signatures in these cell lines.

Epithelial Regulation of Mesenchymal Tissue Behavior

The Journal of Investigative Dermatology. Apr, 2011  |  Pubmed ID: 21228814

Fibroproliferative scars are an important clinical problem, and yet the mechanisms that regulate scar formation remain poorly understood. This study explored the hypothesis that the epithelium has a critical role in dictating scar formation, and that these interactions differ in skin and mucosa. Paired skin and vaginal mucosal wounds on New Zealand white (NZW) rabbits diverged significantly; the cutaneous epithelium exhibited a greater and prolonged response to injury when compared with the mucosa. Microarray analysis of the injured epithelium was performed, and numerous factors were identified that were more strongly upregulated in skin, including several proinflammatory cytokines and profibrotic growth factors. Analysis of the underlying mesenchymal tissue demonstrated a fibrotic response in the dermis of the skin but not the mucosal lamina propria, in the absence of a connective tissue injury. To determine if the proinflammatory factors produced by the epidermis may have a role in dermal fibrosis, an IL-1 receptor antagonist was administered locally to healing skin wounds. In the NZW rabbit model, blockade of IL-1 signaling was effective in preventing hypertrophic scar formation. These results support the idea that soluble factors produced by the epithelium in response to injury may influence fibroblast behavior and regulate scar formation in vivo.

Ultrastructural, Immunofluorescence, and RNA Evidence Support the Hypothesis of a "new" Virus Associated with Kawasaki Disease

The Journal of Infectious Diseases. Apr, 2011  |  Pubmed ID: 21402552

Intracytoplasmic inclusion bodies (ICI) have been identified in ciliated bronchial epithelium of Kawasaki disease (KD) patients using a synthetic antibody derived from acute KD arterial IgA plasma cells; ICI may derive from the KD etiologic agent.

The Early Growth Response Gene Egr2 (Alias Krox20) is a Novel Transcriptional Target of Transforming Growth Factor-β That is Up-regulated in Systemic Sclerosis and Mediates Profibrotic Responses

The American Journal of Pathology. May, 2011  |  Pubmed ID: 21514423

Although the early growth response-2 (Egr-2, alias Krox20) protein shows structural and functional similarities to Egr-1, these two related early-immediate transcription factors are nonredundant. Egr-2 plays essential roles in peripheral nerve myelination, adipogenesis, and immune tolerance; however, its regulation and role in tissue repair and fibrosis remain poorly understood. We show herein that transforming growth factor (TGF)-β induced a Smad3-dependent sustained stimulation of Egr2 gene expression in normal fibroblasts. Overexpression of Egr-2 was sufficient to stimulate collagen gene expression and myofibroblast differentiation, whereas these profibrotic TGF-β responses were attenuated in Egr-2-depleted fibroblasts. Genomewide transcriptional profiling revealed that multiple genes associated with tissue remodeling and wound healing were up-regulated by Egr-2, but the Egr-2-regulated gene expression profile overlapped only partially with the Egr-1-regulated gene profile. Levels of Egr-2 were elevated in lesional tissue from mice with bleomycin-induced scleroderma. Moreover, elevated Egr-2 was noted in biopsy specimens of skin and lung from patients with systemic sclerosis. These results provide the first evidence that Egr-2 is a functionally distinct transcription factor that is both necessary and sufficient for TGF-β-induced profibrotic responses and is aberrantly expressed in lesional tissue in systemic sclerosis and in a murine model of scleroderma. Together, these findings suggest that Egr-2 plays an important nonredundant role in the pathogenesis of fibrosis. Targeting Egr-2 might represent a novel therapeutic strategy to control fibrosis.

Gene Expression Variation Between African Americans and Whites is Associated with Coronary Artery Calcification: the Multiethnic Study of Atherosclerosis

Physiological Genomics. Jul, 2011  |  Pubmed ID: 21521779

Coronary artery calcium (CAC) is a strong indicator of total atherosclerosis burden. Epidemiological data have shown substantial differences in CAC prevalence and severity between African Americans and whites. However, little is known about the molecular mechanisms underlying initiation and progression of CAC. Microarray gene expression profiling of peripheral blood leucocytes was performed from 119 healthy women aged 50 yr or above in the Multi-Ethnic Study of Atherosclerosis cohort; 48 women had CAC score >100 and carotid intima-media thickness (IMT) >1 mm, while 71 had CAC <10 and IMT <0.65 mm. When 17 African Americans were compared with 41 whites in the low-CAC group, 409 differentially expressed genes (false discovery rate <5%) were identified. In addition, 316 differentially expressed genes were identified between the high- and low-CAC groups. A substantial overlap between these two gene lists was observed (148 genes, P < 10(-6)). Furthermore, genes expressed lower in African Americans also tend to express lower in individuals with low CAC (correlation 0.69, P = 0.002). Ontology analysis of the 409 race-associated genes revealed significant enrichment in mobilization of calcium and immune/inflammatory response (P < 10(-9)). Of note, 25 of 30 calcium mobilization genes were involved in immune/inflammatory response (P < 10(-10)). Our data suggest a connection between immune response and vascular calcification and the result provides a potential mechanistic explanation for the lower prevalence and severity of CAC in African Americans compared with whites.

Activated TLR Signaling in Atherosclerosis Among Women with Lower Framingham Risk Score: the Multi-ethnic Study of Atherosclerosis

PloS One. 2011  |  Pubmed ID: 21698167

Atherosclerosis is the leading cause of cardiovascular disease (CVD). Traditional risk factors can be used to identify individuals at high risk for developing CVD and are generally associated with the extent of atherosclerosis; however, substantial numbers of individuals at low or intermediate risk still develop atherosclerosis.

Gene Expression Changes in Retinal Müller (glial) Cells Exposed to Elevated Pressure

Current Eye Research. Aug, 2011  |  Pubmed ID: 21780925

Retinal Müller (glial) cells undergo "reactive gliosis", a stress response that is accompanied by changes in their morphology and upregulation of various cellular markers. Reactive gliosis is seen in many retinal diseases and conditions; however, it is not known whether it is a common, stereotypic response or the nature of the response varies with the type of retinal stress. To address this question, we have examined gene expression changes in Müller cells exposed to elevated pressure.

Egr-1 Induces a Profibrotic Injury/repair Gene Program Associated with Systemic Sclerosis

PloS One. 2011  |  Pubmed ID: 21931594

Transforming growth factor-ß (TGF-ß) signaling is implicated in the pathogenesis of fibrosis in scleroderma or systemic sclerosis (SSc), but the precise mechanisms are poorly understood. The immediate-early gene Egr-1 is an inducible transcription factor with key roles in mediating fibrotic TGF-ß responses. To elucidate Egr-1 function in SSc-associated fibrosis, we examined change in gene expression induced by Egr-1 in human fibroblasts at the genome-wide level. Using microarray expression analysis, we derived a fibroblast "Egr-1-responsive gene signature" comprising over 600 genes involved in cell proliferation, TGF-ß signaling, wound healing, extracellular matrix synthesis and vascular development. The experimentally derived "Egr-1-responsive gene signature" was then evaluated in an expression microarray dataset comprising skin biopsies from 27 patients with localized and systemic forms of scleroderma and six healthy controls. We found that the "Egr-1 responsive gene signature" was substantially enriched in the "diffuse-proliferation" subset comprising exclusively of patients with diffuse cutaneous SSc (dcSSc) of skin biopsies. A number of Egr-1-regulated genes was also associated with the "inflammatory" intrinsic subset. Only a minority of Egr-1-regulated genes was concordantly regulated by TGF-ß. These results indicate that Egr-1 induces a distinct profibrotic/wound healing gene expression program in fibroblasts that is associated with skin biopsies from SSc patients with diffuse cutaneous disease. These observations suggest that targeting Egr-1 expression or activity might be a novel therapeutic strategy to control fibrosis in specific SSc subsets.

Improved Statistical Analysis for Array CGH-Based DNA Copy Number Aberrations

Cancer Informatics. 2011  |  Pubmed ID: 22084565

Array-based comparative genomic hybridization (aCGH) allows measuring DNA copy number at the whole genome scale. In cancer studies, one may be interested in identifying DNA copy number aberrations (CNAs) associated with certain clinicopathological characteristics such as cancer metastasis. We proposed to define test regions based on copy number pattern profiles across multiple samples, using either smoothed log(2)-ratio or discrete data of copy number gain/loss calls. Association test performed on the refined test regions instead of the probes has improved power due to reduced number of tests. We also compared three types of measurement of copy number levels, normalized log(2)-ratio, smoothed log(2)-ratio, and copy number gain or loss calls in statistical hypothesis testing. The relative strengths and weaknesses of the proposed method were demonstrated using both simulation studies and real data analysis of a liver cancer study.

PLoS Computational Biology Conference Postcards from ISMB/ECCB 2011

PLoS Computational Biology. Nov, 2011  |  Pubmed ID: 22125481

Mining the Gene Wiki for Functional Genomic Knowledge

BMC Genomics. Dec, 2011  |  Pubmed ID: 22165947

Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology.

Genomic Sequencing in Clinical Trials

Journal of Translational Medicine. Dec, 2011  |  Pubmed ID: 22206293

Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to find its way into clinical trials both nationally and worldwide. We highlight the currently available types of genomic sequencing platforms, outline the advantages and disadvantages of each, and compare first- and next-generation techniques with respect to capabilities, quality, and cost. We describe the current geographical distributions and types of disease conditions in which these technologies are used, and how next-generation sequencing is strategically being incorporated into new and existing studies. Lastly, recent major breakthroughs and the ongoing challenges of using genomic sequencing in clinical research are discussed.

Using the Bioconductor GeneAnswers Package to Interpret Gene Lists

Methods in Molecular Biology (Clifton, N.J.). 2012  |  Pubmed ID: 22130876

Use of microarray data to generate expression profiles of genes associated with disease can aid in identification of markers of disease and potential therapeutic targets. Pathway analysis methods further extend expression profiling by creating inferred networks that provide an interpretable structure of the gene list and visualize gene interactions. This chapter describes GeneAnswers, a novel gene-concept network analysis tool available as an open source Bioconductor package. GeneAnswers creates a gene-concept network and also can be used to build protein-protein interaction networks. The package includes an example multiple myeloma cell line dataset and tutorial. Several network analysis methods are included in GeneAnswers, and the tutorial highlights the conditions under which each type of analysis is most beneficial and provides sample code.

Genome-wide DNA Methylation Indicates Silencing of Tumor Suppressor Genes in Uterine Leiomyoma

PloS One. 2012  |  Pubmed ID: 22428009

Uterine leiomyomas, or fibroids, represent the most common benign tumor of the female reproductive tract. Fibroids become symptomatic in 30% of all women and up to 70% of African American women of reproductive age. Epigenetic dysregulation of individual genes has been demonstrated in leiomyoma cells; however, the in vivo genome-wide distribution of such epigenetic abnormalities remains unknown.

Meiosis-induced Alterations in Transcript Architecture and Noncoding RNA Expression in S. Cerevisiae

RNA (New York, N.Y.). Jun, 2012  |  Pubmed ID: 22539527

Changes in transcript architecture can have powerful effects on protein expression. Regulation of the transcriptome is often dramatically revealed during dynamic conditions such as development. To examine changes in transcript architecture we analyzed the expression and transcript boundaries of protein-coding and noncoding RNAs over the developmental process of meiosis in Saccharomyces cerevisiae. Custom-designed, high-resolution tiling arrays were used to define the time-resolved transcriptome of cells undergoing meiosis and sporulation. These arrays were specifically designed for the S. cerevisiae strain SK1 that sporulates with high efficiency and synchrony. In addition, new methods were created to define transcript boundaries and to identify dynamic changes in transcript expression and architecture over time. Of 8407 total segments, 699 (8.3%) were identified by our algorithm as regions containing potential transcript architecture changes. Our analyses reveal extensive changes to both the coding and noncoding transcriptome, including altered 5' ends, 3' ends, and splice sites. Additionally, 3910 (46.5%) unannotated expressed segments were identified. Interestingly, subsets of unannotated RNAs are located across from introns (anti-introns) or across from the junction between two genes (anti-intergenic junctions). Many of these unannotated RNAs are abundant and exhibit sporulation-specific changes in expression patterns. All work, including heat maps of the tiling array, annotation for the SK1 strain, and phastCONS conservation analysis, is available at Our high-resolution transcriptome analyses reveal that coding and noncoding transcript architectures are exceptionally dynamic in S. cerevisiae and suggest a vast array of novel transcriptional and post-transcriptional control mechanisms that are activated upon meiosis and sporulation.

Opportunities in Systems Biology to Discover Mechanisms and Repurpose Drugs for CNS Diseases

Drug Discovery Today. Nov, 2012  |  Pubmed ID: 22750722

Therapies for central nervous system (CNS) diseases remain an unmet medical need. This is largely due to multiple unknown disease-modifying genes and pathways. Systems biology through network modeling has shown promise in discovering novel therapeutic targets, deciphering disease mechanisms, and suggesting drug repurposing opportunities. In this article we cover current progress in systems biology and its role, applications, and challenges in the pharmaceutical industry. We also outline a practical strategy to infer drug repositioning candidates for rare CNS diseases by describing Multiple Level Network Modeling (MLNM) analysis.

DNA Methylation Alterations in Response to Pesticide Exposure in Vitro

Environmental and Molecular Mutagenesis. Aug, 2012  |  Pubmed ID: 22847954

Although pesticides are subject to extensive carcinogenicity testing before regulatory approval, pesticide exposure has repeatedly been associated with various cancers. This suggests that pesticides may cause cancer via nonmutagenicity mechanisms. The present study provides evidence to support the hypothesis that pesticide-induced cancer may be mediated in part by epigenetic mechanisms. We examined whether exposure to seven commonly used pesticides (i.e., fonofos, parathion, terbufos, chlorpyrifos, diazinon, malathion, and phorate) induces DNA methylation alterations in vitro. We conducted genome-wide DNA methylation analyses on DNA samples obtained from the human hematopoietic K562 cell line exposed to ethanol (control) and several organophosphate pesticides (OPs) using the Illumina Infinium HumanMethylation27 BeadChip. Bayesian-adjusted t-tests were used to identify differentially methylated gene promoter CpG sites. In this report, we present our results on three pesticides (fonofos, parathion, and terbufos) that clustered together based on principle component analysis and hierarchical clustering. These three pesticides induced similar methylation changes in the promoter regions of 712 genes, while also exhibiting their own OP-specific methylation alterations. Functional analysis of methylation changes specific to each OP, or common to all three OPs, revealed that differential methylation was associated with numerous genes that are involved in carcinogenesis-related processes. Our results provide experimental evidence that pesticides may modify gene promoter DNA methylation levels, suggesting that epigenetic mechanisms may contribute to pesticide-induced carcinogenesis. Further studies in other cell types and human samples are required, as well as determining the impact of these methylation changes on gene expression.

Genome-wide Study of DNA Methylation Alterations in Response to Diazinon Exposure in Vitro

Environmental Toxicology and Pharmacology. Nov, 2012  |  Pubmed ID: 22964155

Pesticide exposure has repeatedly been associated with cancers. However, molecular mechanisms are largely undetermined. In this study, we examined whether exposure to diazinon, a common organophosphate that has been associated with cancers, could induce DNA methylation alterations. We conducted genome-wide DNA methylation analyses on DNA samples obtained from human hematopoietic K562 cell exposed to diazinon and ethanol using the Illumina Infinium HumanMethylation27 BeadChip. Bayesian-adjusted t-tests were used to identify differentially methylated gene promoter CpG sites. We identified 1069 CpG sites in 984 genes with significant methylation changes in diazinon-treated cells. Gene ontology analysis demonstrated that some genes are tumor suppressor genes, such as TP53INP1 (3.0-fold, q-value <0.001) and PTEN (2.6-fold, q-value <0.001), some genes are in cancer-related pathways, such as HDAC3 (2.2-fold, q-value=0.002), and some remain functionally unknown. Our results provided direct experimental evidence that diazinon may modify gene promoter DNA methylation levels, which may play a pathological role in cancer development.

Technical Reproducibility of Genotyping SNP Arrays Used in Genome-wide Association Studies

PloS One. 2012  |  Pubmed ID: 22970228

During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders' quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low.

A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads

PloS One. 2012  |  Pubmed ID: 23049702

The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at

A Framework for Annotating Human Genome in Disease Context

PloS One. 2012  |  Pubmed ID: 23251346

Identification of gene-disease association is crucial to understanding disease mechanism. A rapid increase in biomedical literatures, led by advances of genome-scale technologies, poses challenge for manually-curated-based annotation databases to characterize gene-disease associations effectively and timely. We propose an automatic method-The Disease Ontology Annotation Framework (DOAF) to provide a comprehensive annotation of the human genome using the computable Disease Ontology (DO), the NCBO Annotator service and NCBI Gene Reference Into Function (GeneRIF). DOAF can keep the resulting knowledgebase current by periodically executing automatic pipeline to re-annotate the human genome using the latest DO and GeneRIF releases at any frequency such as daily or monthly. Further, DOAF provides a computable and programmable environment which enables large-scale and integrative analysis by working with external analytic software or online service platforms. A user-friendly web interface ( is implemented to allow users to efficiently query, download, and view disease annotations and the underlying evidences.

The Disease and Gene Annotations (DGA): an Annotation Resource for Human Disease

Nucleic Acids Research. Jan, 2013  |  Pubmed ID: 23197658

Disease and Gene Annotations database (DGA, is a collaborative effort aiming to provide a comprehensive and integrative annotation of the human genes in disease network context by integrating computable controlled vocabulary of the Disease Ontology (DO version 3 revision 2510, which has 8043 inherited, developmental and acquired human diseases), NCBI Gene Reference Into Function (GeneRIF) and molecular interaction network (MIN). DGA integrates these resources together using semantic mappings to build an integrative set of disease-to-gene and gene-to-gene relationships with excellent coverage based on current knowledge. DGA is kept current by periodically reparsing DO, GeneRIF, and MINs. DGA provides a user-friendly and interactive web interface system enabling users to efficiently query, download and visualize the DO tree structure and annotations as a tree, a network graph or a tabular list. To facilitate integrative analysis, DGA provides a web service Application Programming Interface for integration with external analytic tools.

Inspired by Intricacy. Interviewed by Elizabeth Gardner

Health Data Management. Apr, 2013  |  Pubmed ID: 23638605

Linezolid Exerts Greater Bacterial Clearance but No Modification of Host Lung Gene Expression Profiling: A Mouse MRSA Pneumonia Model

PloS One. 2013  |  Pubmed ID: 23826353

Linezolid (LZD) is beneficial to patients with MRSA pneumonia, but whether and how LZD influences global host lung immune responses at the mRNA level during MRSA-mediated pneumonia is still unknown.

An RDF/OWL Knowledge Base for Query Answering and Decision Support in Clinical Pharmacogenetics

Studies in Health Technology and Informatics. 2013  |  Pubmed ID: 23920613

Genetic testing for personalizing pharmacotherapy is bound to become an important part of clinical routine. To address associated issues with data management and quality, we are creating a semantic knowledge base for clinical pharmacogenetics. The knowledge base is made up of three components: an expressive ontology formalized in the Web Ontology Language (OWL 2 DL), a Resource Description Framework (RDF) model for capturing detailed results of manual annotation of pharmacogenomic information in drug product labels, and an RDF conversion of relevant biomedical datasets. Our work goes beyond the state of the art in that it makes both automated reasoning as well as query answering as simple as possible, and the reasoning capabilities go beyond the capabilities of previously described ontologies.

Transcriptional Events During the Recovery from MRSA Lung Infection: a Mouse Pneumonia Model

PloS One. 2013  |  Pubmed ID: 23936388

Community associated methicillin-resistant Staphylococcus aureus (CA-MRSA) is an emerging threat to human health throughout the world. Rodent MRSA pneumonia models mainly focus on the early innate immune responses to MRSA lung infection. However, the molecular pattern and mechanisms of recovery from MRSA lung infection are largely unknown. In this study, a sublethal mouse MRSA pneumonia model was employed to investigate late events during the recovery from MRSA lung infection. We compared lung bacterial clearance, bronchoalveolar lavage fluid (BALF) characterization, lung histology, lung cell proliferation, lung vascular permeability and lung gene expression profiling between days 1 and 3 post MRSA lung infection. Compared to day 1 post infection, bacterial colony counts, BALF total cell number and BALF protein concentration significantly decreased at day 3 post infection. Lung cDNA microarray analysis identified 47 significantly up-regulated and 35 down-regulated genes (p<0.01, 1.5 fold change [up and down]). The pattern of gene expression suggests that lung recovery is characterized by enhanced cell division, vascularization, wound healing and adjustment of host adaptive immune responses. Proliferation assay by PCNA staining further confirmed that at day 3 lungs have significantly higher cell proliferation than at day 1. Furthermore, at day 3 lungs displayed significantly lower levels of vascular permeability to albumin, compared to day 1. Collectively, this data helps us elucidate the molecular mechanisms of the recovery after MRSA lung infection.

Some Experiences and Opportunities for Big Data in Translational Research

Genetics in Medicine : Official Journal of the American College of Medical Genetics. Oct, 2013  |  Pubmed ID: 24008998

Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of "big data." The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records.

In Vitro Evaluation of Novel Inhibitors Against the NS2B-NS3 Protease of Dengue Fever Virus Type 4

Molecules (Basel, Switzerland). Dec, 2013  |  Pubmed ID: 24352016

The discovery of potent therapeutic compounds against dengue virus is urgently needed. The NS2B-NS3 protease (NS2B-NS3pro) of dengue fever virus carries out all enzymatic activities needed for polyprotein processing and is considered to be amenable to antiviral inhibition by analogy. Virtual screening of 300,000 compounds using Autodock 3 on the GVSS platform was conducted to identify novel inhibitors against the NS2B-NS3pro. Thirty-six compounds were selected for in vitro assay against NS2B-NS3pro expressed in Pichia pastoris. Seven novel compounds were identified as inhibitors with IC50 values of 3.9 ± 0.6-86.7 ± 3.6 μM. Three strong NS2B-NS3pro inhibitors were further confirmed as competitive inhibitors with Ki values of 4.0 ± 0.4, 4.9 ± 0.3, and 3.4 ± 0.1 μM, respectively. Hydrophobic and hydrogen bond interactions between amino acid residues in the NS3pro active site with inhibition compounds were also identified.

Rodriguez Syndrome with SF3B4 Mutation: a Severe Form of Nager Syndrome?

American Journal of Medical Genetics. Part A. Jul, 2014  |  Pubmed ID: 24715698

We report on the findings of a novel heterozygous de novo SF3B4 mutation in a long-surviving patient with clinical features of Rodriguez syndrome including severe acrofacial dysostosis, phocomelia with pre- and post-axial limb defects, fibular agenesis, rib, and shoulder girdle anomalies. Since SF3B4 mutations have been recently associated with Nager syndrome, this suggests that at least some cases of Rodriguez syndrome are either allelic to or represent unusually severe manifestations of Nager syndrome. Although clinical overlap is obvious, this is somewhat surprising given the presumed autosomal recessive inheritance of Rodriguez syndrome. Investigation of other Rodriguez syndrome patients is needed to clarify the genetic mechanism and possible heterogeneity in patients with clinical features of Rodriguez syndrome.

Patients Surviving Six Months in Hospice Care: Who Are They?

Journal of Palliative Medicine. Aug, 2014  |  Pubmed ID: 24933676

On January 1, 2011, the Centers for Medicare and Medicaid Services (CMS) began requiring U.S. hospices to conduct a "face-to-face" (F2F) assessment of eligibility for continued hospice care with patients entering their third certification period (180 days after initial enrollment). Understanding which patient populations require F2F assessment is important for evaluating the impact of the CMS regulation and gauging the appropriateness of the 6-month prognosis criteria for different patient groups.

Computational Evidence of NAGNAG Alternative Splicing in Human Large Intergenic Noncoding RNA

BioMed Research International. 2014  |  Pubmed ID: 24995327

NAGNAG alternative splicing plays an essential role in biological processes and represents a highly adaptable system for posttranslational regulation of gene function. NAGNAG alternative splicing impacts a myriad of biological processes. Previous studies of NAGNAG largely focused on messenger RNA. To the best of our knowledge, this is the first study testing the hypothesis that NAGNAG alternative splicing is also operative in large intergenic noncoding RNA (lincRNA). The RNA-seq data sets from recent deep sequencing studies were queried to test our hypothesis. NAGNAG alternative splicing of human lincRNA was identified while querying two independent RNA-seq data sets. Within these datasets, 31 NAGNAG alternative splicing sites were identified in lincRNA. Notably, most exons of lincRNA containing NAGNAG acceptors were longer than those from protein-coding genes. Furthermore, presence of CAG coding appeared to participate in the splice site selection. Finally, expression of the isoforms of NAGNAG lincRNA exhibited tissue specificity. Together, this study improves our understanding of the NAGNAG alternative splicing in lincRNA.

A Google Glass Application to Support Shoppers with Dietary Management of Diabetes

Journal of Diabetes Science and Technology. Nov, 2014  |  Pubmed ID: 25015954

Phenome-wide Association Studies Demonstrating Pleiotropy of Genetic Variants Within FTO with and Without Adjustment for Body Mass Index

Frontiers in Genetics. 2014  |  Pubmed ID: 25177340

Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associations derived from traditional genetic studies as well as identifying novel genetic associations. Here we used an electronic health record (EHR)-based PheWAS to explore pleiotropy of genetic variants in the fat mass and obesity associated gene (FTO), some of which have been previously associated with obesity and type 2 diabetes (T2D). We used a population of 10,487 individuals of European ancestry with genome-wide genotyping from the Electronic Medical Records and Genomics (eMERGE) Network and another population of 13,711 individuals of European ancestry from the BioVU DNA biobank at Vanderbilt genotyped using Illumina HumanExome BeadChip. A meta-analysis of the two study populations replicated the well-described associations between FTO variants and obesity (odds ratio [OR] = 1.25, 95% Confidence Interval = 1.11-1.24, p = 2.10 × 10(-9)) and FTO variants and T2D (OR = 1.14, 95% CI = 1.08-1.21, p = 2.34 × 10(-6)). The meta-analysis also demonstrated that FTO variant rs8050136 was significantly associated with sleep apnea (OR = 1.14, 95% CI = 1.07-1.22, p = 3.33 × 10(-5)); however, the association was attenuated after adjustment for body mass index (BMI). Novel phenotype associations with obesity-associated FTO variants included fibrocystic breast disease (rs9941349, OR = 0.81, 95% CI = 0.74-0.91, p = 5.41 × 10(-5)) and trends toward associations with non-alcoholic liver disease and gram-positive bacterial infections. FTO variants not associated with obesity demonstrated other potential disease associations including non-inflammatory disorders of the cervix and chronic periodontitis. These results suggest that genetic variants in FTO may have pleiotropic associations, some of which are not mediated by obesity.

Assessing Technical Performance in Differential Gene Expression Experiments with External Spike-in RNA Control Ratio Mixtures

Nature Communications. 2014  |  Pubmed ID: 25254650

There is a critical need for standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments. Here we assess technical performance with a proposed standard 'dashboard' of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagnostic performance of differentially expressed transcript lists, limit of detection of ratio (LODR) estimates and expression ratio variability and measurement bias. The performance metrics suite is applicable to analysis of a typical experiment, and here we also apply these metrics to evaluate technical performance among laboratories. An interlaboratory study using identical samples shared among 12 laboratories with three different measurement processes demonstrates generally consistent diagnostic power across 11 laboratories. Ratio measurement variability and bias are also comparable among laboratories for the same measurement process. We observe different biases for measurement processes using different mRNA-enrichment protocols.

Enabling Online Studies of Conceptual Relationships Between Medical Terms: Developing an Efficient Web Platform

JMIR Medical Informatics. Oct, 2014  |  Pubmed ID: 25600290

The Unified Medical Language System (UMLS) contains many important ontologies in which terms are connected by semantic relations. For many studies on the relationships between biomedical concepts, the use of transitively associated information from ontologies and the UMLS has been shown to be effective. Although there are a few tools and methods available for extracting transitive relationships from the UMLS, they usually have major restrictions on the length of transitive relations or on the number of data sources.

A Fuzzy-match Search Engine for Physician Directories

JMIR Medical Informatics. Nov, 2014  |  Pubmed ID: 25601050

A search engine to find physicians' information is a basic but crucial function of a health care provider's website. Inefficient search engines, which return no results or incorrect results, can lead to patient frustration and potential customer loss. A search engine that can handle misspellings and spelling variations of names is needed, as the United States (US) has culturally, racially, and ethnically diverse names.

Efficiently Mining Adverse Event Reporting System for Multiple Drug Interactions

AMIA Joint Summits on Translational Science Proceedings. AMIA Joint Summits on Translational Science. 2014  |  Pubmed ID: 25717411

Efficiently mining multiple drug interactions and reactions from Adverse Event Reporting System (AERS) is a challenging problem which has not been sufficiently addressed by existing methods. To tackle this challenge, we propose a FCI-fliter approach which leverages the efforts of UMLS mapping, frequent closed itemset mining, and uninformative association identification and removal. By applying our method on AERS, we identified a large number of multiple drug interactions with reactions. By statistical analysis, we found most of the identified associations have very small p-values which suggest that they are statistically significant. Further analysis on the results shows that many multiple drug interactions and reactions are clinically interesting, and suggests that our method may be further improved with the combination of external knowledge.

Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing

Proceedings of the ... USENIX Security Symposium. UNIX Security Symposium. Aug, 2014  |  Pubmed ID: 27077138

We initiate the study of privacy in pharmacogenetics, wherein machine learning models are used to guide medical treatments based on a patient's genotype and background. Performing an in-depth case study on privacy in personalized warfarin dosing, we show that suggested models carry privacy risks, in particular because attackers can perform what we call model inversion: an attacker, given the model and some demographic information about a patient, can predict the patient's genetic markers. As differential privacy (DP) is an oft-proposed solution for medical settings such as this, we evaluate its effectiveness for building private versions of pharmacogenetic models. We show that DP mechanisms prevent our model inversion attacks when the privacy budget is carefully selected. We go on to analyze the impact on utility by performing simulated clinical trials with DP dosing models. We find that for privacy budgets effective at preventing attacks, patients would be exposed to increased risk of stroke, bleeding events, and mortality. We conclude that current DP mechanisms do not simultaneously improve genomic privacy while retaining desirable clinical efficacy, highlighting the need for new mechanisms that should be evaluated in situ using the general methodology introduced by our work.

SeqHBase: a Big Data Toolset for Family Based Sequencing Data Analysis

Journal of Medical Genetics. Apr, 2015  |  Pubmed ID: 25587064

Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis.

The Cell Cycle Regulator Ecdysoneless Cooperates with H-Ras to Promote Oncogenic Transformation of Human Mammary Epithelial Cells

Cell Cycle (Georgetown, Tex.). 2015  |  Pubmed ID: 25616580

The mammalian ortholog of Drosophila ecdysoneless (Ecd) gene product regulates Rb-E2F interaction and is required for cell cycle progression. Ecd is overexpressed in breast cancer and its overexpression predicts shorter survival in patients with ErbB2-positive tumors. Here, we demonstrate Ecd knock down (KD) in human mammary epithelial cells (hMECs) induces growth arrest, similar to the impact of Ecd Knock out (KO) in mouse embryonic fibroblasts. Furthermore, whole-genome mRNA expression analysis of control vs. Ecd KD in hMECs demonstrated that several of the top 40 genes that were down-regulated were E2F target genes. To address the role of Ecd in mammary oncogenesis, we overexpressed Ecd and/or mutant H-Ras in hTERT-immortalized hMECs. Cell cycle analyses revealed hMECs overexpressing Ecd+Ras showed incomplete arrest in G1 phase upon growth factor deprivation, and more rapid cell cycle progression in growth factor-containing medium. Analyses of cell migration, invasion, acinar structures in 3-D Matrigel and anchorage-independent growth demonstrated that Ecd+Ras-overexpressing cells exhibit substantially more dramatic transformed phenotype as compared to cells expressing vector, Ras or Ecd. Under conditions of nutrient deprivation, Ecd+Ras-overexpressing hMECs exhibited better survival, with substantial upregulation of the autophagy marker LC3 both at the mRNA and protein levels. Significantly, while hMECs expressing Ecd or mutant Ras alone did not form tumors in NOD/SCID mice, Ecd+Ras-overexpressing hMECs formed tumors, clearly demonstrating oncogenic cooperation between Ecd and mutant Ras. Collectively, we demonstrate an important co-oncogenic role of Ecd in the progression of mammary oncogenesis through promoting cell survival.

Application of Clinical Text Data for Phenome-wide Association Studies (PheWASs)

Bioinformatics (Oxford, England). Jun, 2015  |  Pubmed ID: 25657332

Genome-wide association studies (GWASs) are effective for describing genetic complexities of common diseases. Phenome-wide association studies (PheWASs) offer an alternative and complementary approach to GWAS using data embedded in the electronic health record (EHR) to define the phenome. International Classification of Disease version 9 (ICD9) codes are used frequently to define the phenome, but using ICD9 codes alone misses other clinically relevant information from the EHR that can be used for PheWAS analyses and discovery.

Opportunities for Drug Repositioning from Phenome-wide Association Studies

Nature Biotechnology. Apr, 2015  |  Pubmed ID: 25850054

Comparison of RNA-seq and Microarray-based Models for Clinical Endpoint Prediction

Genome Biology. Jun, 2015  |  Pubmed ID: 26109056

Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model.

Collecting and Analyzing Patient Experiences of Health Care From Social Media

JMIR Research Protocols. Jul, 2015  |  Pubmed ID: 26137885

Social Media, such as Yelp, provides rich information of consumer experience. Previous studies suggest that Yelp can serve as a new source to study patient experience. However, the lack of a corpus of patient reviews causes a major bottleneck for applying computational techniques.

Context-Sensitive Spelling Correction of Consumer-Generated Content on Health Care

JMIR Medical Informatics. Jul, 2015  |  Pubmed ID: 26232246

Consumer-generated content, such as postings on social media websites, can serve as an ideal source of information for studying health care from a consumer's perspective. However, consumer-generated content on health care topics often contains spelling errors, which, if not corrected, will be obstacles for downstream computer-based text analysis.

A Conceptual Model for Translating Omic Data into Clinical Action

Journal of Pathology Informatics. 2015  |  Pubmed ID: 26430534

Genomic, proteomic, epigenomic, and other "omic" data have the potential to enable precision medicine, also commonly referred to as personalized medicine. The volume and complexity of omic data are rapidly overwhelming human cognitive capacity, requiring innovative approaches to translate such data into patient care. Here, we outline a conceptual model for the application of omic data in the clinical context, called "the omic funnel." This model parallels the classic "Data, Information, Knowledge, Wisdom pyramid" and adds context for how to move between each successive layer. Its goal is to allow informaticians, researchers, and clinicians to approach the problem of translating omic data from bench to bedside, by using discrete steps with clearly defined needs. Such an approach can facilitate the development of modular and interoperable software that can bring precision medicine into widespread practice.

Practical Considerations in Genomic Decision Support: The EMERGE Experience

Journal of Pathology Informatics. 2015  |  Pubmed ID: 26605115

Genomic medicine has the potential to improve care by tailoring treatments to the individual. There is consensus in the literature that pharmacogenomics (PGx) may be an ideal starting point for real-world implementation, due to the presence of well-characterized drug-gene interactions. Clinical Decision Support (CDS) is an ideal avenue by which to implement PGx at the bedside. Previous literature has established theoretical models for PGx CDS implementation and discussed a number of anticipated real-world challenges. However, work detailing actual PGx CDS implementation experiences has been limited. Anticipated challenges include data storage and management, system integration, physician acceptance, and more.

WHATIF: An Open-source Desktop Application for Extraction and Management of the Incidental Findings from Next-generation Sequencing Variant Data

Computers in Biology and Medicine. Jan, 2016  |  Pubmed ID: 25890833

Identification and evaluation of incidental findings in patients following whole exome (WGS) or whole genome sequencing (WGS) is challenging for both practicing physicians and researchers. The American College of Medical Genetics and Genomics (ACMG) recently recommended a list of reportable incidental genetic findings. However, no informatics tools are currently available to support evaluation of incidental findings in next-generation sequencing data.

A Practical Guide for Exploring Opportunities of Repurposing Drugs for CNS Diseases in Systems Biology

Methods in Molecular Biology (Clifton, N.J.). 2016  |  Pubmed ID: 26235090

Systems biology has shown its potential in facilitating pathway-focused therapy development for central nervous system (CNS) diseases. An integrated network can be utilized to explore the multiple disease mechanisms and to discover repositioning opportunities. This review covers current therapeutic gaps for CNS diseases and the role of systems biology in pharmaceutical industry. We conclude with a Multiple Level Network Modeling (MLNM) example to illustrate the great potential of systems biology for CNS diseases. The system focuses on the benefit and practical applications in pathway centric therapy and drug repositioning.

A Review on Genomics APIs

Computational and Structural Biotechnology Journal. 2016  |  Pubmed ID: 26702340

The constant improvement and falling prices of whole human genome Next Generation Sequencing (NGS) has resulted in rapid adoption of genomic information at both clinics and research institutions. Considered together, the complexity of genomics data, due to its large volume and diversity along with the need for genomic data sharing, has resulted in the creation of Application Programming Interface (API) for secure, modular, interoperable access to genomic data from different applications, platforms, and even organizations. The Genomics APIs are a set of special protocols that assist software developers in dealing with multiple genomic data sources for building seamless, interoperable applications leading to the advancement of both genomic and clinical research. These APIs help define a standard for retrieval of genomic data from multiple sources as well as to better package genomic information for integration with Electronic Health Records. This review covers three currently available Genomics APIs: a) Google Genomics, b) SMART Genomics, and c) 23andMe. The functionalities, reference implementations (if available) and authentication protocols of each API are reviewed. A comparative analysis of the different features across the three APIs is provided in the Discussion section. Though Genomics APIs are still under active development and have yet to reach widespread adoption, they hold the promise to make building of complicated genomics applications easier with downstream constructive effects on healthcare.

Computerized "Learn-As-You-Go" Classification of Traumatic Brain Injuries Using NEISS Narrative Data

Accident; Analysis and Prevention. Apr, 2016  |  Pubmed ID: 26851618

One important routine task in injury research is to effectively classify injury circumstances into user-defined categories when using narrative text. However, traditional manual processes can be time consuming, and existing batch learning systems can be difficult to utilize by novice users. This study evaluates a "Learn-As-You-Go" machine-learning program. When using this program, the user trains classification models and interactively checks on accuracy until a desired threshold is reached. We examined the narrative text of traumatic brain injuries (TBIs) in the National Electronic Injury Surveillance System (NEISS) and classified TBIs into sport and non-sport categories. Our results suggest that the DUALIST "Learn-As-You-Go" program, which features a user-friendly online interface, is effective in injury narrative classification. In our study, the time frame to classify tens of thousands of narratives was reduced from a few days to minutes after approximately sixty minutes of training.

MD-CTS: An Integrated Terminology Reference of Clinical and Translational Medicine

Computational and Structural Biotechnology Journal. 2016  |  Pubmed ID: 27069559

New vocabularies are rapidly evolving in the literature relative to the practice of clinical medicine and translational research. To provide integrated access to new terms, we developed a mobile and desktop online reference-Marshfield Dictionary of Clinical and Translational Science (MD-CTS). It is the first public resource that comprehensively integrates Wiktionary (word definition), BioPortal (ontology), Wiki (image reference), and Medline abstract (word usage) information. MD-CTS is accessible at The website provides a broadened capacity for the wider clinical and translational science community to keep pace with newly emerging scientific vocabulary. An initial evaluation using 63 randomly selected biomedical words suggests that online references generally provided better coverage (73%-95%) than paper-based dictionaries (57-71%).

'RE:fine Drugs': an Interactive Dashboard to Access Drug Repurposing Opportunities

Database : the Journal of Biological Databases and Curation. 2016  |  Pubmed ID: 27189611

The process of discovering new drugs has been extremely costly and slow in the last decades despite enormous investment in pharmaceutical research. Drug repurposing enables researchers to speed up the process of discovering other conditions that existing drugs can effectively treat, with low cost and fast FDA approval. Here, we introduce 'RE:fine Drugs', a freely available interactive website for integrated search and discovery of drug repurposing candidates from GWAS and PheWAS repurposing datasets constructed using previously reported methods in Nature Biotechnology. 'RE:fine Drugs' demonstrates the possibilities to identify and prioritize novelty of candidates for drug repurposing based on the theory of transitive Drug-Gene-Disease triads. This public website provides a starting point for research, industry, clinical and regulatory communities to accelerate the investigation and validation of new therapeutic use of old drugs.Database URL:

The Health Care and Life Sciences Community Profile for Dataset Descriptions

PeerJ. 2016  |  Pubmed ID: 27602295

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

Associations of Blood Mercury, Inorganic Mercury, Methyl Mercury and Bisphenol A with Dental Surface Restorations in the U.S. Population, NHANES 2003-2004 and 2010-2012

Ecotoxicology and Environmental Safety. Dec, 2016  |  Pubmed ID: 27639196

The potential adverse health effects of mercury from amalgam and bisphenol A (BPA) from composite resin have been significant concerns. It is unclear whether dental restorative materials significantly contribute to mercury or BPA levels. The purpose of this study is to use NHANES data including 14,703 subjects (2003-2004: n=7514; 2011-2012: n=7189) to examine the association between Dental Surface Restorations (DSR) and blood total mercury (THg), inorganic mercury (IHg), methyl mercury (MeHg) and urinary BPA through the stratification of covariates and multivariate analysis. Subjects were divided into three groups based on the number of dental surface restorations (DSRs, 0, 1-8, >8). Blood THg and IHg in 2003-2004 were significantly higher in the subjects with DSR (geometric mean of 0.48, 0.69 and 1.17μg/l for THg; 0.32, 0.33 and 0.39μg/l for IHg with DSR 0, 1-8 and >8). Similarly, increases of THg, IHg and MeHg were also observed in 2013-2014 (geometric mean of 0.51, 0.69 and 0.99μg/l for THg; 0.40, 0.49 and 0.66μg/l for MeHg; 0.20, 0.22 and 0.29μg/l for IHg with DSR 0, 1-8 and >8). Linear regression model analysis revealed blood THg and IHg in 2003-2004 and THg, IHg and MeHg in 2011-2012 were quantitatively associated with the number of DSRs. A dramatic decrease in urinary BPA from 2003 to 2004-2011-2012 was observed, but no significant increase with DSRs in either period of study. In conclusion, significant increases in blood THg, IHg, and MeHg in the subjects with DSRs are confirmed in a nationally representative population, a critical step in assessing the potential risk of adverse effects from dental restorative materials, but no association between dental fillings and urinary BPA was found.

MYPT1 Isoforms Expressed in HEK293T Cells Are Differentially Phosphorylated After GTPγS Treatment

Journal of Smooth Muscle Research = Nihon Heikatsukin Gakkai Kikanshi. 2016  |  Pubmed ID: 27725371

Agonist stimulation of smooth muscle is known to activate RhoA/Rho kinase signaling, and Rho kinase phosphorylates the myosin targeting subunit (MYPT1) of myosin light chain (MLC) phosphatase at Thr696 and Thr853, which inhibits the activity of MLC phosphatase to produce a Ca(2+) independent increase in MLC phosphorylation and force (Ca(2+) sensitization). Alternative mRNA splicing produces four MYPT1 isoforms, which differ by the presence or absence of a central insert (CI) and leucine zipper (LZ). This study was designed to determine if Rho kinase differentially phosphorylates MYPT1 isoforms. In HEK293T cells expressing each of the four MYPT1 isoforms, we could not detect a change in Thr853 MYPT1 phosphorylation following GTPγS treatment. However, there is differential phosphorylation of MYPT1 isoforms at Thr696; GTPγS treatment increases MYPT1 phosphorylation for the CI+LZ- and CI-LZ- MYPT1 isoforms, but not the CI+LZ+ or CI-LZ+ MYPT1 isoforms. These data could suggest that in smooth muscle Rho kinase differentially phosphorylates MYPT1 isoforms.

Patient-centered Design Criteria for Wearable Seizure Detection Devices

Epilepsy & Behavior : E&B. Nov, 2016  |  Pubmed ID: 27741462

Epilepsy is a common neurological condition. Seizure diary reports and patient- or caregiver-reported seizure counts are often inaccurate and underestimated. Many caregivers express stress and anxiety about the patient with epilepsy having seizures when they are not present. Therefore, a need exists for the ability to recognize and/or detect a seizure in the home setting. However, few studies have inquired on detection device features that are important to patients and their caregivers.

simple hit counter