You do not have subscription access to articles in this section. Learn more about access.

  JoVE General

You do not have subscription access to articles in this section. Learn more about access.

  JoVE Neuroscience

You do not have subscription access to articles in this section. Learn more about access.

  JoVE Immunology and Infection

You do not have subscription access to articles in this section. Learn more about access.

  JoVE Clinical and Translational Medicine

You do not have subscription access to articles in this section. Learn more about access.

  JoVE Bioengineering

You do not have subscription access to articles in this section. Learn more about access.

  JoVE Applied Physics

You do not have subscription access to articles in this section. Learn more about access.

  JoVE Chemistry

You do not have subscription access to articles in this section. Learn more about access.

  JoVE Behavior

You do not have subscription access to articles in this section. Learn more about access.

  JoVE Environment


JoVE Science Education

General Laboratory Techniques

You do not have subscription access to videos in this collection. Learn more about access.

Basic Methods in Cellular and Molecular Biology

You do not have subscription access to videos in this collection. Learn more about access.

Model Organisms I

You do not have subscription access to videos in this collection. Learn more about access.

Model Organisms II

You have trial access to videos in this collection until May 31, 2014.

In JoVE (1)

Other Publications (19)

Articles by Nozomu Yachie in JoVE

 JoVE General

The Green Monster Process for the Generation of Yeast Strains Carrying Multiple Gene Deletions

1Department of Synthetic Biology and Bioenergy, J. Craig Venter Institute, 2Department of Microbial and Environmental Genomics, J. Craig Venter Institute, 3Donnelly Centre & Department of Molecular Genetics, University of Toronto, 4Lunenfeld Research Institute, Mt Sinai Hospital

JoVE 4072

The Green Monster method enables the rapid assembly of multiple deletions marked with a reporter gene encoding green fluorescent protein. This method is based on driving yeast strains through repeated cycles of sexual assortment of deletions and fluorescence-based enrichment of cells carrying more deletions.

Other articles by Nozomu Yachie on PubMed

Computational Analysis of MicroRNA Targets in Caenorhabditis Elegans

MicroRNAs (miRNAs) are endogenous approximately 22-nucleotide (nt) non-coding RNAs that post-transcriptionally regulate the expression of target genes via hybridization to target mRNA. Using known pairs of miRNA and target mRNA in Caenorhabditis elegans, we first performed computational analysis for specific hybridization patterns between these two RNAs. We counted the numbers of perfectly complementary dinucleotide sequences and calculated the free energy within complementary base pairs of each dinucleotide, observed by sliding a 2-nt window along all nucleotides of the miRNA-mRNA duplex. We confirmed not only strong base pairing within the 5' region of miRNAs (nts 1-8) in C. elegans, but also the required mismatch within the central region (nt 9 or nt 10), and we found weak binding within the 3' region (nts 13-14). We also predicted 687 possible miRNA target transcripts, many of which are thought to be involved in C. elegans development, by combining the above mentioned hybridization tendency with the following analyses: (1) prediction of the miRNA-mRNA duplex with free-energy minimization; (2) identification of the complementary pattern within the miRNA-mRNA duplex; (3) conservation of target sites between C. elegans and C. briggsae, a related soil nematode; and (4) extraction of mRNA candidates with multiple target sites. Rigorous tests using shuffled miRNA controls supported these predictions. Our results suggest that miRNAs recognize their target mRNAs by their hybridization pattern and that many target mRNAs may be regulated through a combination of several specific miRNA target sites in C. elegans.

Prediction of Non-coding and Antisense RNA Genes in Escherichia Coli with Gapped Markov Model

A new mathematical index was developed to identify and characterize non-coding RNA (ncRNA) genes encoded within the Escherichia coli (E. coli) genome. It was designated the GMMI (Gapped Markov Model Index) and used to evaluate sequence patterns located at the separate positions of consensus sequences, codon biases and/or possible RNA structures on the basis of the Markov model. The GMMI was able to separate a set of known mRNA sequences from a mixture of ncRNAs including tRNAs and rRNAs. Consequently, the GMMI was employed to predict novel ncRNA candidates. At the beginning, possible transcription units were extracted from the E. coli genome using consensus sequences for the sigma70 promoter and the rho-independent terminator. Then, these units were evaluated by using the GMMI. This identified 133 candidate ncRNAs, which contain 29 previously annotated small RNA genes and 46 possible antisense ncRNAs. Furthermore 12 transcripts (including five antisense RNAs) were confirmed according to the expression analysis. These data suggests that the expression of small antisense RNAs might be more common than previously thought in the E. coli genome.

HybGFS: a Hybrid Method for Genome-fingerprint Scanning

Protein identification based on mass spectrometry (MS) has previously been performed using peptide mass fingerprinting (PMF) or tandem MS (MS/MS) database searching. However, these methods cannot identify proteins that are not already listed in existing databases. Moreover, the alternative approach of de novo sequencing requires costly equipment and the interpretation of complex MS/MS spectra. Thus, there is a need for novel high-throughput protein-identification methods that are independent of existing predefined protein databases.

Prediction of Liquid Chromatographic Retention Times of Peptides Generated by Protease Digestion of the Escherichia Coli Proteome Using Artificial Neural Networks

We developed a computational method to predict the retention times of peptides in HPLC using artificial neural networks (ANN). We performed stepwise multiple linear regressions and selected for ANN input amino acids that significantly affected the LC retention time. Unlike conventional linear models, the trained ANN accurately predicted the retention time of peptides containing up to 50 amino acid residues. In 834 peptides, there was a strong correlation (R2 = 0.928) between measured and predicted retention times. We demonstrated the utility of our method by the prediction of the retention time of 121,273 peptides resulting from LysC-digestion of the Escherichia coli proteome. Our approach is useful for the proteome-wide characterization of peptides and the identification of unknown peptide peaks obtained in proteome analysis.

On the Interplay of Gene Positioning and the Role of Rho-independent Terminators in Escherichia Coli

The majority of intrinsic rho-independent terminator signals, reported to consist of stable hairpin structures followed by T-rich regions, possess the potential to operate bi-directionally and to induce transcription terminations on both strands of the DNA duplex in Escherichia coli. By using RNAMotif software, we investigated the distributions of termination motifs around the 3'-ends of overlapping and non-overlapping genes at the genomic level. We suggest that the positions of compactly encoded E. coli genes and rho-independent terminators are optimized to terminate the adjoining genes on their antisense strands efficiently, and not to mis-terminate overlapping transcripts, due to their bi-directional properties.

SPLITS: a New Program for Predicting Split and Intron-containing TRNA Genes at the Genome Level

In the archaea, some tRNA precursors contain intron(s) not only in the anticodon loop region but also in diverse sites of the gene (intron-containing tRNA or cis-spliced tRNA). The parasite Nanoarchaeum equitans, a member of the Nanoarchaeota kingdom, creates functional tRNA from separate genes, one encoding the 5'-half and the other the 3'-half (split tRNA or trans-spliced tRNA). Although recent genome projects have revealed a huge amount of nucleotide sequence data in the archaea, a comprehensive methodology for intron-containing and split tRNA searching is yet to be established. We therefore developed SPLITS, which is aimed at searching for any type of tRNA gene and is especially focused on intron-containing tRNAs or split tRNAs at the genome level. SPLITS initially predicts the bulge-helix-bulge splicing motif (a well-known, required structure in archaeal pre-tRNA introns) to determine and remove the intronic regions of tRNA genes. The intron-removed DNA sequences are automatically queried to tRNAscan-SE. SPLITS can predict known tRNAs with single introns located at unconventional sites on the genes (100%), tRNAs with double introns (85.7%), and known split tRNAs (100%). Our program will be very useful for identifying novel tRNA genes after completion of genome projects. The SPLITS source code is freely downloadable at

Alignment-based Approach for Durable Data Storage into Living Organisms

The practical realization of DNA data storage is a major scientific goal. Here we introduce a simple, flexible, and robust data storage and retrieval method based on sequence alignment of the genomic DNA of living organisms. Duplicated data encoded by different oligonucleotide sequences was inserted redundantly into multiple loci of the Bacillus subtilis genome. Multiple alignment of the bit data sequences decoded by B. subtilis genome sequences enabled the retrieval of stable and compact data without the need for template DNA, parity checks, or error-correcting algorithms. Combined with the computational simulation of data retrieval from mutated message DNA, a practical use of this alignment-based method is discussed.

In Silico Screening of Archaeal TRNA-encoding Genes Having Multiple Introns with Bulge-helix-bulge Splicing Motifs

In archaeal species, several transfer RNA genes have been reported to contain endogenous introns. Although most of the introns are located at anticodon loop regions between nucleotide positions 37 and 38, a number of introns at noncanonical sites and six cases of tRNA genes containing two introns have also been documented. However, these tRNA genes are often missed by tRNAscan-SE, the software most widely used for the annotation of tRNA genes. We previously developed SPLITS, a computational tool to identify tRNA genes containing one intron at a noncanonical position on the basis of its discriminative splicing motif, but the software was limited in the detection of tRNA genes with multiple introns at noncanonical sites. In this study, we initially updated the system as SPLITSX in order to correctly predict known tRNA genes as well as novel ones with multiple introns. By a comprehensive search for tRNA genes in 29 archaeal genomes using SPLITSX, we listed 43 novel candidates that contain introns at noncanonical sites. As a result, 15 contained two introns and three contained three introns within the respective putative tRNA genes. Moreover, the candidates completely complemented all the codons of two archaeal species of uncultured methanogenic archaeon, RC-I and Thermofilum pendens Hrk 5, with novel candidates that were not detectable by tRNAscan-SE alone.

EXpanda: an Integrated Platform for Network Analysis and Visualization

Analysis and visualization of biological networks, such as protein-protein and protein-DNA interactions, are crucially important toward obtaining a thorough understanding of living systems. Here, we present an integrative software platform, eXpanda, which enables an analysis of a very broad range of biological networks, with a special focus on the extraction of characteristic topologies which potentially function as units in the networks. eXpanda is provided as a Perl library which gives full-automatic connections to various biological databases via a Perl programmable interface and can perform topological analysis based on graph theory. The results of these analyses are visualizable by vector graphics. eXpanda is under GNU General Public License. Software package, detailed documentations, source codes, and some sample scripts are downloadable at

Bioinformatic Analysis of Post-transcriptional Regulation by UORF in Human and Mouse

RNA decay is thought to exert an important influence on gene expression by maintaining a steady-state level of transcripts and/or by eliminating aberrant transcripts. However, the sequence elements which control such processes have not been determined. Upstream open reading frames (uORFs) in the transcripts of several genes are reported to control translational initiation by stalling ribosomes and thereby promote RNA decay. We therefore performed bioinformatic analysis of the tissue-wide expression profiles and mRNA half-life of transcripts containing uORFs in humans and mice to assess the relationship between RNA decay and the presence of uORFs in transcripts. The expression levels of transcripts containing uORF were markedly lower than those not containing uORF. Moreover, the half-life of the uORF-containing transcripts was also shorter. These results suggest that uORFs are sequence elements that down-regulate RNA transcripts via RNA decay mechanisms.

Computational Analysis of MicroRNA-mediated Antiviral Defense in Humans

Recent studies have proposed the interesting perspective that viral gene expression is downregulated by host microRNAs (miRNAs), small non-coding RNAs well known as post-transcriptional gene regulators. We computationally predicted human miRNA target sites within 228 human-infecting and 348 invertebrate-infecting virus genomes, and we observed that human-infecting viruses were more likely than invertebrate-infecting ones to be targeted by human miRNAs. We listed 62 possible human miRNA-targeted viruses from 6 families, most of which consisted of single-stranded RNA viruses. These results suggest that miRNAs extensively mediate antiviral defenses in humans.

Permuted TRNA Genes Expressed Via a Circular RNA Intermediate in Cyanidioschyzon Merolae

A computational analysis of the nuclear genome of a red alga, Cyanidioschyzon merolae, identified 11 transfer RNA (tRNA) genes in which the 3' half of the tRNA lies upstream of the 5' half in the genome. We verified that these genes are expressed and produce mature tRNAs that are aminoacylated. Analysis of tRNA-processing intermediates for these genes indicates an unusual processing pathway in which the termini of the tRNA precursor are ligated, resulting in formation of a characteristic circular RNA intermediate that is then processed at the acceptor stem to generate the correct termini.

Comprehensive Analysis of Archaeal TRNA Genes Reveals Rapid Increase of TRNA Introns in the Order Thermoproteales

The analysis of archaeal tRNA genes is becoming more important to evaluate the origin and evolution of tRNA molecule. Even with the recent accumulation of complete genomes of numerous archaeal species, several tRNA genes are still required for a full complement of the codon table. We conducted comprehensive screening of tRNA genes from 47 archaeal genomes by using a combination of different types of tRNA prediction programs and extracted a total of 2,143 reliable tRNA gene candidates including 437 intron-containing tRNA genes, which covered more than 99.9% of the codon tables in Archaea. Previously, the content of intron-containing tRNA genes in Archaea was estimated to be approximately 15% of the whole tRNA genes, and most of the introns were known to be located at canonical positions (nucleotide position between 37 and 38) of precursor tRNA (pre-tRNA). Surprisingly, we observed marked enrichment of tRNA introns in five species of the archaeal order Thermoproteales; about 70% of tRNA gene candidates were found to be intron-containing tRNA genes, half of which contained multiple introns, and the introns were located at various noncanonical positions. Sequence similarity analysis revealed that approximately half of the tRNA introns found at Thermoproteales-specific intron locations were highly conserved among several tRNA genes. Intriguingly, identical tRNA intron sequences were found within different types of tRNA genes that completely lacked exon sequence similarity, suggesting that the tRNA introns in Thermoproteales could have been gained via intron insertion events at a later stage of tRNA evolution. Moreover, although the CCA sequence at the 3' terminal of pre-tRNA is added by a CCA-adding enzyme after gene transcription in Archaea, most of the tRNA genes containing highly conserved introns already encode the CCA sequence at their 3' terminal. Based on these results, we propose possible models explaining the rapid increase of tRNA introns as a result of intron insertion events via retrotransposition of pre-tRNAs. The sequences and secondary structures of the tRNA genes and their bulge-helix-bulge motifs were registered in SPLITSdb (, a novel and comprehensive database for archaeal tRNA genes.

Stabilizing Synthetic Data in the DNA of Living Organisms

Data-encoding synthetic DNA, inserted into the genome of a living organism, is thought to be more robust than the current media. Because the living genome is duplicated and copied into new generations, one of the merits of using DNA material is long-term data storage within heritable media. A disadvantage of this approach is that encoded data can be unexpectedly broken by mutation, deletion, and insertion of DNA, which occurs naturally during evolution and prolongation, or laboratory experiments. For this reason, several information theory-based approaches have been developed as an error check of broken DNA data in order to achieve data durability. These approaches cannot efficiently recover badly damaged data-encoding DNA. We recently developed a DNA data-storage approach based on the multiple sequence alignment method to achieve a high level of data durability. In this paper, we overview this technology and discuss strategies for optimal application of this approach.

In Silico Analysis of Phosphoproteome Data Suggests a Rich-get-richer Process of Phosphosite Accumulation over Evolution

Recent phosphoproteome analyses using mass spectrometry-based technologies have provided new insights into the extensive presence of protein phosphorylation in various species and have raised the interesting question of how this protein modification was gained evolutionarily on such a large scale. We investigated this issue by using human and mouse phosphoproteome data. We initially found that phosphoproteins followed a power-law distribution with regard to their number of phosphosites: most of the proteins included only a few phosphosites, but some included dozens of phosphosites. The power-law distribution, unlike more commonly observed distributions such as normal and log-normal distributions, is considered by the field of complex systems science to be produced by a specific rich-get-richer process called preferential attachment growth. Therefore, we explored the factors that may have promoted the rich-get-richer process during phosphosite evolution. We conducted a bioinformatics analysis to evaluate the relationship of amino acid sequences of phosphoproteins with the positions of phosphosites and found an overconcentration of phosphosites in specific regions of protein surfaces and implications that in many phosphoproteins these clusters of phosphosites are activated simultaneously. Multiple phosphosites concentrated in limited spaces on phosphoprotein surfaces may therefore function biologically as cooperative modules that are resistant to selective pressures during phosphoprotein evolution. We therefore proposed a hypothetical model by which the modularization of multiple phosphosites has been resistant to natural selection and has driven the rich-get-richer process of the evolutionary growth of phosphosite numbers.

Towards the Systematic Discovery of Signal Transduction Networks Using Phosphorylation Dynamics Data

Phosphorylation is a ubiquitous and fundamental regulatory mechanism that controls signal transduction in living cells. The number of identified phosphoproteins and their phosphosites is rapidly increasing as a result of recent mass spectrometry-based approaches.

Integrative Features of the Yeast Phosphoproteome and Protein-protein Interaction Map

Following recent advances in high-throughput mass spectrometry (MS)-based proteomics, the numbers of identified phosphoproteins and their phosphosites have greatly increased in a wide variety of organisms. Although a critical role of phosphorylation is control of protein signaling, our understanding of the phosphoproteome remains limited. Here, we report unexpected, large-scale connections revealed between the phosphoproteome and protein interactome by integrative data-mining of yeast multi-omics data. First, new phosphoproteome data on yeast cells were obtained by MS-based proteomics and unified with publicly available yeast phosphoproteome data. This revealed that nearly 60% of ∼6,000 yeast genes encode phosphoproteins. We mapped these unified phosphoproteome data on a yeast protein-protein interaction (PPI) network with other yeast multi-omics datasets containing information about proteome abundance, proteome disorders, literature-derived signaling reactomes, and in vitro substratomes of kinases. In the phospho-PPI, phosphoproteins had more interacting partners than nonphosphoproteins, implying that a large fraction of intracellular protein interaction patterns (including those of protein complex formation) is affected by reversible and alternative phosphorylation reactions. Although highly abundant or unstructured proteins have a high chance of both interacting with other proteins and being phosphorylated within cells, the difference between the number counts of interacting partners of phosphoproteins and nonphosphoproteins was significant independently of protein abundance and disorder level. Moreover, analysis of the phospho-PPI and yeast signaling reactome data suggested that co-phosphorylation of interacting proteins by single kinases is common within cells. These multi-omics analyses illuminate how wide-ranging intracellular phosphorylation events and the diversity of physical protein interactions are largely affected by each other.

Computational Analysis Suggests a Highly Bendable, Fragile Structure for Nucleosomal DNA

Eukaryotic chromosomal DNA coils around histones to form nucleosomes. Although histone affinity for DNA depends on DNA sequence patterns, how nucleosome positioning is determined by them remains unknown. Here, we show relationships between nucleosome positioning and two structural characteristics of DNA conferred by DNA sequence. Analysis of bendability and hydroxyl radical cleavage intensity of nucleosomal DNA sequences indicated that nucleosomal DNA is bendable and fragile and that nucleosome positional stability was correlated with characteristics of DNA. This result explains how histone positioning is partially determined by nucleosomal DNA structure, illuminating the optimization of chromosomal DNA packaging that controls cellular dynamics.

Tight Associations Between Transcription Promoter Type and Epigenetic Variation in Histone Positioning and Modification

Transcription promoters are fundamental genomic cis-elements controlling gene expression. They can be classified into two types by the degree of imprecision of their transcription start sites: peak promoters, which initiate transcription from a narrow genomic region; and broad promoters, which initiate transcription from a wide-ranging region. Eukaryotic transcription initiation is suggested to be associated with the genomic positions and modifications of nucleosomes. For instance, it has been recently shown that histone with H3K9 acetylation (H3K9ac) is more likely to be distributed around broad promoters rather than peak promoters; it can thus be inferred that there is an association between histone H3K9 and promoter architecture.

simple hit counter