Lysine methylation is an emerging post-translation modification and it has been identified on several histone and non-histone proteins, where it plays crucial roles in cell development and many diseases. Approximately 5,000 lysine methylation sites were identified on different proteins, which are set by few dozens of protein lysine methyltransferases. This suggests that each PKMT methylates multiple proteins, however till now only one or two substrates have been identified for several of these enzymes. To approach this problem, we have introduced peptide array based substrate specificity analyses of PKMTs. Peptide arrays are powerful tools to characterize the specificity of PKMTs because methylation of several substrates with different sequences can be tested on one array. We synthesized peptide arrays on cellulose membrane using an Intavis SPOT synthesizer and analyzed the specificity of various PKMTs. Based on the results, for several of these enzymes, novel substrates could be identified. For example, for NSD1 by employing peptide arrays, we showed that it methylates K44 of H4 instead of the reported H4K20 and in addition H1.5K168 is the highly preferred substrate over the previously known H3K36. Hence, peptide arrays are powerful tools to biochemically characterize the PKMTs.
24 Related JoVE Articles!
Single Read and Paired End mRNA-Seq Illumina Libraries from 10 Nanograms Total RNA
Institutions: Morgridge Institute for Research, University of Wisconsin, University of California.
Whole transcriptome sequencing by mRNA-Seq is now used extensively to perform global gene expression, mutation, allele-specific expression and other genome-wide analyses. mRNA-Seq even opens the gate for gene expression analysis of non-sequenced genomes. mRNA-Seq offers high sensitivity, a large dynamic range and allows measurement of transcript copy numbers in a sample. Illumina’s genome analyzer performs sequencing of a large number (> 107
) of relatively short sequence reads (< 150 bp).The "paired end" approach, wherein a single long read is sequenced at both its ends, allows for tracking alternate splice junctions, insertions and deletions, and is useful for de novo
One of the major challenges faced by researchers is a limited amount of starting material. For example, in experiments where cells are harvested by laser micro-dissection, available starting total RNA may measure in nanograms. Preparation of mRNA-Seq libraries from such samples have been described1, 2
but involves significant PCR amplification that may introduce bias. Other RNA-Seq library construction procedures with minimal PCR amplification have been published3, 4
but require microgram amounts of starting total RNA.
Here we describe a protocol for the Illumina Genome Analyzer II platform for mRNA-Seq sequencing for library preparation that avoids significant PCR amplification and requires only 10 nanograms of total RNA. While this protocol has been described previously and validated for single-end sequencing5
, where it was shown to produce directional libraries without introducing significant amplification bias, here we validate it further for use as a paired end protocol. We selectively amplify polyadenylated messenger RNAs from starting total RNA using the T7 based Eberwine linear amplification method, coined "T7LA" (T7 linear amplification). The amplified poly-A mRNAs are fragmented, reverse transcribed and adapter ligated to produce the final sequencing library. For both single read and paired end runs, sequences are mapped to the human transcriptome6
and normalized so that data from multiple runs can be compared. We report the gene expression measurement in units of transcripts per million (TPM), which is a superior measure to RPKM when comparing samples7
Molecular Biology, Issue 56, Genetics, mRNA-Seq, Illumina-Seq, gene expression profiling, high throughput sequencing
Chromatin Isolation by RNA Purification (ChIRP)
Institutions: Stanford University School of Medicine.
Long noncoding RNAs are key regulators of chromatin states for important biological processes such as dosage compensation, imprinting, and developmental gene expression 1,2,3,4,5,6,7
. The recent discovery of thousands of lncRNAs in association with specific chromatin modification complexes, such as Polycomb Repressive Complex 2 (PRC2) that mediates histone H3 lysine 27 trimethylation (H3K27me3), suggests broad roles for numerous lncRNAs in managing chromatin states in a gene-specific fashion 8,9
. While some lncRNAs are thought to work in cis on neighboring genes, other lncRNAs work in trans to regulate distantly located genes. For instance, Drosophila
lncRNAs roX1 and roX2 bind numerous regions on the X chromosome of male cells, and are critical for dosage compensation 10,11
. However, the exact locations of their binding sites are not known at high resolution. Similarly, human lncRNA HOTAIR can affect PRC2 occupancy on hundreds of genes genome-wide 3,12,13
, but how specificity is achieved is unclear. LncRNAs can also serve as modular scaffolds to recruit the assembly of multiple protein complexes. The classic trans-acting RNA scaffold is the TERC RNA that serves as the template and scaffold for the telomerase complex 14
; HOTAIR can also serve as a scaffold for PRC2 and a H3K4 demethylase complex 13
Prior studies mapping RNA occupancy at chromatin have revealed substantial insights 15,16
, but only at a single gene locus at a time. The occupancy sites of most lncRNAs are not known, and the roles of lncRNAs in chromatin regulation have been mostly inferred from the indirect effects of lncRNA perturbation. Just as chromatin immunoprecipitation followed by microarray or deep sequencing (ChIP-chip or ChIP-seq, respectively) has greatly improved our understanding of protein-DNA interactions on a genomic scale, here we illustrate a recently published strategy to map long RNA occupancy genome-wide at high resolution 17
. This method, Chromatin Isolation by RNA Purification (ChIRP) (Figure 1
), is based on affinity capture of target lncRNA:chromatin complex by tiling antisense-oligos, which then generates a map of genomic binding sites at a resolution of several hundred bases with high sensitivity and low background. ChIRP is applicable to many lncRNAs because the design of affinity-probes is straightforward given the RNA sequence and requires no knowledge of the RNA's structure or functional domains.
Genetics, Issue 61, long noncoding RNA (lncRNA), genomics, chromatin binding, high-throughput sequencing, ChIRP
Transmembrane Domain Oligomerization Propensity determined by ToxR Assay
Institutions: University of Colorado at Boulder.
The oversimplified view of protein transmembrane domains as merely anchors in phospholipid bilayers has long since been disproven. In many cases membrane-spanning proteins have evolved highly sophisticated mechanisms of action.1-3
One way in which membrane proteins can modulate their structures and functions is by direct and specific contact of hydrophobic helices, forming structured transmembrane oligomers.4,5
Much recent work has focused on the distribution of amino acids preferentially found in the membrane environment in comparison to aqueous solution and the different intermolecular forces that drive protein association.6,7
Nevertheless, studies of molecular recognition at the transmembrane domain of proteins still lags behind those of water-soluble regions. A major hurdle remains: despite the remarkable specificity and affinity that transmembrane oligomerization can achieve,8
direct measurement of their association is challenging. Traditional methodologies applied to the study of integral membrane protein function can be hampered by the inherent insolubility of the sequences under examination. Biophysical insights gained from studying synthetic peptides representing transmembrane domains can provide useful structural insight. However, the biological relevance of the detergent micellar or liposome systems used in these studies to mimic cellular membranes is often questioned; do peptides adopt a native-like structure under these conditions and does their functional behaviour truly reflect the mode of action within a native membrane? In order to study the interactions of transmembrane sequences in natural phospholipid bilayers, the Langosch lab developed ToxR transcriptional reporter assays.9
The transmembrane domain of interest is expressed as a chimeric protein with maltose binding protein for location to the periplasm and ToxR to provide a report of the level of oligomerization (Figure 1).
In the last decade, several other groups (e.g. Engelman, DeGrado, Shai) further optimized and applied this ToxR reporter assay.10-13
The various ToxR assays have become a gold standard to test protein-protein interactions in cell membranes. We herein demonstrate a typical experimental operation conducted in our laboratory that primarily follows protocols developed by Langosch. This generally applicable method is useful for the analysis of transmembrane domain self-association in E. coli
, where β-galactosidase production is used to assess the TMD oligomerization propensity. Upon TMD-induced dimerization, ToxR binds to the ctx
promoter causing up-regulation of the LacZ
gene for β-galactosidase. A colorimetric readout is obtained by addition of ONPG to lyzed cells. Hydrolytic cleavage of ONPG by β-galactosidase results in the production of the light absorbing species o-nitrophenolate (ONP) (Figure 2).
Cellular Biology, Issue 51, Transmembrane domain, oligomerization, transcriptional reporter, ToxR, latent membrane protein-1
RNA-Seq Analysis of Differential Gene Expression in Electroporated Chick Embryonic Spinal Cord
Institutions: Universidade de São Paulo.
electroporation of the chick neural tube is a fast and inexpensive method for identification of gene function during neural development. Genome wide analysis of differentially expressed transcripts after such an experimental manipulation has the potential to uncover an almost complete picture of the downstream effects caused by the transfected construct. This work describes a simple method for comparing transcriptomes from samples of transfected embryonic spinal cords comprising all steps between electroporation and identification of differentially expressed transcripts. The first stage consists of guidelines for electroporation and instructions for dissection of transfected spinal cord halves from HH23 embryos in ribonuclease-free environment and extraction of high-quality RNA samples suitable for transcriptome sequencing. The next stage is that of bioinformatic analysis with general guidelines for filtering and comparison of RNA-Seq datasets in the Galaxy public server, which eliminates the need of a local computational structure for small to medium scale experiments. The representative results show that the dissection methods generate high quality RNA samples and that the transcriptomes obtained from two control samples are essentially the same, an important requirement for detection of differential expression genes in experimental samples. Furthermore, one example is provided where experimental overexpression of a DNA construct can be visually verified after comparison with control samples. The application of this method may be a powerful tool to facilitate new discoveries on the function of neural factors involved in spinal cord early development.
Developmental Biology, Issue 93, chicken embryo, in ovo electroporation, spinal cord, RNA-Seq, transcriptome profiling, Galaxy workflow
Split-and-pool Synthesis and Characterization of Peptide Tertiary Amide Library
Institutions: The Scripps Research Institute.
Peptidomimetics are great sources of protein ligands. The oligomeric nature of these compounds enables us to access large synthetic libraries on solid phase by using combinatorial chemistry. One of the most well studied classes of peptidomimetics is peptoids. Peptoids are easy to synthesize and have been shown to be proteolysis-resistant and cell-permeable. Over the past decade, many useful protein ligands have been identified through screening of peptoid libraries. However, most of the ligands identified from peptoid libraries do not display high affinity, with rare exceptions. This may be due, in part, to the lack of chiral centers and conformational constraints in peptoid molecules. Recently, we described a new synthetic route to access peptide tertiary amides (PTAs). PTAs are a superfamily of peptidomimetics that include but are not limited to peptides, peptoids and N-methylated peptides. With side chains on both α-carbon and main chain nitrogen atoms, the conformation of these molecules are greatly constrained by sterical hindrance and allylic 1,3 strain. (Figure 1
) Our study suggests that these PTA molecules are highly structured in solution and can be used to identify protein ligands. We believe that these molecules can be a future source of high-affinity protein ligands. Here we describe the synthetic method combining the power of both split-and-pool and sub-monomer strategies to synthesize a sample one-bead one-compound (OBOC) library of PTAs.
Chemistry, Issue 88, Split-and-pool synthesis, peptide tertiary amide, PTA, peptoid, high-throughput screening, combinatorial library, solid phase, triphosgene (BTC), one-bead one-compound, OBOC
A High Throughput MHC II Binding Assay for Quantitative Analysis of Peptide Epitopes
Institutions: Dartmouth College, University of Rhode Island, Dartmouth College.
Biochemical assays with recombinant human MHC II molecules can provide rapid, quantitative insights into immunogenic epitope identification, deletion, or design1,2
. Here, a peptide-MHC II binding assay is scaled to 384-well format. The scaled down protocol reduces reagent costs by 75% and is higher throughput than previously described 96-well protocols1,3-5
. Specifically, the experimental design permits robust and reproducible analysis of up to 15 peptides against one MHC II allele per 384-well ELISA plate. Using a single liquid handling robot, this method allows one researcher to analyze approximately ninety test peptides in triplicate over a range of eight concentrations and four MHC II allele types in less than 48 hr. Others working in the fields of protein deimmunization or vaccine design and development may find the protocol to be useful in facilitating their own work. In particular, the step-by-step instructions and the visual format of JoVE should allow other users to quickly and easily establish this methodology in their own labs.
Biochemistry, Issue 85, Immunoassay, Protein Immunogenicity, MHC II, T cell epitope, High Throughput Screen, Deimmunization, Vaccine Design
Protease- and Acid-catalyzed Labeling Workflows Employing 18O-enriched Water
Institutions: Boston Biomedical Research Institute.
Stable isotopes are essential tools in biological mass spectrometry. Historically, 18
O-stable isotopes have been extensively used to study the catalytic mechanisms of proteolytic enzymes1-3
. With the advent of mass spectrometry-based proteomics, the enzymatically-catalyzed incorporation of 18
O-atoms from stable isotopically enriched water has become a popular method to quantitatively compare protein expression levels (
reviewed by Fenselau and Yao4
, Miyagi and Rao5
and Ye et al.6)
O-labeling constitutes a simple and low-cost alternative to chemical (e.g.
iTRAQ, ICAT) and metabolic (e.g.
SILAC) labeling techniques7
. Depending on the protease utilized, 18
O-labeling can result in the incorporation of up to two 18
O-atoms in the C-terminal carboxyl group of the cleavage product3
. The labeling reaction can be subdivided into two independent processes, the peptide bond cleavage and the carboxyl oxygen exchange reaction8
. In our PALeO (p
-enriched water) adaptation of enzymatic 18
O-labeling, we utilized 50% 18
O-enriched water to yield distinctive isotope signatures. In combination with high-resolution matrix-assisted laser desorption ionization time-of-flight tandem mass spectrometry (MALDI-TOF/TOF MS/MS), the characteristic isotope envelopes can be used to identify cleavage products with a high level of specificity. We previously have used the PALeO-methodology to detect and characterize endogenous proteases9
and monitor proteolytic reactions10-11
. Since PALeO encodes the very essence of the proteolytic cleavage reaction, the experimental setup is simple and biochemical enrichment steps of cleavage products can be circumvented. The PALeO-method can easily be extended to (i) time course experiments that monitor the dynamics of proteolytic cleavage reactions and (ii) the analysis of proteolysis in complex biological samples that represent physiological conditions. PALeO-TimeCourse experiments help identifying rate-limiting processing steps and reaction intermediates in complex proteolytic pathway reactions. Furthermore, the PALeO-reaction allows us to identify proteolytic enzymes such as the serine protease trypsin that is capable to rebind its cleavage products and catalyze the incorporation of a second 18
O-atom. Such "double-labeling" enzymes can be used for postdigestion 18
O-labeling, in which peptides are exclusively labeled by the carboxyl oxygen exchange reaction. Our third strategy extends labeling employing 18
O-enriched water beyond enzymes and uses acidic pH conditions to introduce 18
O-stable isotope signatures into peptides.
Biochemistry, Issue 72, Molecular Biology, Proteins, Proteomics, Chemistry, Physics, MALDI-TOF mass spectrometry, proteomics, proteolysis, quantification, stable isotope labeling, labeling, catalyst, peptides, 18-O enriched water
Production of Disulfide-stabilized Transmembrane Peptide Complexes for Structural Studies
Institutions: The Walter and Eliza Hall Institute of Medical Research, The University of Melbourne.
Physical interactions among the lipid-embedded alpha-helical domains of membrane proteins play a crucial role in folding and assembly of membrane protein complexes and in dynamic processes such as transmembrane (TM) signaling and regulation of cell-surface protein levels. Understanding the structural features driving the association of particular sequences requires sophisticated biophysical and biochemical analyses of TM peptide complexes. However, the extreme hydrophobicity of TM domains makes them very difficult to manipulate using standard peptide chemistry techniques, and production of suitable study material often proves prohibitively challenging. Identifying conditions under which peptides can adopt stable helical conformations and form complexes spontaneously
adds a further level of difficulty. Here we present a procedure for the production of homo- or hetero-dimeric TM peptide complexes from materials that are expressed in E. coli
, thus allowing incorporation of stable isotope labels for nuclear magnetic resonance (NMR) or non-natural amino acids for other applications relatively inexpensively. The key innovation in this method is that TM complexes are produced and purified as covalently associated
(disulfide-crosslinked) assemblies that can form stable, stoichiometric and homogeneous structures when reconstituted into detergent, lipid or other membrane-mimetic materials. We also present carefully optimized procedures for expression and purification that are equally applicable whether producing single TM domains or crosslinked complexes and provide advice for adapting these methods to new TM sequences.
Biochemistry, Issue 73, Structural Biology, Chemistry, Chemical Engineering, Biophysics, Genetics, Molecular Biology, Membrane Proteins, Proteins, Molecular Structure, transmembrane domain, peptide chemistry, membrane protein structure, immune receptors, reversed-phase HPLC, HPLC, peptides, lipids, protein, cloning, TFA Elution, CNBr Digestion, NMR, expression, cell culture
Polymer Microarrays for High Throughput Discovery of Biomaterials
Institutions: University of Nottingham , University of Nottingham , Massachusetts Institute of Technology.
The discovery of novel biomaterials that are optimized for a specific biological application is readily achieved using polymer microarrays, which allows a combinatorial library of materials to be screened in a parallel, high throughput format1
. Herein is described the formation and characterization of a polymer microarray using an on-chip photopolymerization technique 2
. This involves mixing monomers at varied ratios to produce a library of monomer solutions, transferring the solution to a glass slide format using a robotic printing device and curing with UV irradiation. This format is readily amenable to many biological assays, including stem cell attachment and proliferation, cell sorting and low bacterial adhesion, allowing the ready identification of 'hit' materials that fulfill a specific biological criterion3-5
. Furthermore, the use of high throughput surface characterization (HTSC) allows the biological performance to be correlated with physio-chemical properties, hence elucidating the biological-material interaction6
. HTSC makes use of water contact angle (WCA) measurements, atomic force microscopy (AFM), X-ray photoelectron spectroscopy (XPS) and time-of-flight secondary ion mass spectrometry (ToF-SIMS). In particular, ToF-SIMS provides a chemically rich analysis of the sample that can be used to correlate the cell response with a molecular moiety. In some cases, the biological performance can be predicted from the ToF-SIMS spectra, demonstrating the chemical dependence of a biological-material interaction, and informing the development of hit materials5,3
Bioengineering, Issue 59, Materials discovery, Surface characterization, Polymer library, High throughput, Cell attachment
Microwave-assisted Functionalization of Poly(ethylene glycol) and On-resin Peptides for Use in Chain Polymerizations and Hydrogel Formation
Institutions: University of Rochester, University of Rochester, University of Rochester Medical Center.
One of the main benefits to using poly(ethylene glycol) (PEG) macromers in hydrogel formation is synthetic versatility. The ability to draw from a large variety of PEG molecular weights and configurations (arm number, arm length, and branching pattern) affords researchers tight control over resulting hydrogel structures and properties, including Young’s modulus and mesh size. This video will illustrate a rapid, efficient, solvent-free, microwave-assisted method to methacrylate PEG precursors into poly(ethylene glycol) dimethacrylate (PEGDM). This synthetic method provides much-needed starting materials for applications in drug delivery and regenerative medicine. The demonstrated method is superior to traditional methacrylation methods as it is significantly faster and simpler, as well as more economical and environmentally friendly, using smaller amounts of reagents and solvents. We will also demonstrate an adaptation of this technique for on-resin methacrylamide functionalization of peptides. This on-resin method allows the N-terminus of peptides to be functionalized with methacrylamide groups prior to deprotection and cleavage from resin. This allows for selective addition of methacrylamide groups to the N-termini of the peptides while amino acids with reactive side groups (e.g.
primary amine of lysine, primary alcohol of serine, secondary alcohols of threonine, and phenol of tyrosine) remain protected, preventing functionalization at multiple sites. This article will detail common analytical methods (proton Nuclear Magnetic Resonance spectroscopy (;
H-NMR) and Matrix Assisted Laser Desorption Ionization Time of Flight mass spectrometry (MALDI-ToF)) to assess the efficiency of the functionalizations. Common pitfalls and suggested troubleshooting methods will be addressed, as will modifications of the technique which can be used to further tune macromer functionality and resulting hydrogel physical and chemical properties. Use of synthesized products for the formation of hydrogels for drug delivery and cell-material interaction studies will be demonstrated, with particular attention paid to modifying hydrogel composition to affect mesh size, controlling hydrogel stiffness and drug release.
Chemistry, Issue 80, Poly(ethylene glycol), peptides, polymerization, polymers, methacrylation, peptide functionalization, 1H-NMR, MALDI-ToF, hydrogels, macromer synthesis
Generation of High Quality Chromatin Immunoprecipitation DNA Template for High-throughput Sequencing (ChIP-seq)
Institutions: Children's Hospital of Philadelphia Research Institute, University of Pennsylvania .
ChIP-sequencing (ChIP-seq) methods directly offer whole-genome coverage, where combining chromatin immunoprecipitation (ChIP) and massively parallel sequencing can be utilized to identify the repertoire of mammalian DNA sequences bound by transcription factors in vivo
. "Next-generation" genome sequencing technologies provide 1-2 orders of magnitude increase in the amount of sequence that can be cost-effectively generated over older technologies thus allowing for ChIP-seq methods to directly provide whole-genome coverage for effective profiling of mammalian protein-DNA interactions.
For successful ChIP-seq approaches, one must generate high quality ChIP DNA template to obtain the best sequencing outcomes. The description is based around experience with the protein product of the gene most strongly implicated in the pathogenesis of type 2 diabetes, namely the transcription factor transcription factor 7-like 2 (TCF7L2). This factor has also been implicated in various cancers.
Outlined is how to generate high quality ChIP DNA template derived from the colorectal carcinoma cell line, HCT116, in order to build a high-resolution map through sequencing to determine the genes bound by TCF7L2, giving further insight in to its key role in the pathogenesis of complex traits.
Molecular Biology, Issue 74, Genetics, Biochemistry, Microbiology, Medicine, Proteins, DNA-Binding Proteins, Transcription Factors, Chromatin Immunoprecipitation, Genes, chromatin, immunoprecipitation, ChIP, DNA, PCR, sequencing, antibody, cross-link, cell culture, assay
Identifying Protein-protein Interaction Sites Using Peptide Arrays
Institutions: The Hebrew University of Jerusalem.
Protein-protein interactions mediate most of the processes in the living cell and control homeostasis of the organism. Impaired protein interactions may result in disease, making protein interactions important drug targets. It is thus highly important to understand these interactions at the molecular level. Protein interactions are studied using a variety of techniques ranging from cellular and biochemical assays to quantitative biophysical assays, and these may be performed either with full-length proteins, with protein domains or with peptides. Peptides serve as excellent tools to study protein interactions since peptides can be easily synthesized and allow the focusing on specific interaction sites. Peptide arrays enable the identification of the interaction sites between two proteins as well as screening for peptides that bind the target protein for therapeutic purposes. They also allow high throughput SAR studies. For identification of binding sites, a typical peptide array usually contains partly overlapping 10-20 residues peptides derived from the full sequences of one or more partner proteins of the desired target protein. Screening the array for binding the target protein reveals the binding peptides, corresponding to the binding sites in the partner proteins, in an easy and fast method using only small amount of protein.
In this article we describe a protocol for screening peptide arrays for mapping the interaction sites between a target protein and its partners. The peptide array is designed based on the sequences of the partner proteins taking into account their secondary structures. The arrays used in this protocol were Celluspots arrays prepared by INTAVIS Bioanalytical Instruments. The array is blocked to prevent unspecific binding and then incubated with the studied protein. Detection using an antibody reveals the binding peptides corresponding to the specific interaction sites between the proteins.
Molecular Biology, Issue 93, peptides, peptide arrays, protein-protein interactions, binding sites, peptide synthesis, micro-arrays
A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
Institutions: Stony Brook University, Cold Spring Harbor Laboratory, University of Texas at Dallas.
ChIPseq is a widely used technique for investigating protein-DNA interactions. Read density profiles are generated by using next-sequencing of protein-bound DNA and aligning the short reads to a reference genome. Enriched regions are revealed as peaks, which often differ dramatically in shape, depending on the target protein1
. For example, transcription factors often bind in a site- and sequence-specific manner and tend to produce punctate peaks, while histone modifications are more pervasive and are characterized by broad, diffuse islands of enrichment2
. Reliably identifying these regions was the focus of our work.
Algorithms for analyzing ChIPseq data have employed various methodologies, from heuristics3-5
to more rigorous statistical models, e.g.
Hidden Markov Models (HMMs)6-8
. We sought a solution that minimized the necessity for difficult-to-define, ad hoc parameters that often compromise resolution and lessen the intuitive usability of the tool. With respect to HMM-based methods, we aimed to curtail parameter estimation procedures and simple, finite state classifications that are often utilized.
Additionally, conventional ChIPseq data analysis involves categorization of the expected read density profiles as either punctate or diffuse followed by subsequent application of the appropriate tool. We further aimed to replace the need for these two distinct models with a single, more versatile model, which can capably address the entire spectrum of data types.
To meet these objectives, we first constructed a statistical framework that naturally modeled ChIPseq data structures using a cutting edge advance in HMMs9
, which utilizes only explicit formulas-an innovation crucial to its performance advantages. More sophisticated then heuristic models, our HMM accommodates infinite hidden states through a Bayesian model. We applied it to identifying reasonable change points in read density, which further define segments of enrichment. Our analysis revealed how our Bayesian Change Point (BCP) algorithm had a reduced computational complexity-evidenced by an abridged run time and memory footprint. The BCP algorithm was successfully applied to both punctate peak and diffuse island identification with robust accuracy and limited user-defined parameters. This illustrated both its versatility and ease of use. Consequently, we believe it can be implemented readily across broad ranges of data types and end users in a manner that is easily compared and contrasted, making it a great tool for ChIPseq data analysis that can aid in collaboration and corroboration between research groups. Here, we demonstrate the application of BCP to existing transcription factor10,11
and epigenetic data12
to illustrate its usefulness.
Genetics, Issue 70, Bioinformatics, Genomics, Molecular Biology, Cellular Biology, Immunology, Chromatin immunoprecipitation, ChIP-Seq, histone modifications, segmentation, Bayesian, Hidden Markov Models, epigenetics
A Protocol for Computer-Based Protein Structure and Function Prediction
Institutions: University of Michigan , University of Kansas.
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server.
Biochemistry, Issue 57, On-line server, I-TASSER, protein structure prediction, function prediction
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
Institutions: Princeton University.
The aim of de novo
protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo
protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity.
To disseminate these methods for broader use we present Protein WISDOM (https://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.
Genetics, Issue 77, Molecular Biology, Bioengineering, Biochemistry, Biomedical Engineering, Chemical Engineering, Computational Biology, Genomics, Proteomics, Protein, Protein Binding, Computational Biology, Drug Design, optimization (mathematics), Amino Acids, Peptides, and Proteins, De novo protein and peptide design, Drug design, In silico sequence selection, Optimization, Fold specificity, Binding affinity, sequencing
The ChroP Approach Combines ChIP and Mass Spectrometry to Dissect Locus-specific Proteomic Landscapes of Chromatin
Institutions: European Institute of Oncology.
Chromatin is a highly dynamic nucleoprotein complex made of DNA and proteins that controls various DNA-dependent processes. Chromatin structure and function at specific regions is regulated by the local enrichment of histone post-translational modifications (hPTMs) and variants, chromatin-binding proteins, including transcription factors, and DNA methylation. The proteomic characterization of chromatin composition at distinct functional regions has been so far hampered by the lack of efficient protocols to enrich such domains at the appropriate purity and amount for the subsequent in-depth analysis by Mass Spectrometry (MS). We describe here a newly designed chromatin proteomics strategy, named ChroP (Chromatin Proteomics
), whereby a preparative chromatin immunoprecipitation is used to isolate distinct chromatin regions whose features, in terms of hPTMs, variants and co-associated non-histonic proteins, are analyzed by MS. We illustrate here the setting up of ChroP for the enrichment and analysis of transcriptionally silent heterochromatic regions, marked by the presence of tri-methylation of lysine 9 on histone H3. The results achieved demonstrate the potential of ChroP
in thoroughly characterizing the heterochromatin proteome and prove it as a powerful analytical strategy for understanding how the distinct protein determinants of chromatin interact and synergize to establish locus-specific structural and functional configurations.
Biochemistry, Issue 86, chromatin, histone post-translational modifications (hPTMs), epigenetics, mass spectrometry, proteomics, SILAC, chromatin immunoprecipitation , histone variants, chromatome, hPTMs cross-talks
DNA-affinity-purified Chip (DAP-chip) Method to Determine Gene Targets for Bacterial Two component Regulatory Systems
Institutions: Lawrence Berkeley National Laboratory.
methods such as ChIP-chip are well-established techniques used to determine global gene targets for transcription factors. However, they are of limited use in exploring bacterial two component regulatory systems with uncharacterized activation conditions. Such systems regulate transcription only when activated in the presence of unique signals. Since these signals are often unknown, the in vitro
microarray based method described in this video article can be used to determine gene targets and binding sites for response regulators. This DNA-affinity-purified-chip method may be used for any purified regulator in any organism with a sequenced genome. The protocol involves allowing the purified tagged protein to bind to sheared genomic DNA and then affinity purifying the protein-bound DNA, followed by fluorescent labeling of the DNA and hybridization to a custom tiling array. Preceding steps that may be used to optimize the assay for specific regulators are also described. The peaks generated by the array data analysis are used to predict binding site motifs, which are then experimentally validated. The motif predictions can be further used to determine gene targets of orthologous response regulators in closely related species. We demonstrate the applicability of this method by determining the gene targets and binding site motifs and thus predicting the function for a sigma54-dependent response regulator DVU3023 in the environmental bacterium Desulfovibrio vulgaris
Genetics, Issue 89, DNA-Affinity-Purified-chip, response regulator, transcription factor binding site, two component system, signal transduction, Desulfovibrio, lactate utilization regulator, ChIP-chip
Analyzing Protein Dynamics Using Hydrogen Exchange Mass Spectrometry
Institutions: University of Heidelberg.
All cellular processes depend on the functionality of proteins. Although the functionality of a given protein is the direct consequence of its unique amino acid sequence, it is only realized by the folding of the polypeptide chain into a single defined three-dimensional arrangement or more commonly into an ensemble of interconverting conformations. Investigating the connection between protein conformation and its function is therefore essential for a complete understanding of how proteins are able to fulfill their great variety of tasks. One possibility to study conformational changes a protein undergoes while progressing through its functional cycle is hydrogen-1
H-exchange in combination with high-resolution mass spectrometry (HX-MS). HX-MS is a versatile and robust method that adds a new dimension to structural information obtained by e.g.
crystallography. It is used to study protein folding and unfolding, binding of small molecule ligands, protein-protein interactions, conformational changes linked to enzyme catalysis, and allostery. In addition, HX-MS is often used when the amount of protein is very limited or crystallization of the protein is not feasible. Here we provide a general protocol for studying protein dynamics with HX-MS and describe as an example how to reveal the interaction interface of two proteins in a complex.
Chemistry, Issue 81, Molecular Chaperones, mass spectrometers, Amino Acids, Peptides, Proteins, Enzymes, Coenzymes, Protein dynamics, conformational changes, allostery, protein folding, secondary structure, mass spectrometry
RNA-seq Analysis of Transcriptomes in Thrombin-treated and Control Human Pulmonary Microvascular Endothelial Cells
Institutions: Children's Mercy Hospital and Clinics, School of Medicine, University of Missouri-Kansas City.
The characterization of gene expression in cells via measurement of mRNA levels is a useful tool in determining how the transcriptional machinery of the cell is affected by external signals (e.g.
drug treatment), or how cells differ between a healthy state and a diseased state. With the advent and continuous refinement of next-generation DNA sequencing technology, RNA-sequencing (RNA-seq) has become an increasingly popular method of transcriptome analysis to catalog all species of transcripts, to determine the transcriptional structure of all expressed genes and to quantify the changing expression levels of the total set of transcripts in a given cell, tissue or organism1,2
. RNA-seq is gradually replacing DNA microarrays as a preferred method for transcriptome analysis because it has the advantages of profiling a complete transcriptome, providing a digital type datum (copy number of any transcript) and not relying on any known genomic sequence3
Here, we present a complete and detailed protocol to apply RNA-seq to profile transcriptomes in human pulmonary microvascular endothelial cells with or without thrombin treatment. This protocol is based on our recent published study entitled "RNA-seq Reveals Novel Transcriptome of Genes and Their Isoforms in Human Pulmonary Microvascular Endothelial Cells Treated with Thrombin,"4
in which we successfully performed the first complete transcriptome analysis of human pulmonary microvascular endothelial cells treated with thrombin using RNA-seq. It yielded unprecedented resources for further experimentation to gain insights into molecular mechanisms underlying thrombin-mediated endothelial dysfunction in the pathogenesis of inflammatory conditions, cancer, diabetes, and coronary heart disease, and provides potential new leads for therapeutic targets to those diseases.
The descriptive text of this protocol is divided into four parts. The first part describes the treatment of human pulmonary microvascular endothelial cells with thrombin and RNA isolation, quality analysis and quantification. The second part describes library construction and sequencing. The third part describes the data analysis. The fourth part describes an RT-PCR validation assay. Representative results of several key steps are displayed. Useful tips or precautions to boost success in key steps are provided in the Discussion section. Although this protocol uses human pulmonary microvascular endothelial cells treated with thrombin, it can be generalized to profile transcriptomes in both mammalian and non-mammalian cells and in tissues treated with different stimuli or inhibitors, or to compare transcriptomes in cells or tissues between a healthy state and a disease state.
Genetics, Issue 72, Molecular Biology, Immunology, Medicine, Genomics, Proteins, RNA-seq, Next Generation DNA Sequencing, Transcriptome, Transcription, Thrombin, Endothelial cells, high-throughput, DNA, genomic DNA, RT-PCR, PCR
Identification of Key Factors Regulating Self-renewal and Differentiation in EML Hematopoietic Precursor Cells by RNA-sequencing Analysis
Institutions: The University of Texas Graduate School of Biomedical Sciences at Houston.
Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro
. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study.
RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment.
In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro
and in vivo
Genetics, Issue 93, EML Cells, Self-renewal, Differentiation, Hematopoietic precursor cell, RNA-Sequencing, Data analysis
Peptide-based Identification of Functional Motifs and their Binding Partners
Institutions: Morehouse School of Medicine, Institute for Systems Biology, Universiti Sains Malaysia.
Specific short peptides derived from motifs found in full-length proteins, in our case HIV-1 Nef, not only retain their biological function, but can also competitively inhibit the function of the full-length protein. A set of 20 Nef scanning peptides, 20 amino acids in length with each overlapping 10 amino acids of its neighbor, were used to identify motifs in Nef responsible for its induction of apoptosis. Peptides containing these apoptotic motifs induced apoptosis at levels comparable to the full-length Nef protein. A second peptide, derived from the Secretion Modification Region (SMR) of Nef, retained the ability to interact with cellular proteins involved in Nef's secretion in exosomes (exNef). This SMRwt peptide was used as the "bait" protein in co-immunoprecipitation experiments to isolate cellular proteins that bind specifically to Nef's SMR motif. Protein transfection and antibody inhibition was used to physically disrupt the interaction between Nef and mortalin, one of the isolated SMR-binding proteins, and the effect was measured with a fluorescent-based exNef secretion assay. The SMRwt peptide's ability to outcompete full-length Nef for cellular proteins that bind the SMR motif, make it the first inhibitor of exNef secretion. Thus, by employing the techniques described here, which utilize the unique properties of specific short peptides derived from motifs found in full-length proteins, one may accelerate the identification of functional motifs in proteins and the development of peptide-based inhibitors of pathogenic functions.
Virology, Issue 76, Biochemistry, Immunology, Infection, Infectious Diseases, Molecular Biology, Medicine, Genetics, Microbiology, Genomics, Proteins, Exosomes, HIV, Peptides, Exocytosis, protein trafficking, secretion, HIV-1, Nef, Secretion Modification Region, SMR, peptide, AIDS, assay
Analysis of Nephron Composition and Function in the Adult Zebrafish Kidney
Institutions: University of Notre Dame.
The zebrafish model has emerged as a relevant system to study kidney development, regeneration and disease. Both the embryonic and adult zebrafish kidneys are composed of functional units known as nephrons, which are highly conserved with other vertebrates, including mammals. Research in zebrafish has recently demonstrated that two distinctive phenomena transpire after adult nephrons incur damage: first, there is robust regeneration within existing nephrons that replaces the destroyed tubule epithelial cells; second, entirely new nephrons are produced from renal progenitors in a process known as neonephrogenesis. In contrast, humans and other mammals seem to have only a limited ability for nephron epithelial regeneration. To date, the mechanisms responsible for these kidney regeneration phenomena remain poorly understood. Since adult zebrafish kidneys undergo both nephron epithelial regeneration and neonephrogenesis, they provide an outstanding experimental paradigm to study these events. Further, there is a wide range of genetic and pharmacological tools available in the zebrafish model that can be used to delineate the cellular and molecular mechanisms that regulate renal regeneration. One essential aspect of such research is the evaluation of nephron structure and function. This protocol describes a set of labeling techniques that can be used to gauge renal composition and test nephron functionality in the adult zebrafish kidney. Thus, these methods are widely applicable to the future phenotypic characterization of adult zebrafish kidney injury paradigms, which include but are not limited to, nephrotoxicant exposure regimes or genetic methods of targeted cell death such as the nitroreductase mediated cell ablation technique. Further, these methods could be used to study genetic perturbations in adult kidney formation and could also be applied to assess renal status during chronic disease modeling.
Cellular Biology, Issue 90,
zebrafish; kidney; nephron; nephrology; renal; regeneration; proximal tubule; distal tubule; segment; mesonephros; physiology; acute kidney injury (AKI)
Genome-wide Analysis using ChIP to Identify Isoform-specific Gene Targets
Institutions: University of Illinois Chicago - UIC, Universitat Pompeu Fabra, Whitehead Institute for Biomedical Research.
Recruitment of transcriptional and epigenetic factors to their targets is a key step in their regulation. Prominently featured in recruitment are the protein domains that bind to specific histone modifications. One such domain is the plant homeodomain (PHD), found in several chromatin-binding proteins. The epigenetic factor RBP2 has multiple PHD domains, however, they have different functions (Figure 4). In particular, the C-terminal PHD domain, found in a RBP2 oncogenic fusion in human leukemia, binds to trimethylated lysine 4 in histone H3 (H3K4me3)1
. The transcript corresponding to the RBP2 isoform containing the C-terminal PHD accumulates during differentiation of promonocytic, lymphoma-derived, U937 cells into monocytes2
. Consistent with both sets of data, genome-wide analysis showed that in differentiated U937 cells, the RBP2 protein gets localized to genomic regions highly enriched for H3K4me33
. Localization of RBP2 to its targets correlates with a decrease in H3K4me3 due to RBP2 histone demethylase activity and a decrease in transcriptional activity. In contrast, two other PHDs of RBP2 are unable to bind H3K4me3. Notably, the C-terminal domain PHD of RBP2 is absent in the smaller RBP2 isoform4
. It is conceivable that the small isoform of RBP2, which lacks interaction with H3K4me3, differs from the larger isoform in genomic location. The difference in genomic location of RBP2 isoforms may account for the observed diversity in RBP2 function. Specifically, RBP2 is a critical player in cellular differentiation mediated by the retinoblastoma protein (pRB). Consistent with these data, previous genome-wide analysis, without distinction between isoforms, identified two distinct groups of RBP2 target genes: 1) genes bound by RBP2 in a manner that is independent of differentiation; 2) genes bound by RBP2 in a differentiation-dependent manner.
To identify differences in localization between the isoforms we performed genome-wide location analysis by ChIP-Seq. Using antibodies that detect both RBP2 isoforms we have located all RBP2 targets. Additionally we have antibodies that only bind large, and not small RBP2 isoform (Figure 4). After identifying the large isoform targets, one can then subtract them from all RBP2 targets to reveal the targets of small isoform. These data show the contribution of chromatin-interacting domain in protein recruitment to its binding sites in the genome.
Biochemistry, Issue 41, chromatin immunoprecipitation, ChIP-Seq, RBP2, JARID1A, KDM5A, isoform-specific recruitment
Profiling of Methyltransferases and Other S-adenosyl-L-homocysteine-binding Proteins by Capture Compound Mass Spectrometry (CCMS)
Institutions: caprotec bioanalytics GmbH, RWTH Aachen University.
There is a variety of approaches to reduce the complexity of the proteome on the basis of functional small molecule-protein interactions such as affinity chromatography 1
or Activity Based Protein Profiling 2
. Trifunctional Capture Compounds (CCs, Figure 1A) 3
are the basis for a generic approach, in which the initial equilibrium-driven interaction between a small molecule probe (the selectivity function, here S
-homocysteine, SAH, Figure 1A) and target proteins is irreversibly fixed upon photo-crosslinking between an independent photo-activable reactivity function (here a phenylazide) of the CC and the surface of the target proteins. The sorting function (here biotin) serves to isolate the CC - protein conjugates from complex biological mixtures with the help of a solid phase (here streptavidin magnetic beads). Two configurations of the experiments are possible: "off-bead" 4
or the presently described "on-bead" configuration (Figure 1B). The selectivity function may be virtually any small molecule of interest (substrates, inhibitors, drug molecules).
-methionine (SAM, Figure 1A) is probably, second to ATP, the most widely used cofactor in nature 5, 6
. It is used as the major methyl group donor in all living organisms with the chemical reaction being catalyzed by SAM-dependent methyltransferases (MTases), which methylate DNA 7
, RNA 8
, proteins 9
, or small molecules 10
. Given the crucial role of methylation reactions in diverse physiological scenarios (gene regulation, epigenetics, metabolism), the profiling of MTases can be expected to become of similar importance in functional proteomics as the profiling of kinases. Analytical tools for their profiling, however, have not been available. We recently introduced a CC with SAH as selectivity group to fill this technological gap (Figure 1A).
SAH, the product of SAM after methyl transfer, is a known general MTase product inhibitor 11
. For this reason and because the natural cofactor SAM is used by further enzymes transferring other parts of the cofactor or initiating radical reactions as well as because of its chemical instability 12
, SAH is an ideal selectivity function for a CC to target MTases. Here, we report the utility of the SAH-CC and CCMS by profiling MTases and other SAH-binding proteins from the strain DH5α of Escherichia coli
), one of the best-characterized prokaryotes, which has served as the preferred model organism in countless biochemical, biological, and biotechnological studies. Photo-activated crosslinking enhances yield and sensitivity of the experiment, and the specificity can be readily tested for in competition experiments using an excess of free SAH.
Biochemistry, Issue 46, Capture Compound, photo-crosslink, small molecule-protein interaction, methyltransferase, S-adenosyl-l-homocysteine, SAH, S-adenosyl-l-methionine, SAM, functional proteomics, LC-MS/MS