This manuscript describes the use of state-of-the-art technology provided by DNA-microarrays. Microarrays provide an overview of the transcriptomic changes in bacteria incurred under a specific condition. Moreover, we highlight the ease by which large amounts of data can be analyzed by using convenient in-house developed software packages.
Gene expression and its regulation are very important to understand the behavior of cells under different conditions. Various techniques are used nowadays to study gene expression, but most are limited in terms of providing an overall picture of the expression of the whole transcriptome. DNA microarrays offer a fast and economic research technology, which gives a full overview of global gene expression and have a vast number of applications including identification of novel genes and transcription factor binding sites, characterization of transcriptional activity of the cells and also help in analyzing thousands of genes (in a single experiment). In the present study, the conditions for bacterial transcriptome analysis from cell harvest to DNA microarray analysis have been optimized. Taking into account the time, costs and accuracy of the experiments, this technology platform proves to be very useful and universally applicabale for studying bacterial transcriptomes. Here, we perform DNA microarray analysis with Streptococcus pneumoniae as a case-study by comparing the transcriptional responses of S. pneumoniae grown in the presence of varying L-serine concentrations in the medium. Total RNA was isolated by using a Macaloid method using an RNA isolation kit and the quality of RNA was checked by using an RNA quality check kit. cDNA was prepared using reverse transcriptase and the cDNA samples were labelled using one of two amine-reactive fluorescent dyes. Homemade DNA microarray slides were used for hybridization of the labelled cDNA samples and microarray data were analyzed by using a cDNA microarray data pre-processing framework (Microprep). Finally, Cyber-T was used to analyze the data generated using Microprep for the identification of statistically significant differentially expressed genes. Furthermore, in-house built software packages (PePPER, FIVA, DISCLOSE, PROSECUTOR, Genome2D) were used to analyze data.
The study of the whole set of mRNA abundance (transcriptome) encoded by the genome of a unicellular organism or a eukaryotic cell at a specific time or under a specific condition, including gene overexpression or knock-out, is called transcriptomics. Transcriptomics allows us to observe to what extent genes are expressed under a particular condition at a time point X and gives us information about how strongly the genes are expressed relative to a reference.
A microarray is a two-dimensional array on a solid substrate (usually a glass slide or silicon thin-film cell) that can be used to assay large quantities of biological material using high-throughput screening, and miniaturized, multiplexed and parallel processing and detection methods. Microarrays come in various types, including DNA-microarrays, protein microarrays, peptide microarrays, tissue microarrays, antibody microarrays, cellular microarrays and others. A DNA microarray is basically an assembly of microscopic DNA spots fixed to a solid surface, usually glass. DNA microarrays are used to measure the expression levels of a gene or a set of genes simultaneously or to genotype multiple regions of a genome2,3. Picomoles (10-12 moles) of a probe are present within each DNA spot that represents a specific DNA sequence, also known as a reporter. The labelled mRNA molecules from the samples are called ‘targets’. Fluorophores are used to measure probe-target hybridization and detection of fluorophore-labelled targets determines the relative abundance of nucleic acid sequences in the target. A microarray experiment can accomplish multiple genetic tests in parallel because an array may contain tens of thousands of probes. The layout of a simple microarray experiment is shown in Figure 1. Recently, it was established in our and other labs that these arrays are reusable, which makes this technique quite cost-effective.
Different RNA isolation and purification techniques have been developed over the years including C-TAB, SDS and GT methods 4–8. Furthermore, several commercial kits are also available. For gene expression high quality RNA is very important. Therefore, the RNA isolation methods are modified to get a maximum quantity of RNA. Similarly, the steps for cDNA preparation and labelling of cDNA are minimized. Normalization of data after scanning is also performed efficiently by using in-house built software packages and tools9.
Streptococcus pneumoniae is a Gram-positive human pathogen that colonizes the nasopharynx and is the cause of multiple infections such as pneumonia, sepsis, otitis media and meningitis10. The bacterium can utilize a wide variety of the nutrients required for growth and survival 11,12. A number of studies have been carried out on the pneumococcal nitrogen metabolism and regulation emphasizing the importance of amino acids and their role in virulence13,14. In this study, the transcriptomic response of S. pneumoniae to changing concentrations of L-serine, an amino acid abundantly present in the human blood plasma, is reported using DNA microarrays. The transcriptomic response of S. pneumoniae grown in a minimum concentration of L-serine (150 µM) was compared to that grown in a maximum concentration (10 mM) of serine. Chemically defined medium (CDM or minimal medium)15 was used for this study to control the concentration of serine. The focus of this study is to make this technique user-friendly and to provide different tools for data normalization and analysis. Therefore, a number of tools were developed for analysis and data interpretation. FIVA (Functional Information Viewer and Analyzer) provides a platform for processing information contained in clusters of genes having similar gene expression patterns and for constructing functional profiles16. PROSECUTOR is another software package that facilitates the identification of putative functions and annotations of genes 17. By making use of clustering methods, DISCLOSE provides a DNA binding site detection algorithm. Cis-regulatory motifs of genes can be projected by using this algorithm 18. Genome2D offers a Windows-based platform for visualization and analysis of transcriptome data by offering different color ranges to characterize the changes in gene expression levels on a genome map19. The PePPER webserver offers, in addition to the all-in-one analysis method, a toolbox for mining for regulons, promoters and transcription factor binding sites 20. Full annotation of intergenic regions in a bacterial genome can be achieved by using this package. Biologists can greatly benefit from PePPER as it offers them a platform for designing experiments so that the hypothesized information can be confirmed in vitro20. These software packages contribute significantly to the microarray analysis as most of them are freely available and make data normalization and analysis very reliable.
1. Preparation of Media, and Cell Culture
2. Isolation of Total RNA
3. RNA Cleanup
4. Analysis of RNA
5. cDNA Preparation and Labelling
NOTE: The following protocol was followed for cDNA preparation and labelling.
6. Degradation of mRNA and Purification of cDNA
7. Measurement of cDNA Concentration
8. Labelling of cDNA with Amine-reactive Dye and Purification
9. Measurement of Labelled cDNA
10. Mixing of Labelled cDNA Samples
11. Hybridization and Washing
12. Microarray Analysis
RNA, cDNA isolations and analysis
L-serine is one of the essential amino acids and its concentration in human blood plasma varies from 60-150 µM in children and adults. Its role in the biosynthesis of purines and pyrimidines highlights its importance in metabolism and it is a precursor to several amino acids (glycine, cysteine and tryptophan). To study the impact of L-serine on the whole transcriptome of S. pneumoniae D39 wild-type strain, microarray analysis of the D39 strain grown in CDM with a minimum concentration (150 µM) of L-serine against that grown in a maximum concentration (10 mM) in the same medium was performed. First of all, total RNA from cells grown under both concentrations was isolated. The concentrations of the RNA samples are given in Table 1. The quality of total RNA was examined using the quality check assay. RNA was treated with DNase I before performing this assay to remove the possible genomic DNA. Figure 2 shows the quality of RNA; lane L represents the ladder, lanes 1 and 2 represent RNA from 150µM serine and lanes 3 and 4 represent the RNA from 10 mM serine. The presence of two clear bands corresponding to the two RNA subunits indicated the good quality of RNA and the next step of the experiment could be performed.
After measuring the quality of RNA, cDNA synthesis was done. cDNA was formed by using random nanomers and reverse transcriptase enzyme. The concentrations of the cDNA samples are given in Table 1. This cDNA was labelled with amine-reactive dyes and the concentrations of the labelled cDNA and dye labels are given in Table 1. After labelling, samples were mixed accordingly and then hybridized. After washing, slides were scanned using the scanner, and analysis was performed using Gene Pix Pro software. Figure 3 shows the scatter plot analysis of the amine-reactive dyes ratio. After initial analysis, data was further analyzed using the PicroPrep software package (PrePrep, Prep, PostPrep)9 to reduce noise and Cyber-T was used for final analysis. Table 2 summarizes the results of the microarray studies after applying the criteria of ≥ 2.0 fold difference and p-value < 0.001. A number of genes were differentially expressed in the presence of minimum L-serine as compared to the maximum (Table 2).
Figure 1. An overview of DNA microarray technology. RNA is isolated from the control and the target samples and labelled cDNA is then hybridized.
Figure 2. Quality check of RNA isolated from S. pneumoniae D39 cells grown in the presence of minimum (150 µM) and maximum (10 mM) serine concentrations. Lane L represents the size-ladder whereas lane 1 and 2 represent RNA samples isolated from cells grown in the presence of minimum serine concentration. Similarly, lane 3 and 4 represent RNA samples isolated from cells grown in the presence of maximum serine concentration. The bands represent the 23S and 16S rRNA. The presence of only two bands shows that there is no gDNA contamination and RNA is of good quality.
Figure 3. Array comparison scatter plot of a sample mixture. Each spot in the plot represents the mean expression value (log2) of a gene in an experiment with dye 1 on the y-axis and dye 2 on the x-axis.
Sample | RNA Sample | Description | RNA concentration (ng/µl) | cDNA concentration (ng/µl) | Labelled cDNA (ng/µl) | DyLight-550 (pmol/µl) | DyLight-650 (pmol/µl) | Hybridization scheme |
S1 | R1 | D39 wild-type grown in CDM + Minimum L-serine | 2057 | 255 | 225 | 0.8 | R1+R3 | |
R2 | D39 wild-type grown in CDM + Minimum L-serine | 2566 | 201 | 179 | 1.2 | R2+R4 | ||
S2 | R3 | D39 wild-type grown in CDM + Maximum L-serine | 2831 | 292 | 276 | 2.3 | ||
R4 | D39 wild-type grown in CDM + Maximum L-serine | 1867 | 172 | 150 | 1 |
Table 1. Hybridization scheme of the samples used in the microarray analysis.
Genea | Functionb | Ratioc |
SPD0600 | Cell division protein DivIB | 4 |
SPD0445 | Phosphoglycerate kinase | 3.5 |
SPD0646 | Hypothetical protein | 3.4 |
SPD0873 | Hypothetical protein | 3.3 |
SPD1223 | Hypothetical protein | 3.3 |
SPD0980 | Ribose-phosphate pyrophosphokinase | 2.8 |
SPD1628 | Xanthine phosphoribosyltransferase | 2.7 |
SPD1011 | Glycerate kinase | 2.4 |
SPD0645 | Hypothetical protein | 2.3 |
SPD0564 | Hypothetical protein | 2.2 |
SPD0641 | Mannose-6-phosphate isomerase, class I, ManA | 2.1 |
SPD1333 | Hypothetical protein | 2.1 |
SPD1384 | Cation efflux family protein | 2.1 |
SPD1432 | UDP-glucose 4-epimerase, GalE-1 | 2.1 |
SPD1866 | N-acetylglucosamine-6-phosphate deacetylase, NagA | 2.1 |
SPD0104 | LysM domain protein | 2 |
SPD0140 | ABC transporter, ATP-binding protein | 2 |
SPD0261 | Aminopeptidase C, PepC | 2 |
SPD1350 | Hypothetical protein | 2 |
SPD1822 | Ribosomal large subunit pseudouridine synthase, RluD subfamily protein | 2 |
SPD0453 | Type I restriction-modification system, S subunit | -2 |
SPD0459 | Heat shock protein GrpE | -2 |
SPD1006 | Glucose-1-phosphate adenylyltransferase | -2 |
SPD1799 | Sensor histidine kinase, putative | -2.1 |
SPD0387 | Beta-hydroxyacyl-(acyl-carrier-protein) dehydratase FabZ | -2.2 |
SPD1494 | Sugar ABC transporter, permease protein | -2.2 |
SPD0974 | Class I glutamine amidotransferase, putative | -2.4 |
SPD1600 | Anthranilate phosphoribosyltransferase | -2.5 |
SPD1472 | Isoleucyl-tRNA synthetase | -2.6 |
SPD0681 | Hypothetical protein | -2.7 |
SPD0501 | Transcription antiterminator, Lict | -3.4 |
Table 2. List of genes regulated in the transcriptome comparison of S. pneumoniae strain D39wild-type grown in CDM15 with minimum concentration of L-serine and CDM15 with maximum concentration of L-serine. aGene numbers refer to D39 locus tags. bD39 annotation/TIGR4 annotation21. cRatio represents the fold increase/decrease in the expression of genes in CDM-maximum as compared to CDM-minimum (minus sign indicates downregulation).
We describe a user-friendly protocol that can be applied to perform whole transcriptome analysis of bacteria. The key point about this particular technique is that the condition under which the cells are harvested will vary. After harvesting the cells and RNA isolation, this technique becomes equal for all types of bacterial samples and follows exactly identical steps and therefore, can be applied to any type of bacterial culture. The protocol is very simple and convenient and starts from RNA isolation. Our RNA isolation protocol (using a Macaloid and RNA isolation kit) is time-effective as compared to conventional phenol-chloroform and Trizol methods. In the next step, the preparation of cDNA with transcriptase III enzyme is performed. Next, labelling of cDNA is done by first mixing of cDNA with amine-reactive dyes, and then purifying the labelled cDNA. The labelling is also simple and does not take a lot of time as the samples need to be incubated for about 1 hr in the dark. All these steps can be performed in any standard laboratory as it does not demand any specialized equipment. A microarray scanner is needed to scan slides. For slide scanning and analysis, the GenePix Pro program is used, which is a very simple and user-friendly program22.
After doing all the experiments, analysis was performed using MicroPrep and CyberT. The MicroPrep package, consists of three modules, i.e., PrePreP, PreP and PostPreP9. This data pre-processing framework reduces the time for normalization of data and also reduces the amount of discarded data. The ease with which the software can be used makes it possible for the researcher to have an understanding of the DNA microarray data in minimum time. It takes only a couple of minutes to convert the raw signal data into high-quality data for further processing after slide image analysis is done. Further analysis on the pool of genes regulated in the microarray can be done using different in-house software packages. They include PePPER 20, FIVA 16, DISCLOSE 18, PROSECUTOR 17 and Genome2D 19. These Windows-based tools and software packages are user-friendly and provide deep insight into the data during further investigation. These software packages make it very easy and handy for the researchers to utilize this technology as the data becomes much more meaningful and relevant.
The dyes used in microarrays are known to be susceptible to an ozone-effect, where the dyes become unstable in the presence of ozone and signal strength is so low that it cannot be recognized by the scanner. Amine-reactive dyes which are less sensitive to the ozone-effect are used to label cDNA. The solution to ozone-effect will make this technology even better.
A list was created of genes that were up- or downregulated in the presence of the minimal L-serine concentration as compared to the maximum (Table 2). The upregulated genes can be categorized according to their product’s function. Seven of them encode hypothetical proteins, four are involved in carbohydrate transport and metabolism, a cell division protein DivIB, and certain amino acid transport and metabolism genes. Similarly, there are also certain genes that are downregulated in our tested condition. A BglG-family transcriptional regulator LicT, some carbohydrate genes and some amino acid-specific genes are among the downregulated genes. A heat-shock protein GrpE is also among the down-regulated ones. Therefore, this study provides a complete overview of the genes differentially expressed under the tested conditions. After analysis of the results, sometimes verification of the results is necessary. This can be either done by qPCR or β-galactosidase assays using lacZ promoter fusions.
The authors have nothing to disclose.
We thank Anne de Jong and Siger Holsappel for help with the DNA microarrays slide production. Anne de Jong’s support for bioinformatics analysis is also appreciated. We also thank Jelle Slager for reviewing the paper. Muhammad Afzal and Irfan Manzoor are supported by the GC University, Faisalabad, Pakistan under the faculty development program of HEC Pakistan.
Acid phenol | SigmaAldrich | P4682 | |
Roche RNA isolation kit | Roche Applied Science | 11828665001 | |
Glass beads | 105015 | ||
Chloroform | Boom | 92013505.1000. | |
IAA | 106630 | ||
Nanodrop | Nanodrop | ND-100 | |
Agilent BioAnalyser | Agilent | G2940CA | |
Superscript III | Life technologies Invitrogen | 18080044 | |
AA-dUTP | Life technologies Invitrogen | AM8439 | |
DDT | Life technologies Invitrogen | 18080044 | |
First Strand buffer | Life technologies Invitrogen | 18080044 | |
NaOH | SigmaAldrich | S8045-1KG | |
HEPES | SigmaAldrich | H4034-500G | |
DyLight-550 | Thermoscientiffic | 62262 | |
DyLight-650 | Thermoscientiffic | 62265 | |
SHY Buffer | SigmaAldrich | H7033-125ML | |
Speedvac cooler | Eppendorf | RUGNE3140 | Speedvac concentrator plus |
Hybridization oven | Grant Boekel | Iso-20 | |
Lifter-slips | Erie Scientific | 25x60I-M-5439 | |
Wipe | KIMTECH | ||
SDS | SigmaAldrich | L3771-100g | |
SSC | SigmaAldrich | W302600-1KG-K | |
Genpix autoloader 4200A1 | MSD analytical technologies | Microarray scanner | |
Sodium bicarbonate | SigmaAldrich | 104766 |