Immunostaining for DNA Modifications: Computational Analysis of Confocal Images

For several decades, 5-methylcytosine (5mC) has been thought to be the only DNA modification with a functional significance in metazoans. The discovery of enzymatic oxidation of 5mC to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) as well as detection of N6-methyladenine (6mA) in the DNA of multicellular organisms provided additional degrees of complexity to the epigenetic research. According to a growing body of experimental evidence, these novel DNA modifications may play specific roles in different cellular and developmental processes. Importantly, as some of these marks (e. g. 5hmC, 5fC and 5caC) exhibit tissue- and developmental stage-specific occurrence in vertebrates, immunochemistry represents an important tool allowing assessment of spatial distribution of DNA modifications in different biological contexts. Here the methods for computational analysis of DNA modifications visualized by immunostaining followed by confocal microscopy are described. Specifically, the generation of 2.5 dimension (2.5D) signal intensity plots, signal intensity profiles, quantification of staining intensity in multiple cells and determination of signal colocalization coefficients are shown. Collectively, these techniques may be operational in evaluating the levels and localization of these DNA modifications in the nucleus, contributing to elucidating their biological roles in metazoans.


Introduction
DNA methylation, a well-documented mechanism associated with transcriptional regulation entails modification of cytosine residues in a 5'cytosine-phosphate-guanine-3' (CpG) dinucleotide context via the addition of a methyl group (-CH 3 ) to the 5-carbon atom of the cytosine pyrimidine ring to form 5-methylcytosine (5mC) 1 . In mammals, approximately 70% of CpGs are methylated which constitutes only 1% of their genomes as they are depleted of this palindromic sequence owing to 5mC mutagenic propensity to spontaneously deaminate to thymine 2 .
Presence of methyl groups on gene promoter sequences show strong correlation with transcriptional repression in vertebrates 3,4,5 . Addition of these methyl groups is catalyzed by highly conserved DNA methyltransferase (DNMT) enzymes DNMT3a, 3b and 3L, and DNMT1 which modify CpG cytosines in de novo and maintenance methylation contexts respectively 6 . DNMT3A/B expression is elevated during development in embryonic stem cells and epiblast; however its diminished expression is observed following pluripotent cell lineage commitment to somatic fates during differentiation 8 . Whilst sharing functional redundancy, DNMT3a and 3b display tissue-specific expression patterns, with 3a detected uniformly in mouse embryos but 3b predominantly localized to neuroectoderm and chorionic ectoderm tissues 8 .
Methylation signatures can be inherited during mitosis and meiosis 10 . Maintenance methylation involves DNMT1 facilitated modification of CpG cytosine residues existing in hemi-methylated palindromes on double-stranded DNA 11 . DNMT1 binds DNA at replication forks 11 and consequently, genomic methylation levels peak during S phase of the cell cycle 12 . DNMT1 methylates unmodified cytosines thereby distinguishing newly synthesized DNA strands, promoting X-chromosome inactivation and maintaining transcriptional repression profiles 13 .
Through recognition of hemi-methylated DNA CpG sequences, DNMT1 maintains established patterns of methylation demarcated by de novo methylation e.g. repression of Long Interspersed Nuclear Element 1 (LINE1) retrotransposon promoters to inhibit its potentially carcinogenic propagation 14 . Although possessing hemi-methylated DNA preferential binding affinity, DNMT1 can methylate unmethylated CpG island (CGI) sequences in DNMT3a/b -/mutant cells, fulfilling an emergency de novo methylation role . The Ten-Eleven Translocase (TET) proteins belonging to a conserved family of dioxygenase proteins are capable of iterative oxidation of methyl groups on CpG residues 19 . These Tet proteins, homologous to J-Base binding proteins (JBP) discovered in Trypanosome bruceii, recognise and bind modified DNA bases e.g. 5mC and oxygenate these residues to 5hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) 20 . Tet protein facilitated oxidation of 5mC results in stepwise conformation change from methyl to hydroxyl, carbonyl and carboxylate configurations, however 5fC and 5caC modifications can be synthesized directly from 5mC oxidation 21,22,23,24 .
An informative indication of TET protein activity in understanding their regulation is studying the distribution and abundance of oxi-methylcytosine marks. Significant 5mC presence at CpG poor promoters is detectable in contrast to unmethylated CpG rich regions, the latter being characteristic of CpG islands 25 . Across tissues, highest tissue specific methylation is observed in brain, testis and blood whilst oral mucosa exhibit greatest hypomethylation, indicating a differential methylation pattern occurring at tissue specific promoters 26 .
Through utilizing sensitive anti-5mC and anti-5hmC antibodies in selective methyl/hydroxymethyl DNA immunoprecipitation (meDIP/hmeDIP), and subsequent high throughput sequencing, Ficz et al. demonstrated high 5hmC occupancy at promoters, exons and LINE-1 retrotransposon sequences which correlated with reduced 5mC levels at these locations in mouse embryonic stem cells 27 . Inversely, greatest 5mC enrichment was observed at repetitive satellite sequences where 5hmC presence was limited 28 . Studies performed on human frontal lobe brain tissue reveal highly significant 5hmC enrichment, four fold higher than in mouse embryonic stem cells 28 . In concordance with previous observations, high throughput sequencing of frontal lobe tissue illustrated majority 55-59% of 5hmC signal localized at low density CpG promoter regions, 35-38% within gene bodies and approximately 6% occupancy at intergenic regions. In contrast, 5mC was enriched at intergenic regions (25-26%) and higher within gene bodies (52-55%) but reduced (22-24%) at promoter sequences 29 . These studies indicate abundance of 5hmC in embryonic stem cells and somatic tissue, particularly the brain, however investigations on 5fC and 5caC distributions are limited.
Interestingly, recently discovered in eukaryotes, methylation of adenine residues at the position 6 nitrogen (N 6 ) (6mA) display a genomic abundance profile inverse to that of 5mC 30 . Observations from liquid chromatography coupled tandem mass spectrometry reveal 6mA absolute levels to exist in excess in zebra fish and porcine early embryos compared to sperm with its levels (0.003% of genomic adenines) increasing steadily upon fertilization, peaking at the morula developmental stage (33 fold higher than sperm) and reaching steady state somatic levels of 0.004% of genomic adenines 31 . Immunoprecipitation of 6mA enriched DNA sequences has demonstrated predominant occupancy (approximately 80% enrichment) of this mark at repetitive element regions and transcriptional start sites 32 . These observations contextualize and validate the discovery of 6mA demethylase-null embryonic stem cells exhibiting accumulated 6mA mediated LINE-1 retrotransposon silencing compared to transcriptionally active elements in wildtype cells. These data suggest a transcriptional regulatory function for 6mA 33 .
Whilst conjugated biochemical tags coupled to subsequent DIP assays indicate presence or absence of oxidized methylcytosine derivatives (oxi-mCs), they cannot impart spatial distribution or quantifiable information of these marks 34,35 . A protocol for sensitive immunochemical detection of 5hmC and 5caC was recently developed 36 . This fluorophore conjugated secondary antibody-based immunostaining method coupled with utilizing scanning laser confocal microscopy possesses the unique advantage of providing visual localization of these DNA modifications within the cells, thus, emphasizing individual positively or negatively stained cells corresponding with the heterogeneous presence of these marks. The 5hmC and 5caC absolute signal intensities as amplified by the conjugated antibodies enable semi-quantitative interpretations to be proposed about the magnitude and positions of these marks within the nucleus e.g. heterochromatic and euchromatic regions 37,38 . Here a technique for the computational analysis of confocal microscopy images is described. The generation of 2.5D spatial distribution plots for displaying distinct 5hmC and 5caC signal peaks per pixel and their locations within nuclei is demonstrated. Histogram plots of 5hmC and 5caC signal intensity profiles can illustrate trends in abundance of these marks as the peaks and troughs are plotted as separate non-overlapping channels. Finally, by implementing the colocalization function, the degree of proximity of one signal to another can be determined and as a result of this, their respective genomic coordinates can be identified.

Generation of Confocal Images
1. In preparation for the analysis of modified forms of cytosine, perform immunohistochemical staining as described by Abakir et al. 34 .
2. Carry out imaging of slides out using a microscope and save files in an LSM format.
NOTE: When comparing intensity profiles between samples the values used for laser power and gain for each channel must be maintained throughout the image taking procedure to allow for direct comparison at the image analysis stage.  Table' and 'Image'. 1. Select the third icon along at the top of this tab (Close Bezier text will appear when the mouse hovers over the tool) then use this tool to encircle a single nuclei, values for this nuclei will then appear in the table. NOTE: For each encircled nuclei a scatter plot is produced for red versus green fluorescence which is visualized in the left hand panel on the screen. The axis is then moved in order to gate out cells depending on the threshold. As the nuclei of interest are selected manually in the images, adjusting the gating thresholds above zero is not required. All pixel intensity values observed within the nuclear perimeters are considered as positive signal.

Co-localization Analysis
6. Repeat this process by selecting the third icon in the tab and encircling subsequent nuclei until the dataset is completed. 7. To export this data, right click on the table, select save table and save to an appropriate location. 8. To save the colocalization image, select the File menu and select 'Export…' then choose; Tagged Image File (Format) and 'Contents of Image Window -single plane (Data)' followed by clicking 'Select file name and save…', choose an appropriate location and click 'Save'. 9. Plot colocalization data in a spreadsheet followed by statistical analysis.

Representative Results
To determine the spatial distribution of 5hmC and 5caC in differentiating hepatic progenitors immunostained slides for these DNA modifications were imaged using a microscope.
Initial analysis of the spatial distribution of these oxi-mCs was carried out through the generation of 2.5D intensity plots (Figure 1A, 1B). The red and green peaks appeared to be well defined with the limited presence of orange peaks indicating only a small overlap of the 5caC and 5hmC signals. In agreement with these results, the profiles for 5hmC and 5caC intensities in the corresponding cell do not strongly coincide with each other (Figure 1C). This illustrates that 5caC and 5hmC exhibit distinct patterns of nuclear distribution in hepatic progenitors suggesting that Tetdependent oxidation of 5mC leads to generation of different oxidized derivatives of this DNA modification in specific chromatin regions. To compare the levels of 5caC and 5hmC in Daoy medulloblastoma and BXD-1425EPN ependymoma cells, immunostaining for these oxi-mCs was carried out on both samples under identical experimental conditions. Intensities of 5caC and 5hmC signals were then compared in 20 profiles generated across 3-5 cells recorded for each cell line (Figure 2A-2D). The quantification of these results revealed that BXD-1425EPN cells exhibit significantly lower levels of both 5hmC and 5caC immunostaining compared to Daoy cells ( Figure 2D).
Considering the two channels red (5hmC) and green (5caC) which can be denoted as R and G respectively, the Pearson's correlation coefficient (PCC) describes the degree of spatial overlap and co-segregation of R & G assuming both marks exist in a linear relationship with each other, denoted by R statistic 34 . Squaring the PCC value, denoted by R 2 enables the measurement of green signal intensity variation which is influenced by red signal intensity 34 . This is superfluous for our 5caC vs 5hmC co-localization analysis.
To examine the spatial distribution of 5caC and 5hmC in undifferentiated (Undiff) hiPSCs and hepatic endoderm 24 hours (HE24) after induction, colocalization analysis of these DNA modifications was carried out on the same confocal images, with R values for 20 nuclei analysed for each cell line (Figure 3A-3C).
The quantification of these results demonstrated that degree of colocalization is significantly higher (P < 0.05) in HE24 compared to Undiff (Figure 3c). This may be attributable to the reduced magnitude of 5caC presence and thus its reduced signal intensity in Undiff cells.

Discussion
This protocol describes the stepwise process of generating visual representations of DNA modifications within nuclei. Whilst sensitive and impactful, these techniques do possess a number of limitations.
It is necessary to emphasize that the data generated from 2.5D plots, signal intensity profiles and colocalization analysis are semi-quantitative in nature. Owing to the point by point illumination of the sample during image acquisition on the laser scanning confocal microscope, absolute signal intensities of each channel are recorded. However, these are not absolute magnitudes of 5hmC and 5caC being detected and only represent indications of the presence, absence or scale of these epigenetic marks.
Due to the repetitive nature of signal amplifications, limitations consistent with the magnitude of machinery applied to enhance the signal inevitably confound approximations of true physical genomic occupancy of these marks 34 . The true genomic localization of these marks may be obscured by these bulky proteins, which physically occupy areas unresolvable even at optimal confocal resolution limits of 200-300 nm 34 . Moreover, the signal intensities of the individual channels for each mark cannot be compared to each other due to differences in antibody Copyright © 2017 Creative Commons Attribution 3.0 License September 2017 | 127 | e56318 | Page 6 of 7 sensitivity and fluorophore conjugation 34 . However the information obtained with imaging sheds light on the spatial and temporal organization of the genomic marks.
For accurate quantitation of 5hmC and 5caC signals, nuclei are highlighted and encircled against the DAPI counterstain, clearly demarcating the region to be analyzed. This procedure dispenses with the requirement for calibrating gating thresholds as background noise is disregarded through the manual process of selecting nuclei 1 . An automation of this process can be achieved in the latest software which allows for nuclear segmentation to be performed. Whilst free alternative software packages such as FIJI are available for colocalization analysis, drawbacks such as the requirement to convert images to binary 8 bit formats, masking images and inputting size exclusion data to select regions of interest limit the user-accessibility of this program. Additionally, considering the overall limited levels of 5hmC and 5caC in the genome coupled with their predominant location at euchromatic regions and, depletion of genomic CpG sequences in general, manual encircling of nuclei introduces user-bias. Therefore, the highlighting of nuclei automatically assumes all events occurring within the demarcated region to be positive and disregards any background pixels which may be present. Thus, analyzing images based on pixel intensity may be more meaningful. These factors are important when considering the use of Mander's colocalization coefficient to analyse the degree of spatial overlap between two signals, irrespective of their signal intensities as oppose to Pearson's correlation coefficient which assumes a linear relationship between signals.
Overall, the techniques outlined here carry the theme of visual representation of biological data in an intuitive format. Whilst semi-quantitative image analysis cannot supersede powerful techniques such as mass spectrometry in terms of sensitivity or single base resolution genome wide sequencing, it does provide complementary data allowing inferences and hypotheses to be conceived from analysis of images with precision.

Disclosures
No potential conflicts of interest were disclosed.