$$\rightleftharpoonup{xx}$$
$$\longleftharp{xx}$$,
$$\longrightharp{xx}$$,
The example set of images generated using the reverse-transfection siRNA screening protocol have been prepared for and analyzed using Cell Profiler software. The resulting numerical raw data is such that every cell is individually represented, traceable back to its image and well of origin and measured for several fluorescence intensity parameters (Figure 6A). For each cell identified the mean nuclear fluorescence intensity for the P-S780 RB1 antibody and the integrated DNA intensity for the DNA dye-defined nuclear masks are determined. Mean GFP intensity values for nucleus and cytoplasm regions of each cell are also recorded allowing the calculation of nuclear versus cytoplasmic fluorescence of the GFP-CDK2 reporter. Downstream of these algorithmic fluorescence intensity measurements use is made of these individual cell data to define gates for two assays, nuclear antibody staining and GFP-CDK2 reporter. Subsequent annotation of the cells on the basis of assay outcome and use of these labels to enable specific subpopulations to be further characterized by a third measurement (nuclear DNA content) is described.
Histogram plots of the raw fluorescence intensity data gathered for each assay are an effective way of assessing how cell subpopulations behave under different conditions. The histograms in Figure 6B show the population distributions of individual cell data from triplicate wells for each RNAi knockdown condition. To the left are the data for nuclear antibody intensity and on the right are the corresponding data for the GFP-CDK2 reporter. The P-S780 RB1 antibody data reveals that the cells broadly exist in two populations with regard to this post-translational modification and that cell populations with loss of RB1 phosphorylated on S780 can be distinguished as a left-hand peak of nuclear intensity which is enriched when CDK6 is knocked down by siRNA. This same left-hand peak is seen when RB1 itself is the RNAi target, reflecting the outright removal of the protein and thereby P-S780 RB1 staining. In contrast, the same experimental conditions for the same cells, when observed via the GFP-CDK2 reporter assay, show a different dynamic in the individual cell data. A continuous distribution is observed, with only a single peak, but siRNA which disturbs the cell cycle (siCDK6) and causes accumulation in G1 phase results in an extension of the right-hand shoulder of that distribution (i.e. indicating enhanced presence of cells showing an increase in the nuclear/cytoplasm GFP ratio, plotted on the X-axis).
Also shown on the histograms of Figure 6B are the gate values (vertical bars) that are chosen on the basis of the distributions of both sets of assay data. The rule used for the P-S780 RB1 antibody data is to define the gate position as the half-height:maximum-width position on the left shoulder of the main (right) peak when considering the negative control cell data (non-targeting siRNA). Data highlighted red are cells with reduced and absent P-S780 RB1, which are identified with this gate. A similar gate positioned on the opposite shoulder of the ratio value distribution is used for the GFP-CDK2 reporter. The resulting high-ratio subpopulation cells, which lack or feature reduced CDK2 activity, are shown in green. To illustrate multiplexed analysis of the two assays Figure 6C shows the implementation of both gate values using the 2_gate_classifier.pl perl script to convert the raw data (Figure 6A) into the annotated file below. This new file includes the original data alongside a new column of class labels for each cell and the two gate values used to distinguish them (in this case gates of 0.004 for the antibody data and 1.5 for the GFP-CDK2 reporter were used, respectively).
Having classified the individual cells from each knockdown condition on the basis of the two assays it is now possible to use these class labels to assist the annotation of plots of the assay data. Figure 7 shows scatter plots of the individual cell data for the P-S780 RB1 and GFP-CDK2 assays from the example data set for all three RNAi conditions. Numbers annotating the quadrants on the scatterplots show the relative percentages of each gated subpopulation to the whole for that knockdown context and are generated in R using the class labels described above. These plots reveal that, compared to cells transfected with non-targeting siRNA (Figure 7B), cells transfected with siCDK6 reveal a net data distribution shifted both downward on the Y-axis (indicating absence of RB1 phosphorylation at serine 780) and to the right on the X-axis (indicating low CDK2 activity, Figure 7C). Both of these shifts are expected for knockdown of this target. In contrast to this, the data from siRB1 transfected cells (Figure 7A) show a loss of the antibody staining in keeping with loss of the epitope, but little effect in the data distribution for the CDK2 reporter compared to controls transfected with non-targeting siRNA, suggesting no great effect on the GFP-CDK2 reporter arises from RB1 knockdown.
To further explore the use of individual cell data, subpopulation classification and assay multiplexing Figure 8 shows the scatter plot for the siCDK6 data from Figure 7C alongside paired histogram profiles for integrated DNA intensity. The pairs of histograms relate to opposing halves of the entire population, divided on the basis of either antibody intensity (right of the scatterplot) or GFP-CDK2 reporter ratio values (above the scatterplot). Quantification of nuclear DNA intensity for these populations shows two peaks characteristic of 2N and 4N DNA content as left and right peaks, respectively. The intentions of the gates shown in Figures 6, 7 and 8 are such that cells identified as low for P-S780 RB1 (labeled: P-S780-) or with a high ratio value from the GFP-CDK2 reporter (labeled: G1) will be in G1 phase of the cell cycle. Indeed, the DNA profile histograms for subpopulations identified with either of these assays predominantly contain cells with 2N DNA content. DNA profiles of the oppositely gated population (labeled: P-S780+ or Non-G1) contains cells with distributions ranging from 2N to 4N, in keeping with such cells adopting a range of cell cycle positions post-G1 phase.
Although the focus here has been the generation and analysis of individual cell data from fluorescently stained images, it is also useful to be able to take these data and summarize each assay on a well-by-well basis to monitor variability between replicates and the performance of all the wells for a given assay across a whole plate of data. Figure 9 shows the data from each siRNA treatment summarized as the mean values from triplicate wells for the percent cells within the gates applied to A) the P-S780 RB1 data and B) the GFP-CDK2 reporter data. The values plotted in A and B are produced by two additional Perl scripts provided with this manuscript; ‘antibody_fluorescence_summary.pl’ and ‘G1assay_summary.pl’, respectively. These scripts use the raw data created by Cell Profiler (Nuclei.csv) and report data per well as i) total cells measured per well, ii) number of cells within the gate, iii) percent cells within the gate and iv) the arithmetic mean of the measured, raw data for that well. This is included as an option suitable for looking across large sets of assay data, prior to focusing on individual treatment data using multiplexed assessment of individual cell data as illustrated in Figures 7 and 8. The charts displayed here plot ‘iii) percent cells within the gate’ for both assays, which suit the non-normal data distributions seen for the P-S780 RB1 and GFP-CDK2 data in the histograms in Figure 6B. These scripts also calculate ‘iv) the arithmetic mean of the measured, raw data for that well’, which would suit analysis of data for homogeneous population responses and normal data distribution before and after experimental perturbation.

Figure 1: Overview of the steps in the workflow to quantitatively analyze fluorescently labeled microscope image data. The workflow is represented here as four steps. (A) First it is necessary to experimentally prepare cells for fluorescent imaging. The example described here is that of a screen in which siRNA-treated adherent human tumor cells are grown for 48 hours, fixed and stained on a 96-well tissue culture plate. Different RNAi conditions are present in triplicate in separate wells within the plate. Cells are stained with a DNA dye, an antibody specific for RB1 phosphorylated on the CDK4 and 6 selective target site Serine-780 (P-S780 RB1) and they also stably express a GFP-CDK2-reporter, reporting G1 cell cycle exit. Collectively these fluorescent probes constitute two assays separately assessed within the workflow. (B) Parallel microscope images for each fluorescent probe (channel) are generated and named such as to include details by which the image analysis software can organize the data. (C) The image files are loaded into the Cell Profiler software, which algorithmically identifies individual cells and the associated pairs of nuclei and cytoplasm before yielding intensity measurements for the three fluorescent probes detected in each. (D) Finally, a Perl script is used to organize the raw quantitative data produced. This step applies gates to the fluorescence intensity data for each cell, effectively binning the cells into subpopulations, which can be plotted, tracked and cross-examined. Please click here to view a larger version of this figure.

Figure 2: Experimental data to be obtained by image analysis. The fixed, siRNA-treated, fluorescently labeled cells from the example data set were imaged and corresponding intensity measurements taken per cell. Representative image data are shown for each parameter recorded during image analysis. (A) Nuclear DNA intensity: The intensity of staining of nuclear DNA dye is used to yield a measure of DNA per nucleus. (B) Nuclear intensity of phospho-RB1: Immuno-staining specific for P-S780 RB1 using a primary (black) antibody and fluorescently tagged secondary antibody (red) enable an intensity measurement of RB1 phosphorylation at S780 per nucleus. (C) GFP-CDK2 reporter: The cells used stably express a GFP-tagged reporter protein that translocates between the nucleus and cytoplasm in a set pattern with the cell cycle. Dual measurement of the paired nuclear and cytoplasmic GFP intensity for each cell allows calculation of a ratio per cell that can be used to distinguish G1 phase from the rest of the cell cycle. Three siRNA targets will be used to illustrate the analysis; a non-targeting negative control siRNA; CDK6 siRNA as a positive control in perturbing RB1 phosphorylation and cell cycle progress; RB1 siRNA to establish antibody specificity. Please click here to view a larger version of this figure.

Figure 3: Organization of image files prior to image analysis. The images taken from the tissue culture plate are named systematically in order to allow the image analysis software to relate the image data back to the original experimental context. This information is placed within the filename for each image. (A) As each well on the experiment plate may correspond to different RNAi targets or treatments, the well address forms part of the filename. (B) The frame number is part of the filename as each well is imaged to collect multiple, non-overlapping frames. (C) Fluorescent probes from each frame are imaged separately; consequently the filenames also need to reflect which channel each image relates to. (D) Example filenames, relating to well (G12), Frame (2) with each image representing one of the channels (Blue, Red, Green). Dotted lines link the filename elements to the relevant schematic representations for Well, Frame and Channel, respectively. Please click here to view a larger version of this figure.

Figure 4: Use of Cell Profiler to measure nuclear DNA and antibody staining. With the settings in the provided pipeline file (3_channels_pipeline.cppipe), the Cell Profiler image analysis software measures fluorescence intensity values for nuclear DNA and antibody-binding relating to individual cells. (A) Nuclei are identified in the ‘blue’ channel image of stained DNA. (B) The positions of the DNA stained nuclei are temporarily held in a ‘Nuclei mask’. The Nuclei mask is then overlaid onto (C) the blue and red channel images (DNA and antibody fluorescence data, respectively) and the fluorescence values from image segments that overlap with the mask are recorded against each identified cell. Successful identification of separate, neighboring nuclei can be visually assessed in the appearance of the Nuclei mask. For illustration, shown circled in this mask image, are examples where the chosen settings for the algorithm have mis-identified neighboring nuclei as a single nucleus. Adjusting the algorithm settings to minimize these events is introduced in the Discussion section. Please click here to view a larger version of this figure.

Figure 5: Use of Cell Profiler to measure nuclear and cytoplasmic GFP intensities. The GFP-tagged CDK2 reporter translocates between the nucleus and cytoplasm in relation to the cell cycle position of the cells. At the same time that Cell Profiler calculates the DNA and antibody nuclear intensities per cell (Figure 4), it also calculates the nuclear to cytoplasm ratio of GFP intensities for each cell. (A) The DNA dye data for each image is used to generate a Nuclei mask. (B) Cell Profiler uses the Nuclei mask in conjunction with the GFP image from the GFP-CDK2 reporter to seed the position of each cell and then expands to each cell’s perimeter to estimate the whole footprint of each cell. This becomes a new, ‘Cell mask’. (C) The nuclei mask is subtracted from the Cell mask to yield a donut-like series of cytoplasm outlines, which become the ‘Cytoplasm mask’. (D) The nuclei mask and cytoplasm mask are used by Cell Profiler to measure pairs of nuclear and cytoplasmic GFP values. These paired values are then used by Cell Profiler to calculate ratios, which inform as to each cell’s position in the cell cycle. Please click here to view a larger version of this figure.

Figure 6: Data extraction - Processing raw individual cell data by imposing gates on assay values. Biological trends from the individual cell data for the antibody staining and GFP-CDK2 reporter assays are extracted using gated data. Histograms of the raw data enable identification of suitable gate values. These are then imposed with a Perl script. (A) The end product of analyzing the image files with the provided settings for Cell Profiler are comma-separated-value (.csv) files. These files contain individual cell data relating to each of the different sub-cellular segments. The file ‘Nuclei.csv’ contains all the selected measurements relating to the use of the Nuclei mask. These measurements include nuclear antibody intensity, nuclear DNA intensity and the GFP ratio (nucleus/cytoplasm). (B) Histograms of nuclear antibody intensity (left) and GFP-CDK2 reporter ratios (right) plotted from individual cell data for each siRNA knockdown condition. The bars on the displayed histograms show the desired gate positions for these assays. Colored data on the histograms indicate the gated subpopulations. (C) The gates for the two assays illustrated in B are applied to the raw data using the Perl script ‘2_gate_classifier.pl’. The script creates a modified copy of the original Cell Profiler output (Nuclei.csv) file to assist subsequent plotting. The two gate values are recorded in the new file (highlighted in color here) and a new ‘Label’ column is added. The labels bin each cell into one of four possible subgroups based on the two gated assay values for each cell. These labels are used in subsequent plots which feature calculations of the contributions of each subpopulation as well as the cross-referencing of additional parameters generated in Cell Profiler. Please click here to view a larger version of this figure.

Figure 7: Scatter plots for each siRNA condition depicting raw data for individual cells and gate positions. Scatter plots of individual cell data from all images for the siRNA conditions indicated: (A) siRB1; (B) siNon-targeting negative control; (C) siCDK6. Plotted against the Y-axes are values of nuclear fluorescence from anti-P-S780 RB1 staining. Plotted against the X-axes are the corresponding ratio values calculated from the GFP-CDK2 reporter. The red and green bars indicate the positions of the gates for the P-S780 RB1 gate and the GFP-CDK2 reporter gates, respectively. The two gates divide the cells into four subpopulations and the numbers over the resulting quadrants are the percent number of cells from each of these. Annotations around the axes for A indicate the four possible label-elements applied to each cell by the 2_gate_classifier.pl Perl script. These labels are shown in relation to their respective assay gate and are used in the R-script (analysis.r) to generate the plots in Figures 6, 7 and 8. Please click here to view a larger version of this figure.

Figure 8: Cell subpopulations defined by the two G1 transit assays show 2N and 4N DNA profiles in keeping with assay outcome. The scatter plot of data for the siCDK6 cells is repeated from Figure 7C. Surrounding the scatter plot are histograms for integrated nuclear DNA intensity relating to subsets of the population. Those above the scatterplot relate to the GFP-CDK2 reporter assay. Those to the right of the scatterplot relate to nuclear phospho-RB1 antibody measurements alone. The colored gate lines are extended to show their relation to the histograms. Gate labels by which the cell data were selected for these additional plots are also shown. Cells with loss of RB1 phosphorylated on serine 780 (P-S780-) or those with a high GFP-CDK2 reporter nuclear to cytoplasmic ratio (indicating low CDK2 activity) show predominantly 2N-like DNA profiles, whereas their opposite counterparts for each respective assay show a distribution of 2N and 4N, characteristic of a mixed, post-G1 phase population of cells. Please click here to view a larger version of this figure.

Figure 9: Summary plots of gated assay values for each siRNA condition. Summary data plots of gated (A) P-S780 RB1 data and (B) GFP-CDK2 data from triplicate wells for each siRNA knockdown condition. Values were calculated from the raw Cell Profiler output (Nuclei.csv) using the Perl scripts, ‘antibody_fluorescence_summary.pl’ (A) or ‘G1assay_summary.pl’ (B). The values plotted are means of the percent cells within the gate applied to each assay. Bars indicate standard errors calculated from triplicate wells. Unpaired, homoscedastic T-Test P values for each knockdown condition compared to non-targeting siRNA are shown above the plotted data where P < 0.001 (**) and P < 0.05 (*). Please click here to view a larger version of this figure.
Figure S1. Setting up Cell Profiler software for image analysis. (A) Screenshot of Cell Profiler before any image analysis settings are entered. (B) Screenshot of Cell Profiler after the algorithm details contained in ‘3_channels_pipeline.cppipe’ have been loaded. The highlighted tab in the upper left corner indicates that this screen shows the parameters for the ‘LoadImages’ stage of the analysis. Clicking on the other parts of the list below this will reveal the details for the subsequent steps in the analysis. (C) Screenshot of Cell Profiler with details for Input Folder and Output Folder entered. (D) Screenshot of Cell Profiler after the ‘Analyze images’ button has been clicked to begin analysis. Superimposed are three new windows illustrating the algorithmically-produced masks generated by the software from the images under analysis. These windows are accessed by clicking the ‘eye’ icons to the open position next to the relevant steps in the analysis, in the upper left corner of the main Cell Profiler window. These views help the user to verify whether the settings generating the confetti-colored masks agree with the accompanying, original, greyscale data.
Figure S2. Use of Perl and RStudio to gate individual cell data and plot the resulting cell subpopulations. (A) The right panel shows the folder chosen to receive the output .csv files (green icons) from the Cell Profiler analysis. The Perl scripts provided with the manuscript (blue icons) are copied into this folder. Highlighted is the ‘2_gate_classifier.pl’ Perl script, which has been double-clicked with the mouse to produce the dialogue box in the left panel. Shown are the prompts and corresponding typed answers necessary to gate the individual cell data from the ‘Nuclei.csv’ file. (B) Screenshot of RStudio immediately after loading the ‘analysis.R’ script. Highlighted are the commands to upload the gated data from A into the software prior to plotting (note details in lines 5 and 6 will need to be adjusted according to where the gated data is located on the computer used for analysis). (C) Screenshot of RStudio once data has been uploaded. (D) Screenshot of RStudio showing highlighted the block of code required to produce the plot shown in the lower right window. Codes for each plot are separated by blank lines and grouped by type of plot.
| siRNA target | Plate well addresses |
| Non-targeting (NT) | E5, F5, G5 |
| Retinoblastoma (RB) | E7, F7, G7 |
| Cyclin dependent kinase 6 (CDK6) | B2, C2, D2 |
Table 1: Well addresses and corresponding siRNA conditions used in the example data set.