RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
A subscription to JoVE is required to view this content. Sign in or start your free trial.
Research Article
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
This report describes a method involving an R script in the open-source software RStudio to analyze large-scale datasets obtained from time series experiments.
Large datasets are increasingly common in the scientific field. It is important to develop user-friendly tools to allow researchers to analyze these large datasets with ease. Here, we introduce a method involving an R script in the open-source software RStudio to analyze large-scale datasets obtained from time series experiments. This method requires minimal input from a user, allowing a beginner who does not have prior R knowledge or programming experience to use it. The detailed instructions described here and in the R script shall further guide users on how to use the method. The input data and the output results are stored in the same folder of a local computer, making it possible to do the analysis anywhere and anytime. The output results are organized into folders for easy interpretation, and they can be conveniently processed to generate figures for publications. This method has been successfully used to analyze circadian clock data and reactive oxygen species burst data, both containing large-scale datasets from time series experiments in a 96-well-plate format. We believe that this method provides a facile and powerful solution for researchers in analyzing similar large datasets obtained through time series experiments.
With the increased availability of large datasets in the scientific field, it is important to develop user-friendly tools to allow researchers to quickly analyze these large datasets with accuracy and ease. One type of the common large dataset comes from the use of the luciferase gene as a reporter, which has allowed an easy, continuous, and noninvasive examination of gene expression in live cells and organisms. Automation in luminescence recording has transformed the measurement of luciferase luminescence and led to an expansion of data collection, in particular, in the circadian clock field1,2. Using 96-well microplates and an automatic plate reader with a stacker, thousands of samples expressing the luciferase gene can be individually assayed in time series, sometimes at one-hour intervals for days, in one experiment. Such high-throughput experiments have resulted in the production of large datasets that traditional gene expression experiments using hand sample collection, followed by RNA processing, could not possibly achieve. Analyzing such large datasets in a timely manner is important but can be challenging.
Although there exists a plethora of tools to analyze data for rhythmicity, many of the tools analyze animal behavior-based assays rather than luminescence reporter expression3,4,5,6,7 (Supplemental Table S1). Some tools require researchers to have prior computer programming skills, such as Python skills or access to MATLAB. Other tools require the purchase of software, which can be costly. Some free workable solutions are available online. One such tool is BioDare28, which offers a variety of different methods to analyze rhythmicity data. BioDare2 is a user-friendly online tool and requires minimal computational expertise. Users need to upload data input online and download data output from the online interface for further processing.
Here, we present user-friendly R scripts with multiple capabilities for analyzing large-scale datasets with ease. We use the free, open-source software RStudio9, an interface for R and Python, to run the scripts. RStudio can be used on various computer systems, including Windows, Mac, and Linux. In this report, detailed stepwise instructions are provided to guide users on how to use the R scripts, specifically in protocol sections 1 and 2. This method requires minimal input from the user. A beginner who does not have prior R knowledge and who has no programming experience shall be able to use the method to analyze large datasets from luciferase assays or other types of datasets with time series data. All input and output data are stored on a local computer, and thus, an analysis can be done anywhere without the restriction of internet access, once all the relevant R packages are downloaded for the first time. The output data are sorted into well-organized folders with results ready to be processed for publications. Statistical analyses are also included as part of the output to provide a quick assessment of the differences among the samples. Thus, the R method could provide a facile and powerful solution for researchers in analyzing large datasets.
1. Luciferase-based circadian clock analysis
2. Luminol-based ROS Assay
Case study 1. The luminescence assay for circadian clock activity with Arabidopsis seedlings
We previously showed that the GLYCINE-RICH RNA-BINDING PROTEIN 7 (GRP7) gene was controlled by the master clock protein CIRCADIAN CLOCK-ASSOCIATED 1 (CCA1) and circadian expression of GRP7 is important for its role in plant defense, using transgenic Col-0 plants expressing the luciferase reporter under the control of the wildtype GRP7 promoter (pGRP7wt:LUC)21. We analyzed the circadian clock activities of these transgenic plants along with the control plant CCA1:LUC/Col-0, using the R script named LUC_2025.R (Supplemental File 1 in protocol section 1).
The input file named NO7.csv (Supplemental File 2) has luminescence readings for seven independent pGRP7wt:LUC lines and the CCA1:LUC/Col-0 control ( Supplemental File 2 (NO7.csv)). After running the script, the output subfolder called NO7 output will be generated under the same folder as the input file NO7.csv (Supplemental File 2 (NO7.csv)). The files of the NO7 output folder are described in Table 1 and can be conveniently viewed with the tree structure in Supplemental Figure S2. Values in the NO7 output folder were further processed to make Figure 3 and Figure 4. Figure 3 shows that the CCA1:LUC reporter displayed an amplitude of 3,000 RLU, a period of 23.5 h, and a phase of 3.5 h. These clock parameters are largely consistent with previous reports22,23. A different expression pattern was observed for the pGRP7wt:Luc lines. While all the pGRP7wt:LUC lines appeared to be similar in period and phase, there were differences in the amplitude values of these lines, likely due to positional effect of the transgenes in chromosomes. These observations were further confirmed when the period, amplitude, and phase parameters were computed via the R script (Figure 4). To validate this analysis, the same dataset was reanalyzed using BioDare2, a free online platform for circadian data analysis8. The results from the R analysis were comparable to those obtained from BioDare2 FFT-NLLS (NLLS) algorithm8,24 (Figure 4).
Case study 2. The luminescence assay for circadian clock activity with mammalian cells
The R script LUC_2025.R (Supplemental File 1 was further used to analyze the circadian clock activity displayed by mammalian cells25. The U2 OS cell line expressing a circadian clock reporter is a commonly used model cell line to gauge mammalian circadian clock activities26,27. We re-analyzed time series data generated with U2 OS cells expressing the Per2d:Luc reporter cultured in a 96-well plate. The cells were treated with siRNA molecules targeting specific genes. Figure 5 shows that the negative control cells, which were not treated with siRNA, showed a period of 23.3 h, a phase of 2.8 h, and an amplitude of 184.8 RLU. As expected, the siRNA targeting the CRY2 gene significantly dampened the amplitude and affected the period and phase of the reporter. PSMD4 and PSMD7 genes encode proteins that are a part of 26S proteasome lid component for protein degradation. Consistent with the previous report25, the R analysis shows that knocking down of PSMD4 or PSMD7 by their respective siRNA does not affect the clock parameters. Thus, this R script is readily applicable to different experimental systems for circadian clock studies.
Case study 3. The ROS assay for a defense response
In addition to large datasets from luminescent circadian clock assays, the R script can be adapted to analyze other data types. Here we present one such application for quantifying reactive oxygen species (ROS). Plants are known to have evolved various strategies to fight against pathogen invasion. One of the strategies is to recognize non-self-molecules from a pathogen and subsequently activate innate immune responses. One such early immune response is a ROS burst, occurring within minutes when a host encounters a non-self-molecule. A typical ROS assay was conducted with a 96-well plate, containing 12 leaf discs per genotype per treatment (protocol section 2). Here, two common elicitor molecules, flg22, a 22-amino acid peptide derived from the conserved region of bacterial flagellin proteins28, and elf26, a 26-amino acid peptide from the elongation factor Tu protein29, were used to induce ROS burst. The script, Supplemental File 3 (ROS_2025.R), was developed for ROS data analysis. Two CVS files from ROS assays, Supplemental File 4 (ROS_flg22.csv) and Supplemental File 5 (ROS_elf26.csv), that were converted to the format of the R analysis, can be downloaded from the Supplementary Material section. After R analysis, the output folders shall be generated in the same folder as each input file in one's own computer, containing the ROS burst curves and the total ROS values during the assay time, along with statistical analyses (Supplemental Figure S4). The data were further processed to make Figure 6. The results shown here are similar to those published, which were processed manually30.

Figure 1: Flow chart of the luciferase assay for R analysis. Seedlings expressing a luciferase reporter driven by a clock promoter were sterilized and grown on 1/2 MS media in LD for 4 days. Seedlings were transferred to 96-well plates containing 180 µL of 1/2 MS medium containing D-luciferin. Each well contained one seedling. After 1 day in LD followed by 1 day in LL, luminescence was recorded with a plate reader. The seedlings on a plate were typically recorded for luminescence in LL at 1 h intervals for 5-7 days. After the recording, plates were photographed to assess seedling growth and the raw data were saved as a CSV file for R analysis. Abbreviations: LD = 12 h light/12 h dark; LL = constant light. Please click here to view a larger version of this figure.

Figure 2: Flow chart of luminescence data acquisition and R analysis. ( A) A five-step procedure is outlined for circadian clock analysis using the R script. Step 1. Set up experiments either as 8 or 12 seedlings per genotype and/or per treatment; Step 2. Record luminescence in LL at 1 h intervals for 5-7 days; Step 3. Obtain and format data in a CSV file; Step 4. Analyze data using R; and Step 5. View output data. The starting time for recording can be at any time. However, because the R script only takes integers (whole numbers), the recording intervals must be a whole number. (B) Screenshot of an input CSV file correctly formatted for the R script. The original input file, NO7.csv, can be found in Supplemental File 2. Please click here to view a larger version of this figure.

Figure 3: Circadian expression of pGRP7wt:LUC in transgenic plants. Luminescence traces of pGRP7wt:LUC are shown. The bars under the x-axis indicate subjective day (open bars) and night (gray bars). The luminescence trace of each genotype is an average of 12 replicates. Error bars were not shown due to the large number of curves. Abbreviation: RLU = Relative luminescence units. Please click here to view a larger version of this figure.

Figure 4: A comparison of output data from the R script and BioDare2. The same set of data shown in Figure 3 was analyzed by the R script and by BioDare2 for circadian clock parameters, amplitude, period, and phase. Data represent mean ± SEM (n=12). Different letters indicate significant difference among the samples (P < 0.05; One-way ANOVA with post hoc Tukey's HSD test). Please click here to view a larger version of this figure.

Figure 5: Analysis of circadian clock activity with mammalian cells. Time series data generated with U2 OS cells expressing the Per2dLuc reporter cultured on a 96-well plate in DD were described previously 25. siRNA molecules targeting CRY2, PSMD4, or PSMD7 were used to treat the cells. ( A) Luminescence traces. ( B) Amplitude, Period, and Phase of Per2d:Luc. Data represent mean ± SEM (n = 3). Different letters in panel ( B) indicate significant difference between the negative control and an siRNA-treated sample (P<0.05; One-way ANOVA with post hoc Tukey's HSD test). Abbreviation: RLU = relative luminescence unit. Please click here to view a larger version of this figure.

Figure 6: ROS burst analysis by R. Seedlings were recorded for relative luminescence unit immediately after being treated with 1 µM flg22 (left) or 1 µM elf26 (right). ( A) Luminescence traces averaged from 12 seedlings per genotype (n = 12) in a time course post elicitation. The average values per genotype per treatment are part of the R output. ( B) Averaged total luminescence counts for each genotype with either flg22 or elf26 treatment. Data represent mean ± SEM (n = 12). Different letters indicate significant difference among the samples (P < 0.05; One-way ANOVA with post hoc Tukey's HSD test). Abbreviation: RLU = relative luminescence unit. Please click here to view a larger version of this figure.
| #1__Plate_NO7 Avg Per_Pha_Amp | This is the CSV file containing averages of the period, phase, and amplitude with SEM for each treatment. The treatment was defined as a genotype with or without a specified treatment. |
| #2__Plate_NO7 Graphs | This is a PDF file that contains the graphic output for period, phase, and amplitude. The graphs are presented in groups and individually for each treatment. This includes bar graphs and box plots for the period, phase, and amplitude of the ARS method, as well as luminescence curves. |
| #3__Plate_NO7 Averaged LUC Data | This is the CSV file where each treatment is averaged for each time point so that the user can easily make their own luminescence graphs to include or exclude any treatments they wish and possibly normalize the luminescence using their preferred method. |
| >#4__Plate_NO7 Individual Wells | >This folder contains values for individual wells. One such file is the CSV file where the period, phase, and amplitude of each individual sample (seedling) is located. This is particularly useful for looking at individual seedlings in case there are contaminated wells that the user wishes to later exclude after getting the data. These data are also organized in separate files for period, phase, and amplitude for convenience in using tools such as Prism to graph. There are also individual luminescence data in time serials organized according to treatment for the user’s graphing convenience. NO7 96 Well Individual PerPhaAmp: average values for period, phase, and amplitude for each genotype and treatment. NO7 LUC Replicates: individual well-LUC values grouped by genotype and treatment. NO7 PrismAmplitude: average values for amplitude ready for Prism analysis. NO7 PrismPeriod: average values for period ready for Prism analysis. NO7 PrismPhase: average values for phase ready for Prism analysis. |
| >#5__Plate_NO7 ANOVA | >This folder holds files of the averaged period, phase, and amplitude merged with the p-values from an ANOVA. Files #1-8 show the p-values compared to one specific treatment, e.g. #1 file uses the #1 sample as the baseline for a comparison. In addition, NO7 All ANOVA Results is a file that contains all of the ANOVA comparisons if the user wants a comprehensive view. NO7 DataForANOVA is a file that is set up with the data to run a new ANOVA in R, using our auxiliary script. This is in case the user wants to run their own statistics or graphs, as it is compatible with making boxplots in R, possibly after deleting contaminated wells. |
| >#6__Plate_NO7 t-test | >This folder holds files of the averaged period, phase, and amplitude merged with the p-values from a t-test. Files #1-8 show the p-values compared to one specific treatment, e.g. #1 file uses the #1 sample as the baseline for a comparison. |
Table 1: A list of the output documents from R analysis. This is a list of the output documents generated by the LUC_2025.R script (Supplemental File 1) and the input file NO7.csv (Supplemental File 2).
Supplemental Figure S1: Screenshots for Input I and Input II in protocol section 1. User Input I must be changed to tailor the analysis to a specific dataset on a local computer. Changes to User Input II are optional, depending on the experimental setting. It is important to note that the Supplemental File 1 (LUC_2025.R) script expects all wells to be present in the file and not just selected or used wells. Please click here to download this figure.
Supplemental Figure S2: Tree structure for the output documents. This output was generated by using the LUC_2025.R script (Supplemental File 1) and the input file NO7.csv (Supplemental File 2). The LUC_2025.R script generates an output folder based on the input file name. For more details about the output files, see Table 1. Boxes represent file folders. Please click here to download this figure.
Supplemental Figure S3: Screenshots for User Input I and User Input II in protocol section 2. The Supplemental File 3 (ROS_2025.R) script uses the same general input format as the Supplemental File 1 (LUC_2025.R) script. User Input I must be changed to tailor the analysis to a specific dataset on a local computer. Changes to User Input II are optional, depending on the experimental setting. It is important to note that the Supplemental File 3 (ROS_2025.R) script expects all wells to be present in the file and not just selected or used wells. Please click here to download this figure.
Supplemental Figure S4: Tree structure for the output documents. This output was generated by using the ROS_2025.R script (Supplemental File 3) and the input file ROS_flg22.csv (Supplemental File 4). The ROS_2025.R script generates an output folder based on the input file name. Within that folder is a file for Total ROS Counts and a file for graphs. There are also subfolders for PRISM and graphing data, the ANOVA test, and the t-tests. Boxes represent file folders. Please click here to download this figure.
Supplemental Table S1: A list of available bioinformatics tools for circadian data analysis. Please click here to download this file.
Supplemental File 1: LUC_2025.R. This is the R script used for analysing circadian clock data. Please click here to download this file.
Supplemental File 2: NO7.csv. This is the input file containing an example of circadian clock data. Please click here to download this file.
Supplemental File 3: ROS_2025.R. This is the R script used for analysing ROS data. Please click here to download this file.
Supplemental File 4: ROS_fig22.csv. This is the input file containing an example of ROS data. ROS was induced by 1 µM flg22 treatment. Please click here to download this file.
Supplemental File 5: ROS_elf26.csv. This is the input file containing an example of ROS data. ROS was induced by 1 µM elf26 treatment. Please click here to download this file.
We present here using R scripts in RStudio to provide a user-friendly method for analyzing large-scale data obtained from 96-well format time series experiments. This method has allowed us to quickly and easily analyze luminescence recording data obtained from thousands of samples in time series experiments to gauge the circadian clock activity from plant seedlings to mammalian cell cultures, in addition to data from ROS assays.
BioDare2 is a free repository that has been quite commonly used for circadian data analysis. While BioDare2 is a well-made online tool, researchers would need to upload their data online and then download the results after each analysis. It also requires lengthy data labeling and manipulation to get the data into a usable format. The R method described here provides several functions that could improve user-friendliness. First, this method requires minimal input from a user. It only requires a 4-step, simple input to indicate the working directory, the input file, labels for the treatments, and the relative start time of a clock assay. Additional optional input can be added per the user's specific needs. Detailed instructions are provided here to better guide users on how to make the changes. Experiments on different plates can be run in back-to-back analyses without having to stop to type in repeat information. This script can be easily used to analyze experiments in 96-well format, either with 8 or 12 replicates per treatment, without the necessity to look up the labels of the plate wells. Second, the input data are from a local computer, and the output results are also saved in the same folder of the local computer. Therefore, the R analysis can be done anywhere and anytime and is not limited by access to the internet. Third, the output results are organized to allow easy interpretation and can be conveniently processed to generate figures for publications. Fourth, anyone with RStudio downloaded on their computer would be able to do the analysis without the necessity of prior computer programming skills.
The most critical step to learn this method to analyze circadian data would be to first use the example data provided here to set up the RStudio system in one's own computer (see protocol section 1). If one can run with the provided examples, then they can move on to their own data and only need to make some common modifications described in the Input I section step 1.2.3 in protocol section 1. It is important to know that for circadian experiments, while a recording can begin at any time, the recording intervals must be a whole number because the R script only takes integers (whole numbers). The R method can be easily adapted to analyze other type of data than just the circadian clock data; we already included such an example, the ROS assay data (see protocol section 2). This R analysis is highly recommended to be used with data from experiments using 96-well plates. However, even if an experiment is not conducted with a 96-well plate, as long as the data are arranged according to the same data format, this R method is applicable to analyze such data.
The output generated by this R method contains limited statistical analyses intended for rapid exploratory analysis. Users should assess whether their data meet the assumptions of parametric tests before drawing conclusions. For instance, this method does not provide a rhythmicity test to exclude non-rhythmic datasets. Readers should ensure that only rhythmic datasets are used in protocol section 1 for circadian analysis.
In conclusion, the method described here involving the R scripts in RStudio provides a user-friendly and convenient tool for researchers working with time series data in 96-well format. This method has been successfully used to analyze large datasets from circadian clock and ROS burst assays and can be easily adapted to analyze similar large datasets from time series experiments, preferably conducted with 96-well plates.
The authors have no conflicts of interest to disclose.
We thank the members of the Lu laboratory for their assistance in this work. We thank Min Gao and Matthew Fabian for the use of their unprocessed data and Benjamin Harris for assistance and/or guidance in making this R script. We thank John B. Hogenesch at Cincinnati Children's Hospital Medical Center for providing luminescence data from mammalian cells for Case study 2. We further thank John B. Hogenesch, Andrew Millar at The University of Edinburgh, and Mary Harrington at Smith College for helpful discussions during the development of this method. This work was partially supported by grants from the National Science Foundation, NSF 1456140 and NSF 2223886, to Hua Lu.
| R | The R Project | https://www.r-project.org/ | A free, open-source platform that can be downloaded from online and used to code, especially for statistics. |
| Rstudio | Posit Software | https://posit.co/download/rstudio-desktop/ | A free software that can be downloaded from online for more user-friendly access to R. |
| MetaCycle | Gang Wu, Xavier Li, Matthew Carlucci, Ron Anafi, Michael Hughes, Karl Kornacker, and John Hogenesch | https://cran.r-project.org/web/packages/MetaCycle/vignettes/implementation.html | The ARSER algorithm of the MetaCycle package is used to evaluate clock parameters, period, phase and amplitude. |
| ggplot2 | Posit Software | https://cran.r-project.org/web/packages/ggplot2/index.html | Creates data visualizations, particularly for statistical graphics. |
| dplyr | Posit Software | https://cran.r-project.org/web/packages/dplyr/index.html | A fundamental R library for efficient data manipulation. |
| magrittr | Posit Software | https://cran.r-project.org/web/packages/magrittr/index.html | Provides a set of operators to enhance code readability and facilitate a more natural flow of data operations. |
| stringr | Posit Software | https://cran.r-project.org/web/packages/stringr/index.html | Provides a consistent, simple, and easy-to-use set of functions for working with character strings. |
| filesstrings | Rory Nolan, and Sergi Padilla-Parra | https://cran.r-project.org/web/packages/filesstrings/index.html | Provides convenient functions for manipulating files and strings, particularly those related to file names and paths. |
| circular | Ulric Lund, Claudio Agostinelli, Hiroyoshi Arai, Alessando Gagliardi, Eduardo García-Portugués, Dimitri Giunchi, Jean-Olivier Irisson, Matthew Pocernich, and Federico Rotolo | https://cran.r-project.org/web/packages/circular/index.html | Provides the statistical analysis and graphics representation of circular data. |
| AICcmodavg | Marc J. Mazerolle | https://cran.r-project.org/web/packages/AICcmodavg/index.html | Creates model selection tables based on Akaike's information criterion (AIC) and related information. |
| broom | Posit Software | https://cran.r-project.org/web/packages/broom/index.html | Converts the output of various statistical models and objects into "tidy" tibbles (a modern data frame format), making it easier to work with, analyze, and visualize model results. |
| Autoclave machine | Steris Amsco Eagle Century SG120 Scientific, Inc. | 8901400012 | Autoclave media |
| Chemical fume hood | Lab Design & Supply | sterilize seeds | |
| Omega Luminescence Reader | BMG LABTECH, Inc. | plate reader | |
| Laminar flow cabinet | NuAire Nu-408FM-400 | Class II/TypeA | transfer seedlings to 96-well plate |
| 96-well microplates | Perkin-Elmer | OptiPlate-96 | grow seedlings for luciferase assay |
| Flg22 | GenScript Inc. | RP19986 | An elicitor from bacterial flagellin. |
| Elf26 | Alpha Diagnostic Intl. Inc. | 2427 | An elicitor from bacterial translation Elongation Factor-Tu. |
| D-Luciferin Firefly, potassium salt | Biosynth Chemistry & Biology | L-8220 | luciferase substrate |
| L-012 (Luminol) | Fisher Scientific | NC0733364 | ROS assay reagent |