<em>In Silico</em> Identification and Characterization of circRNAs During Host-Pathogen Interactions

Mathanakumara Ealam Selvan; Kai Shen Lim; Chee How Teo; Yat-Yuen Lim

doi:10.3791/64565

JoVE Journal > Biology

Biology

In Silico Identification and Characterization of circRNAs During Host-Pathogen Interactions

Published: October 21, 2022

doi:

10.3791/64565

Mathanakumara Ealam Selvan*¹, Kai Shen Lim*¹, Chee How Teo, Yat-Yuen Lim

¹Institute of Biological Sciences, Faculty of Science,Universiti Malaya, ²Centre for Research in Biotechnology for Agriculture (CEBAR),Universiti Malaya

Summary

The protocol submitted here explains the complete in silico pipeline needed to predict and functionally characterize circRNAs from RNA-sequencing transcriptome data studying host-pathogen interactions.

Abstract

Circular RNAs (circRNAs) are a class of non-coding RNAs that are formed via back-splicing. These circRNAs are predominantly studied for their roles as regulators of various biological processes. Notably, emerging evidence demonstrates that host circRNAs can be differentially expressed (DE) upon infection with pathogens (e.g., influenza and coronaviruses), suggesting a role for circRNAs in regulating host innate immune responses. However, investigations on the role of circRNAs during pathogenic infections are limited by the knowledge and skills required to carry out the necessary bioinformatic analysis to identify DE circRNAs from RNA sequencing (RNA-seq) data. Bioinformatics prediction and identification of circRNAs is crucial before any verification, and functional studies using costly and time-consuming wet-lab techniques. To solve this issue, a step-by-step protocol of in silico prediction and characterization of circRNAs using RNA-seq data is provided in this manuscript. The protocol can be divided into four steps: 1) Prediction and quantification of DE circRNAs via the CIRIquant pipeline; 2) Annotation via circBase and characterization of DE circRNAs; 3) CircRNA-miRNA interaction prediction through Circr pipeline; 4) functional enrichment analysis of circRNA parental genes using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). This pipeline will be useful in driving future in vitro and in vivo research to further unravel the role of circRNAs in host-pathogen interactions.

Introduction

Host-pathogen interactions represent a complex interplay between the pathogens and host organisms, which triggers the hosts' innate immune responses that eventually result in the removal of invading pathogens¹^,². During pathogenic infections, a multitude of the host immune genes is regulated to inhibit the replication and release of pathogens. For example, common interferon-stimulated genes (ISGs) regulated upon pathogenic infections include ADAR1, IFIT1, IFIT2, IFIT3, ISG20, RIG-I, and OASL³^,⁴. Besides protein-coding genes, studies have also reported that non-coding RNAs such as long non-coding RNAs (lncRNAs), microRNAs (miRNAs), and circular RNAs (circRNAs) also play a role and are regulated concurrently during pathogenic infections⁵^,⁶^,⁷. In contrast to protein-coding genes that mainly encode proteins as functional molecules, non-coding RNAs (ncRNAs) are known to function as regulators of genes at transcriptional and post-transcriptional levels. However, studies involving the participation of non-coding RNAs, particularly circRNAs, in regulating the hosts' immune genes are not well reported compared to the protein-coding genes.

CircRNAs are widely characterized by their covalently closed continuous loop structure, which is generated through a non-canonical splicing process called back-splicing⁸. The process of back-splicing, unlike the splicing process of cognate linear RNAs, involves the ligation of the downstream donor site to the upstream acceptor site, forming a circular-shaped structure. Currently, three different back-splicing mechanisms for the biogenesis of circRNAs have been proposed. These are RNA binding protein (RBP) mediated circularization⁹^,¹⁰, intron-pairing-driven circularization¹¹, and lariat-driven circularization¹²^,¹³^,¹⁴. Given that circRNAs are connected end-to-end in a circular structure, they tend to be naturally resistant to normal exonuclease digestions and, thus, are considered to be more stable than their linear counterparts¹⁵. Another common characteristic exhibited by circRNAs includes the cell or tissue type-specific expression in hosts¹⁶.

As implied by their unique structure and cell or tissue-specific expression, circRNAs have been discovered to play important biological functions in cells. To date, one of the prominent functions of circRNAs is their role as microRNA (miRNA) sponges¹⁷^,¹⁸. This regulatory role of circRNAs occurs through the complementary binding of circRNA nucleotides with the seed region of miRNAs. Such a circRNA-miRNA interaction inhibits the miRNAs' normal regulatory functions on target mRNAs, thus regulating the expression of genes¹⁹^,²⁰. Additionally, circRNAs are also known to regulate gene expression by interacting with RNA binding proteins (RBPs) and forming RNA-protein complexes²¹. Although circRNAs are classified as non-coding RNAs, there is also evidence that circRNAs can act as templates for protein translation²²^,²³^,²⁴.

Recently, circRNAs have been demonstrated to play pivotal roles in regulating the host-pathogen interactions, particularly between the hosts and viruses. Generally, host circRNAs are assumed to assist in regulating the host's immune responses to eliminate the invading pathogens. An example of circRNA that promotes host immune responses is circRNA_0082633, reported by Guo et al.²⁵. This circRNA enhances type I interferon (IFN) signaling within A549 cells, which helps to suppress influenza virus replication²⁵. Moreover, Qu et al. also reported a human intronic circRNA, called circRNA AIVR, that promotes immunity by regulating the expression of CREB-binding protein (CREBBP), a signal transducer of IFN-β²⁶^,²⁷. However, circRNAs that are known to promote the pathogenesis of disease upon infection also exist. For example, Yu et al. recently reported the role played by a circRNA spliced from the GATA zinc finger domain containing the 2A gene (circGATAD2A) in promoting the H1N1 virus replication through the inhibition of host cell autophagy²⁸.

To effectively study circRNAs, a genome-wide circRNA prediction algorithm is usually implemented, followed by an in silico characterization of the predicted circRNA candidates before any functional studies can be carried out. Such a bioinformatics approach to predict and characterize circRNAs is less costly and more time efficient. It helps to refine the number of candidates to be functionally studied and could potentially lead to novel findings. Here, we provide a detailed bioinformatics-based protocol for the in silico identification, characterization and functional annotation of circRNAs during the host-pathogen interactions. The protocol includes the identification and quantification of circRNAs from RNA-sequencing datasets, annotation via circBase, and the characterization of the circRNA candidates in terms of circRNA types, number of overlapping genes, and predicted circRNA-miRNA interactions. This study also provides the functional annotation of the circRNA parental genes through Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis.

Protocol

In this protocol, de-identified ribosomal RNA (rRNA)-depleted RNA-seq library datasets prepared from the influenza A virus-infected human macrophage cells were downloaded and used from the Gene Expression Omnibus (GEO) database. The entire bioinformatics pipeline from prediction to functional characterization of circRNAs is summarized in Figure 1. Each part of the pipeline is further explained in the sections below.

1. Preparation, download, and setup before data analysis

NOTE: All software packages used in this study are free and open-source.

Downloading the required tools on the Linux platform
1. Download and install the required software and tools listed in the Table of Materials on a Linux high-performance computer using the instructions provided by the developer.
  NOTE: Most of the tools and software have their own online GitHub pages or documentation containing instructions on installing and using their tools (refer to the Table of Materials).
2. Download the desired RNA-seq datasets for circRNA detection and analysis from sequence archive websites (e.g., European Nucleotides Archive and Gene Expression Omnibus).
3. Download the reference genome(s) (FASTA format) and annotation files (GTF/GFF3 format), compatible with the host from which the RNA-seq dataset was prepared. Host reference genome(s) and annotation files are usually found on online genome browsers such as the National Center for Biotechnology Information (NCBI), the University of California Santa Cruz (UCSC), and the Ensembl websites.
Quality checking of RNA-seq
1. Input the FASTQ files into the FASTQC program to determine the quality of RNA sequences. If the quality of the FASTQ files are low (e.g., <Q20) or there is a presence of adapter sequences, further trimming might be needed using tools such as Trimmomatic²⁹^,³⁰.

2. Prediction and differential expression analysis of circRNAs using CIRIquant

NOTE: A more detailed manual on installing and performing differential expression analysis can be found in the code availability section of the CIRIquant paper³¹. The supplementary data also includes some of the basic commands used in this protocol.

CircRNA predictions
1. Index the host's reference genome first using BWA and HISAT2 aligners. Then, on a Linux terminal, execute the commands bwa index³² and hisat2-build³³ in the directory of the host's reference genome to index it.
2. Next, prepare a YML config file containing the name of the file, the path of tools (BWA, HISAT2, stringtie³⁴, samtools³⁵), the path to the downloaded reference files (host's reference genome FASTA files, annotation files), and the path to the index files from step 2.1.1.
3. Execute the CIRIquant tool from terminal using either the default or manual parameters. The user can specify the library type (either stranded or non-stranded) of the RNA-seq data when executing the CIRIquant tool.
  NOTE: The library type of the RNA-seq data can be determined by knowing the type of library preparation kit used. If the identity of the library preparation kit is unknown, an RNA-seq control bioinformatic package called RSeQC³⁶ can be used to determine the strandedness of RNA-seq data.
Differential expression analysis
NOTE: CIRIquant package includes prep_CIRIquant, prepDE.py, and CIRI_DE_replicate; therefore, no additional downloads are needed for these three tools.
1. Prepare a text file (.lst) with a list of data containing the following:
  1^st column: IDs of the RNA-seq data used in step 2.1.3
  2^nd column: path to the GTF files outputted by CIRIquant
  3^rd column: grouping of the RNA-seq data, whether it is a control or treated group.
2. For an example, refer to Table 1 below.
  NOTE: It is not necessary to put in the headers as they are just for reference.
3. On the Linux terminal, run prep_CIRIquant with the text file (.lst) prepared in step 2.2.1 as an input. The run will generate a list of files: library_info.csv, circRNA_info.csv, circRNA_bsj.csv, and circRNA_ratio.csv.
4. Prepare a second text file with a list of data containing the RNA-seq IDs and the path to their respective StringTie output. The file layout must be similar to the text file in step 2.2.1 without the grouping column.
5. Run prepDE.py with the text file prepared in step 2.2.4 as an input to generate the gene count matrix files.
6. Execute CIRI_DE_replicate with the library_info.csv and circRNA_bsj.csv files from step 2.2.3 and the gene_count_matrix.csv file from step 2.2.5 as inputs to output the final circRNA_de.tsv file.
Filtering of DE circRNAs
1. Use R (in the computer terminal or RStudio) or any spreadsheet software (e.g., Microsoft Excel) to open the circRNA_de.tsv file generated from step 2.2.6 to filter and determine the number of differentially expressed (DE) circRNAs.
2. Filter the DE circRNAs according to the criteria LogFC > |2| and FDR < 0.05.
3. Create a file named DE_circRNAs.txt to store the information of DE circRNAs.

3. Characterization and annotation of predicted DE circRNAs

Annotation status of DE circRNAs
1. Load the file named DE_circRNAs.txt in RStudio , which consists of the list of DE circRNAs filtered from step 2.3.3. Include other information such as the genomic positions (Chr, Start, End), strand orientations (+ or -), gene name, and circRNA type. Before proceeding, convert the circRNA genomic start coordinates from CIRIquant to 0-based by subtracting 1 base pair.
  NOTE: The other information stated above can be obtained from the GTF files outputted by CIRIquant (Supplementary File 1).
2. Determine the annotation status of the predicted DE circRNAs by downloading a library containing the genomic positions of the circRNA-database (e.g., circBase) deposited circRNAs.
  NOTE: Assure that the genome version used to predict the circRNAs is identical to the circRNA database library before making the comparison. The circBase data file used here is freely available in the drive folder provided in Github (https://github.com/bicciatolab/Circr)³⁷.
3. Once both the files from step 3.1.1 and step 3.1.2 are prepared, run the R script given in Supplementary File 1. Chromosomal locations of DE circRNAs are queried to the library before assigning the status Annotated or Unannotated.
Characterization of DE circRNAs
1. Use R and other spreadsheet software to summarize the number of circRNAs according to the circRNA types (i.e., exon, intron, intergenic, and antisense) and the number of genes that the circRNAs span across (1 or >1) (Supplementary File 1).NOTE: CIRIquant can only detect four types of circRNAs (exon, intron, intergenic, and antisense). Exon-intron circRNAs, also known as ElciRNAs, cannot be detected by CIRIquant.

4. Predicting the circRNA-miRNA interaction using Circr

NOTE: A more detailed manual on how to install and use Circr for the circRNA-miRNA interaction analysis can be found at: https://github.com/bicciatolab/Circr³⁷.

Preparation of files
1. Unzip and extract the contents of the Circr.zip file after downloading it from the Circr GitHub page using the relevant software such as "WinRar" or "7-zip" into a new directory where the analysis will be conducted.
2. Install the prerequisite software applications (miRanda, RNAhybrid, Pybedtools, and samtools) before conducting the circRNA-miRNA analysis.
3. Reference genomes and annotation files for several organisms of interest, rRNA coordinates file, validated miRNA interaction file, and circBase circRNA files are provided by the Circr author in the Github page (https://github.com/bicciatolab/Circr)³⁷. Upon clicking on the support files in drive folder, select the folder for the organism of interest, miRNA folder, and the circBase text file and download it.
4. After downloading the necessary files in step 4.1.3, create a new directory named support_files in the directory mentioned in step 4.1.1. Then, unzip and extract the content into the support_files directory.
5. Index the reference genome file of the organism of interest using the samtools faidx command (Supplementary File 1).
6. Prepare an input file consisting of the coordinates of DE circRNAs of interest in a tab-delimited BED file, as shown in Table 2.
  NOTE: Because circRNAs predicted by CIRIquant are not 0-based, it is necessary to minus 1 bp at the starting coordinate (as mentioned in step 3.1.1) before converting them to the BED format. The headers shown in Table 2 are just for reference and are not needed in the BED files.
7. At this point, ensure that the expected folder tree structure for Circr analysis is as in Figure 2.
Running Circr.py
1. Execute Circr.py using Python 3, and as arguments, specify the circRNA input file, the FASTA genome of the organism of interest, the genome version of the selected organism, the number of threads, and the name of the output file in the command line.
2. If the organism of interest is not provided in the drive folder listed in step 4.1.3 or if the user prefers to have a custom set of files to run the analysis, additional commands specifying the location of these files need to be included when executing Circr.py.
3. After the Circr analysis is complete, the program outputs a circRNA-miRNA interaction file in the csv format.
4. Filter the circRNA-miRNA interaction results according to the user-specific preference. For this study, the predictions are filtered using Rstudio according to the criteria below:
  -Detected by all three software tools
  -Two or more binding sites reported by both Targetscan and miRanda
  -Identified in either the "AGO" or "validated" columns
  -Filter out no seed region interactions
5. Write the circRNAs that pass the filtered conditions from step 4.2.3 into a new text file named circRNA_miRNA.txt. Such filtering can increase the confidence of the predicted interactions.

5. Construction of the ceRNA network

NOTE: A detailed manual on how to use Cytoscape can be found at: http://manual.cytoscape.org/en/stable/ and https://github.com/cytoscape/cytoscape-tutorials/wiki#introduction

Download and preparation
1. Download the latest version of Cytoscape³⁸ from: https://cytoscape.org/download.html.
2. Execute the installer wizard downloaded in step 5.1.1 and select the file location for the Cytoscape software.
3. Prepare a tab-delimited file containing the circRNAs of interest and their target miRNA. The first column consists of the circRNA name; the second column specifies the type of RNA from the first column; the third column is the target miRNA; and the fourth column specifies the type of RNA from the third column. An example of the file is shown in Table 3.
Constructing the ceRNA network map
1. Open the Cytoscape software installed in step 5.1.2.
2. In Cytoscape, navigate to File > Import > Network from File. Select the file that has been prepared in step 5.1.3.
3. In the new tab, select the first and second column as "Source Node" and "Source Node Attribute" while select the third and fourth column as "Target Node" and "Target Node Attribute" respectively. Click OK and the network will show up on the upper right side of Cytoscape.
4. To change the visual style of the network, press the Style button on the left side of Cytoscape.
5. Press the arrow on the right side of Fill Color. Choose Type for the column and Discrete Mapping for the mapping type. Then, select the color desired for each of the RNA types.
6. After changing the color, change the shape of the nodes by navigating to Shape and following step 5.2.5.

6. Functional enrichment analysis

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis for the parental gene of the circRNAs
1. Ensure the clusterProfiler³⁹^,⁴⁰ and org.Hs.eg.db⁴¹ packages have been installed in Rstudio. The org.Hs.eg.db⁴¹ package is a genome-wide annotation package only for humans. If the organism of interest is another species, refer to: https://bioconductor.org/packages/release/BiocViews.html#OrgDb
2. Import the DE_circRNA information from step 2.3.1 into the Rstudio workspace.
3. Use the parental gene of the circRNAs provided in this file for enrichment analysis in the upcoming steps. However, if the user wishes to convert the gene symbol to other formats, such as the Entrez ID, use a function such as "bitr".
4. By using the gene ID as an input, run the GO enrichment analysis using the enrichGO function within the clusterProfiler³⁹^,⁴⁰ package using default parameters.
5. By using the gene ID as an input, run the KEGG enrichment analysis using the enrichKEGG function within the clusterProfiler³⁹^,⁴⁰ package using the default parameters.

Representative Results

The protocol enlisted in the previous section was modified and configured to suit the Linux OS system. The main reason is that most module libraries and packages involved in the analysis of circRNAs can only work on the Linux platform. In this analysis, de-identified ribosomal RNA (rRNA)-depleted RNA-seq library datasets prepared from the Influenza A virus-infected human macrophage cells were downloaded from the GEO database⁴² and used to generate the representative results.

CircRNA prediction and quantification
In this analysis, ribosomal RNA (rRNA)-depleted RNA-seq library datasets prepared from the Influenza A virus infected human macrophage cells were used to carry out circRNA detection and functional analysis. As specified in the protocol section, CIRIquant was used to identify and carry out DE analysis of identified circRNAs using the RNA-seq library datasets as input. Reference files used are based on the latest human genome version (hg38). Table 4 shows an example of the final output from the CIRIquant analysis. The identification and filtering of DE circRNAs from CIRIquant output were executed through simple RStudio scripts (Supplementary File 1). CircRNAs are only classified as DE when the false-discovery rate (FDR) value is <0.05 and log fold change (LogFC) >|2|. Table 5 shows the total number of circRNAs and DE circRNAs detected. A total of 35,846 circRNAs were detected, with 306 being DE. The DE circRNAs detected in this output are entirely upregulated (LogFC > 2), with none being downregulated (LogFC < 2).

Annotation and characterization of DE circRNAs
Annotation status of DE circRNAs
DE circRNAs identified were cross-checked with an established circRNA database, circBase. However, since the circRNA coordinates deposited in circBase are based on a previous human genome version (hg19), the circRNA coordinates from circBase must be converted to the current human genome version (hg38) for cross-checking in this study. Additionally, the starting coordinate must be converted to 0-based from the 1-based output of CIRIquant. The hg38 version-converted circRNA coordinates of circBase is provided in a drive folder in Github (https://github.com/bicciatolab/Circr)³⁷. Then, the Rstudio scripts (Supplementary File 1) were used to assign the annotation status of circRNAs in a new data frame column. Table 6 shows an example of circRNAs with the annotation status.

Characterization of DE circRNAs
This part was entirely executed through R scripts in the RStudio software. R scripts ease the analytical processes, and only basic knowledge is required.

CircRNA types
In this step, DE circRNAs were characterized by their circRNA types (Antisense, Exonic, Intergenic, and Intronic) based on their genomic positions. Table 7 below displays the percentage breakdown of the different circRNA types encompassed by the identified DE circRNAs. Of the total 306 DE circRNAs, 263 circRNAs (85.95%) were identified to be exonic circRNAs, which is the most abundant circRNA type identified. Intronic circRNAs come in as the second most identified circRNA type comprising 17 DE circRNAs, making up to 5.56% of the total DE circRNAs. This is followed by intergenic circRNAs (16 DE circRNAs ~5.23%) and antisense circRNAs (10 DE circRNAs ~3.27%).

Number of genes spanned per circRNA
CircRNAs identified by CIRIquant can overlap across a number of genes. To date, most studies are focused on circRNAs that span one gene. Hence, in this protocol, the circRNA candidates spanning more than one gene are excluded from the downstream analysis. Table 8 below describes the number and percentage of DE circRNAs spanning one and more than one gene. In this table, intergenic circRNAs (16 DE circRNAs) are excluded since they do not overlap any host genes, while the rest of the circRNA types (290 DE circRNAs) are subjected to this analysis. Of the 290 DE circRNAs, the majority of the DE circRNAs (261 circRNAs ~90%) span only one gene, while the remaining 29 circRNAs (~10%) span more than one gene.

Construction of the ceRNA network
A ceRNA network is usually drawn to visualize the circRNA-miRNA interactions after it has been predicted. In Figure 3 below, only one DE circRNA was chosen as a representative result, which is the hsa_DE_58 circRNA. Based on Circr predictions, hsa_DE_58 can sponge up to nine different miRNAs. These nine miRNAs are identified after filtering through stringent criteria.

Functional enrichment analysis
GO and KEGG analysis of the circRNA parental genes
Figure 4 below depicts a bubble plot of the functional enrichment of DE circRNA parental genes through the GO analysis. Fundamentally, the GO analysis aims to unravel the biological processes, cellular locations, and molecular functions that are enriched or impacted in the condition studied, in this case, the virus-infected sample. The enrichment is considered statistically significant and plotted on the bubble plot only if the p-value is < 0.01. As shown in Figure 4, the top three enrichments for the biological processes (BP) include the ribonucleoprotein complex biogenesis, the response to virus, and the regulation of response to a biotic stimulus, while for the molecular functions (MF) only the catalytic activity acting on RNA and single-stranded RNA binding are statistically enriched. On the other hand, only the retromer complex is statistically enriched for the cellular components (CC).

Figure 5 shows the KEGG enrichment analysis of the DE circRNA parental genes in a bubble plot. Similar to the GO enrichment analysis, KEGG enrichment is only considered statistically significant and plotted on a bubble plot if the p-value is < 0.01. Only two KEGG terms were enriched in this case, which are the Influenza A and viral life cycle (HIV-1) pathways.

Figure 1: The pipeline for the prediction and functional characterization of circRNAs. The pipeline shows a simple overview of the key steps from start to end involving the installation of the necessary software packages, predicting and quantifying the circRNA expression, construction of the ceRNA network, and performing the circRNA parental gene functional enrichment. Please click here to view a larger version of this figure.

Figure 2: Folder tree structure for Circr. This folder tree structure is necessary to be established prior to running the Circr software in order to detect the required files for the analysis. Please click here to view a larger version of this figure.

Figure 3: ceRNA network consisting of the circRNA-miRNA interaction. The blue oval shape represents the circRNA, while the orange triangles represent the miRNAs. The solid lines connecting the circRNA to miRNAs describe the potential miRNA sponging function of the hsa_DE_58 circRNA. Please click here to view a larger version of this figure.

Figure 4: Bubble plot of GO enrichment analysis of DE circRNA parental genes. GeneRatio on the x-axis is the number of genes in the input list associated with the given GO term dividing the total number of input genes. The dot size in the plot is represented by the count value, which is the number of genes in the input list associated with the given GO term. The bigger the size of the dots, the larger the number of input genes associated with the term. Besides, the dots in the plot are color-coded based on the p-value. P-value is calculated by comparing the observed frequency of an annotation term with the frequency expected by chance. The individual terms are considered enriched beyond a cut-off value (p-value < 0.01). The color gradient of p-value ranging from blue to red indicates increasing enrichment of the terms. Please click here to view a larger version of this figure.

Figure 5: KEGG enrichment analysis of DE circRNA parental genes. GeneRatio on the x-axis is the number of genes in the input list associated with the given KEGG term dividing the total number of input genes. The dot size in the plot is represented by the count value, which is the number of genes in the input list associated with the given KEGG term. The bigger the size of the dots, the larger the number of input genes associated with the term. Besides, the dots in the plot are color-coded based on the p-value. P-value is calculated by comparing the observed frequency of an annotation term with the frequency expected by chance. Individual terms are considered enriched beyond a cut-off value (p-value < 0.01). The color gradient of p-value ranging from blue to red indicates increasing enrichment of terms. Please click here to view a larger version of this figure.

Sample name	Path to CIRIquant output GTF file	Grouping
Control 1	/path/to/CIRIquant/ctrl1.gtf	C
Control 2	/path/to/CIRIquant/ctrl2.gtf	C
Infected 1	/path/to/CIRIquant/infect1.gtf	T
Infected 2	/path/to/CIRIquant/infect2.gtf	T

Table 1: The .lst file preparation of CIRIquant. The destination paths of the control and treated samples from the CIRIquant output are written in a text file to compare the expressions of circRNA between the two types of samples.

Chr	Start	End	Name	.	Strand
chr2	137428930	137433876	hsa_circ_000076	.	–
chr2	154705868	154706632	hsa_circ_000105	.	–
chr2	159104273	159106793	hsa_circ_000118	.	–
chr2	159215701	159226125	hsa_circ_000119	.	–
chr4	39980067	39980129	hsa_circ_002584	.	–

Table 2: Example BED file for Circr. Six columns (Chr, Start, End, Name, Gene, and Strand) associated with the circRNAs are required to generate the BED file.

circRNA_name	Type	miRNA_name	Type
DE_circRNA_1	circRNA	miR-001	miRNA
DE_circRNA_1	circRNA	miR-002	miRNA
DE_circRNA_2	circRNA	miR-003	miRNA
DE_circRNA_2	circRNA	miR-004	miRNA

Table 3: Cytoscape input file. Four columns (circRNA_name, Type, miRNA_name, and Type) are required to be written into a text file.

CircRNA	logFC	logCPM	LR	Pvalue	DE	FDR
chr4:17595410\|17598558	8.167934481	-0.039318634	185.5341965	3.00E-42	1	1.08E-37
chr16:18834892\|18850467	-3.955083482	-4.397235736	2.982607619	0.08416358	0	0.282478158
chr14:73198031\|73211942	2.493964729	-4.448176684	2.736442046	0.09808293	0	0.282478158

Table 4: Part of the final output (.csv) file of CIRIquant. CIRIquant delivers information such as the LogFC, log counts per million (LogCPM), logistic regression (LR), p-value, differential expression, and FDR.

CIRIquant results
Total	DE	Up	Down
35846	306	306	0

Table 5: A summary of the number of total and differentially expressed (DE) circRNAs identified. A total of 35,846 circRNAs are detected, with 306 being DE circRNAs. All the 306 DE circRNAs are upregulated (with none being downregulated) in treated samples when compared to control samples.

Custom_Name	Annotation_Status
hsa_DE_22	Non-Annotated
hsa_DE_2	Annotated
hsa_DE_58	Non-Annotated
hsa_DE_3	Annotated

Table 6: Table of custom circRNA names with annotation status. CircRNAs are queried in a database of known deposited circRNAs (circBase). If the circRNA is present within the database, it is labeled to be annotated, while the absence of the circRNA is labeled as non-annotated.

CircRNA Type	Freq	Percentage
antisense	10	3.27%
exon	263	85.95%
intergenic	16	5.23%
intron	17	5.56%

Table 7: Types of circRNAs identified. CircRNAs can be further categorized into different types of circRNAs based on their sequence region, namely, exonic, intronic, antisense, and intergenic.

Number of Parental Genes	Freq	Percentage
1	261	90%
> 1	29	10%

Table 8: Percentage of circRNAs with the different number of genes spanned. CircRNAs are commonly encoded from exons of one gene, but circRNAs spanning more than one gene can also be detected by CIRIquant.

Supplementary File 1: Scripts used in the protocol. Please click here to download this File.

Discussion

To illustrate the utility of this protocol, RNA-seq from influenza A virus-infected human macrophage cells was used as an example. CircRNAs functioning as potential miRNA sponges in host-pathogen interactions and their GO and KEGG functional enrichment within a host were investigated. Although there are a variety of circRNA tools available online, each of them is a standalone package that does not interact with one another. Here, we put together few of the tools that are required for circRNA prediction and quantification, circRNA functional enrichment, circRNA-miRNA interaction prediction, and ceRNA network construction. This streamlined protocol is time-saving and can be applied to clinical samples to detect circRNA candidates with diagnostic and prognostic values.

Essentially, we employed CIRIquant³¹, a circRNA quantification tool pre-packaged with CIRI2, which can detect and carry out DE analysis of circRNAs. DE circRNAs are filtered based on a cut-off value of LogFC > |2| and FDR < 0.05, which helps to eliminate potential false positives in downstream analyses. Characterization of DE circRNAs in terms of annotation status, circRNA types, and the number of genes spanned aid in categorizing and further filtering circRNA candidates. Subsequently, Circr³⁷, a circRNA-miRNA prediction tool, is used to predict potential miRNA sponging candidates. After predicting potential miRNAs as targets of circRNAs, a ceRNA network is drawn. Finally, based on the parental genes of circRNAs, the R clusterProfiler package³⁹ is used for functional annotation via the GO and KEGG pathway enrichment analysis. Results from GO and KEGG may help to unravel the biological mechanisms influenced by circRNAs.

To date, several different circRNA prediction tools have been developed, including CIRI2⁴³, CIRCexplorer2⁴⁴, find_circ⁴⁵, MapSplice⁴⁶, and UROBORUS⁴⁷. In a study conducted by Hansen et al., CIRI2 is reported to have a high overall performance. It is among the few circRNA detection tools that can function well in terms of de novo prediction and the reduction of false positive identification⁴⁸. CIRIquant which utilizes CIRI2 for circRNA detection and quantification was therefore used in this study. CIRIquant was used to count the back splice junction (BSJ) reads, and the count data were normalized to the reads mapped to cognate linear RNAs transcribed from the same gene loci. This allows the quantification of circRNAs in a sample. To determine the differential expression of circRNAs across experimental conditions, CIRIquant implemented a generalized linear model in edgeR⁴⁹ for DE analysis, and the exact rate-ratio test was used as a statistical test to determine the significance of the difference in the circRNA junction ratio. Although other circRNA quantification tools such as CIRCexplorer3-CLEAR⁵⁰ can be used to quantify the expression level of circRNAs, this tool only allows circRNA quantification in a sample as it counts the BSJ reads in a sample and normalizes the count data against the cognate linear RNA counts from the same sample. CIRCexplorer3-CLEAR cannot compare circRNA expressions across experimental conditions. Furthermore, no statistical analysis tool is implemented in CIRCexplorer3-CLEAR to support the quantified expression level. Although the default circRNA prediction tool implemented within CIRIquant is CIRI2, the prediction results from other tools such as find_circ and CIRCexplorer2 can also be utilized for the quantification and DE analysis³¹. In this protocol, only one circRNA prediction tool (CIRI2) was used for prediction, which might still yield false-positive circRNA candidates. To reduce false positives, one can combine other circRNA prediction tools for analysis and select common circRNAs detected among the different circRNA prediction tools⁴⁸^,⁵¹. To further improve circRNA detection, it is ideal to use RNA sequencing datasets that are both rRNA-depleted and subjected to RNase R pre-treatment.

Depending on the objective of the study, de novo and annotated DE circRNAs can be identified separately based on the circBase database⁵². However, circRNAs spanning more than one gene often require manual examination on UCSC or any other genome browser to determine the authenticity of circRNAs and eliminate false positives. Nonetheless, circRNAs that span more than one gene, such as circRNAs derived from fusion genes, have also been reported recently⁵³^,⁵⁴.

Circr works by combining three different miRNA-mRNA predicting algorithms, namely, TargetScan⁵⁵, miRanda⁵⁶, and RNAhybrid⁵⁷ to predict the circRNA-miRNA binding sites. On top of that, the algorithm also incorporates information of AGO peaks and previously validated interactions in the circRNA-miRNA analysis. Here, stringent filtering criteria were applied to allow a more reliable circRNA-miRNA prediction to be obtained, thus, further reducing false positives. However, the stringency of this filtering step could be set higher or lower depending on user preference.

ClusterProfiler is a well-documented R package that can functionally annotate gene sets across diverse organisms. Besides the functions within the R clusterProfiler package mentioned in this protocol (enrichGO and enrichKEGG), which utilize over-representation analysis, there are also other functions such as gseGO and gseKEGG that can be used. If clusterProfiler is not a suitable choice for the workflow, there are also other tools and packages such as the "AllEnricher"⁵⁸ or the website-based tools such as "Metascape"⁵⁹ that can functionally annotate a set of genes. Lastly, although the pipeline provided above helps in predicting potential circRNAs and their functional annotations, wet-lab verification will be needed to provide solid evidence.

Disclosures

The authors have nothing to disclose.

Acknowledgements

The author would like to thank Tan Ke En and Dr. Cameron Bracken for their critical review of this manuscript. This work was supported by grants from Fundamental Research Grant Scheme (FRGS/1/2020/SKK0/UM/02/15) and University of Malaya High Impact Research Grant (UM.C/625/1/HIR/MOE/CHAN/02/07).

Materials

Bedtools	GitHub	https://github.com/arq5x/bedtools2/	Referring to section 4.1.2. Needed for Circr.
BWA	Burrows-Wheeler Aligner	http://bio-bwa.sourceforge.net/	Referring to section 2.1.1 and 2.1.2. Needed to run CIRIquant, and to index the genome
Circr	GitHub	https://github.com/bicciatolab/Circr	Referring to section 4. Use to predict the miRNA binding sites
CIRIquant	GitHub	https://github.com/bioinfo-biols/CIRIquant	Referring to section 2.1.3. To predict circRNAs
Clusterprofiler	GitHub	https://github.com/YuLab-SMU/clusterProfiler	Referring to section 7. For GO and KEGG functional enrichment
CPU	Intel	Intel(R) Xeon(R) CPU E5-2620 V2 @ 2.10 GHz Cores: 6-core CPU Memory: 65 GB Graphics card: NVIDIA GK107GL (QUADRO K2000)	Specifications used to run this entire protocol.
Cytoscape	Cytoscape	https://cytoscape.org/download.html	Referring to section 5.2. Needed to plot ceRNA network
FastQC	Babraham Bioinformatics	https://www.bioinformatics.babraham.ac.uk/projects/fastqc/	Referring to section 1.2.1. Quality checking on Fastq files
HISAT2		http://daehwankimlab.github.io/hisat2/	Referring to section 2.1.1 and 2.1.2. Needed to run CIRIquant, and to index the genome
Linux	Ubuntu 20.04.5 LTS (Focal Fossa)	https://releases.ubuntu.com/focal/	Needed to run the entire protocol. Other Ubuntu versions may still be valid to carry out the protocol.
miRanda		http://www.microrna.org/microrna/getDownloads.do	Referring to section 4.1.2. Needed for Circr
Pybedtools	pybedtools 0.8.2	https://pypi.org/project/pybedtools/	Needed for BED file genomic manipulation
Python	Python 2.7 and 3.6 or abover	https://www.python.org/downloads/	To run necessary library modules
R	The Comprehensive R Archive Network	https://cran.r-project.org/	To manipulate dataframes
RNAhybrid	BiBiServ	https://bibiserv.cebitec.uni-bielefeld.de/rnahybrid	Referring to section 4.1.2. Needed for Circr
RStudio	RStudio	https://www.rstudio.com/	A workspace to run R
samtools	SAMtools	http://www.htslib.org/	Referring to section 2.1.2. Needed to run CIRIquant
StringTie	Johns Hopkins University: Center for Computational Biology	http://ccb.jhu.edu/software/stringtie/index.shtml	Referring to section 2.1.2. Needed to run CIRIquant
TargetScan	GitHub	https://github.com/nsoranzo/targetscan	Referring to section 4.1.2. Needed for Circr

References

Raman, K., Bhat, A. G., Chandra, N. A systems perspective of host-pathogen interactions: predicting disease outcome in tuberculosis. Molecular BioSystems. 6 (3), 516-530 (2010).
Casadevall, A., Pirofski, L. A. Host-pathogen interactions: basic concepts of microbial commensalism, colonization, infection, and disease. Infection and Immunity. 68 (12), 6511-6518 (2000).
Yang, E., Li, M. M. H. All About the RNA: Interferon-stimulated genes that interfere with viral RNA processes. Frontiers in Immunology. 11, 605024 (2020).
Schneider, W. M., Chevillotte, M. D., Rice, C. M. Interferon-stimulated genes: A complex web of host defenses. Annual Review of Immunology. 32 (1), 513-545 (2014).
Shirahama, S., Miki, A., Kaburaki, T., Akimitsu, N. Long non-coding RNAs involved in pathogenic infection. Frontiers in Genetics. 11, 454 (2020).
Chandan, K., Gupta, M., Sarwat, M. Role of host and pathogen-derived microRNAs in immune regulation during infectious and inflammatory diseases. Frontiers in Immunology. 10, 3081 (2019).
Chen, X., et al. Circular RNAs in immune responses and immune diseases. Theranostics. 9 (2), 588-607 (2019).
Kristensen, L. S., et al. The biogenesis, biology and characterization of circular RNAs. Nature Reviews Genetics. 20 (11), 675-691 (2019).
Ashwal-Fluss, R., et al. circRNA biogenesis competes with pre-mRNA splicing. Molecular Cell. 56 (1), 55-66 (2014).
Conn, S. J., et al. The RNA binding protein quaking regulates formation of circRNAs. Cell. 160 (6), 1125-1134 (2015).
Zhang, X. O., et al. Complementary sequence-mediated exon circularization. Cell. 159 (1), 134-147 (2014).
Robic, A., Demars, J., Kuhn, C. In-depth analysis reveals production of circular RNAs from non-coding sequences. Cells. 9 (8), 1806 (2020).
Eger, N., Schoppe, L., Schuster, S., Laufs, U., Boeckel, J. N. Circular RNA splicing. Advances in Experimental Medicine and Biology. 1087, 41-52 (2018).
Barrett, S. P., Wang, P. L., Salzman, J. Circular RNA biogenesis can proceed through an exon-containing lariat precursor. eLife. 4, 07540 (2015).
Memczak, S., et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 495 (7441), 333-338 (2013).
Misir, S., Wu, N., Yang, B. B. Specific expression and functions of circular RNAs. Cell Death and Differentiation. 29 (3), 481-491 (2022).
Bai, S., et al. Construct a circRNA/miRNA/mRNA regulatory network to explore potential pathogenesis and therapy options of clear cell renal cell carcinoma. Scientific Reports. 10 (1), 13659 (2020).
Sakshi, S., Jayasuriya, R., Ganesan, K., Xu, B., Ramkumar, K. M. Role of circRNA-miRNA-mRNA interaction network in diabetes and its associated complications. Molecular Therapy – Nucleic Acids. 26, 1291-1302 (2021).
Hansen, T. B., et al. miRNA-dependent gene silencing involving Ago2-mediated cleavage of a circular antisense RNA. The EMBO Journal. 30 (21), 4414-4422 (2011).
Lu, M. Circular RNA: functions, applications, and prospects. ExRNA. 2 (1), 15 (2020).
Liu, K. S., Pan, F., Mao, X. D., Liu, C., Chen, Y. J. Biological functions of circular RNAs and their roles in occurrence of reproduction and gynecological diseases. American Journal of Translational Research. 11 (1), 1-15 (2019).
Pamudurti, N. R., et al. Translation of CircRNAs. Molecular Cell. 66 (1), 9-21 (2017).
Legnini, I., et al. Circ-ZNF609 Is a circular RNA that can be translated and functions in myogenesis. Molecular Cell. 66 (1), 22-37 (2017).
Weigelt, C. M., et al. An insulin-sensitive circular RNA that regulates lifespan in Drosophila. Molecular Cell. 79 (2), 268-279 (2020).
Guo, Y., et al. Identification and characterization of circular RNAs in the A549 cells following Influenza A virus infection. Veterinary Microbiology. 267, 109390 (2022).
Qu, Z., et al. A novel intronic circular RNA antagonizes influenza virus by absorbing a microRNA that degrades CREBBP and accelerating IFN-β production. mBio. 12 (4), 0101721 (2021).
Kawarada, Y., et al. TGF-β induces p53/Smads complex formation in the PAI-1 promoter to activate transcription. Scientific Reports. 6 (1), 35483 (2016).
Yu, T., et al. Circular RNA GATAD2A promotes H1N1 replication through inhibiting autophagy. Veterinary Microbiology. 231, 238-245 (2019).
FastQC: A quality control tool for high throughput sequence data. Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010)
Bolger, A. M., Lohse, M., Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30 (15), 2114-2120 (2014).
Zhang, J., Chen, S., Yang, J., Zhao, F. Accurate quantification of circular RNAs identifies extensive circular isoform switching events. Nature Communications. 11 (1), 90 (2020).
Li, H., Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 26 (5), 589-595 (2010).
Kim, D., Paggi, J. M., Park, C., Bennett, C., Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology. 37 (8), 907-915 (2019).
Pertea, M., et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology. 33 (3), 290-295 (2015).
Li, H., et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25 (16), 2078-2079 (2009).
Wang, L., Wang, S., Li, W. RSeQC: quality control of RNA-seq experiments. Bioinformatics. 28 (16), 2184-2185 (2012).
Dori, M., Caroli, J., Forcato, M. Circr, a computational tool to identify miRNA:circRNA associations. Frontiers in Bioinformatics. 2, 852834 (2022).
Shannon, P., et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research. 13 (11), 2498-2504 (2003).
Wu, T., et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2 (3), 100141 (2021).
Yu, G., Wang, L. G., Han, Y., He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 16 (5), 284-287 (2012).
. org.Hs.eg.db: Genome wide annotation for human. 2022. R package version 3.15.0 Available from: https://bioconductor.org/packages/release/data/annotation/html/org.Hs.eg.db.html (2022)
Barrett, T., et al. NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Research. 41, 991-995 (2012).
Gao, Y., Zhang, J., Zhao, F. Circular RNA identification based on multiple seed matching. Briefings in Bioinformatics. 19 (5), 803-810 (2018).
Zhang, X. O., et al. Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Research. 26 (9), 1277-1287 (2016).
Memczak, S., et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 495 (7441), 333-338 (2013).
Wang, K., et al. MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Research. 38 (18), 178 (2010).
Song, X., et al. Circular RNA profile in gliomas revealed by identification tool UROBORUS. Nucleic Acids Research. 44 (9), 87 (2016).
Hansen, T. B. Improved circRNA identification by combining prediction algorithms. Frontiers in Cell and Developmental Biology. 6, 20 (2018).
Robinson, M. D., McCarthy, D. J., Smyth, G. K. edgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 26 (1), 139-140 (2010).
Ma, X. K., et al. CIRCexplorer3: A CLEAR pipeline for direct comparison of circular and linear RNA expression. Genomics Proteomics Bioinformatics. 17 (5), 511-521 (2019).
Gaffo, E., Buratin, A., Dal Molin, A., Bortoluzzi, S. Sensitive, reliable and robust circRNA detection from RNA-seq with CirComPara2. Briefings in Bioinformatics. 23 (1), (2022).
Glažar, P., Papavasileiou, P., Rajewsky, N. circBase: a database for circular RNAs. RNA. 20 (11), 1666-1670 (2014).
Tan, S., et al. Circular RNA F-circEA-2a derived from EML4-ALK fusion gene promotes cell migration and invasion in non-small cell lung cancer. Molecular Cancer. 17 (1), 138 (2018).
Guarnerio, J., et al. Oncogenic role of Fusion-circRNAs Derived from cancer-associated chromosomal translocations. Cell. 165 (2), 289-302 (2016).
McGeary, S. E., et al. The biochemical basis of microRNA targeting efficacy. Science. 366 (6472), (2019).
Enright, A. J., et al. MicroRNA targets in Drosophila. Genome Biology. 5 (1), 1 (2003).
Rehmsmeier, M., Steffen, P., Hochsmann, M., Giegerich, R. Fast and effective prediction of microRNA/target duplexes. RNA. 10 (10), 1507-1517 (2004).
Zhang, D., et al. AllEnricher: a comprehensive gene set function enrichment tool for both model and non-model species. BMC Bioinformatics. 21 (1), 106 (2020).
Zhou, Y., et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications. 10 (1), 1523 (2019).

Play Video

PDF

DOI

DOWNLOAD MATERIALS LIST

Cite This Article

Ealam Selvan, M., Lim, K. S., Teo, C. H., Lim, Y. In Silico Identification and Characterization of circRNAs During Host-Pathogen Interactions. J. Vis. Exp. (188), e64565, doi:10.3791/64565 (2022).