RESEARCH
Peer reviewed scientific video journal
Video encyclopedia of advanced research methods
Visualizing science through experiment videos
EDUCATION
Video textbooks for undergraduate courses
Visual demonstrations of key scientific experiments
BUSINESS
Video textbooks for business education
OTHERS
Interactive video based quizzes for formative assessments
Products
RESEARCH
JoVE Journal
Peer reviewed scientific video journal
JoVE Encyclopedia of Experiments
Video encyclopedia of advanced research methods
EDUCATION
JoVE Core
Video textbooks for undergraduates
JoVE Science Education
Visual demonstrations of key scientific experiments
JoVE Lab Manual
Videos of experiments for undergraduate lab courses
BUSINESS
JoVE Business
Video textbooks for business education
Solutions
Language
English
Menu
Menu
Menu
Menu
Research Article
Jooa Kwon1,2, George Z. He3,4, Mirana Ramialison1,2,3,4,5, Hieu T. Nim1,2,3,4
1Department of Paediatrics, Faculty of Medicine, Dentistry and Health Sciences,University of Melbourne, 2Australian Regenerative Medicine Institute,Monash University, 3Stem Cell Medicine Department, Murdoch Children's Research Institute,The Royal Children's Hospital, 4The Novo Nordisk Foundation Center for Stem Cell Medicine, reNEW Melbourne,Murdoch Children's Research Institute, 5Systems Biology Institute (SBI) Australia
Erratum Notice
Important: There has been an erratum issued for this article. View Erratum Notice
Retraction Notice
The article Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size (LEfSe) in Microbiome Data (10.3791/61715) has been retracted by the journal upon the authors' request due to a conflict regarding the data and methodology. View Retraction Notice
We present a coding-free workflow for biologists to identify tissue-specific gene enhancers using only browser-based tools. Our protocol leverages public H3K4me1/H3K27ac histone marks and Hi-C data, enabling researchers without programming expertise to access, analyse, and identify potential regulatory elements associated with their genes of interest.
Enhancers are DNA regions that regulate gene expression. Mutations within enhancers can result in abnormal gene regulation leading to disease. Therefore, identifying enhancers that regulate gene activity in specific tissues is crucial for understanding the genetic basis of disease. However, enhancers are difficult to identify as they do not encode proteins. While numerous enhancer repositories and identification tools are available, the complexity of these tools can present a challenge for biologists. To facilitate biologists in using these resources, we present a biologist-friendly protocol (https://github.com/Ramialison-Lab/EnhancerWorkflow) which leverages existing web-based genomics data such as H3K4me1 and H3K27ac histone marks and chromatin conformation analysis (Hi-C) data to discover enhancers associated with a gene of interest (GoI) in a target tissue where the enhancer is active. This protocol is entirely web-based and does not require programming skills from end-users. We demonstrated the utility of this approach by characterising candidate enhancers regulating TBX5, a gene critical for heart development. This protocol facilitates the identification of enhancers associated with this gene in the left ventricle.
Enhancers are non-coding DNA regions that regulate gene transcription, development, and cellular differentiation1,2. Mutation of enhancers can lead to various diseases, including developmental disorders, cancers, and other genetic conditions3,4,5,6. Therefore, understanding enhancers is paramount to understanding gene expression, mutation, and diseases.
To understand how enhancers interact with their target genes, it is important to identify their locations within the genome. However, identifying the enhancer locations is not always straightforward as enhancers can be located both close to the transcriptional start site (TSS) and much further away, spanning tens to hundreds of kilobases2,7,8,9.
Despite their unpredictable genomic location, enhancers show distinct biochemical and structural signatures, allowing them to be systemically traced. Generally, enhancers tend to be enriched in intergenic and intronic regions, with a small number found within exons2,8. They are often marked by specific histone modifications and transcription factor binding, which define their regulatory roles and determine their spatio-temporal activity across different developmental stages and tissues10,11.
ChIP-seq is used to identify transcription factor binding sites (TFBSs) and histone modification marks, such as the hallmark of enhancers, H3K4me1, active enhancer marks, H3K27ac, and H3K4me3 marks, which are enriched at promoter regions1,12,13,14,15. Chromatin conformation capture (3C) techniques and their derivatives, such as 4C, 5C, Hi-C, and ChIA-PET, are used to map physical interactions between distant genomic regions. While 3C targets specific interactions in specific tissues, Hi-C offers a genome-wide architecture across cell types16,17.
In addition to the current methods, specialised approaches have been developed to characterise enhancers, including pooled enhancer databases, such as EnhancerAtlas or EnhancerFinder18,19. However, these tools often require researchers to integrate multiple datasets to investigate multiple enhancers across many tissues, which can be overwhelming for biologists without experience in bioinformatics and data mining.
Here, we describe a user-friendly protocol to select enhancers, entirely based on existing web tools. This allows researchers to query a gene of interest (GoI) and retrieve corresponding enhancers. The protocol here selects enhancers based on a specific set of criteria: histone modifications, chromatin interactions, and tissue specificity1,12,13,14,15,16,17,20,21. Enhancers found within introns are more likely to show tissue-specific activity compared to intergenic enhancers, which are positioned in the genomic regions between genes22. To ensure comprehensive coverage of potential active enhancers, we defined the search range between two neighboring GoIs to increase the likelihood of capturing regulatory elements located outside of gene bodies. We used an enhancer-specific epigenetic hallmark, H3K4me1, and an active enhancer mark, H3K27ac, to list enhancer candidates. These candidates were then refined based on Hi-C data, retaining enhancers with physical interactions with the corresponding promoter. This protocol is designed to guide biologists through the process of enhancer identification using only publicly available web-based tools. By integrating epigenetic and chromatin interaction data, the approach described here offers a practical framework to generate hypotheses about potential enhancers for further experimental validation.
NOTE: A step-by-step walkthrough is available at https://github.com/Ramialison-Lab/EnhancerWorkflow. Data used in the protocol is summarized in Table 1 and Table 2. Troubleshooting is available in Supplementary File 1.
1. Locating GoI (Figure 1)
2. Defining enhancer detection region (Figure 2)
3. Histone mark analysis (Figure 3)
4. Chromatin conformation capture (Hi-C) analysis (Figure 4)
To illustrate the use of the presented protocol, we investigated the TBX5 gene in the human heart, exploring TBX5-associated enhancers using the comprehensive workflow involving H3K4me1, H3K27ac, and Hi-C data. TBX5 is a gene that contributes to limb and heart development, including the formation of the four chambers and septum separation24. Mutation in this gene is a main cause of Holt-Oram syndrome (HOS), which causes limb abnormalities and congenital heart disease (CHD), including septum defects24. The mutation of TBX5-associated cardiac enhancers may critically influence CHD24. A previous study discovered three known TBX5 enhancers in human cardiac-specific tissue - namely "Enhancer 2", "Enhancer 9", and "Enhancer 16" (Supplementary File 2), which were demonstrated to have comparative phenotypes in transgenic mice23.
We investigated H3K4me1- and H3K27ac-enriched regions between RBM19 and TBX3, which are two flanking genes downstream and upstream of TBX5 in human, to retrieve putative enhancers at the TBX5 locus (Figure 1 and Figure 2). To identify heart-specific enhancers, the cardiac muscle cells were chosen. The putative TBX5 cardiac enhancer regions were retrieved as coordinates (chr12: start-end), and 22 both H3K4me1- and H3K27ac-associated regions were identified (Figure 3 and Supplementary File 3). Putative TBX5 cardiac enhancers were retrieved from the EnsEMBL genomic database to cross-reference Hi-C data held in the 4DNucleome database (Figure 4). This was done to gauge possible interactions between potential enhancers and the TBX5 cardiac promoter. Following the protocol described here, 21 out of 22 genomic regions were confirmed to interact with the TBX5 promoter (chr12: 114400143-114410103) in the cardiac muscle cells (Supplementary File 4). There was one region that had no physical interaction with the promoter (Figure 4, step 4.8). Finally, we compared this protocol with these biologically validated enhancers and the current gold-standard database of heart enhancers, VISTA Cardiac Enhancers Browser, and revealed additional enhancers not currently captured by the database25.
We performed a cross-comparison of the 21 TBX5 enhancers retrieved by the protocol presented here with existing databases. We retrieved 4 TBX5 enhancers from VISTA Cardiac Enhancer Browser (Supplementary File 5)25. Of the 4 VISTA-identified cardiac enhancers, 3 enhancers, hs2329, mm1282, and m370 overlapped with the regions identified by this web-based enhancer detection protocol (Figure 5). Each of the predicted enhancers also shared the genomic regions with previously experimentally validated enhancers from Smemo et al.23, Enhancer 2 (chr12:114025907-114026275, GRCh38), and Enhancer 16 (chr12:114415466-114420433, GRCh38), while they did not show overlap with Enhancer 9 (chr12:114263402-114266886, GRCh38). One of the VISTA-identified enhancers, hs498 did not overlap with any predicted enhancers by this protocol or Smemo et al.'s experimentally validated enhancers23 (Figure 5), even though the region showed partial overlap with H34Kme1-marks (Figure 5). Similarly, Enhancer 9 did not overlap with the predicted enhancers by this pipeline but was associated with H3K4me1 marks (Figure 5).

Figure 1: Step-by-step guide to locate the GoI in the EnsEMBL Genome Browser. The user first opens the EnsEMBL homepage (1.1), selects the species (Human), and enters the gene into the search bar (1.2-1.3). From the list of results, the appropriate gene ID is selected (1.4), which opens the gene summary page. The user then clicks the Region in Detail hyperlink (1.5) to visualize the genomic region surrounding the GoI, including neighboring elements and regulatory features. Please click here to view a larger version of this figure.

Figure 2: Defining the enhancer detection region around the GoI using the EnsEMBL Genome Browser. To define the enhancer detection region, identify the two neighboring genes flanking the GoI using the Basic Gene Annotations from GENCODE track, where genes are shown as dark yellow blocks labelled with merged EnsEMBL/Havana annotations. The transcriptional direction of each gene is indicated by arrowheads (< or >) next to the gene name (2.1). To select the intergenic region between the neighboring genes, click and drag across the region of interest, then choose Jump to region in the pop-up box to zoom in (2.2). To add regulatory or enhancer-related annotations, click Add/remove tracks (2.3). Please click here to view a larger version of this figure.

Figure 3: Configuration of histone modification tracks in the enhancer detection region using the EnsEMBL Genome Browser. In the left tool bar, click Configure this page (3.1) to access the track configuration panel and navigate to "Activity by Cell/Tissue" under the Regulation section (3.2). In the opened tab, select the "Experiments" section (3.3), and use the Cell/Tissue search bar to locate and select your tissue of interest (cardiac muscle cell) (3.4). In the histone mark panel (3.5), enable H3K4me1 and H3K27ac as active enhancer marks and H3K4me3 as a promoter mark then click "Configure track display" (3.6). After confirming the track selections, click "View tracks" (3.7) to return to the genome viewer. Histone mark peaks are now shown in the detection region (3.8) as coloured blocks under the corresponding tissue label (yellow: H3K4me1, blue: H3K27ac, and orange: H3K4me3). "Hists & Pols" pop-up containing the genomic coordinates of the region in base pairs (chr:start-end), which can be copied and saved for downstream analysis. A "Hists & Pols" pop-up appears after clicking on the coloured elements in the track. The pop-up contains the genomic coordinates of the region in base pairs (e.g., chr12:11443450-114451611 for the promoter region), which can be copied and saved for downstream analysis (3.8). Likewise, to extract candidate enhancers, prioritize regions where H3K4me1 and H3K27ac peaks overlap, as shown by vertical alignment of peaks and boxes across tracks (3.9). Overlapping regions can be selected directly by clicking their boxes or by manually click-dragging across the aligned peaks to define a region (e.g., chr12:114400143-114410103 for an active candidate region). Coordinates shown in the pop-up should be saved in BED format for downstream validation or visualization. Please click here to view a larger version of this figure.

Figure 4: Visualisation of promoter-enhancer chromatin interactions using Hi-C heatmaps from the 4D Nucleome Data Portal. The 4D Nucleome Data Portal homepage displays a stacked bar chart summarizing available experiment types by organism. The "in situ Hi-C" dataset for human samples is selected by clicking the corresponding section of the bar (4.1). A filtered list of relevant datasets is displayed; a Hi-C dataset derived from H9 cells differentiated into cardiac myoblasts is selected (4.2). The selected dataset (4.3) is opened in the HiGlass browser via the Explore Data button (4.4). The genomic region of interest is entered in the coordinate box (4.5), and the contact matrix is rendered as a colour-scaled heatmap. Darker colours (deep red to black) indicate stronger chromatin contact frequency, while lighter colours (white to orange) represent weaker interactions. A horizontal rule is placed at the promoter coordinate, and vertical rules are drawn at the positions of three experimentally validated control enhancers (4.6). These intersections are used to define a strict interaction threshold, set by the strongest visible signal (darkest colour) among the promoter-enhancer contacts (4.7). Additional vertical rules are drawn at the locations of candidate H3K27ac and H3K4me1-marked enhancers (from step 3.8). Candidates whose promoter-enhancer intersections are equal to or darker than the threshold are retained, while those with weaker signals (lighter color squares) are excluded (4.8). Retained coordinates are extracted manually and saved in BED format for downstream analyses. (a. Enhancer 2, b. Enhancer 9, and c. Enhancer 16) Please click here to view a larger version of this figure.

Figure 5: Genomic browser view of predicted TBX5 enhancers compared with VISTA-validated cardiac enhancers and control enhancers. Genome browser snapshots display the enhancer searching range (STEP 2) comparing predicted enhancers retrieved by the web-based protocol (bottom) with VISTA-validated enhancers (top) and experimentally validated control enhancers (center). The main panel shows the full genomic locus with annotated regulatory elements, including cardiac muscle cell-specific H3K4me1 (yellow), H3K27ac (blue), and H3K4me3 peaks (orange). Three zoomed-in figures capture the alignment between the retrieved by the protocol, VISTA, and control enhancers. The overlap with control enhancers is outlined with red boxes. Coordinates for each sub-region are displayed in the lower browser panels. Please click here to view a larger version of this figure.
Table 1: Data used in the study. Please click here to download of this Table.
Table 2: Web-based tools used in the study. Please click here to download of this Table.
Supplementary File 1: Troubleshooting instructions for EnsEMBL Genome Browser. Please click here to download of this File.
Supplementary File 2: A BED file in GRCh38 format, the experimentally validated TBX5 cardiac control enhancers23. Please click here to download of this File.
Supplementary File 3: A BED file in GRCh38 format, TBX5 cardiac enhancers retrieved by STEP3 from EnsEMBL. Please click here to download of this File.
Supplementary File 4: A BED file in GRCh38 format, TBX5 cardiac enhancers retrieved by STEP4 from EnsEMBL. Please click here to download of this File.
Supplementary File 5: A BED file in GRCh38 format, TBX5 cardiac enhancers retrieved from the VISTA cardiac enhancer browser25. Please click here to download of this File.
The web-based protocol described here functions as a workflow for enhancer retrieval, rather than enhancer prediction. By leveraging publicly available tissue-specific datasets, histone modifications (H3K4me1 and H3K27ac), and Hi-C interaction data, it narrows down potential enhancers associated with the GoI. Unlike computational prediction tools that depend on machine learning or sequence-based models, our approach focuses on retrieving enhancers only based on experimental data. The troubleshooting instructions for EnsEMBL genome browser is provided in Supplementary File 1.
This workflow integrates key elements such as histone marks and chromatin interaction data, similar to advanced methods like ChIA-PET and PLAC-seq, which map enhancer-promoter interactions with higher accuracy10,11. However, this method is beneficial when high-resolution experimental techniques are not feasible, as the protocol is less resource-intensive and saves significant time.
The primary limitation of the above-presented approach is its dependence on the availability and quality of existing datasets, which may affect the precision of the retrieved interactions. Investigating enhancer activity during cardiac development in the context of TBX5-associated genetic mutations requires tissue-specific resolution. For such analysis, embryonic cardiac tissue would be the most appropriate given its relevance to developmental regulation. However, no embryonic datasets included the histone modification data was publicly available at the time of analysis. To account for this, integrating alternative resources such as ENCODE, Enhancer Atlas 2.0, or EnhancerFinder can expand the pipeline's utility by providing additional datasets for enhancer identification and validation18,19.
In the TBX5 REPFIX, the H3K4me1-based analysis revealed 22 putative enhancers as a starting point for further investigation. The subsequent Hi-C analysis showed that 21 out of 22 enhancer candidates based on previous histone modification marks interacted with the TBX5 promoter in the cardiac muscle cells (Figure 5). This supports the reliability of the histone modification marks-based approach in predicting regions of interest.
We chose not to prioritise sequence conservation across species in this protocol, although it is a common criterion for identifying enhancers. As it previously proved to be less effective for tissue or species-specific enhancers, many of which are not strongly conserved during evolution26. Given this, we opted to focus on chromatin-based markers that are more directly indicative of functional enhancer activity. However, the sequence conservation may still hold value in specific contexts, such as studying enhancers with evolutionary significance. In this case, it could be added as an optional step for users interested in conserved regulatory elements.
The protocol proved effective in the retrieval of 21 heart-specific enhancers for the TBX5 gene from the EnsEMBL website, which had previously eluded identification by an existing platform, VISTA Cardiac Enhancers Browser. Although the protocol described here could not retrieve one of the enhancers, hs498, suggesting the possibility of limitation in detection, the approach uncovered some enhancers not detected by the VISTA Cardiac Enhancers Browser. However, further validation of the retrieved enhancers is necessary, as the protocol yields a larger number of putative regions compared to curated VISTA databases. This higher count increases the risk of false positives, and a larger number of predicted enhancers does not necessarily indicate improved specificity or functional relevance. Incorporating additional experimental datasets or functional assays such as gene expression analysis, CRISPR perturbation, or reporter assays will be crucial to confirm the biological validity of these candidates as performed in Smemoet al.'s study23.
The cross-comparison with three experimentally validated enhancers showed partial overlap with the predicted regions (Figure 5). The predicted enhancer "e1" was more broadly positioned than validated Enhancer 2, while "e18" showed partial overlap with Enhancer 16 (Figure 5). These results suggest that this approach successfully identifies regions with known regulatory activity, although the broader span of the predicted enhancers may reflect the flexibility in enhancer boundaries. Enhancers often function as modular elements, and their activity can depend on chromatin context, cell type, and developmental timing2,27. Therefore, predicted regions may include the core active sites with adjacent sequences that contribute to regulatory function, despite the need for experimental validation to determine which portions of these broader predictions are functionally active in a tissue-specific or developmental context. While VISTA Cardiac Enhancer Browser identified four regions within the defined range, only one enhancer, mm370, partially overlapped with an experimentally validated enhancer, Enhancer 16, specifically in the region marked by H3K4me1 (Figure 5)23. The remaining portion of mm370, which does not overlap with Enhancer 16, may indicate a non-functional or inactive subregion of the enhancer28,29.
The authors report no competing interests.
We thank the members of the Ramialison lab (Transcriptomics and Bioinformatics, reNEW Bioinformatics Hub) for helpful discussions. MR and HTN are supported by an NHMRC Ideas Grant (APP1180905). We thank Richard Saffery for the support. MR is funded by a Heart Foundation Future Leader Fellowship (107328). Additional infrastructure funding to the Murdoch Children's Research Institute was provided by the Australian Government National Health and Medical Research Council Independent Research Institute Infrastructure Support Scheme. The Australian Regenerative Medicine Institute is supported by grants from the State Government of Victoria and the Australian Government. The Novo Nordisk Foundation Center for Stem Cell Medicine is supported by Novo Nordisk Foundation grants (NNF21CC0073729).
| Computer workstation | N/A | N/A | Web-browser-capable computer, Windows/Mac/Linux operating system |
| 4DN data portal | 4DN data portal | https://data.4dnucleome.org/ | |
| Galaxy | Galaxy | https://usegalaxy.org/published/history?id=aff5db4e07064445 | |
| Github | Github | https://github.com/Ramialison-Lab/EnhancerWorkflow | |
| VISTA | VISTA Cardiac Enhancer Browser | https://portal.nersc.gov/dna/RD/heart/ |