$$\rightleftharpoonup{xx}$$
$$\longleftharp{xx}$$,
$$\longrightharp{xx}$$,
A thorough understanding of the epigenome is required to achieve the full potential of human genome sequencing in providing new biological insights8. Currently there are only ways to search online epigenomic datasets by their data description and title (i.e., metadata)1. This severely limits the types of search one can do with epigenomic data. Pattern-based search tools for epigenomic data are essential for exploring the relationship between different epigenomic marks, which may lead to new biological insights. GeNemo, which searches by the content of the data and not metadata, is the first service of its kind to compare patterns in epigenomic data from published depositories such as the ENCODE database with a user-generated or downloaded dataset5. This marks the beginning of the availability of an epigenomic search tool that is widely accessible to researchers around the world just as text-based sequence search tool became widely available in the 1990s. Currently, there are no alternatives for pattern-based online search tools for epigenomic data other than GeNemo.
One potential example of using GeNemo is to search the co-appearing histone modifications and other epigenetic marks with the transcriptional factor E2F6 in human embryonic stem cells (an example E2F6 binding signal file is available at ENCODE data portal or at https://sysbio.ucsd.edu/public/xcao3/ENCODESample/ENCFF001UBC.bed). By using this file as query to search against all ENCODE datasets in H1-hESC, GeNemo will show that E2F6 binding signal is heavily enriched with H3K4me1, H3K4me2, H3K4me3, and H3K27me3, which agrees with existing research showing that E2F6 regulates some genes via methylation of H3K279. On the other hand, there appears to be colocalization of E2F6 and CtBP2 binding sites, which is known to interact with a factor in the same family, E2F710. These results for the entire genome against a large number of epigenetic marks, transcriptional factor binding signals, and other signals included in ENCODE can be fairly easily obtained with GeNemo, which can provide all potential targets for further analysis.
Since the first publication5 of GeNemo as a web-based epigenomic data search tool, the Results section of GeNemo has been updated to have a matching appearance with GeNemo's front page. The old Results section closely mirrored the UCSC genome browser results section, and was largely dependent on the remote UCSC server for display. With the new interface, GeNemo is more user-friendly and no longer dependent on the UCSC genome server (even though data are still fetched remotely). This makes GeNemo more robust and less susceptible to problems due to code changes at the UCSC server. Furthermore, the new, faster polymer interface of GeNemo gives the user more tools to visualize and analyze patterns in the data.
Critical steps include providing the appropriate input file and selecting data tracks to search against. Users are strongly encouraged to experiment with various track selection functions to become familiar with the selection process and how different commands can be combined to achieve the intended outcome. In particular, note that the "Add" function is required to add desired tracks selected to the query, while "Filter" or "Exclude" can be used as logic gate commands "AND" and "OR", respectively. The "Update" function is required to affect all the selections before implementing the search. When no results are returned, a user may check the input data file, search more tracks or increase the search range. Whenever there is an error, there will be a window popping up defining what exactly the error is. There are some ambiguous errors, though. For example, when the window says that 'no file was uploaded,' either no file was uploaded, or the uploaded file was not of an acceptable format and, consequently, the program was not able to read it correctly. Acceptable file formats for file upload include BED and Peaks format file for both upload methods, and bigWig for online link upload only. The zipped versions of these file formats are also acceptable.
Current limitations of this approach include the yet-to-be-optimized algorithms and functions employed in GeNemo. GeNemo cannot yet provide any guidance on the interpretation of any datasets returned. This task is up to the users, which requires significant knowledge and expertise in the biology of the genome and epigenome. In addition, another current limitation is that users cannot change the sensitivity and noise level of the searches. We expect to continue to improve and expand GeNemo on its pattern searching capabilities and dataset collection in future.