Long non-coding RNAs (lncRNAs) perform a diversity of functions in numerous important biological processes and are implicated in many human diseases. In this report we present lncRNAWiki (http://lncrna.big.ac.cn), a wiki-based platform that is open-content and publicly editable and aimed at community-based curation and collection of information on human lncRNAs. Current related databases are dependent primarily on curation by experts, making it laborious to annotate the exponentially accumulated information on lncRNAs, which inevitably requires collective efforts in community-based curation of lncRNAs. Unlike existing databases, lncRNAWiki features comprehensive integration of information on human lncRNAs obtained from multiple different resources and allows not only existing lncRNAs to be edited, updated and curated by different users but also the addition of newly identified lncRNAs by any user. It harnesses community collective knowledge in collecting, editing and annotating human lncRNAs and rewards community-curated efforts by providing explicit authorship based on quantified contributions. LncRNAWiki relies on the underling knowledge of scientific community for collective and collaborative curation of human lncRNAs and thus has the potential to serve as an up-to-date and comprehensive knowledgebase for human lncRNAs.
Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases.
The increased prevalence of multi-drug resistant (MDR) pathogens heightens the need to design new antimicrobial agents. Antimicrobial peptides (AMPs) exhibit broad-spectrum potent activity against MDR pathogens and kills rapidly, thus giving rise to AMPs being recognized as a potential substitute for conventional antibiotics. Designing new AMPs using current in-silico approaches is, however, challenging due to the absence of suitable models, large number of design parameters, testing cycles, production time and cost. To date, AMPs have merely been categorized into families according to their primary sequences, structures and functions. The ability to computationally determine the properties that discriminate AMP families from each other could help in exploring the key characteristics of these families and facilitate the in-silico design of synthetic AMPs.
Transcription regulation in multicellular eukaryotes is orchestrated by a number of DNA functional elements located at gene regulatory regions. Some regulatory regions (e.g. enhancers) are located far away from the gene they affect. Identification of distal regulatory elements is a challenge for the bioinformatics research. Although existing methodologies increased the number of computationally predicted enhancers, performance inconsistency of computational models across different cell-lines, class imbalance within the learning sets and ad hoc rules for selecting enhancer candidates for supervised learning, are some key questions that require further examination. In this study we developed DEEP, a novel ensemble prediction framework. DEEP integrates three components with diverse characteristics that streamline the analysis of enhancer's properties in a great variety of cellular conditions. In our method we train many individual classification models that we combine to classify DNA regions as enhancers or non-enhancers. DEEP uses features derived from histone modification marks or attributes coming from sequence characteristics. Experimental results indicate that DEEP performs better than four state-of-the-art methods on the ENCODE data. We report the first computational enhancer prediction results on FANTOM5 data where DEEP achieves 90.2% accuracy and 90% geometric mean (GM) of specificity and sensitivity across 36 different tissues. We further present results derived using in vivo-derived enhancer data from VISTA database. DEEP-VISTA, when tested on an independent test set, achieved GM of 80.1% and accuracy of 89.64%. DEEP framework is publicly available at http://cbrc.kaust.edu.sa/deep/.
Marine sponges are the most primitive metazoan and host symbiotic microorganisms. They are crucial components of the marine ecological system and play an essential role in pelagic processes. Copper pollution is currently a widespread problem and poses a threat to marine organisms. Here, we examined the effects of copper treatment on the composition of the sponge-associated bacterial community and the genetic features that facilitate the survival of enriched bacteria under copper stress. The 16S rRNA gene sequencing results showed that the sponge Haliclona cymaeformis harbored symbiotic sulfur-oxidizing Ectothiorhodospiraceae and photosynthetic Cyanobacteria as dominant species. However, these autotrophic bacteria decreased substantially after treatment with a high copper concentration, which enriched for a heterotrophic-bacterium-dominated community. Metagenomic comparison revealed a varied profile of functional genes and enriched functions, including bacterial motility and chemotaxis, extracellular polysaccharide and capsule synthesis, virulence-associated genes, and genes involved in cell signaling and regulation, suggesting short-period mechanisms of the enriched bacterial community for surviving copper stress in the microenvironment of the sponge. Microscopic observation and comparison revealed dynamic bacterial aggregation within the matrix and lysis of sponge cells. The bacteriophage community was also enriched, and the complete genome of a dominant phage was determined, implying that a lytic phage cycle was stimulated by the high copper concentration. This study demonstrated a copper-induced shift in the composition of functional genes of the sponge-associated bacterial community, revealing the selective effect of copper treatment on the functions of the bacterial community in the microenvironment of the sponge.
In spite of advances in invertebrate pest management, the agricultural industry is suffering from impeded pest control exacerbated by global climate changes that have altered rain patterns to favour opportunistic breeding. Thus, novel naturally derived chemical compounds toxic to both terrestrial and aquatic invertebrates are of interest, as potential pesticides. In this regard, marine cyanobacterium-derived metabolites that are toxic to both terrestrial and aquatic invertebrates continue to be a promising, but neglected, source of potential pesticides. A PubMed query combined with hand-curation of the information from retrieved articles allowed for the identification of 36 cyanobacteria-derived chemical compounds experimentally confirmed as being toxic to invertebrates. These compounds are discussed in this review.
The bottom of the Red Sea harbors over 25 deep hypersaline anoxic basins that are geochemically distinct and characterized by vertical gradients of extreme physicochemical conditions. Because of strong changes in density, particulate and microbial debris get entrapped in the brine-seawater interface (BSI), resulting in increased dissolved organic carbon, reduced dissolved oxygen toward the brines and enhanced microbial activities in the BSI. These features coupled with the deep-sea prevalence of ammonia-oxidizing archaea (AOA) in the global ocean make the BSI a suitable environment for studying the osmotic adaptations and ecology of these important players in the marine nitrogen cycle. Using phylogenomic-based approaches, we show that the local archaeal community of five different BSI habitats (with up to 18.2% salinity) is composed mostly of a single, highly abundant Nitrosopumilus-like phylotype that is phylogenetically distinct from the bathypelagic thaumarchaea; ammonia-oxidizing bacteria were absent. The composite genome of this novel Nitrosopumilus-like subpopulation (RSA3) co-assembled from multiple single-cell amplified genomes (SAGs) from one such BSI habitat further revealed that it shares ?54% of its predicted genomic inventory with sequenced Nitrosopumilus species. RSA3 also carries several, albeit variable gene sets that further illuminate the phylogenetic diversity and metabolic plasticity of this genus. Specifically, it encodes for a putative proline-glutamate 'switch' with a potential role in osmotolerance and indirect impact on carbon and energy flows. Metagenomic fragment recruitment analyses against the composite RSA3 genome, Nitrosopumilus maritimus, and SAGs of mesopelagic thaumarchaea also reiterate the divergence of the BSI genotypes from other AOA.The ISME Journal advance online publication, 8 August 2014; doi:10.1038/ismej.2014.137.
Sulfur-reducing bacteria (SRB) and sulfur-oxidizing bacteria (SOB) play essential roles in marine sponges. However, the detailed characteristics and physiology of the bacteria are largely unknown. Here, we present and analyse the first genome of sponge-associated SOB using a recently developed metagenomic binning strategy. The loss of transposase and virulence-associated genes and the maintenance of the ancient polyphosphate glucokinase gene suggested a stabilized SOB genome that might have coevolved with the ancient host during establishment of their association. Exclusive distribution in sponge, bacterial detoxification for the host (sulfide oxidation) and the enrichment for symbiotic characteristics (genes-encoding ankyrin) in the SOB genome supported the bacterial role as an intercellular symbiont. Despite possessing complete autotrophic sulfur oxidation pathways, the bacterium developed a much more versatile capacity for carbohydrate uptake and metabolism, in comparison with its closest relatives (Thioalkalivibrio) and to other representative autotrophs from the same order (Chromatiales). The ability to perform both autotrophic and heterotrophic metabolism likely results from the unstable supply of reduced sulfur in the sponge and is considered critical for the sponge-SOB consortium. Our study provides insights into SOB of sponge-specific clade with thioautotrophic and versatile heterotrophic metabolism relevant to its roles in the micro-environment of the sponge body.
Abnormality and disease in sponges have been widely reported, yet how sponge-associated microbes respond correspondingly remains inconclusive. Here, individuals of the sponge Carteriospongia foliascens under abnormal status were collected from the Rabigh Bay along the Red Sea coast. Microbial communities in both healthy and abnormal sponge tissues and adjacent seawater were compared to check the influences of these abnormalities on sponge-associated microbes. In healthy tissues, we revealed low microbial diversity with less than 100 operational taxonomic units (OTUs) per sample. Cyanobacteria, affiliated mainly with the sponge-specific species "Candidatus Synechococcus spongiarum," were the dominant bacteria, followed by Bacteroidetes and Proteobacteria. Intraspecies dynamics of microbial communities in healthy tissues were observed among sponge individuals, and potential anoxygenic phototrophic bacteria were found. In comparison with healthy tissues and the adjacent seawater, abnormal tissues showed dramatic increase in microbial diversity and decrease in the abundance of sponge-specific microbial clusters. The dominated cyanobacterial species Candidatus Synechococcus spongiarum decreased and shifted to unspecific cyanobacterial clades. OTUs that showed high similarity to sequences derived from diseased corals, such as Leptolyngbya sp., were found to be abundant in abnormal tissues. Heterotrophic Planctomycetes were also specifically enriched in abnormal tissues. Overall, we revealed the microbial communities of the cyanobacteria-rich sponge, C. foliascens, and their impressive shifts under abnormality.
"Candidatus Synechococcus spongiarum" is a cyanobacterial symbiont widely distributed in sponges, but its functions at the genome level remain unknown. Here, we obtained the draft genome (1.66 Mbp, 90% estimated genome recovery) of "Ca. Synechococcus spongiarum" strain SH4 inhabiting the Red Sea sponge Carteriospongia foliascens. Phylogenomic analysis revealed a high dissimilarity between SH4 and free-living cyanobacterial strains. Essential functions, such as photosynthesis, the citric acid cycle, and DNA replication, were detected in SH4. Eukaryoticlike domains that play important roles in sponge-symbiont interactions were identified exclusively in the symbiont. However, SH4 could not biosynthesize methionine and polyamines and had lost partial genes encoding low-molecular-weight peptides of the photosynthesis complex, antioxidant enzymes, DNA repair enzymes, and proteins involved in resistance to environmental toxins and in biosynthesis of capsular and extracellular polysaccharides. These genetic modifications imply that "Ca. Synechococcus spongiarum" SH4 represents a low-light-adapted cyanobacterial symbiont and has undergone genome streamlining to adapt to the sponge's mild intercellular environment. IMPORTANCE Although the diversity of sponge-associated microbes has been widely studied, genome-level research on sponge symbionts and their symbiotic mechanisms is rare because they are unculturable. "Candidatus Synechococcus spongiarum" is a widely distributed uncultivated cyanobacterial sponge symbiont. The genome of this symbiont will help to characterize its evolutionary relationship and functional dissimilarity to closely related free-living cyanobacterial strains. Knowledge of its adaptive mechanism to the sponge host also depends on the genome-level research. The data presented here provided an alternative strategy to obtain the draft genome of "Ca. Synechococcus spongiarum" strain SH4 and provide insight into its evolutionary and functional features.
DNA methylation in promoters is closely linked to downstream gene repression. However, whether DNA methylation is a cause or a consequence of gene repression remains an open question. If it is a cause, then DNA methylation may affect the affinity of transcription factors (TFs) for their binding sites (TFBSs). If it is a consequence, then gene repression caused by chromatin modification may be stabilized by DNA methylation. Until now, these two possibilities have been supported only by non-systematic evidence and they have not been tested on a wide range of TFs. An average promoter methylation is usually used in studies, whereas recent results suggested that methylation of individual cytosines can also be important.
Using dilution-to-extinction cultivation, we isolated a strain affiliated with the PS1 clade from surface waters of the Red Sea. Strain RS24 represents the second isolate of this group of marine Alphaproteobacteria after IMCC14465 that was isolated from the East (Japan) Sea. The PS1 clade is a sister group to the OCS116 clade, together forming a putatively novel order closely related to Rhizobiales. While most genomic features and most of the genetic content are conserved between RS24 and IMCC14465, their average nucleotide identity (ANI) is < 81%, suggesting two distinct species of the PS1 clade. Next to encoding two different variants of proteorhodopsin genes, they also harbor several unique genomic islands that contain genes related to degradation of aromatic compounds in IMCC14465 and in polymer degradation in RS24, possibly reflecting the physicochemical differences in the environment they were isolated from. No clear differences in abundance of the genomic content of either strain could be found in fragment recruitment analyses using different metagenomic datasets, in which both genomes were detectable albeit as minor part of the communities. The comparative genomic analysis of both isolates of the PS1 clade and the fragment recruitment analysis provide first insights into the ecology of this group.
Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles. TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved. Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs. The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses. The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research.
Microorganisms are known to counteract salt stress through salt influx or by the accumulation of osmoprotectants (also called compatible solutes). Understanding the pathways that synthesize and/or breakdown these osmoprotectants is of interest to studies of crops halotolerance and to biotechnology applications that use microbes as cell factories for production of biomass or commercial chemicals. To facilitate the exploration of osmoprotectants, we have developed the first online resource, 'Dragon Explorer of Osmoprotection associated Pathways' (DEOP) that gathers and presents curated information about osmoprotectants, complemented by information about reactions and pathways that use or affect them. A combined total of 141 compounds were confirmed osmoprotectants, which were matched to 1883 reactions and 834 pathways. DEOP can also be used to map genes or microbial genomes to potential osmoprotection-associated pathways, and thus link genes and genomes to other associated osmoprotection information. Moreover, DEOP provides a text-mining utility to search deeper into the scientific literature for supporting evidence or for new associations of osmoprotectants to pathways, reactions, enzymes, genes or organisms. Two case studies are provided to demonstrate the usefulness of DEOP. The system can be accessed at. Database URL: http://www.cbrc.kaust.edu.sa/deop/
The central rift of the Red Sea contains 25 brine pools with different physicochemical conditions, dictating the diversity and abundance of the microbial community. Three of these pools, the Atlantis II, Kebrit and Discovery Deeps, are uniquely characterized by a high concentration of hydrocarbons. The brine-seawater interface, described as an anoxic-oxic (brine-seawater) boundary, is characterized by a high methane concentration, thus favoring aerobic methane oxidation. The current study analyzed the aerobic free-living methane-oxidizing bacterial communities that potentially contribute to methane oxidation at the brine-seawater interfaces of the three aforementioned brine pools, using metagenomic pyrosequencing, 16S rRNA pyrotags and pmoA library constructs. The sequencing of 16S rRNA pyrotags revealed that these interfaces are characterized by high microbial community diversity. Signatures of aerobic methane-oxidizing bacteria were detected in the Atlantis II Interface (ATII-I) and the Kebrit Deep Upper (KB-U) and Lower (KB-L) brine-seawater interfaces. Through phylogenetic analysis of pmoA, we further demonstrated that the ATII-I aerobic methanotroph community is highly diverse. We propose four ATII-I pmoA clusters. Most importantly, cluster 2 groups with marine methane seep methanotrophs, and cluster 4 represent a unique lineage of an uncultured bacterium with divergent alkane monooxygenases. Moreover, non-metric multidimensional scaling (NMDS) based on the ordination of putative enzymes involved in methane metabolism showed that the Kebrit interface layers were distinct from the ATII-I and DD-I brine-seawater interfaces.
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptional regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.
Metagenomics-based functional profiling analysis is an effective means of gaining deeper insight into the composition of marine microbial populations and developing a better understanding of the interplay between the functional genome content of microbial communities and abiotic factors. Here we present a comprehensive analysis of 24 datasets covering surface and depth-related environments at 11 sites around the world's oceans. The complete datasets comprises approximately 12 million sequences, totaling 5,358 Mb. Based on profiling patterns of Clusters of Orthologous Groups (COGs) of proteins, a core set of reference photic and aphotic depth-related COGs, and a collection of COGs that are associated with extreme oxygen limitation were defined. Their inferred functions were utilized as indicators to characterize the distribution of light- and oxygen-related biological activities in marine environments. The results reveal that, while light level in the water column is a major determinant of phenotypic adaptation in marine microorganisms, oxygen concentration in the aphotic zone has a significant impact only in extremely hypoxic waters. Phylogenetic profiling of the reference photic/aphotic gene sets revealed a greater variety of source organisms in the aphotic zone, although the majority of individual photic and aphotic depth-related COGs are assigned to the same taxa across the different sites. This increase in phylogenetic and functional diversity of the core aphotic related COGs most probably reflects selection for the utilization of a broad range of alternate energy sources in the absence of light.
Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.
A unique combination of physicochemical conditions prevails in the lower convective layer (LCL) of the brine pool at Atlantis II (ATII) Deep in the Red Sea. With a maximum depth of over 2000 m, the pool is characterized by acidic pH (5.3), high temperature (68oC) and salinity (26%), low light levels, anoxia, and high concentrations of heavy metals. We have established a metagenomic dataset derived from the microbial community in the LCL, and here we describe a gene for a novel mercuric reductase - a key component of the bacterial detoxification system for elemental mercury. The metagenome-derived gene and an ortholog from an uncultured soil bacterium were synthesized and expressed in E. coli. The properties of their products show that, in contrast to the soil enzyme, the ATII-LCL mercuric reductase is functional in high salt, stable at high temperature, resistant to high concentrations of Hg2+, and efficiently detoxifies Hg2+ in vivo. Interestingly, despite the marked functional differences between the orthologs, their amino acid sequences differ by less than 10%. Site-directed mutagenesis and kinetic analysis of the mutant enzymes, in conjunction with 3D modeling, have identified distinct structural features that contribute to extreme halophilicity, thermostability and high detoxification capacity respectively, suggesting that these were acquired independently during the evolution of this enzyme. Thus, our work provides fundamental structural insights into a novel protein that has undergone multiple biochemical and biophysical adaptations to promote the survival of microorganisms that reside in the extremely demanding environment of the ATII-LCL.
Cancer cells are often characterized by epigenetic changes, which include aberrant histone modifications. In particular, local or regional epigenetic silencing is a common mechanism in cancer for silencing expression of tumor suppressor genes. Though several tools have been created to enable detection of histone marks in ChIP-seq data from normal samples, it is unclear whether these tools can be efficiently applied to ChIP-seq data generated from cancer samples. Indeed, cancer genomes are often characterized by frequent copy number alterations: gains and losses of large regions of chromosomal material. Copy number alterations may create a substantial statistical bias in the evaluation of histone mark signal enrichment and result in underdetection of the signal in the regions of loss and overdetection of the signal in the regions of gain.
Marine microorganisms are considered to be an important source of bioactive molecules against various diseases and have great potential to increase the number of lead molecules in clinical trials. Progress in novel microbial culturing techniques as well as greater accessibility to unique oceanic habitats has placed the marine environment as a new frontier in the field of natural product drug discovery.
Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA. Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.
Information on Protein Interactions (Pls) is valuable for biomedical research, but often lies buried in the scientific literature and cannot be readily retrieved. While much progress has been made over the years in extracting Pls from the literature using computational methods, there is a lack of free, public, user-friendly tools for the discovery of Pls. We developed an online tool for the extraction of PI relationships from PubMed-abstracts, which we name PIMiner. Protein pairs and the words that describe their interactions are reported by PIMiner so that new interactions can be easily detected within text. The interaction likelihood levels are reported too. The option to extract only specific types of interactions is also provided. The PIMiner server can be accessed through a web browser or remotely through a clients command line. PIMiner can process 50,000 PubMed abstracts in approximately 7 min and thus appears suitable for large-scale processing of biological/biomedical literature.
Technological improvements have resulted in increased discovery of new microRNAs (miRNAs) and refinement and enrichment of existing miRNA families. miRNA families are important because they suggest a common sequence or structure configuration in sets of genes that hint to a shared function. Exploratory tools to enhance investigation of characteristics of miRNA families and the functions of family-specific miRNA genes are lacking. We have developed, miRNAVISA, a user-friendly web-based tool that allows customized interrogation and comparisons of miRNA families for hypotheses generation, and comparison of per-species chromosomal distribution of miRNA genes in different families. This study illustrates hypothesis generation using miRNAVISA in seven species. Our results unveil a subclass of miRNAs that may be regulated by genomic imprinting, and also suggest that some miRNA families may be species-specific, as well as chromosome- and/or strand-specific.
Long non-coding RNAs (lncRNAs) have been found to perform various functions in a wide variety of important biological processes. To make easier interpretation of lncRNA functionality and conduct deep mining on these transcribed sequences, it is convenient to classify lncRNAs into different groups. Here, we summarize classification methods of lncRNAs according to their four major features, namely, genomic location and context, effect exerted on DNA sequences, mechanism of functioning and their targeting mechanism. In combination with the presently available function annotations, we explore potential relationships between different classification categories, and generalize and compare biological features of different lncRNAs within each category. Finally, we present our view on potential further studies. We believe that the classifications of lncRNAs as indicated above are of fundamental importance for lncRNA studies, helpful for further investigation of specific lncRNAs, for formulation of new hypothesis based on different features of lncRNA and for exploration of the underlying lncRNA functional mechanisms.
Natural products are considered a rich source of new chemical structures that may lead to the therapeutic agents in all major disease areas. About 50% of the drugs introduced in the market in the last 20 years were natural products/derivatives or natural products mimics, which clearly shows the influence of natural products in drug discovery.
High salinity and temperature combined with presence of heavy metals and low oxygen renders deep-sea anoxic brines of the Red Sea as one of the most extreme environments on Earth. The ability to adapt and survive in these extreme environments makes inhabiting bacteria interesting candidates for the search of novel bioactive molecules.
To use proteomics to identify and characterize proteins in maternal serum from patients at high-risk for fetal trisomy 21, trisomy 18, and trisomy 13 on the basis of ultrasound and maternal serum triple tests.
Bacterial degradation of steroid compounds is of high ecological and biotechnological relevance. Pseudomonas sp. strain Chol1 is a model organism for studying the degradation of the steroid compound cholate. Its draft genome sequence is presented and reveals one gene cluster responsible for the metabolism of steroid compounds.
The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes.
In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually.
A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.
Initiation of transcription is essential for most of the cellular responses to environmental conditions and for cell and tissue specificity. This process is regulated through numerous proteins, their ligands and mutual interactions, as well as interactions with DNA. The key such regulatory proteins are transcription factors (TFs) and transcription co-factors (TcoFs). TcoFs are important since they modulate the transcription initiation process through interaction with TFs. In eukaryotes, transcription requires that TFs form different protein complexes with various nuclear proteins. To better understand transcription regulation, it is important to know the functional class of proteins interacting with TFs during transcription initiation. Such information is not fully available, since not all proteins that act as TFs or TcoFs are yet annotated as such, due to generally partial functional annotation of proteins. In this study we have developed a method to predict, using only sequence composition of the interacting proteins, the functional class of human TF binding partners to be (i) TF, (ii) TcoF, or (iii) other nuclear protein. This allows for complementing the annotation of the currently known pool of nuclear proteins. Since only the knowledge of protein sequences is required in addition to protein interaction, the method should be easily applicable to many species.
Sickle cell disease (SCD) is a fatal monogenic disorder with no effective cure and thus high rates of morbidity and sequelae. Efforts toward discovery of disease modifying drugs and curative strategies can be augmented by leveraging the plethora of information contained in available biomedical literature. To facilitate research in this direction we have developed a resource, Dragon Exploration System for Sickle Cell Disease (DESSCD) (http://cbrc.kaust.edu.sa/desscd/) that aims to promote the easy exploration of SCD-related data.
The demand for antimicrobial peptides (AMPs) is rising because of the increased occurrence of pathogens that are tolerant or resistant to conventional antibiotics. Since naturally occurring AMPs could serve as templates for the development of new anti-infectious agents to which pathogens are not resistant, a resource that contains relevant information on AMP is of great interest. To that extent, we developed the Dragon Antimicrobial Peptide Database (DAMPD, http://apps.sanbi.ac.za/dampd) that contains 1232 manually curated AMPs. DAMPD is an update and a replacement of the ANTIMIC database. In DAMPD an integrated interface allows in a simple fashion querying based on taxonomy, species, AMP family, citation, keywords and a combination of search terms and fields (Advanced Search). A number of tools such as Blast, ClustalW, HMMER, Hydrocalculator, SignalP, AMP predictor, as well as a number of other resources that provide additional information about the results are also provided and integrated into DAMPD to augment biological analysis of AMPs.
Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants.
Despite intense efforts to develop non-cytotoxic anticancer treatments, effective agents are still not available. Therefore, novel apoptosis-inducing drug leads that may be developed into effective targeted cancer therapies are of interest to the cancer research community. Targeted cancer therapies affect specific aberrant apoptotic pathways that characterize different cancer types and, for this reason, it is a more desirable type of therapy than chemotherapy or radiotherapy, as it is less harmful to normal cells. In this regard, marine sponge derived metabolites that induce apoptosis continue to be a promising source of new drug leads for cancer treatments. A PubMed query from 01/01/2005 to 31/01/2011 combined with hand-curation of the retrieved articles allowed for the identification of 39 recently confirmed apoptosis-inducing anticancer lead compounds isolated from the marine sponge that are selectively discussed in this review.
MicroRNAs (miRNAs) are small non-coding RNA molecules that repress the translation of messenger RNAs (mRNAs) or degrade mRNAs. These functions of miRNAs allow them to control key cellular processes such as development, differentiation and apoptosis, and they have also been implicated in several cancers such as leukaemia, lung, pancreatic and ovarian cancer (OC). Unfortunately, the specific machinery of miRNA regulation, involving transcription factors (TFs) and transcription co-factors (TcoFs), is not well understood. In the present study we focus on computationally deciphering the underlying network of miRNAs, their targets, and their control mechanisms that have an influence on OC development.
We present the draft genome of Haloplasma contractile, isolated from a deep-sea brine and representing a new order between Firmicutes and Mollicutes. Its complex morphology with contractile protrusions might be strongly influenced by the presence of seven MreB/Mbl homologs, which appears to be the highest copy number ever reported.
We present the draft genome of Halorhabdus tiamatea, the first member of the Archaea ever isolated from a deep-sea anoxic brine. Genome comparison with Halorhabdus utahensis revealed some striking differences, including a marked increase in genes associated with transmembrane transport and putative genes for a trehalose synthase and a lactate dehydrogenase.
We present the genome of Salinisphaera shabanensis, isolated from a brine-seawater interface and representing a new order within the Gammaproteobacteria. Its adaptations to physicochemical and nutrient availability fluctuations include six genes encoding heavy metal-translocating P-type ATPases and multiple genes involved in iron uptake, siderophore production, and poly-?-hydroxybutyrate synthesis.
Physical interactions between transcription factors (TFs) are necessary for forming regulatory protein complexes and thus play a crucial role in gene regulation. Currently, knowledge about the mechanisms of these TF interactions is incomplete and the number of known TF interactions is limited. Computational prediction of such interactions can help identify potential new TF interactions as well as contribute to better understanding the complex machinery involved in gene regulation.
The Dragon Exploration System for Toxicants and Fertility (DESTAF) is a publicly available resource which enables researchers to efficiently explore both known and potentially novel information and associations in the field of reproductive toxicology. To create DESTAF we used data from the literature (including over 10500 PubMed abstracts), several publicly available biomedical repositories, and specialized, curated dictionaries. DESTAF has an interface designed to facilitate rapid assessment of the key associations between relevant concepts, allowing for a more in-depth exploration of information based on different gene/protein-, enzyme/metabolite-, toxin/chemical-, disease- or anatomically centric perspectives. As a special feature, DESTAF allows for the creation and initial testing of potentially new association hypotheses that suggest links between biological entities identified through the database. DESTAF, along with a PDF manual, can be found at http://cbrc.kaust.edu.sa/destaf. It is free to academic and non-commercial users and will be updated quarterly.
It is essential to catalog characterized hepatitis C virus (HCV) protein-protein interaction (PPI) data and the associated plethora of vital functional information to augment the search for therapies, vaccines and diagnostic biomarkers. In furtherance of these goals, we have developed the hepatitis C virus protein interaction database (HCVpro) by integrating manually verified hepatitis C virus-virus and virus-human protein interactions curated from literature and databases. HCVpro is a comprehensive and integrated HCV-specific knowledgebase housing consolidated information on PPIs, functional genomics and molecular data obtained from a variety of virus databases (VirHostNet, VirusMint, HCVdb and euHCVdb), and from BIND and other relevant biology repositories. HCVpro is further populated with information on hepatocellular carcinoma (HCC) related genes that are mapped onto their encoded cellular proteins. Incorporated proteins have been mapped onto Gene Ontologies, canonical pathways, Online Mendelian Inheritance in Man (OMIM) and extensively cross-referenced to other essential annotations. The database is enriched with exhaustive reviews on structure and functions of HCV proteins, current state of drug and vaccine development and links to recommended journal articles. Users can query the database using specific protein identifiers (IDs), chromosomal locations of a gene, interaction detection methods, indexed PubMed sources as well as HCVpro, BIND and VirusMint IDs. The use of HCVpro is free and the resource can be accessed via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/.
The barnacle Balanus amphitrite is a globally distributed biofouler and a model species in intertidal ecology and larval settlement studies. However, a lack of genomic information has hindered the comprehensive elucidation of the molecular mechanisms coordinating its larval settlement. The pyrosequencing-based transcriptomic approach is thought to be useful to identify key molecular changes during larval settlement.
MicroRNAs (miRNAs) are short non-coding RNA molecules that act as post-transcriptional regulators and affect the regulation of protein-coding genes. Mostly transcribed by PolII, miRNA genes are regulated at the transcriptional level similarly to protein-coding genes. In this study we focus on human miRNAs. These miRNAs are involved in a variety of pathways and can affect many diseases. Our interest is on possible deregulation of the transcription initiation of the miRNA encoding genes, which is facilitated by variations in the genomic sequence of transcriptional control regions (promoters).
The initiation and regulation of transcription in eukaryotes is complex and involves a large number of transcription factors (TFs), which are known to bind to the regulatory regions of eukaryotic DNA. Apart from TF-DNA binding, protein-protein interaction involving TFs is an essential component of the machinery facilitating transcriptional regulation. Proteins that interact with TFs in the context of transcription regulation but do not bind to the DNA themselves, we consider transcription co-factors (TcoFs). The influence of TcoFs on transcriptional regulation and initiation, although indirect, has been shown to be significant with the functionality of TFs strongly influenced by the presence of TcoFs. While the role of TFs and their interaction with regulatory DNA regions has been well-studied, the association between TFs and TcoFs has so far been given less attention. Here, we present a resource that is comprised of a collection of human TFs and the TcoFs with which they interact. Other proteins that have a proven interaction with a TF, but are not considered TcoFs are also included. Our database contains 157 high-confidence TcoFs and additionally 379 hypothetical TcoFs. These have been identified and classified according to the type of available evidence for their involvement in transcriptional regulation and their presence in the cell nucleus. We have divided TcoFs into four groups, one of which contains high-confidence TcoFs and three others contain TcoFs which are hypothetical to different extents. We have developed the Dragon Database for Human Transcription Co-Factors and Transcription Factor Interacting Proteins (TcoF-DB). A web-based interface for this resource can be freely accessed at http://cbrc.kaust.edu.sa/tcof/ and http://apps.sanbi.ac.za/tcof/.
Prostate cancer (PC) is one of the most commonly diagnosed cancers in men. PC is relatively difficult to diagnose due to a lack of clear early symptoms. Extensive research of PC has led to the availability of a large amount of data on PC. Several hundred genes are implicated in different stages of PC, which may help in developing diagnostic methods or even cures. In spite of this accumulated information, effective diagnostics and treatments remain evasive. We have developed Dragon Database of Genes associated with Prostate Cancer (DDPC) as an integrated knowledgebase of genes experimentally verified as implicated in PC. DDPC is distinctive from other databases in that (i) it provides pre-compiled biomedical text-mining information on PC, which otherwise require tedious computational analyses, (ii) it integrates data on molecular interactions, pathways, gene ontologies, gene regulation at molecular level, predicted transcription factor binding sites on promoters of PC implicated genes and transcription factors that correspond to these binding sites and (iii) it contains DrugBank data on drugs associated with PC. We believe this resource will serve as a source of useful information for research on PC. DDPC is freely accessible for academic and non-profit users via http://apps.sanbi.ac.za/ddpc/ and http://cbrc.kaust.edu.sa/ddpc/.
The R2R3-type OsMyb4 transcription factor of rice has been shown to play a role in the regulation of osmotic adjustment in heterologous overexpression studies. However, the exact composition and organization of its underlying transcriptional network has not been established to be a robust tool for stress tolerance enhancement by regulon engineering. OsMyb4 network was dissected based on commonalities between the global chilling stress transcriptome and the transcriptome configured by OsMyb4 overexpression. OsMyb4 controls a hierarchical network comprised of several regulatory sub-clusters associated with cellular defense and rescue, metabolism and development. It regulates target genes either directly or indirectly through intermediary MYB, ERF, bZIP, NAC, ARF and CCAAT-HAP transcription factors. Regulatory sub-clusters have different combinations of MYB-like, GCC-box-like, ERD1-box-like, ABRE-like, G-box-like, as1/ocs/TGA-like, AuxRE-like, gibberellic acid response element (GARE)-like and JAre-like cis-elements. Cold-dependent network activity enhanced cellular antioxidant capacity through radical scavenging mechanisms and increased activities of phenylpropanoid and isoprenoid metabolic processes involving various abscisic acid (ABA), jasmonic acid (JA), salicylic acid (SA), ethylene and reactive oxygen species (ROS) responsive genes. OsMyb4 network is independent of drought response element binding protein/C-repeat binding factor (DREB/CBF) and its sub-regulons operate with possible co-regulators including nuclear factor-Y. Because of its upstream position in the network hierarchy, OsMyb4 functions quantitatively and pleiotrophically. Supra-optimal expression causes misexpression of alternative targets with costly trade-offs to panicle development.
The serotonin transporter promoter length polymorphism (5-hydroxytryptamine transporter length polymorphism; 5-HTTLPR) has long been implicated in autism and other psychiatric disorders. The use of selective serotonin reuptake inhibitors (SSRIs) has a positive effect on treating some symptoms of autism. The effects of these drugs vary in individuals because of the presence of the S or L allele of 5-HTTLPR. Studies performed on various autistic populations have found different allele frequencies for the L and S alleles. Allele frequencies and genotypes of the South African autistic populations (African, mixed, and Caucasian) were compared with matching South African ethnic control populations. The *S/*S genotype was found to be highly significantly associated with all the South African autistic ethnic populations. In the South African African population the *S/*S genotype was present in 7 (33%) of the autistic individuals but in none of the control subjects, yielding infinitely large odds of developing autism. The odds of developing autism with the *S/*S genotype compared to the *L/*L genotype increased 10.15-fold in the South African mixed group and 2.74-fold in the South African Caucasian population. The allele frequency of the South African autistic population was also compared with studies of other autistic populations around the world, and highly significant differences were found with the Japanese, Korean, and Indian population groups. The difference was not significant for the French, German, Israeli, Portuguese, and American groups. This is the first South African study of autistic individuals of different ethnic backgrounds that shows significant differences in allele and genotype frequencies of 5-HTTLPR. The results of this study open new avenues for investigating the role of transmission of the L and S alleles in families with autism in South Africa.
Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5 completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.
Multiple factors underlie susceptibility to essential hypertension, including a significant genetic and ethnic component, and environmental effects. Blood pressure response of hypertensive individuals to salt is heterogeneous, but salt sensitivity appears more prevalent in people of indigenous African origin. The underlying genetics of salt-sensitive hypertension, however, are poorly understood. In this study, computational methods including text- and data-mining have been used to select and prioritize candidate aetiological genes for salt-sensitive hypertension. Additionally, we have compared allele frequencies and copy number variation for single nucleotide polymorphisms in candidate genes between indigenous Southern African and Caucasian populations, with the aim of identifying candidate genes with significant variability between the population groups: identifying genetic variability between population groups can exploit ethnic differences in disease prevalence to aid with prioritisation of good candidate genes. Our top-ranking candidate genes include parathyroid hormone precursor (PTH) and type-1 angiotensin II receptor (AGTR1). We propose that the candidate genes identified in this study warrant further investigation as potential aetiological genes for salt-sensitive hypertension.
Non-model organisms represent the majority of life forms in our planet. However, the lack of genetic information hinders us to understand the unique biological phenomena in non-model organisms at the molecular level. In this study, we applied a tandem transcriptome and proteome profiling on a non-model marine fouling organism, Bugula neritina. Using a 454 pyrosequencing platform with the updated titanium reagents, we generated a total of 48M bp transcriptome data consisting of 131 450 high-quality reads. Of these, 122 650 reads (93%) were assembled to produce 6392 contigs with an average length of 538 bases and the remaining 8800 reads were singletons. Of the total 15 192 unigenes, 13 863 ORFs were predicated, of which 6917 were functionally annotated based on gene ontology and eukaryotic orthologous groups. Subsequent proteome analysis identified and quantified 882 proteins from B. neritina. These results would provide fundamental and important information for the subsequent studies of molecular mechanism in larval biology, development, antifouling research. Furthermore, we demonstrated, for the first time, the combined use of two high-throughput technologies as a powerful approach for accelerating the studies of non-model but otherwise important species.
The purpose of this study is to: i) develop a computational model of promoters of human histone-encoding genes (shortly histone genes), an important class of genes that participate in various critical cellular processes, ii) use the model so developed to identify regions across the human genome that have similar structure as promoters of histone genes; such regions could represent potential genomic regulatory regions, e.g. promoters, of genes that may be coregulated with histone genes, and iii/ identify in this way genes that have high likelihood of being coregulated with the histone genes.
Even though hepatitis C virus (HCV) cDNA was characterized about 20 years ago, there is insufficient understanding of the molecular etiology underlying HCV infections. Current global rates of infection and its increasingly chronic character are causes of concern for health policy experts. Vast amount of data accumulated from biochemical, genomic, proteomic, and other biological analyses allows for novel insights into the HCV viral structure, life cycle and functions of its proteins. Biomedical text-mining is a useful approach for analyzing the increasing corpus of published scientific literature on HCV. We report here the first comprehensive HCV customized biomedical text-mining based online web resource, dragon exploratory system on hepatitis C virus (DESHCV), a biomedical text-mining and relationship exploring knowledge base was developed by exploring literature on HCV. The pre-compiled dictionaries existing in the dragon exploratory system (DES) were enriched with biomedical concepts pertaining to HCV proteins, their name variants and symbols to make it suitable for targeted information exploration and knowledge extraction as focused on HCV. A list of 32,895 abstracts retrieved via PubMed database using specific keywords searches related to HCV were processed based on concept recognition of terms from several dictionaries. The web query interface enables retrieval of information using specified concepts, keywords and phrases, generating text-derived association networks and hypotheses, which could be tested to identify potentially novel relationship between different concepts. Such an approach could also augment efforts in the search for diagnostic or even therapeutic targets. DESHCV thus represents online literature-based discovery resource freely accessible for academic and non-profit users via http://apps.sanbi.ac.za/DESHCV/ and its mirror site http://cbrc.kaust.edu.sa/deshcv/.
Natural products have played a vital role in drug discovery and development process for cancer. Diospyrin, a plant based bisnaphthoquinonoid, has been used as a lead molecule in an effort to develop anti-cancer drugs. Several derivatives/analogues have been synthesized and screened for their pro-apoptotic/anti-cancer activities so far. Our review is focused on the pro-apoptotic/anti-cancer activities of diospyrin, its derivatives/analogues and the different mechanisms potentially involved in the bioactivity of these compounds. Particular focus has been placed on the different mechanisms (both chemical and molecular) thought to underlie the bioactivity of these compounds. A brief bioinformatics analysis at the end of the article provides novel insights into the new potential mechanisms and pathways by which these compounds might exert their effects and lead to a better realization of the full therapeutic potential of these compounds as anti-cancer drugs.
An ever increasing amount of transcriptomic data and analysis tools provide novel insight into complex responses of biological systems. Given these resources we have undertaken to review aspects of transcriptional regulation in response to the plant hormone gibberellic acid (GA) and its second messenger guanosine 3,5-cyclic monophosphate (cGMP) in Arabidopsis thaliana, both wild type and selected mutants. Evidence suggests enrichment of GA-responsive (GARE) elements in promoters of genes that are transcriptionally upregulated in response to cGMP but downregulated in a GA insensitive mutant (ga1-3). In contrast, in the genes upregulated in the mutant, no enrichment in the GARE is observed suggesting that GARE motifs are diagnostic for GA-induced and cGMP-dependent transcriptional upregulation. Further, we review how expression studies of GA-dependent transcription factors and transcriptional networks based on common promoter signatures derived from ab initio analyses can contribute to our understanding of plant responses at the systems level.
We examined whether Gonadotrophin-releasing hormone (GnRH) analogues [leuprolide acetate (LA) and ganirelix acetate (GA)] modulate gene expression in Ishikawa cells used as surrogate for human endometrial epithelial cells in vitro. The specific aims were: (i) to study the modulatory effect of GnRH analogues by RT-PCR [in the absence and presence of E(2) and P4, and cyclic adenosine monophosphate (cAMP)] on mRNA expression of genes modulated during the window of implantation in GnRH analogues/rFSH-treated assisted reproductive technology cycles including OPTINEURIN (OPTN), CHROMATIN MODIFYING PROTEIN (CHMP1A), PROSAPOSIN (PSAP), IGFBP-5 and SORTING NEXIN 7 (SNX7), and (ii) to analyze the 5-flanking regions of such genes for the presence of putative steroid-response elements [estrogen-response elements (EREs) and P4-response element (PREs)]. Ishikawa cells were cytokeratin+/vimentin- and expressed ERalpha, ERbeta, PR and GnRH-R proteins. At 6 and 24 h, neither LA nor GA alone had an effect on gene expression. GnRH analogues alone or following E(2) and/or P4 co-incubation for 24 h also had no effect on gene expression, but P4 significantly increased expression of CHMP1A. E(2) + P4 treatment for 4 days, alone or followed by GA, had no effect, but E(2) + P4 treatment followed by LA significantly decreased IGFBP-5 expression. The addition of 8-Br cAMP did not modify gene expression, with the exception of IGFBP-5 that was significantly increased. The GnRH analogues did not modify intracellular cAMP levels. We identified conserved EREs for OPN, CHMP1A, SNX7 and PSAP and PREs for SNX7. We conclude that GnRH analogues appear not to have major direct effects on gene expression of human endometrial epithelial cells in vitro.
Ovarian epithelial cancer (OEC) usually presents in the later stages of the disease. Factors, especially those associated with cell-cycle genes, affecting the genesis and tumour progression for ovarian cancer are largely unknown. We hypothesized that over-expressed transcription factors (TFs), as well as those that are driving the expression of the OEC over-expressed genes, could be the key for OEC genesis and potentially useful tissue and serum markers for malignancy associated with OEC.
Combinatorial interactions among transcription factors are critical to directing tissue-specific gene expression. To build a global atlas of these combinations, we have screened for physical interactions among the majority of human and mouse DNA-binding transcription factors (TFs). The complete networks contain 762 human and 877 mouse interactions. Analysis of the networks reveals that highly connected TFs are broadly expressed across tissues, and that roughly half of the measured interactions are conserved between mouse and human. The data highlight the importance of TF combinations for determining cell fate, and they lead to the identification of a SMAD3/FLI1 complex expressed during development of immunity. The availability of large TF combinatorial networks in both human and mouse will provide many opportunities to study gene regulation, tissue differentiation, and mammalian evolution.
The transcriptional regulatory network involved in low temperature response leading to acclimation has been established in Arabidopsis. In japonica rice, which can only withstand transient exposure to milder cold stress (10 degrees C), an oxidative-mediated network has been proposed to play a key role in configuring early responses and short-term defenses. The components, hierarchical organization and physiological consequences of this network were further dissected by a systems-level approach.
Macrophages are immune cells involved in various biological processes including host defence, homeostasis, differentiation, and organogenesis. Disruption of macrophage biology has been linked to increased pathogen infection, inflammation and malignant diseases. Differential gene expression observed in monocytic differentiation is primarily regulated by interacting transcription factors (TFs). Current research suggests that microRNAs (miRNAs) degrade and repress translation of mRNA, but also may target genes involved in differentiation. We focus on getting insights into the transcriptional circuitry regulating miRNA genes expressed during monocytic differentiation.
Esophageal cancer ranks eighth in order of cancer occurrence. Its lethality primarily stems from inability to detect the disease during the early organ-confined stage and the lack of effective therapies for advanced-stage disease. Moreover, the understanding of molecular processes involved in esophageal cancer is not complete, hampering the development of efficient diagnostics and therapy. Efforts made by the scientific community to improve the survival rate of esophageal cancer have resulted in a wealth of scattered information that is difficult to find and not easily amendable to data-mining. To reduce this gap and to complement available cancer related bioinformatic resources, we have developed a comprehensive database (Dragon Database of Genes Implicated in Esophageal Cancer) with esophageal cancer related information, as an integrated knowledge database aimed at representing a gateway to esophageal cancer related data.
Environmental exposures filtered through the genetic make-up of each individual alter the transcriptional repertoire in organs central to metabolic homeostasis, thereby affecting arterial lipid accumulation, inflammation, and the development of coronary artery disease (CAD). The primary aim of the Stockholm Atherosclerosis Gene Expression (STAGE) study was to determine whether there are functionally associated genes (rather than individual genes) important for CAD development. To this end, two-way clustering was used on 278 transcriptional profiles of liver, skeletal muscle, and visceral fat (n = 66/tissue) and atherosclerotic and unaffected arterial wall (n = 40/tissue) isolated from CAD patients during coronary artery bypass surgery. The first step, across all mRNA signals (n = 15,042/12,621 RefSeqs/genes) in each tissue, resulted in a total of 60 tissue clusters (n = 3958 genes). In the second step (performed within tissue clusters), one atherosclerotic lesion (n = 49/48) and one visceral fat (n = 59) cluster segregated the patients into two groups that differed in the extent of coronary stenosis (P = 0.008 and P = 0.00015). The associations of these clusters with coronary atherosclerosis were validated by analyzing carotid atherosclerosis expression profiles. Remarkably, in one cluster (n = 55/54) relating to carotid stenosis (P = 0.04), 27 genes in the two clusters relating to coronary stenosis were confirmed (n = 16/17, P<10(-27 and-30)). Genes in the transendothelial migration of leukocytes (TEML) pathway were overrepresented in all three clusters, referred to as the atherosclerosis module (A-module). In a second validation step, using three independent cohorts, the A-module was found to be genetically enriched with CAD risk by 1.8-fold (P<0.004). The transcription co-factor LIM domain binding 2 (LDB2) was identified as a potential high-hierarchy regulator of the A-module, a notion supported by subnetwork analysis, by cellular and lesion expression of LDB2, and by the expression of 13 TEML genes in Ldb2-deficient arterial wall. Thus, the A-module appears to be important for atherosclerosis development and, together with LDB2, merits further attention in CAD research.
Using deep sequencing (deepCAGE), the FANTOM4 study measured the genome-wide dynamics of transcription-start-site usage in the human monocytic cell line THP-1 throughout a time course of growth arrest and differentiation. Modeling the expression dynamics in terms of predicted cis-regulatory sites, we identified the key transcription regulators, their time-dependent activities and target genes. Systematic siRNA knockdown of 52 transcription factors confirmed the roles of individual factors in the regulatory network. Our results indicate that cellular states are constrained by complex networks involving both positive and negative regulatory interactions among substantial numbers of transcription factors and that no single transcription factor is both necessary and sufficient to drive the differentiation process.
Ovarian cancer (OC) is becoming the most common gynecological cancer in developed countries and the most lethal gynecological malignancy. It is also the fifth leading cause of all cancer-related deaths in women. The identification of diagnostic biomarkers and development of early detection techniques for OC largely depends on the understanding of the complex functionality and regulation of genes involved in this disease. Unfortunately, information about these OC genes is scattered throughout the literature and various databases making extraction of relevant functional information a complex task. To reduce this problem, we have developed a database dedicated to OC genes to support exploration of functional characterization and analysis of biological processes related to OC. The database contains general information about OC genes, enriched with the results of transcription regulation sequence analysis and with relevant text mining to provide insights into associations of the OC genes with other genes, metabolites, pathways and nuclear proteins. Overall, it enables exploration of relevant information for OC genes from multiple angles, making it a unique resource for OC and will serve as a useful complement to the existing public resources for those interested in OC genetics. Access is free for academic and non-profit users and database can be accessed at http://apps.sanbi.ac.za/ddoc/.
Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.
In higher eukaryotes, the identification of translation initiation sites (TISs) has been focused on finding these signals in cDNA or mRNA sequences. Using Arabidopsis thaliana (A.t.) information, we developed a prediction tool for signals within genomic sequences of plants that correspond to TISs. Our tool requires only genome sequence, not expressed sequences. Its sensitivity/specificity is for A.t. (90.75%/92.2%), for Vitis vinifera (66.8%/94.4%) and for Populus trichocarpa (81.6%/94.4%), which suggests that our tool can be used in annotation of different plant genomes. We provide a list of features used in our model. Further study of these features may improve our understanding of mechanisms of the translation initiation.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.