Nucleotide sequences that are conserved between many different species are called multi-species conserved sequences.
For example, there are thousands of DNA segments between humans, rats, and mice that have remained unchanged since the divergence of these three species from their common mammalian ancestor.
While a small proportion of these conserved sequences encode for RNA or proteins, the majority are non-coding DNA sequences, also called conserved non-genic sequences or CNGs. For example, among the conserved sequences of human and mouse genome, only one-third of them produce a functional transcript while remaining two-thirds are CNGs.
CNGs are approximately 50 to 200 nucleotides in length and can be found in intergenic regions in the genome, introns within a gene, or untranslated regions of the RNA transcript.
Some of these sequences are extremely conserved across lineages and are therefore called ultra-conserved sequences. There are more than 5000 ultraconserved sequences between human, rat, and mice genomes, each around 100 bases in length.
Conservation of sequences over millions of years across lineages indicates that multi-species conserved sequences must be critical for survival. However, since most conserved sequences do not code for a protein, their function still remains a mystery.
Such conserved sequences are speculated to have the following few functions. First, these sequences may act as enhancers or silencers that bind to the transcription machinery and control the level of gene expression.
Second, the conserved sequences may get transcribed into long non-coding RNAs that regulate pre-mRNA maturation and stability.
Third, these sequences might allow functional interaction between chromosomes and help define chromosome territories with distinct gene expression patterns within the nucleus.
Rare mutations in multi-species conserved sequences usually represent a critical step in the evolution of new species.
For example, the primate genomes encompass highly conserved specific sequences near the neural development genes. Around 6 million years ago, these conserved sequences underwent nucleotide changes at exceptional rates leading to the evolution of the human lineage.
These regions called Human Accelerated Regions or HARs are involved in critical stages of brain development and improved cognitive functions.
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved DNA sequences among two or more species are collectively known as “multi-species conserved sequences.” These sequences are less likely to mutate and remain mostly unchanged over a long time. For example, an exon and three intronic regions of the gene cystic fibrosis are highly conserved among humans and higher apes such as chimpanzees and orangutans.
Most multi-conserved sequences do not code for proteins; instead, they are transcribed into long non-coding RNAs or not transcribed at all. The RNA elements synthesized may be involved in epigenetic regulation of gene expression, pre-mRNA processing, alternate splicing, maturation, and stability. The non-transcribed sequences are called conserved non-genic sequences (CNG). Human-mouse genome comparison revealed approximately 327,000 CNGs in the human genome. Among them, 65% are intergenic regions, and 35% are in introns.
Ultraconserved sequences are DNA sequences that have undergone little to no change for millions of years. Many of the ultraconserved sequences are concentrated in clusters near transcription factors and developmental genes loci.
Besides, there are some universally conserved genes. These genes are so fundamental to life that they are conserved right from bacteria to mammals. Examples include RNA polymerase, Helicases, GTP-binding elongation factors, and ABC transporters.