13.5: Multi-species Conserved Sequences
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved DNA sequences among two or more species are collectively known as “multi-species conserved sequences.” These sequences are less likely to mutate and remain mostly unchanged over a long time. For example, an exon and three intronic regions of the gene cystic fibrosis are highly conserved among humans and higher apes such as chimpanzees and orangutans.
Most multi-conserved sequences do not code for proteins; instead, they are transcribed into long non-coding RNAs or not transcribed at all. The RNA elements synthesized may be involved in epigenetic regulation of gene expression, pre-mRNA processing, alternate splicing, maturation, and stability. The non-transcribed sequences are called conserved non-genic sequences (CNG). Human-mouse genome comparison revealed approximately 327,000 CNGs in the human genome. Among them, 65% are intergenic regions, and 35% are in introns.
Ultraconserved sequences in the human genome
Ultraconserved sequences are DNA sequences that have undergone little to no change for millions of years. Many of the ultraconserved sequences are concentrated in clusters near transcription factors and developmental genes loci.
Besides, there are some universally conserved genes. These genes are so fundamental to life that they are conserved right from bacteria to mammals. Examples include RNA polymerase, Helicases, GTP-binding elongation factors, and ABC transporters.