5.3: Organization of Genes
The genomes of eukaryotes can be structured in several functional categories. A strand of DNA is comprised of genes and intergenic regions. Genes themselves consist of protein-coding exons and non-coding introns. Introns are excised once the sequence is transcribed to mRNA, leaving only exons to code for proteins.
Eukaryotic Genes Are Separated by Intergenic Regions
In eukaryotic genomes, genes are separated by large stretches of DNA that do not code for proteins. However, these intergenic regions carry important elements that regulate gene activity, for instance, the promoter where transcription starts, and enhancers and silencers that fine-tune gene expression. Sometimes these binding sites can be located far away from the associated gene.
Protein-Coding Exons Are Interspersed by Introns
As researchers investigated the process of gene transcription in eukaryotes, they realized that the final mRNA that codes for a protein is shorter than the DNA it is derived from. This difference in length is due to a process called splicing. Once pre-mRNA has been transcribed from DNA in the nucleus, splicing immediately removes introns and joins exons together. The result is protein-coding mRNA that moves to the cytoplasm and is translated into protein.
The Number of Introns per Gene Can Vary Significantly
One of the largest human genes, DMD, is over two million base pairs long. This gene encodes the muscle protein dystrophin. Mutations in DMD cause muscular dystrophy, a disorder characterized by progressive muscle deterioration. This gene contains 79 exons and 103 introns. On the other end of the spectrum lies the histone H1A gene—it is one of the smallest genes in the human genome at only 781 base pairs long with one exon and no introns.
Introns Carry Important Functions
Are introns garbage DNA that needs to be removed? Interestingly, introns can carry elements that are important for gene regulation. Furthermore, the cutting of the initial transcript and re-joining of exons allows DNA sequences to be shuffled. This process of mixing and matching exons is known as alternative splicing. It makes it possible to produce several protein variants from a single coding sequence.
The Vast Majority of the Human Genome Does Not Code for Proteins
Did you know that 99% of your genome does not code for proteins? In the early days of genome research, biologists coined the catchy term ‘junk DNA’ for these seemingly non-functional sequences. Meanwhile, we have learned that a large portion of non-coding DNA does carry important functions. At least 9% of the human genome is involved in gene regulation—that is nine times more than protein-coding sequences.