The way that mutations affect the survival of a cell depends greatly on the location of the genetic change. In regions of the genome that do not code for genes or regulatory regions, mutations may have little effect, such that the cell can often continue as usual.
The overall effect is that non-coding sequences are free to change fairly rapidly in evolutionary terms, meaning these regions of a genome may be almost unrecognizable even in two closely related species.
However, in coding sequences, mutations are not so freely adopted. While in very rare cases they may be beneficial, such as a mutation in a gene for an enzyme creating a better binding affinity for the substrate; the majority will be detrimental.
Let’s look at the 16s rRNA gene, for example. It encodes a structural RNA forming part of the ribosome. Some of the regions of this RNA are critical to ribosome function, and changes in these segments are exceptionally rare. These highly conserved regions change so slowly that they can be used to examine sequence homology across phyla, kingdoms, and even all living species – making them a valuable tool for studying relationships between even distantly related organisms.
However, there are still sections of the 16s rRNA sequence less critical to function, which may evolve slightly faster. These “variable regions” can be useful for elucidating relationships between more closely related species – such as genera or even strains of bacteria.
Overall, this leads to the phenomenon that different genome regions may evolve at vastly different rates, even within regions encoding a single gene.
The genomes of eukaryotes are punctuated by long stretches of sequence which do not code for proteins or RNAs. Although some of these regions do contain crucial regulatory sequences, the vast majority of this DNA serves no known function. Typically, these regions of the genome are the ones in which the fastest change, in evolutionary terms, is observed, because there is typically little to no selection pressure acting on these regions to preserve their sequences.
In contrast, regions which code for a protein might experience high selection pressure, because any changes in their sequence are likely to result in a protein which is less capable of performing its function optimally. However, occasionally a mutation in one of these regions will result in a beneficial outcome that contributes to the overall fitness of the organism, and such mutations often persist and may even become fixed in populations. When comparing the frequency of these mutation events to the relatively regular changes seen in non-coding sequences, this is exceedingly rare, and so in general coding regions are considered as evolving slowly.
It is also true that there is a measurable amount of variation in the levels of sequence conservation within coding sequences, and this is seen across all organisms. For instance, take the example of a receptor protein. Such proteins typically have different regions that may perform functions such as ligand binding, or intracellular signaling, or membrane integration. In this case, a mutation in the region that is involved in ligand binding may produce a protein that is less efficient at binding the ligand. Therefore, selection pressure would likely be high on the particular nucleotides coding for this part of the protein. However, in the section of the protein which spans the membrane, there may be less effect seen if an amino acid substitution occurs, and therefore lower levels of selection pressure. Under these conditions, we might see that two regions of the same protein-coding gene might have different rates of evolution.
Sequencing Genes or Genomic Regions to Build Phylogenies
This variation in the speed of genome evolution over different regions can be studied to answer questions about evolutionary relationships. Genes and gene regions can be selected and sequenced over groups of individuals to answer questions as narrow as “are these populations potentially different species?” or as broad as “how do these phyla place into the tree of life?”. For the former, selecting a gene that has a relatively lightly conserved region would help to identify population-level differences. Conversely, to answer questions over groups as diverse as phyla, a highly conserved gene region may provide enough homology to produce a phylogeny of such groups. Commonly used regions for molecular phylogenetic analyses such as these include ribosomal rRNA genes (such as 16s rRNA, 18s rRNA, or 28s rRNA), or genomic regions known as ITS (Internal Transcribed Spacers, I or II) which sit between the ribosomal rRNA subunit genes.