Back to chapter

21.12:

From DNA to Protein

JoVE Core
Chemistry
A subscription to JoVE is required to view this content.  Sign in or start your free trial.
JoVE Core Chemistry
From DNA to Protein

Languages

Share

DNA contains genes, sequences of nucleotides, some of which are instructions that code for the series of amino acids in a protein. The flow of genetic information from DNA to RNA to protein is a process known as The Central Dogma. The first step of this process is transcription, where an RNA polymerase enzyme synthesizes an RNA based copy, or transcript of the gene. The DNA is used as a template where each new RNA base added to the transcript is complementary to the original strand of DNA. Some transcripts, called messenger or mRNA, code for proteins, while non-coding ones participate in other cellular processes. For example, ribosomal rRNA and transfer tRNA participate in protein synthesis. The next step is translation, where mRNA is decoded to synthesize a chain of amino acids. A set of instructions known as the genetic code is used to read the mRNA. Most organisms use this same universal code composed of three nucleotide groups called codons that translate to specific amino acids. There are 64 different nucleotide triplets but only 20 standard amino acids in proteins making the code degenerate, that is, multiple codon sets can give the same instruction. Sixty-one sets code for amino acids, and three signal the stop of translation. Translation occurs at the ribosome, a large complex of rRNAs and proteins, with the help of tRNA.  tRNA has a three hairpin loop structure. One loop contains a sequence called the anticodon, which has complementary bases to the codon. An amino acid corresponding to this sequence is attached at the end of the tRNA, which transports it into the ribosome. Proteins called initiation factors bring together the small ribosome unit, an initiator tRNA and the mRNA. After the assembly of the complex, the ribosome glides along the mRNA in search of the translation start site.   Here, the initiator tRNA anticodon binds to the complementary codon; the large ribosome unit binds to the assembly,  and translation is initiated. When the next tRNA comes in, the amino acid from the initiator is detached and transferred to the neighboring amino acid resulting in a growing polypeptide chain.    The addition of amino acids continues until a stop codon is detected in the mRNA. The ribosome then releases the chain so that it can fold into a functional protein.

21.12:

From DNA to Protein

The flow of genetic information in cells from DNA to mRNA to protein is described by the central dogma, which states that genes specify the sequence of mRNAs, which in turn specify the sequence of amino acids making up all proteins. The decoding of one molecule to another is performed by specific proteins and RNAs. Because the information stored in DNA is so central to cellular function, it makes intuitive sense that the cell would make mRNA copies of this information for protein synthesis while keeping the DNA itself intact and protected. The copying of DNA to RNA is relatively straightforward, with one nucleotide being added to the mRNA strand for every nucleotide read in the DNA strand. The translation to protein is a bit more complex because three mRNA nucleotides correspond to one amino acid in the polypeptide sequence. However, the translation to protein is still systematic and collinear, such that nucleotides 1 to 3 correspond to amino acid 1, nucleotides 4 to 6 correspond to amino acid 2, and so on.

The Genetic Code Is Degenerate and Universal

Each amino acid is defined by a three-nucleotide sequence called the triplet codon. Given the different numbers of “letters” in the mRNA and protein “alphabets,” scientists theorized that single amino acids must be represented by combinations of nucleotides. Nucleotide doublets would not be sufficient to specify every amino acid because there are only 16 possible two-nucleotide combinations (42). In contrast, there are 64 possible nucleotide triplets (43), which is far more than the number of amino acids. Scientists theorized that amino acids were encoded by nucleotide triplets and that the genetic code was “degenerate.” In other words, a given amino acid could be encoded by more than one nucleotide triplet. This was later confirmed experimentally: Francis Crick and Sydney Brenner used the chemical mutagen proflavin to insert one, two, or three nucleotides into the gene of a virus. When one or two nucleotides were inserted, the normal proteins were not produced. When three nucleotides were inserted, the protein was synthesized and functional. This demonstrated that the amino acids must be specified by groups of three nucleotides. These nucleotide triplets are called codons. The insertion of one or two nucleotides completely changed the triplet reading frame, thereby altering the message for every subsequent amino acid. Though insertion of three nucleotides caused an extra amino acid to be inserted during translation, the integrity of the rest of the protein was maintained.

In addition to codons that instruct the addition of a specific amino acid to a polypeptide chain, three of the 64 codons terminate protein synthesis and release the polypeptide from the translation machinery. These triplets are called nonsense codons or stop codons. Another codon, AUG, also has a special function. In addition to specifying the amino acid methionine, it also serves as the start codon to initiate translation. The reading frame for translation is set by the AUG start codon near the 5' end of the mRNA. Following the start codon, the mRNA is read in groups of three until a stop codon is encountered.

The specification of a single amino acid by multiple similar codons is called "degeneracy." Degeneracy is believed to be a cellular mechanism to reduce the negative impact of random mutations. Codons that specify the same amino acid typically only differ by one nucleotide. In addition, amino acids with chemically similar side chains are encoded by similar codons. For example, aspartate (Asp) and glutamate (Glu), which occupy the GA* block, are both negatively charged. This nuance of the genetic code ensures that a single-nucleotide substitution mutation might specify the same amino acid but have no effect or specify a similar amino acid, preventing the protein from being rendered completely nonfunctional.

The genetic code is nearly universal. With a few minor exceptions, virtually all species use the same genetic code for protein synthesis. Conservation of codons means that a purified mRNA encoding the globin protein in horses could be transferred to a tulip cell, and the tulip would synthesize horse globin. That there is only one genetic code is powerful evidence that all of life on Earth shares a common origin, especially considering that there are about 1084 possible combinations of 20 amino acids and 64 triplet codons.

This text has been adapted from Openstax, Biology 2e, Chapter 15.1: The Genetic Code.