This content is Free Access.
Humans have been attempting to properly classify living things since Aristotle made the first attempt during the 4th century BC. Aristotle’s system was improved upon during the Renaissance and then, subsequently, by Carolus Linnaeus in the mid 1700’s. These more formal classification and organization systems grouped species by their physical similarity to one another. For example, all vertebrates have a backbone, but invertebrates do not. Traits like the backbone are called synapomorphies, which are traits that are shared by a group of organisms, presumably because they were derived from a common ancestor. As we will explore, this method has been shown to have limitations and has more recently been amended to include genetic analysis. Still, scientists construct trees called dendrograms to create a visual representation of how species are related to one another and share common ancestors. These dendrograms can aid in our understanding of the evolutionary processes that drive these relationships. Genetic comparisons have added an important tool guiding the analysis of evolutionary relationships.
A type of dendrogram, called a cladogram, depicts the hypothetical genealogical relationships between species with the tips (or leaves) of the tree representing a species and the branches showing how species are related to each other. A slightly more complicated type of tree, called a phylogram, differs from a cladogram in that the branches leading to the species are of different lengths. The length of a branch in this type of tree represents the degree of change between species: the longer the branch, the more time since the species have diverged from a common ancestor. In both types of tree, the common ancestor of a group of species is indicated by a node, which is the point where a series of branches meet. Species that are more closely related to each other (most recently shared a common ancestor) will be located closest to the node. The two species that share a node are called a sister group1.
Historically, cladograms were constructed by comparing the morphology (physical structure) of organisms. This method is still practiced but the techniques have been modernized to include comparison of DNA (deoxyribonucleic acid) sequences between species. Using DNA for building trees has several advantages over relying solely on morphology, including being able to calculate an estimate of how long ago different species shared a common ancestor1. However, using DNA is not always feasible, especially when trees include extinct organisms. DNA is best found in soft tissues, which are not preserved during the fossilization process, and therefore it is uncommon for a DNA sample of an extinct species to be available.
DNA is passed on from parents to their offspring in hereditary units called genes. The nucleotide (A, G, C, and T) sequence of genes found in different species are frequently quite similar, likely due to their having come from a common ancestor. This fact allows researchers to align sequences from different species with one another to build the trees described above. Species with more similarity between their nucleotide sequences will be placed next to each other in a tree, and species with less sequence similarity will be placed further apart from each other.
Bioinformatics are the tools used by biologists to analyze large datasets using a combination of computer science, mathematical modeling, and statistics. One such tool is called BLAST (Basic Local Alignment Search Tool), which can be used to quickly search the entire genome of any species that is available in the NCBI (National Center for Biotechnology Information) database2. The NCBI database combines several different databases that hold different types of DNA sequence information. The process of a BLAST search includes complex computer algorithms, but basically, BLAST aligns sequences of each nucleotide base from a submitted DNA sequence (known as the query sequence) with sequences in the data base that most closely match it. The DNA sequences that are found will be listed in order of similarity to the sequence in question, and will therefore be from species closely related to the species containing the query gene. This comparison may or may not depict the actual evolutionary relationship between species because genes evolve at different rates. Additionally, genomes sometimes contain more than one instance of a similar sequence.
Comparison of DNA sequences of genes is valuable beyond consideration of evolutionary relationships. Frequently, genes are identified in model organisms, such as the fruit fly, Drosophila melanogaster, or the mouse3. Integral to studying a gene, the function of its product is commonly identified and analyzed. If a researcher is interested in studying that function in a different organism (humans for example), BLAST or other bioinformatic tools can be used to find candidate genes based on their similarities to the genes of known function from model organisms.
Human genes can also be used as the starting point to find homologs in model organisms. In fact, human disease research depends heavily on this. Once a human gene of interest is identified, mice can be genetically manipulated to have the homologous gene disrupted, or “knocked out,” creating a model of the human disease that can be studied in order to understand and treat the disease. There are many of these mouse strains currently available. For example, there is a mouse model for human Cystic Fibrosis (CF) called the Cftr knockout mouse and another modeling atherosclerosis, called the Apoe knockout3.