5.4: Protein Families
Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism. Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members. If these new proteins contain similar amino acids in key locations, protein domains, and possibly the overall three-dimensional structure, can remain similar. Proteins within a family can have as low as 30% amino acid sequence homology but still perform related functions.
Protein superfamilies are larger groups of proteins that have evolved from a more distant ancestor. They generally have lower sequence homology as compared to a protein family but still have significant structural features in common. Each superfamily can contain several protein families with more closely related structures and functions. Some larger families are even further divided into sub-families. The exact distinction as to whether proteins belong to a superfamily, family, or subfamily can vary between classification systems and is still changing as the amount of protein sequence and structural data continues to grow.
The immunoglobulin protein superfamily (IgSF) is one of the largest protein superfamilies; over 700 superfamily members are found in the human genome. All members of the superfamily contain one or more immunoglobulin (Ig) domains. This domain has a unique three-dimensional structure composed of a sandwich of two anti-parallel beta-sheets, and most are involved in cell adhesion or ligand binding. The IgSF contains many families including antigen receptors, cell adhesion molecules (CAMs), cytoskeletal proteins, and several growth-factor and cytokine receptor groups. Several of the larger families are further divided into subfamilies. The antigen receptor family can be further divided into subfamilies: the antibody or immunoglobulin family and the T- cell receptor family; the CAMs can be divided into the NCAM, ICAM, and CD2 related protein families.
Protein family classifications allow scientists to understand functional and evolutionary relationships between proteins. Several online resources can be used to search for known protein families or classify newly discovered proteins. Pfam is one of several online databases where a scientist can search for known proteins and their family members. A researcher can also enter the amino acid sequence of a newly discovered protein to see if it might belong to a known family of proteins due to sequence similarity. This can provide a testable hypothesis as to the possible role of the novel protein as family members often have similar structures and functions.