We developed computational de novo protein design methods capable of tackling several important areas of protein design. To disseminate these methods we present Protein WISDOM, an online tool for protein design (http://www.proteinwisdom.org). Starting from a structural template, design of monomeric proteins for increased stability and complexes for increased binding affinity can be performed.
Le but de la conception des protéines de novo est de trouver les séquences d'acides aminés qui se replier dans une structure trois dimensions souhaitée avec améliorations des propriétés spécifiques, telles que l'affinité de liaison, un agoniste ou un comportement antagoniste, ou la stabilité, par rapport à la séquence native. la conception des protéines se trouve au centre de la conception de médicaments des progrès actuels et de la découverte. Non seulement la conception des protéines ne fournit prédictions pour cibles des médicaments potentiellement utiles, mais il améliore aussi notre compréhension du processus de repliement des protéines et des interactions protéine-protéine. Méthodes expérimentales telles que l'évolution dirigée ont connu le succès dans la conception des protéines. Toutefois, ces méthodes sont limitées par l'espace de séquence limitée qui peut être consulté docilement. En revanche, les stratégies de conception informatiques permettent la projection d'un ensemble beaucoup plus vaste de séquences couvrant une grande variété de propriétés et fonctionnalités. Nous avons développé une gamme de calcul de novo de protéines conception méthods capables de s'attaquer à plusieurs secteurs importants de la conception des protéines. Ceux-ci comprennent la conception de protéines monomères pour une meilleure stabilité et complexes pour une meilleure affinité de liaison.
Pour diffuser ces méthodes pour utiliser plus large, nous présentons SAGESSE de protéines ( http://www.proteinwisdom.org ), un outil qui fournit des méthodes automatisées pour une variété de problèmes de conception de protéines. Modèles structurels sont soumis à initialiser le processus de conception. La première phase de conception est une étape de sélection de la séquence d'optimisation qui vise à améliorer la stabilité grâce à la minimisation de l'énergie potentielle dans l'espace de séquence. Séquences sélectionnées sont alors soumis à une étape de spécificité de pliage et un étage d'affinité de liaison. Une liste de classement ordonné des séquences pour chaque étape du procédé, ainsi que les structures destinées pertinentes, fournit à l'utilisateur une évaluation quantitative complète de la conception. Ici, nous fournissons les détails of chaque méthode de conception, ainsi que plusieurs succès notables expérimentaux obtenus grâce à l'utilisation de ces méthodes.
De novo protein design is the identification of protein sequences that will yield a desired tertiary structure with improved properties or function. Since the native fold of a protein is the conformation which lies at the free energy minimum, de novo protein design seeks sequences that will have a free energy minimum in the target fold. This problem was first described by Drexler1 and Pabo2 and was referred to as the “inverse folding problem.” However, unlike the protein folding problem, where a sequence can yield only one folded structure solution, the de novo protein design problem exhibits degeneracy. Many different amino acid sequences can yield the same tertiary structure and function.
While protein design has traditionally been performed experimentally through rational design and directed evolution, computational methods have more recently been employed to overcome the limited search space inherent in experimental methods. A variety of computational methods have been used, including deterministic methods, stochastic methods, and probabilistic methods.3,4 Early computational methods used fixed-backbone templates to make the problem easier to solve.5-7 With the advent of faster processors, high performance computing, and more efficient algorithms, backbone flexibility has been incorporated by using an ensemble of fixed-backbone templates8-14 or by incorporating true backbone flexibility by expressing the template in terms of ranges of atom-to-atom distances and dihedral angles.15,16
This paper describes in detail Protein WISDOM, an online tool that has been made available to the academic community to utilize our computational de novo protein design framework. This framework has been applied to the design of numerous proteins, for therapeutic use targeting diseases such as HIV, cancer, complement diseases, and other autoimmune disorders. Many of the predicted peptides were experimentally validated, demonstrating the power of the method. Table 1 provides a summary of the different proteins that have been designed including the size of the protein or peptide, the number of predictions, and experimental validation.
Protein Design | Protein Length | # of Computational Predictions | # of Experimental Validations | Reference |
Full sequence design of human beta-defensin-2 | 41 | 340 | (17) | |
Compstatin inhibitors of human C3 | 13 | 28 | 3/3 | (18, 19) |
Compstatin analogues that bind to rat C3c | 13 | 5 | (20) | |
Compstatin analogues with di-serine extension | 15 | 8 | ||
Stabilizing structure of compstatin analog W4A9 | 13 | 18 | ||
C3a receptor agonists and antagonists | 77 | 20 | 4/7 | (21) |
C5a receptor agonists and antagonists | 74 | 61 | 2/61 | |
HIV-1 gp14 inhibitors | 12 | 6 | 4/5 | (22) |
HIV-1 gp120 inhibitors | 9 | 14 | ||
Bak inhibitors of Bcl-x L and Bcl-2 | 16-18 | 10 | 5/5 | (23) |
Inhibitors of ERK2 | 11 | 25 | ||
Inhibitors of EZH2 | 21 | 17 | 10/10 | (24) |
Inhibitors of LSD1 and LSD2 | 16 | 41 | 17/20 | |
Inhibitors of HLA-DR1 | 13 | 6 | (25) | |
Inhibitors of PNP | 5 | 13 |
Table 1. Summary of designed proteins and peptides using the de novo protein design framework. The # of computational predictions is presented as the number of favorable predictions (i.e. fold specificities above a certain cutoff or approximate binding affinities greater than the native sequence). The # of experimental validations gives two numbers: the first is the number of predictions that were experimentally validated while the second is the total number of predictions that were tested experimentally.
Design of human-beta-defensin-2 (hβD-2) was performed to enhance the peptide’s antimicrobial property.17 For this design, we considered two cases: 1) up to 10 mutations along hβD-2 and 2) full sequence design of all hβD-2 residue positions except the Cysteines (8, 15, 20, 30, 37, and 38). Three different design templates and three different sequence selection models were utilized in the design. High levels of similarity in mutations were observed between the weighted average and distance bin models for both the 10 mutation design and the full sequence design. Additionally, a large number of sequences were found to have more favorable calculated Fold Specificity values than the native sequence.
Complement system inhibitors (of C3, C3a, and C5a) were designed to combat a number of immune diseases such as stroke, heart attack, Alzheimer’s disease, asthma, rheumatoid arthritis, rejection of xenotransplantation, adult respiratory disease, psoriasis, and Crohn’s disease. Three compstatin inhibitors of C3c predicted by the protein design framework plus three rationally designed sequences were experimentally validated to be better binders than the native compstatin.18,19
Further studies examined the loss of activity of compstatin against non-primate C3c and designed a number of candidate rat and mouse C3c inhibitors. Five sequences were shown to have more favorable association free energies with rat C3c than the W4A9 compstatin mutant known to inhibit C3c. This is due to a new salt bridge formation by Arg1.20 Eight sequences with an N-terminal extension were predicted to be better binders than W4A9 with a di-Serine extension. Finally, 18 compstatin sequences were predicted to stabilize the bound conformation of W4A9, providing strong candidates for primate and non-primate C3c inhibitors.
In addition to C3c inhibitors, C3a and C5a receptor agonists and antagonists were designed based upon the structures of C3a and C5a. Seven C3a sequences predicted by the model were experimentally tested. Two of the sequences were potent agonists while two others were partial agonists.21 The two potent agonists showed a 58-fold improvement over a previously discovered “superagonist”. The design of C5a receptor agonists and antagonists provided a set of 61 sequences. All the sequences were synthesized and two were found to be novel C5a agonists.
Fusion inhibitors of HIV-1, the virus that causes AIDS, were designed to prevent HIV-1 from infecting cells. The first design targeted gp41, an envelope glycoprotein of HIV-1. The protein design framework predicted six sequences that were better binders than the native sequence. Four of these predicted sequences were experimentally validated to inhibit HIV-1 with the best sequence having an IC50 as low as 29 μM. This sequence showed a 3-15 fold improvement over the native sequence and had no loss of activity against an Enfuvirtide-resistant virus strain.22 The second design targeted gp120, another envelope glycoprotein of HIV-1. Fourteen sequences were predicted to be binders of gp120 and provide additional potential fusion inhibitors of HIV-1.
Numerous proteins linked to cancer provided promising targets for cancer therapeutics. Bcl-2 and Bcl-xL are anti-apoptotic proteins that prevent cell death. Inhibitors of these two proteins were designed to induce cell death in cancer cells. Ten sequences were predicted to be better binders than the native, and these results captured previous experimental and mutagenesis results.23 Another target protein, ERK2, is involved in signal-transduction cascades that make it a promising target for antiproliferative cancer therapies. Twenty-five sequences were predicted to be inhibitors of ERK2.
Histone methyltransferases and demethylases dynamically control histone methylation, which has been linked to many cancer types including prostate, breast, lymphoma, myeloma, bladder, colon, skin, liver, endometrial, lung, and gastric. The de novo protein design framework identified 17 inhibitors of EZH2 (a Lysine methyltransferase) and of the ten experimentally tested, all were found to inhibit EZH2.24 The most potent peptide had an IC50 of about 13 μM, was equally effective with elevated enzyme concentrations, and did not compete with the cofactor. These peptides were the first set of inhibitors of EZH2. 53 inhibitors of LSD1 (a demethylase) were predicted by the framework and of the 20 experimentally tested, 17 were inhibitors of LSD1 and 18 were inhibitors of LSD2. The best inhibitors had IC50 values below 1 μM, making them the most potent peptidic inhibitors discovered to date.
The final two protein systems provided targets for treating various autoimmune diseases such as Coeliac disease, diabetes mellitus type 1, systemic lupus erythematosus, Sjögren’s syndrome, Churg-Strauss Syndrome, Hashimoto’s thyroiditis, Graves’ disease, idiopathic thrombocytopenic purpura, rheumatoid arthritis, and allergies. None of these potential inhibitors have been experimentally validated, however the framework predicted six sequences that bind to HLA-DR1 and 13 sequences that bind to PNP.
Table 2 summarizes experimentally validated inhibitors and agonists predicted using the de novo protein design framework. The approximate binding affinity metric was used to predict nine of the sequences (inhibitors of human C3c, HIV-1 gp41, EZH2, LSD1, and LSD2), while the fold specificity metric was used to identify four of the sequences (agonists/antagonists of C3aR). These peptides highlight the success of the de novo protein design framework, particularly the added approximate binding affinity metric. The framework is extremely versatile in its applicability. Six different proteins linked to twenty-five different diseases have been successfully designed and experimentally validated.
Name | IC50 | EC50 | Protein Target | Applicable Diseases |
SQ027 | 0.94 μM | human C3c | stroke, heart attack, Alzheimer’s disease, asthma, rheumatoid arthritis, systemic lupus erythematosus, multiple sclerosis, psoriasis, diabetes type I, Crohn’s disease, pancreatitis, and cystic fibrosis | |
SQ086 | 1.98 μM | human C3c | ||
SQ059 | 4.73 μM | human C3c | ||
SQ110-4 | 15.2 nM | C3aR | ||
SQ060-4 | 36.4 nM | C3aR | ||
SQ007-5 | 15.4 nM | C3aR | ||
SQ002-5 | 26.1 nM | C3aR | ||
SQ435 | 29 – 253 μM | HIV-1 gp41 | AIDS | |
SQ037 | 13.57 μM | EZH2 | prostate, breast, lymphoma, myeloma, bladder, colon, skin, liver, endometrial, lung, and gastric cancers | |
SQ011-1 | 0.521 μM | LSD1 | ||
SQ016-1 | 0.249 μM | LSD1 | ||
SQ026-1 | 2.51 μM | LSD2 | ||
SQ015-1 | 1.332 μM | LSD2 |
Table 2. Computationally predicted and experimentally validated peptides targeting various diseases.
The de novo protein design framework consists of two stages, a sequence selection stage and a validation stage. The framework is robust enough to handle rigid and flexible design templates, and can be applied to single protein design or complex protein design. The framework has been successfully applied to numerous protein systems with applications to dozens of diseases. A number of the designs have been experimentally validated, providing the most potent inhibitors or agonists of some proteins discovered to dat…
The authors have nothing to disclose.
CAF gratefully acknowledges support from NSF, NIH (R01 GM52032; R24 GM069 736), and the US Environmental Protection Agency, EPA (R 832721-010). A portion of this research was made possible with Government support by DoD, Air Force Office of Scientific Research. JS gratefully acknowledges support from NIH (P50GM071508-06). MLBP gratefully acknowledges support from a National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a. GAK gratefully acknowledges support from a National Science Foundation Graduate Research Fellowship under grant number DGE-1148900.