We developed computational de novo protein design methods capable of tackling several important areas of protein design. To disseminate these methods we present Protein WISDOM, an online tool for protein design (http://www.proteinwisdom.org). Starting from a structural template, design of monomeric proteins for increased stability and complexes for increased binding affinity can be performed.
L'obiettivo di disegno de novo di proteine è di trovare le sequenze amminoacidiche che si piega in una struttura 3-dimensionale desiderata con miglioramenti nelle proprietà specifiche, quali affinità di legame, agonista o antagonista comportamento, o stabilità, relativi alla sequenza nativa. Progettazione di proteine si trova al centro del progresso attuale progettazione di farmaci e di scoperta. Non solo progettazione di proteine fornisce previsioni per bersagli farmacologici potenzialmente utili, ma migliora anche la nostra comprensione del processo di folding delle proteine e le interazioni proteina-proteina. Metodi sperimentali come evoluzione diretta hanno dato risultati positivi in progettazione di proteine. Tuttavia, tali metodi sono limitate dallo spazio sequenza limitata che possono essere cercati trattabile. In contrasto, strategie di progettazione computazionali permettono la proiezione di un insieme più ampio di sequenze che coprono una vasta gamma di proprietà e funzionalità. Abbiamo sviluppato una serie di calcolo de novo di proteine disegno metoDS in grado di affrontare diverse importanti aree di progettazione di proteine. Questi includono la progettazione di proteine monomeriche per una maggiore stabilità e complessi per una maggiore affinità di legame.
Per diffondere questi metodi per il più ampio uso presentiamo SAGGEZZA Proteine ( http://www.proteinwisdom.org ), uno strumento che fornisce metodi automatici per una varietà di problemi di progettazione di proteine. Modelli strutturali sono state inoltrate per inizializzare il processo di progettazione. La prima fase di progettazione è una sequenza fase di selezione di ottimizzazione che mira a migliorare la stabilità mediante minimizzazione dell'energia potenziale nello spazio sequenza. Sequenze selezionate sono poi eseguiti attraverso una fase di specificità piega e una fase di affinità di legame. Una lista ordinata rango delle sequenze per ogni fase del processo, insieme con strutture progettate pertinenti, fornisce all'utente una valutazione quantitativa completa del disegno. Qui forniamo i dettagli of ciascun metodo di progettazione, così come diversi successi notevoli sperimentali ottenuti attraverso l'uso dei metodi.
De novo protein design is the identification of protein sequences that will yield a desired tertiary structure with improved properties or function. Since the native fold of a protein is the conformation which lies at the free energy minimum, de novo protein design seeks sequences that will have a free energy minimum in the target fold. This problem was first described by Drexler1 and Pabo2 and was referred to as the “inverse folding problem.” However, unlike the protein folding problem, where a sequence can yield only one folded structure solution, the de novo protein design problem exhibits degeneracy. Many different amino acid sequences can yield the same tertiary structure and function.
While protein design has traditionally been performed experimentally through rational design and directed evolution, computational methods have more recently been employed to overcome the limited search space inherent in experimental methods. A variety of computational methods have been used, including deterministic methods, stochastic methods, and probabilistic methods.3,4 Early computational methods used fixed-backbone templates to make the problem easier to solve.5-7 With the advent of faster processors, high performance computing, and more efficient algorithms, backbone flexibility has been incorporated by using an ensemble of fixed-backbone templates8-14 or by incorporating true backbone flexibility by expressing the template in terms of ranges of atom-to-atom distances and dihedral angles.15,16
This paper describes in detail Protein WISDOM, an online tool that has been made available to the academic community to utilize our computational de novo protein design framework. This framework has been applied to the design of numerous proteins, for therapeutic use targeting diseases such as HIV, cancer, complement diseases, and other autoimmune disorders. Many of the predicted peptides were experimentally validated, demonstrating the power of the method. Table 1 provides a summary of the different proteins that have been designed including the size of the protein or peptide, the number of predictions, and experimental validation.
Protein Design | Protein Length | # of Computational Predictions | # of Experimental Validations | Reference |
Full sequence design of human beta-defensin-2 | 41 | 340 | (17) | |
Compstatin inhibitors of human C3 | 13 | 28 | 3/3 | (18, 19) |
Compstatin analogues that bind to rat C3c | 13 | 5 | (20) | |
Compstatin analogues with di-serine extension | 15 | 8 | ||
Stabilizing structure of compstatin analog W4A9 | 13 | 18 | ||
C3a receptor agonists and antagonists | 77 | 20 | 4/7 | (21) |
C5a receptor agonists and antagonists | 74 | 61 | 2/61 | |
HIV-1 gp14 inhibitors | 12 | 6 | 4/5 | (22) |
HIV-1 gp120 inhibitors | 9 | 14 | ||
Bak inhibitors of Bcl-x L and Bcl-2 | 16-18 | 10 | 5/5 | (23) |
Inhibitors of ERK2 | 11 | 25 | ||
Inhibitors of EZH2 | 21 | 17 | 10/10 | (24) |
Inhibitors of LSD1 and LSD2 | 16 | 41 | 17/20 | |
Inhibitors of HLA-DR1 | 13 | 6 | (25) | |
Inhibitors of PNP | 5 | 13 |
Table 1. Summary of designed proteins and peptides using the de novo protein design framework. The # of computational predictions is presented as the number of favorable predictions (i.e. fold specificities above a certain cutoff or approximate binding affinities greater than the native sequence). The # of experimental validations gives two numbers: the first is the number of predictions that were experimentally validated while the second is the total number of predictions that were tested experimentally.
Design of human-beta-defensin-2 (hβD-2) was performed to enhance the peptide’s antimicrobial property.17 For this design, we considered two cases: 1) up to 10 mutations along hβD-2 and 2) full sequence design of all hβD-2 residue positions except the Cysteines (8, 15, 20, 30, 37, and 38). Three different design templates and three different sequence selection models were utilized in the design. High levels of similarity in mutations were observed between the weighted average and distance bin models for both the 10 mutation design and the full sequence design. Additionally, a large number of sequences were found to have more favorable calculated Fold Specificity values than the native sequence.
Complement system inhibitors (of C3, C3a, and C5a) were designed to combat a number of immune diseases such as stroke, heart attack, Alzheimer’s disease, asthma, rheumatoid arthritis, rejection of xenotransplantation, adult respiratory disease, psoriasis, and Crohn’s disease. Three compstatin inhibitors of C3c predicted by the protein design framework plus three rationally designed sequences were experimentally validated to be better binders than the native compstatin.18,19
Further studies examined the loss of activity of compstatin against non-primate C3c and designed a number of candidate rat and mouse C3c inhibitors. Five sequences were shown to have more favorable association free energies with rat C3c than the W4A9 compstatin mutant known to inhibit C3c. This is due to a new salt bridge formation by Arg1.20 Eight sequences with an N-terminal extension were predicted to be better binders than W4A9 with a di-Serine extension. Finally, 18 compstatin sequences were predicted to stabilize the bound conformation of W4A9, providing strong candidates for primate and non-primate C3c inhibitors.
In addition to C3c inhibitors, C3a and C5a receptor agonists and antagonists were designed based upon the structures of C3a and C5a. Seven C3a sequences predicted by the model were experimentally tested. Two of the sequences were potent agonists while two others were partial agonists.21 The two potent agonists showed a 58-fold improvement over a previously discovered “superagonist”. The design of C5a receptor agonists and antagonists provided a set of 61 sequences. All the sequences were synthesized and two were found to be novel C5a agonists.
Fusion inhibitors of HIV-1, the virus that causes AIDS, were designed to prevent HIV-1 from infecting cells. The first design targeted gp41, an envelope glycoprotein of HIV-1. The protein design framework predicted six sequences that were better binders than the native sequence. Four of these predicted sequences were experimentally validated to inhibit HIV-1 with the best sequence having an IC50 as low as 29 μM. This sequence showed a 3-15 fold improvement over the native sequence and had no loss of activity against an Enfuvirtide-resistant virus strain.22 The second design targeted gp120, another envelope glycoprotein of HIV-1. Fourteen sequences were predicted to be binders of gp120 and provide additional potential fusion inhibitors of HIV-1.
Numerous proteins linked to cancer provided promising targets for cancer therapeutics. Bcl-2 and Bcl-xL are anti-apoptotic proteins that prevent cell death. Inhibitors of these two proteins were designed to induce cell death in cancer cells. Ten sequences were predicted to be better binders than the native, and these results captured previous experimental and mutagenesis results.23 Another target protein, ERK2, is involved in signal-transduction cascades that make it a promising target for antiproliferative cancer therapies. Twenty-five sequences were predicted to be inhibitors of ERK2.
Histone methyltransferases and demethylases dynamically control histone methylation, which has been linked to many cancer types including prostate, breast, lymphoma, myeloma, bladder, colon, skin, liver, endometrial, lung, and gastric. The de novo protein design framework identified 17 inhibitors of EZH2 (a Lysine methyltransferase) and of the ten experimentally tested, all were found to inhibit EZH2.24 The most potent peptide had an IC50 of about 13 μM, was equally effective with elevated enzyme concentrations, and did not compete with the cofactor. These peptides were the first set of inhibitors of EZH2. 53 inhibitors of LSD1 (a demethylase) were predicted by the framework and of the 20 experimentally tested, 17 were inhibitors of LSD1 and 18 were inhibitors of LSD2. The best inhibitors had IC50 values below 1 μM, making them the most potent peptidic inhibitors discovered to date.
The final two protein systems provided targets for treating various autoimmune diseases such as Coeliac disease, diabetes mellitus type 1, systemic lupus erythematosus, Sjögren’s syndrome, Churg-Strauss Syndrome, Hashimoto’s thyroiditis, Graves’ disease, idiopathic thrombocytopenic purpura, rheumatoid arthritis, and allergies. None of these potential inhibitors have been experimentally validated, however the framework predicted six sequences that bind to HLA-DR1 and 13 sequences that bind to PNP.
Table 2 summarizes experimentally validated inhibitors and agonists predicted using the de novo protein design framework. The approximate binding affinity metric was used to predict nine of the sequences (inhibitors of human C3c, HIV-1 gp41, EZH2, LSD1, and LSD2), while the fold specificity metric was used to identify four of the sequences (agonists/antagonists of C3aR). These peptides highlight the success of the de novo protein design framework, particularly the added approximate binding affinity metric. The framework is extremely versatile in its applicability. Six different proteins linked to twenty-five different diseases have been successfully designed and experimentally validated.
Name | IC50 | EC50 | Protein Target | Applicable Diseases |
SQ027 | 0.94 μM | human C3c | stroke, heart attack, Alzheimer’s disease, asthma, rheumatoid arthritis, systemic lupus erythematosus, multiple sclerosis, psoriasis, diabetes type I, Crohn’s disease, pancreatitis, and cystic fibrosis | |
SQ086 | 1.98 μM | human C3c | ||
SQ059 | 4.73 μM | human C3c | ||
SQ110-4 | 15.2 nM | C3aR | ||
SQ060-4 | 36.4 nM | C3aR | ||
SQ007-5 | 15.4 nM | C3aR | ||
SQ002-5 | 26.1 nM | C3aR | ||
SQ435 | 29 – 253 μM | HIV-1 gp41 | AIDS | |
SQ037 | 13.57 μM | EZH2 | prostate, breast, lymphoma, myeloma, bladder, colon, skin, liver, endometrial, lung, and gastric cancers | |
SQ011-1 | 0.521 μM | LSD1 | ||
SQ016-1 | 0.249 μM | LSD1 | ||
SQ026-1 | 2.51 μM | LSD2 | ||
SQ015-1 | 1.332 μM | LSD2 |
Table 2. Computationally predicted and experimentally validated peptides targeting various diseases.
The de novo protein design framework consists of two stages, a sequence selection stage and a validation stage. The framework is robust enough to handle rigid and flexible design templates, and can be applied to single protein design or complex protein design. The framework has been successfully applied to numerous protein systems with applications to dozens of diseases. A number of the designs have been experimentally validated, providing the most potent inhibitors or agonists of some proteins discovered to dat…
The authors have nothing to disclose.
CAF gratefully acknowledges support from NSF, NIH (R01 GM52032; R24 GM069 736), and the US Environmental Protection Agency, EPA (R 832721-010). A portion of this research was made possible with Government support by DoD, Air Force Office of Scientific Research. JS gratefully acknowledges support from NIH (P50GM071508-06). MLBP gratefully acknowledges support from a National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a. GAK gratefully acknowledges support from a National Science Foundation Graduate Research Fellowship under grant number DGE-1148900.