This article describes the procedure for the identification and characterization of a gene family in grapevine applied to the family of Arabidopsis Tóxicos in Levadura (ATL) E3 ubiquitin ligases.
Classification and nomenclature of genes in a family can significantly contribute to the description of the diversity of encoded proteins and to the prediction of family functions based on several features, such as the presence of sequence motifs or of particular sites for post-translational modification and the expression profile of family members in different conditions. This work describes a detailed protocol for gene family characterization. Here, the procedure is applied to the characterization of the Arabidopsis Tóxicos in Levadura (ATL) E3 ubiquitin ligase family in grapevine. The methods include the genome-wide identification of family members, the characterization of gene localization, structure, and duplication, the analysis of conserved protein motifs, the prediction of protein localization and phosphorylation sites as well as gene expression profiling across the family in different datasets. Such procedure, which could be extended to further analyses depending on experimental purposes, could be applied to any gene family in any plant species for which genomic data are available, and it provides valuable information to identify interesting candidates for functional studies, giving insights into the molecular mechanisms of plant adaptation to their environment.
During the last decade, much research has been carried out in grapevine genomics. Grapevine is a recognized economically relevant crop, which has become a model for research on fruit development and on the responses of woody plants to biotic and abiotic stresses. In this context, the release of the Vitis vinifera cv. PN40024 genome in 20071 and its updated version in 20112 led to a rapid accumulation of "Omics"-scale data and to a burst of high-throughput studies. Based on the published sequence data, the comprehensive analysis of a given gene family (generally composed of proteins sharing conserved motifs, structural and/or functional similarities and evolutionary relationships), can now be performed to uncover its molecular functions, evolution, and gene expression profiles. These analyses can contribute to understanding how gene families control physiological processes at a genome-wide level.
Many aspects of the plant life cycle are regulated by ubiquitin-mediated degradation of key proteins, which require a fine-tuned turnover to ensure regular cellular processes. Important components of the ubiquitin-mediated degradation process are the E3 ubiquitin ligases, which are responsible for system flexibility, thanks to the recruitment of specific targets3. Accordingly, these enzymes represent a huge gene family, with around 1,400 E3 ligase-encoding genes predicted in Arabidopsis thaliana genome4, each E3 ubiquitin ligase acting for the ubiquitination of specific target proteins. Despite the importance of substrate-specific ubiquitination in cellular regulation in plants, little is known about how the ubiquitination pathway is regulated and target proteins have been identified only in a few cases. The deciphering of such specificity and regulation mechanisms relies first on the identification and characterization of the different components of the system, in particular the E3 ligases. Among ubiquitin ligases, the ATL subfamily is characterized by 91 members identified in A. thaliana displaying a RING-H2 finger domain5,6, some of them playing a role in defense and hormone responses7.
The first crucial step to define the members of a new gene family is the precise definition of the family features, such as consensus motifs, key domains, and protein sequence characteristics. Indeed, the reliable retrieval of all gene family members based on BLAST analysis requires some mandatory sequence characteristics, in particular protein domains responsible for protein function/activity, serving as protein signature. This can be facilitated by previous characterization of the same gene family in other plant species or achieved by analyzing different genes putatively belonging to the same family in different plant species, to isolate common sequences. The family members can then be individually named following common rules settled by international consortia for a given plant species. In grapevine, for instance, such procedure is subjected to the recommendations of the Super-Nomenclature Committee for Grape Gene Annotation (sNCGGa), establishing the construction of a phylogenetic tree including V. vinifera and A. thaliana gene family members to allow gene annotation based on nucleotide sequences8.
Chromosome localization of family members and gene duplication survey allow highlighting the presence of whole-genome or tandem duplicated genes. Such information appears useful to unravel putative gene functions, since it might show functional redundancy or reveal different situations, i.e., non-functionalization, neo-functionalization, or sub-functionalization9. Both neo- and sub-functionalization are important events that create genetic novelty, providing new cellular components for plant adaptation to changing environments10. In particular, duplications of ancestral genes and production of new genes were very frequent during the evolution of the grapevine genome and newly formed genes originating from proximal and tandem duplications in grapevine were more likely to produce new functions11.
Another key factor in deciphering gene family function is the transcriptomic profile. The availability of public databases giving access to a huge amount of transcriptomic data can be thus exploited to assign putative functions to gene family members using large-scale in silico expression analyses. Indeed, the peculiar expression of some genes in specific plant organs or in response to certain stresses can give some hints regarding the putative roles of the corresponding proteins in defined conditions, and give support to hypotheses about possible sub-functionalization of duplicated genes to respond to different challenges. For that purpose, it is important to consider several datasets: these can be already available gene expression matrixes, such as the genome-wide transcriptomic atlas of grapevine organs and developmental stages12, or can be built ad hoc by retrieving transcriptomic datasets for the particular plant species subjected to defined stresses. Moreover, a simple approach using two matrices, one with pairwise similarity data and the other one with pairwise co-expression coefficients can be applied to evaluate the relationships between sequence similarity and expression patterns within a gene family.
The aim of this work is to provide a global approach, defining gene structure, conserved protein motifs, chromosomal location, gene duplications, and expression patterns, as well the prediction of protein localization and phosphorylation sites, to attain an exhaustive characterization of a gene family in plants. Such a comprehensive approach is applied here to the characterization of the ATL E3 ubiquitin ligase family in grapevine. According to the emerging role of ATL subfamily members in regulating key cellular processes7, this work can well assist the identification of strong candidates for functional studies, and eventually unravel the molecular mechanisms governing the adaptation of this important crop to its environment.
1. Identification of Putative ATL Gene Family Member(s)
2. Manual Inspection of the PSI-BLAST-identified Family Members
3. Analysis of Protein Physical Parameters and Domains
4. Chromosomal Distribution, Duplications, and Exon-intron Organization
5. Phylogenetic Analysis and Nomenclature
6. Grapevine Organ and Stage Expression Profiling
7. Expression Profiling in Response to Biotic and Abiotic Stresses
8. Analysis of the Relationships Between Paralogous Sequence Divergence and Gene Co-expression
The VIT_05s0077g01970 gene, identified as the most similar to A. thaliana ATL2 (At3g16720) through a BLASTp search, was used as probe to survey the ATL family members in the grapevine genome (V. vinifera cv Pinot Noir PN40024). The PSI-BLAST analysis converged after a few cycles revealing a list of putative genes belonging to the grapevine ATL gene family (Figure 1A). The presence of the canonical RING-H2 domain for each candidate was evaluated by the visual inspection of the MUSCLE alignment of all the entries identified in the analysis (Figure 1B). Only those genes containing the correctly spaced conserved amino acids, the two histidine residues, as well as the proline residue before the third cysteine were considered as ATLs according to the original ATL definition in Arabidopsis5. A total of 96 grapevine genes fulfilled the requirements and were considered for further characterization. Each ATL family member was analyzed to define the specific characteristics of the gene and the corresponding encoded protein, i.e., the presence of other known domain(s) in addition to the RING-H2, transmembrane or hydrophobic rich regions, subcellular localization, and putative phosphorylation sites (Table 1 and Table 2).
Figure 1: PSI-BLAST survey and alignment of putative grapevine ATLs. (A) Screenshot of the top 10 hits of the first PSI-BLAST iteration search using the protein sequence VIT_05s0077g01970 as bait. (B) Portion of the alignment of the 96 selected grapevine putative ATLs showing their RING-H2 domain and the corresponding LOGO obtained using a suite of molecular biology (see Table of Materials). Reproduced from Ariani et al. licensed under a Creative Commons Attribution 4.0 International License42. Please click here to view a larger version of this figure.
Name | Gene ID | Gene length (bp) | Intron number | UniProt ID | Protein length (aa) | RING-H2 motif | TM/H domain number | Other domains | ||
VviATL3 | VIT_09s0002g00220 | 1245 | 0 | F6HXK6 | 304 | PxC | 1 | |||
VviATL4[VviRHX1A] | VIT_15s0021g00890 | 1827 | 3 | D7SM36 | 203 | PxC | 0 | |||
VviATL18 | VIT_11s0118g00780 | 1113 | 2 | F6HCI8 | 193 | PC | 0 | |||
VviATL23a | VIT_18s0001g01060 | 935 | 0 | F6H0E4 | 114 | PxC | 0.5 | |||
VviATL23b | VIT_18s0001g01050 | 399 | 0 | E0CQX3 | 132 | PxC | 1 | |||
VviATL24 | VIT_17s0000g06460 | 4466 | 4 | D7SI89 | 217 | PxC | 1 | |||
VviATL27 | VIT_00s0264g00020 | 2554 | 4 | D7T1R5 | 235 | PxC | 1 | |||
VviATL43 | VIT_11s0052g00530 | 1576 | 2 | D7SQD9 | 457 | PxC | 3 | |||
VviATL54a | VIT_18s0001g06640 | 3221 | 1 | F6H0Y5 | 405 | PxC | 1 | |||
VviATL54b | VIT_03s0017g00670 | 2774 | 1 | F6HTI0 | 427 | PxC | 1 | |||
VviATL55[VviRING1] | VIT_07s0191g00230 | 1844 | 0 | F6HRP9 | 372 | PxC | 1 | |||
VviATL63 | VIT_06s0004g06930 | 804 | 0 | D7SJU6 | 267 | PxC | 1 | |||
VviATL65 | VIT_03s0063g01890 | 2068 | 0 | F6HQI8 | 396 | PxC | 1 | |||
VviATL82 | VIT_01s0026g02540 | 820 | 0 | F6HPQ9 | 233 | PC | 0.5 | |||
VviATL83 | VIT_17s0000g08400 | 1887 | 0 | F6GSQ4 | 143 | PC | 0 | |||
VviATL84 | VIT_06s0004g00120 | 1853 | 0 | F6GUP5 | 368 | PC | 0.5 | zf-RING_3 | ||
VviATL85 | VIT_12s0034g01400 | 786 | 0 | F6H965 | 261 | PC | 0.5 | |||
VviATL86 | VIT_12s0034g01390 | 1434 | 1 | D7T016 | 451 | PC | 0.5 | |||
VviATL87 | VIT_18s0001g03270 | 1002 | 0 | F6H0T2 | 333 | PC | 0.5 | zf-RING_3 | ||
VviATL88 | VIT_08s0040g00590 | 1320 | 0 | F6HQR2 | 314 | PC | 0 | zf-RING_3 |
Table 1: First 20 VviATL genes and sequence characteristics of the corresponding proteins. TM: transmembrane; H: hydrophobic; 0.5 indicates the presence of one or more hydrophobic regions. Reproduced from Ariani et al. licensed under a Creative Commons Attribution 4.0 International License42.
Table 2: Details on the first 20 VviATL gene position in V. vinifera genome, duplication state, and ATL protein physico-chemical characteristics and location. (a) Number of phosphorylation sites predicted by Musite; (b) similar predictions obtained with at least two software are highlighted in bold; ngLOC was used with default settings, whereas TargetP v1.1 and Protein Prowler Subcellular Localization were used with a cut-off of probability of 0.5. NUC, nucleus; MIT, mitochondria; CHL, chloroplast; PLA, plasma membrane; S, secretory pathway (presence of a signal peptide); M, mitochondria; C, chloroplast; O or -, other locations; nd, not determined (i.e., value below the threshold). Reproduced from Ariani et al. licensed under a Creative Commons Attribution 4.0 International License42. Please click here to download this file.
A phylogenetic analysis including the nucleotide sequences of identified grapevine ATL-encoding genes together with the sequences of the reference A. thaliana ATL gene family was used for grapevine ATL nomenclature, according to the guidelines of the sNCGGa8. Ninety-six and 83 nucleotide sequences from V. vinifera and A. thaliana, respectively, were subjected to the Phylogeny.fr pipeline to obtain a reliable phylogenetic tree. The latter sequences were later used to annotate and name grapevine genes on the basis of solid relationships (Figure 2). Following this approach, 13 out of 96 grapevine ATLs received a specific identifier considering their one-to-one orthology with an A. thaliana ATL. The names of the other 83 genes were assigned based on the phylogenetic tree, with a progressive numbering from top to bottom, starting from an ATL gene number higher than the highest number used in A. thaliana.
Figure 2: Phylogenetic tree of V. vinifera and A. thaliana ATL E3 ubiquitin ligase-encoding genes. The unrooted tree was generated with the Phylogeny.fr suite (V. vinifera (in green) and the 83 ATL genes of A. thaliana reported in the UniProt database (in yellow). Branch support values were obtained from 100 bootstrap replicates. The red stars indicate the presence of a BCA2 zinc finger (BZF) domain in the corresponding proteins. Reproduced from Ariani et al. licensed under a Creative Commons Attribution 4.0 International License42. Please click here to view a larger version of this figure.
Mapping ATL-encoding genes to the grapevine chromosomes showed a wide distribution throughout the genome, suggesting whole-genome duplication as the major evolutionary force in the expansion of ATL gene family in grapevine. Indeed, 31 ATLs were found in homologous chromosomal regions potentially originating from segmental or whole genome duplication events. Moreover, the same analysis highlighted 13 tandemly duplicated genes, one proximal duplicate, and 51 dispersed duplicates (Figure 3). Considering the very large number of duplicated genes in the ATL family, we performed an enrichment test (Fisher's exact test) to check the preferential retention of the duplicated genes during the genome fractionation. With a p-value < 0.001, this test confirmed the hypothesis that duplicated ATL genes were retained more than randomly expected, suggesting a role for the ATL gene family during grapevine adaptation and evolution.
Figure 3: Grapevine ATL-encoding gene distribution on V. vinifera chromosomes and duplication state. The 96 grapevine ATL genes with exact chromosomal information available in the database were mapped to the 19 V. vinifera chromosomes. The colors indicate the original duplication event. Vertical black lines and red lines identify pairs derived from tandem duplications and whole genome duplications, respectively. Reproduced from Ariani et al. licensed under a Creative Commons Attribution 4.0 International License42. Please click here to view a larger version of this figure.
To further investigate the putative biological functions of the ATLs in grapevine, a meta-analysis was carried out on the V. vinifera cv. Corvina global gene expression Atlas12. The dataset includes whole-genome expression values of 54 different grapevine organs and developmental stages and was used to perform a hierarchical bi-clustered analysis. Results not only confirmed that all the 96 ATLs were expressed in at least one of the 54 tissues/stages, but also pointed out the presence of five main clusters of expression profiles (Figure 4A). Briefly, clusters A and E showed opposite behaviors, in particular the first is characterized by a general downregulation of ATL genes in juvenile samples, including early berry stages, young leaf, tendrils, inflorescence, and most of the bud stages. On the other hand, in the same cluster A, mature samples such as berries at ripening and post-harvest withering stages, woody tissues, and late stages of seed development ATL genes showed a predominant upregulation. Genes in Cluster C were mainly downregulated in most of the samples, while ATL genes in cluster D were often upregulated at late stages of berry development. Finally, cluster B did not show any relevant variation in the expression profiles.
A similar approach was applied to study the expression of grapevine ATL family members in response to biotic and abiotic stresses, using specific datasets built for this purpose. A huge amount of expression data deriving from microarray and RNA-seq experiments are available from public access databases such as Gene Expression Omnibus (GEO) and ArrayExpress. Once collected and conveniently normalized, the information was exploited for further insights into the potential function of ATLs in plant response to stresses. Analyzing the expression profiles of grapevine ATLs in response to biotic stresses revealed that 62 out of 96 transcripts showed a significant modulation (log2 fold-change (FC) >|0.5|) in at least two conditions, with a false discovery rate (FDR) < 0.05 (Figure 4B). The number increases to 81 considering only the FDR threshold in a single condition. These results strongly suggested a direct involvement of the ATL gene family in the response to pathogens also in grapevine. In particular, a group of 12 genes (VviATL3-27-54b-55-90-97-123-144-148-149-156) were strongly upregulated in response to most pathogens, including biotrophic and necrotrophic fungi and herbivores, and thus, deserve attention for further functional analyses.
Figure 4: Hierarchical clusteringof ATL gene expression in grapevine Atlas and in grapevine biotic stress-related dataset. (A) The log transformed expression values of grapevine ATL genes in the grapevine Atlas12 were used for hierarchical cluster analysis based on Pearson's distance metric. The color scale represents higher (red) or lower (green) expression levels with respect to the median transcript abundance of each gene across all samples. Letters A to E on the right side indicate the different clusters identified. AB: after burst; B: burst; bud-W: winter bud; F: flowering; FB: flowering begins; FS: fruit set; G: green; MR: mid-ripening; PFS: post-fruit set; PHWI-II-III: post-harvest withering 1, 2 and 3 months; R: ripening; S: senescent; stem-W: woody stem; V: veraison; WD: well developed; Y: young. (B) The color scale represents increased (red) or decreased (blue) fold changes of grapevine ATL gene expression in infected samples compared to controls for each condition. Asterisks indicate the significant differential expression (FDR < 0.05) of each ATL under the corresponding conditions. Reproduced from Ariani et al. licensed under a Creative Commons Attribution 4.0 International License42. Please click here to view a larger version of this figure.
Supplementary Table 1: ATL genes candidates for alternative splicing. (a) ATL gene ID according to the V1 grape gene prediction and annotation, (b) ATL gene ID according to the V2 grape gene prediction and annotation43, (c) number of putative ATL alternative splicing variants, (d) information on coding sequence of each putative ATL variant. Please click here to download this file.
Supplementary Table 2: Please click here to download this file.
Supplementary File 1: Please click here to download this file.
In the genomic era, many gene families have been deeply characterized in several plant species. This information is preliminary to functional studies and provide a frame to investigate further the role of different members in a family. In this context, there is also a need for a nomenclature system allowing to uniquely identify each member in a family, avoiding the redundancy and confusions that may arise when names are assigned independently to different genes by different research groups.
After thoughtful consideration, the grapevine scientific community agreed to name grapevine genes in a family based on similarities with Arabidopsis genes and established a series of rules that must be applied to describe new gene families in grapevine, basically starting from the phylogenetic comparison of nucleotide sequences between grapevine and Arabidopsis family members8. Therefore, only genes that are already annotated and named properly in Arabidopsis can be used in the grapevine nomenclature. The procedure applied for the identification of grapevine ATL orthologues in Arabidopsis described here was therefore carried out solely to fulfill the requirement of assigning the correct grapevine gene family nomenclature. Nevertheless, for other plant species, alternative approaches could be an option. For instance, orthology could be inferred using a bidirectional BLAST hits (BBH), where orthologues are defined as pairs of genes in two species that are more similar (i.e., with highest alignment score) to one another than to any other gene in the other species44. However, this method could miss many orthologues in the case of high rate of gene duplication, such as in plants and animals45. Moreover, in the case of ATL-encoding genes, BBH may retrieve genes lacking the precise ATL-type RING-H2 structure (including the proline residue) or genes that are not annotated and named as ATLs in Arabidopsis. Although from an evolutionary perspective this search may be relevant, the retrieval of orthologues that are not annotated would not have fulfilled the scope of grapevine ATL gene family annotation and nomenclature, and orthologues that are not annotated as ATLs cannot be used to name grapevine family members. Another possibility is to infer orthology based on amino acid instead of nucleotide sequences using InParanoid46, or the most recent Hieranoid 247, albeit such workflows are not expressly recommended by the scientific community.
Expression meta-analysis, which can be defined as a systematic approach to study and combine different publicly available dataset repositories of expression data, allows highlighting shared and different molecular mechanisms in a variety of conditions. Thus, the integration of gene expression information from multiple large-scale transcriptomic experiments can improve the characterization of a gene family, by defining the expression profiles of the family members across experiments, thus minimizing the impact of experiment-specific factors and supporting a more robust assumption of putative gene function in particular processes. However, the use of microarray data requires the integration of expression data obtained with different platforms, considering their own limitations. For instance, in the grapevine Nimblegen microarray platform, a significant proportion of probesets for corresponding genes represented on the array (~ 13,000 genes) have potentially cross-hybridization issues48. In the case of the grapevine ATL family, 15 genes may be affected by such phenomenon. Nevertheless, as discussed by Cramer et al.48, the cross-identification of highly similar gene family members by the same probe could provide interesting information regarding the expression, in specific conditions, not only of a single gene but of two to more genes sharing high sequence similarities, and thus potentially sharing targets and functions. Another potential issue related to microarray datasets is the expression detection limit of microarray platforms, which are not very sensitive. To solve both concerns, i.e., cross-hybridization and signal sensitivity, a possible solution could be to consider only RNAseq expression datasets. However, the meta-analysis of RNAseq data of very large datasets from many different studies can become highly time-consuming and may require many computational resources and high expertise.
Though the approach presented here aims to be exhaustive, it can be certainly further complemented with other analyses. First, to achieve further insights into the molecular evolution and phylogenetic relationship among gene family members in plants, the phylogenetic analysis could be extended building a phylogenetic tree using multiple sequence alignments of family members from several plant species. It is also possible to calculate the evolutionary time of family genes, an estimation of their synonymous and non-synonymous substitution rates during evolution, by determining the values Ks (number of synonymous substitutions per synonymous site in a given period of time) and Ka (number of nonsynonymous substitutions per non-synonymous site in the same period). The Ka/Ks ratio is used to infer the mechanisms of gene duplication events after divergence from their ancestors. A value of Ka/Ks = 1 suggests neutral selection, a Ka/Ks value of < 1 suggests purifying selection, and a Ka/Ks value of > 1 suggests positive selection49. Moreover, if gene structure analysis reveals the presence of introns, the gene family characterization can be further extended to the detection of alternative splicing variants. Indeed, based on a deep survey of RNA-seq data from different tissues, stress conditions and genotypes43, 21 (of the 96) ATLs are strong candidates for alternative splicing events, with potential number of isoforms ranging from 2 to 16 for these ATLs (see Supplementary Table 1). Alternative transcripts frequently produce protein isoforms that vary in amino acid sequences and these changes may alter the cellular properties of proteins and may cause alterations from subtle modulation to loss of function of the gene product. For that reason, alternative splicing events have been involved in important plant functions, including stress response, disease resistance, photosynthesis, and flowering50,51. Integration of ATL gene promoter information that contains putative cis-regulatory elements52 or finding molecules (e.g., microRNA and long non-coding RNA) potentially targeting ATLs53 can also be supplemented to reveal system insights into the complex molecular regulation and interaction of grapevine ATLs.
In conclusion, the choice of the analyses to be performed as well as the procedures to be applied to characterize a new gene family in a plant species are mainly driven by scientific community rules as well as by the scope of gene family identification. It is important to keep in mind the possible subsequent investigation steps, which will exploit the set of information, among which includes gene evolution among plant species, genome structure description, or reliable candidates for selection in functional studies.
The authors have nothing to disclose.
The work was supported by the University of Verona within the frame of Joint Project 2014 (Characterization of the ATL gene family in grapevine and of its involvement in resistance to Plasmopara viticola).
Personal computer | |||
Basic Local Alignment Search Tool (BLAST) | https://blast.ncbi.nlm.nih.gov/Blast.cgi | ||
Molecular Evolutionary Genetics Analysis (MEGA) | http://www.megasoftware.net/ | ||
Motif-based sequence analysis tools (MEME) | http://meme-suite.org/ | ||
Geneious | Biomatters Limited | http://www.geneious.com/ | |
ProtParam Tool | http://web.expasy.org/protparam/ | ||
ngLOC | http://genome.unmc.edu/ngLOC/index.html | ||
TargetP v1.1 Server | http://www.cbs.dtu.dk/services/TargetP/ | ||
Protein Prowler | http://bioinf.scmb.uq.edu.au:8080/pprowler_webapp_1-2/ | ||
MUsite | http://musite.sourceforge.net/ | ||
Pfam | http://pfam.xfam.org/ | ||
TMHMM Server v. 2.0 | http://www.cbs.dtu.dk/services/TMHMM/ | ||
ProtScale | http://web.expasy.org/protscale/ | ||
Grape Genome Database (CRIBI) | http://genomes.cribi.unipd.it/grape/ | ||
PhenoGram | http://visualization.ritchielab.psu.edu/phenograms/plot | ||
MCScanX | http://chibba.pgml.uga.edu/mcscan2/ | ||
Interactive Tree Of Life (iTOL) | http://itol.embl.de/ | ||
UniProt | http://www.uniprot.org/ | ||
Phylogeny.fr | http://www.phylogeny.fr/index.cgi | ||
MUSCLE | http://www.ebi.ac.uk/Tools/msa/muscle/ | ||
Gblocks Server | http://molevol.cmima.csic.es/castresana/Gblocks_server.html | ||
Vitis vinifera cv. Corvina gene expression Atlas datamatrix | https://www.researchgate.net/publication/273383414_54sample_datamatrix_geneIDs_Fasoli2012 | ||
Multi Experiment Viewer (MeV) | http://mev.tm4.org/#/welcome | ||
Sequence Read Archive (SRA) | https://www.ncbi.nlm.nih.gov/sra | ||
R | https://www.r-project.org/ | ||
EMBOSS Needle (EMBL-EBI) | http://www.ebi.ac.uk/Tools/psa/emboss_needle/ |