|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Go to DomPred BackgroundInput sequence (single letter code)DomPred lower sequence length limitPSI-BLAST sequence alignment domain predictionDomSSEA domain predictionInclude secondary structure profile plotAttach results to email?Format of Results |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Input sequence (single letter code)Type
or cut and paste your query sequence into the form (no nucleic acid
sequences please!). The sequence must be in the format of the amino
acid single letter code, in either capital and or lower-case letters.
Spaces within the pasted sequence are permitted - however note that
these will be parsed out prior to analysis. We recommend that you enter your sequence as a plain single-letter string like this: ALGSNLNTPVEQLHAALKAISQLSNTHLVTTSSFYKSKPLGPQDQPDYVNAVAKIETELS DomPred lower sequence length limitFigure 1. ![]() Of course it is possible for a chain of 120 residues or less to consist of two-domains, and if you have such suspicions about your query, it is worth considering that over 99% of domains in the CATH database are over 40 residues in length, it is therefore likely that the domain boundary in such a case will be located approximately midway between the amino and carboxyl termini of the query sequence. PSI-BLAST sequence alignment domain predictionThe PSI-BLAST sequence alignment domain prediction searches the query sequence against a large database of sequences (nrdb90), including sequences from Pfam-A. Pfam-A search
Query vs sequence database
Input E-value cut-off (default 0.01)
Input number of PSI-BLAST iterations (default 5)
DomSSEA domain predictionIn cases where no significant sequence matches have been found to Pfam-A sequences, or no significant domain termini peaks found by the PSI-BLAST alignment algorithm, DomSSEA can be used to predict the domain content of you query sequence.As outlined in Marsden et al. (In Press), DomSSEA is based on the idea that a crude fold recognition algorithm based on the mapping of predicted secondary structures to observed secondary structure patterns in domains of known 3-D structure might be reliable enough to parse a long target sequence into putative domains. This is often the way in which a human sequence analyst will attempt to parse a protein into domains when homology-based approaches have been unsuccessful. Secondary structure element alignments methods (SSEA) have been shown to provide a rapid prediction of the fold for given sequences with no detectable homology to any known structure and have also been applied to the related problem of novel fold detection (McGuffin et al., 2001; McGuffin & Jones, 2002). DomSSEA results table Include secondary structure profile plotMulti-domain proteins may contain regions of different secondary structure class. For example a two-domain chain may contain an all-beta domain, followed by an all-alpha domain. The transition between such regions may be enough to predict a putative domain boundary.Secondary structure predictions are made for query sequences by PSIPRED. Using a smoothing window of 9 residues, a profile of the secondary structure can be shown on the plot shown at the top of the results page figure (2). Show PSIPRED prediction
Attach results to email?All results data from the results form can be emailed. Note however this may result in a large amount of data being sent to your email server.Format of ResultsThe response email provides a link to a web page containing the DomPred results for the protein submitted. For example, if we submit the AXONIN-1 protein (PDB ID 1CS6) to the DomPred server, the following result page is generated:
The graph is derived from the N- and C-termini positions from PSI-BLAST local alignments. In this way, large values indicate regions where sequence discontinuities occur, thus indicate putative domain boundaries. Below this is given the predicted number of domains and the positions of domain boundaries predicted from the peaks in this graph. In general, this graph should always be visually inspected to confirm the predicted number of domains and possible domain boundaries since a large degree of variation is possible, due to aspects such as disorder and variation in the domain linker region, which may not always be accurately handled by the automatic peak detector. In this case 4 domains are predicted with the domain boundaries at residue positions 102, 188, 290. Following the PSI-BLAST derived results is the DomSSEA results. PSIPRED is used to predict the secondary structure of the query sequence, and this secondary structure is then aligned against the DSSP determined secondary structures over a complete fold library. The SSEA (Secondary Structure Element Alignment) scoring scheme is employed to carry out this matching by aligning complete secondary structure element. The reasoning behind this is that secondary structure is more conserved than sequence and thus more remote structural relationships will be detected. Once a fold is detected which matches the secondary structure of the query sequence, the domain boundaries of the matched fold are simply transferred to the query sequence. In this way, the method also predicts putative folds for the individual domains, given in terms of SCOP codes. In this case the best secondary structure match is with itself, chain A of the PDB structure 1CS6. Clearly this is a completely accurate prediction ! If the sequence was actually novel and not already in the PDB, the best match would be to chain B in PDB structure 1BIH, again predicting a 4 domain protein with boundaries at residue positions 99 ,207 and 293. In this case the domains are all predicted to be the same fold, with SCOP code b.1.1.4 (Immunoglobulin-like beta-sandwich). The third prediction is only for two domains and is a false positive. The fourth prediction is again for 4 domains with the domains having the fold given by SCOP code b.1.2.1., this is again an Immunoglobulin-like beta-sandwich, although in a different super-family (Fibronectin type III). So there is a general consensus that the structure is a 4 domain structure with identical Immunoglobulin-like beta-sandwich domains. The following graphic shows that this is the correct prediction: ![]() | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|