1. Absolute difference
in length
The similarity of two
domains was scored by the absolute difference in the two sequence lengths
and was normalised by maximum length. This number was then subtracted from
1, such that numbers nearer to 1 indicated closely similar domains.
2. Absolute difference
in number of secondary structure elements
A secondary structure
element was taken to be a continuous segment of strand (E) or helix (H).
The similarity of two domains was scored by the absolute difference in
total number of strand elements plus the absolute difference in the total
number of helical elements and was normalised by the total number of secondary
structure elements. This number was then subtracted from 1.
3. Simple alignment
of secondary structure elements
Secondary structure
strings were shortened such that a single character 'H' represented a helical
element, the single character 'E' represented a strand element and the
single character 'C' represented a coil element. The initial and final
coil elements were ignored. For example, the secondary structure string
CCCCCHHHHHHCCCCEEEECCCCCHHHHHHCCCCC would have been shortened to HCECH.
These shortened strings were then pairwise aligned using a standard dynamic
programming algorithm. Alignment between two elements i and j were scored
as follows: Matching elements (i=j) scored 2, mismatches (i¹j)
scored -1 and gaps scored -1. The alignment score was normalised by twice
the maximum length of the two shortened secondary structure strings.
4. Alignment of
secondary structure elements (Przytycka et al., 1999)
This method emulates
the secondary structure alignment scoring method of Przytycka et al.
(1999).
Secondary structure
strings were shortened and aligned using a conventional dynamic programming
algorithm as in method 3 above. However the alignment was scored according
to the scheme outlined by Przytycka et al. (1999). Information on
the length of each element was retained. Matching elements (i=j) were scored
by the minimum length of the two elements (min(li,lj)).
Alignment of helix with coil or strand with coil was scored by half of
the minimum length of the two elements (0.5min(li,lj)).
Alignment of helix with strand scored 0. No explicit gap penalty was imposed.
An additional score
was given by allowing helix and strand elements to be split into two for
alignment with coil, and coil elements to be split into two or three smaller
coils for alignment with strands and helices. See Przytycka et al.
(1999) for more detail.
The total score was
normalised by the mean trimmed sequence length (sequence length minus initial
and final coil regions) of the two proteins.
5. Alignment of
secondary structure elements without additional scoring
This method was essentially
the same as method 4 however the additional scoring stage from splitting
elements was not included. The standard score was normalised by the mean
trimmed sequence length of the two proteins.
6. Alignment of
secondary structure elements using DSSP as secondary structure assignment
This method was essentially
the same as method 4 however secondary structure was assigned using the
DSSP method (Kabsch and Sander, 1983).
7. Alignment of
secondary structure elements with gap penalty
This method was essentially
the same as method 5 however a gap penalty equal to the length in residues
of the element opposite the gap was imposed. For example, if a helix of
6 residues was aligned against a gap the gap penalty would equal -6. The
score was normalised by the mean trimmed sequence length of the two proteins.
8. Alignment of
secondary structure elements with gap penalty for long elements
This method was essentially
the same as method 7 however a gap penalty was only introduced for insertion
of gaps opposite helices longer than 6 residues and strands longer than
4 residues. The gap penalty imposed was equal to the length in residues
of the element opposite the gap. The score was normalised by the mean trimmed
sequence length of the two proteins.
9. Alignment of
secondary structure elements with absolute difference in length as scoring
scheme
This method was essentially
the same as method 5 however matching elements (i=j) were scored by the
absolute difference in length of the two elements normalised by the maximum
of the two lengths subtracted from 1 i.e. 1-((abs(li-lj)/max(li,lj)).
Alignment of helix with coil or strand with coil was scored by (1-((abs(li-lj)/max(li,lj)))/2.
Alignment of helix with strand scored 0. The score was normalised by the
maximum number of elements in both sequences.
10. Alignment of
full length secondary structure strings
Full-length secondary structure strings were pairwise aligned using
a conventional dynamic programming algorithm. Alignment between two characters
i and j were scored as follows: Matching elements (i=j) scored 2, mismatches
(i¹j) scored -1 and gaps scored
-1. The similarity score between domains was taken as the percentage identity
of the alignment i.e. the number of matching characters divided by the
length of the shortest sequence.
11. Alignment of
primary sequence
Primary sequences were pairwise aligned using dynamic programming as in method 10. Alignment between two characters i and j were scored as follows: Matching elements (i=j) scored 1, mismatches (i¹j) scored 0 and gaps scored -4. The similarity score between domains was taken as the percentage identity of the alignment.