Similarity scores were calculated between pairs of domains by each of the following methods. (Secondary structure has been interpreted from the 4th string unless otherwise stated. Methods are roughly ordered by increasing complexity and decreasing speed.)

 
 

1. Absolute difference in length
 

 

The similarity of two domains was scored by the absolute difference in the two sequence lengths and was normalised by maximum length. This number was then subtracted from 1, such that numbers nearer to 1 indicated closely similar domains.
 

 

2. Absolute difference in number of secondary structure elements
 

 

A secondary structure element was taken to be a continuous segment of strand (E) or helix (H). The similarity of two domains was scored by the absolute difference in total number of strand elements plus the absolute difference in the total number of helical elements and was normalised by the total number of secondary structure elements. This number was then subtracted from 1.
 

 

3. Simple alignment of secondary structure elements
 

 

Secondary structure strings were shortened such that a single character 'H' represented a helical element, the single character 'E' represented a strand element and the single character 'C' represented a coil element. The initial and final coil elements were ignored. For example, the secondary structure string CCCCCHHHHHHCCCCEEEECCCCCHHHHHHCCCCC would have been shortened to HCECH. These shortened strings were then pairwise aligned using a standard dynamic programming algorithm. Alignment between two elements i and j were scored as follows: Matching elements (i=j) scored 2, mismatches (i¹j) scored -1 and gaps scored -1. The alignment score was normalised by twice the maximum length of the two shortened secondary structure strings.
 

 

4. Alignment of secondary structure elements (Przytycka et al., 1999)
 

 

This method emulates the secondary structure alignment scoring method of Przytycka et al. (1999).
 

 

Secondary structure strings were shortened and aligned using a conventional dynamic programming algorithm as in method 3 above. However the alignment was scored according to the scheme outlined by Przytycka et al. (1999). Information on the length of each element was retained. Matching elements (i=j) were scored by the minimum length of the two elements (min(li,lj)). Alignment of helix with coil or strand with coil was scored by half of the minimum length of the two elements (0.5min(li,lj)). Alignment of helix with strand scored 0. No explicit gap penalty was imposed.
 

 

An additional score was given by allowing helix and strand elements to be split into two for alignment with coil, and coil elements to be split into two or three smaller coils for alignment with strands and helices. See Przytycka et al. (1999) for more detail.
 

 

The total score was normalised by the mean trimmed sequence length (sequence length minus initial and final coil regions) of the two proteins.
 

 

5. Alignment of secondary structure elements without additional scoring
 

 

This method was essentially the same as method 4 however the additional scoring stage from splitting elements was not included. The standard score was normalised by the mean trimmed sequence length of the two proteins.
 

 

6. Alignment of secondary structure elements using DSSP as secondary structure assignment
 

 

This method was essentially the same as method 4 however secondary structure was assigned using the DSSP method (Kabsch and Sander, 1983).
 

 

7. Alignment of secondary structure elements with gap penalty
 

 

This method was essentially the same as method 5 however a gap penalty equal to the length in residues of the element opposite the gap was imposed. For example, if a helix of 6 residues was aligned against a gap the gap penalty would equal -6. The score was normalised by the mean trimmed sequence length of the two proteins.
 

 

8. Alignment of secondary structure elements with gap penalty for long elements
 

 

This method was essentially the same as method 7 however a gap penalty was only introduced for insertion of gaps opposite helices longer than 6 residues and strands longer than 4 residues. The gap penalty imposed was equal to the length in residues of the element opposite the gap. The score was normalised by the mean trimmed sequence length of the two proteins.
 

 

9. Alignment of secondary structure elements with absolute difference in length as scoring scheme
 

 

This method was essentially the same as method 5 however matching elements (i=j) were scored by the absolute difference in length of the two elements normalised by the maximum of the two lengths subtracted from 1 i.e. 1-((abs(li-lj)/max(li,lj)). Alignment of helix with coil or strand with coil was scored by (1-((abs(li-lj)/max(li,lj)))/2. Alignment of helix with strand scored 0. The score was normalised by the maximum number of elements in both sequences.
 

 

10. Alignment of full length secondary structure strings
 

 

Full-length secondary structure strings were pairwise aligned using a conventional dynamic programming algorithm. Alignment between two characters i and j were scored as follows: Matching elements (i=j) scored 2, mismatches (i¹j) scored -1 and gaps scored -1. The similarity score between domains was taken as the percentage identity of the alignment i.e. the number of matching characters divided by the length of the shortest sequence.
 

 

11. Alignment of primary sequence
 

 

Primary sequences were pairwise aligned using dynamic programming as in method 10. Alignment between two characters i and j were scored as follows: Matching elements (i=j) scored 1, mismatches (i¹j) scored 0 and gaps scored -4. The similarity score between domains was taken as the percentage identity of the alignment.