Representative set of 1087 protein domains with resolutions <=2.5 Angstroms selected from the CATH list v1.6 of sequence family representatives (S-reps, v1.6) (ftp://ftp.biochem.ucl.ac.uk/pub/cathdata/v1.6/Sreps ) from: McGuffin, L. J., Bryson, K. & Jones, D. T.(2001) What are the baselines for protein fold recognition? Bioinformatics 17, 63-72.

Download files (WinZipped)
Download files (gzipped tar)



'Unique' and 'Known' domain sets from: McGuffin, L. J. & Jones, D. T. Targeting novel folds for structural genomics
 

Download 'Unique' set:     WinZipped gzipped tar
Download 'Known' set:     WinZipped gzipped tar


N.B. Each file within each archive is in a FASTA style format containing four lines of information e.g. the file 1atx00 contains the lines:

1. >1atx00
2. GAAaLbKSDGPNTRGNSMSGTIWVFGcPSGWNNbEGRAIIGYacKQ
3.   EEE TTS S  TTSSEEEEEESS   TT EEE  SSSSSEEEE
4. CEEEEEHHECEEEECCCECEEEECCCEECCEECEEECCEECEEEEC

1. CATH Domain name (four character PDB code followed by chain identifier and domain number)
2. DSSP amino acid sequence (lowercase letters are Cys residues)
3. DSSP assigned secondary structure (Kabsch and Sander, 1983)
4. Backbone dihedral angle assigned secondary structure (Przytycka et al., 1999).

It is recommended that lines 3 and 4 should be interpreted such that a strand equals three or more consecutive Es and a helix equals five or more consecutive Hs.

Please see the following references for more details:

Kabsch,W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers, 22, 2577-2637.

Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., and Thornton, J.M. (1997) CATH- A Hierarchic Classification of Protein Domain Structures. Structure., 5(8), 1093-1108.

Przytycka, T., Aurora, R. and Rose, G. (1999) A protein taxonomy based on secondary structure. Nature Struct. Biol., 6(7), 672-682.