The ASTRAL compendium provides a set of tools and databases designed to aid investigators in the analysis of protein structure, particularly through the use of sequence comparison. Astral augments SCOP, a manual classification of protein domains according to structure, by providing a library of sequences which each corresponds to a structural domain classified in SCOP. To do so, the PDB entry for each SCOP domain is examined, and a mapping is constructed between the SEQRES information (that reflects the molecule studied) and the ATOM records (atoms observed experimentally) Because the majority of the structures in PDB are very similar to others, it is frequently helpful to reduce the redundancy by selecting high-quality representative subsets. To do this, we compare all extracted sequences using standard sequence comparison algorithms. This information is then combined with a quality score that provides a first order estimate of the resolution and regularity of crystallographically determined protein structures. We are thus able to provide sequence subsets with both limited redundancy and high quality structural information. The level of redundancy in these subsets is user defined, and is based on one of three criteria: percent sequence identity, BLAST E-value, or SCOP similarity. These sequence subsets are an ideal starting point for homology based structure prediction, and have also proven useful for testing new sequence comparison methods, and structure analysis. Several major improvements have been made to the ASTRAL compendium since its initial release two years ago. The number of protein domains included has doubled from 15,190 to 30,867, and additional databases have been added. The Rapid Access Format (RAF) database contains manually curated mappings linking the amino acid sequences of proteins in the PDB (SEQRES records in the database entry) to the atoms experimentally observed (ATOM records), in a format designed for rapid access by automated tools. This information is used to derive sequences for protein domains in the SCOP database. In cases where a SCOP domain spans several protein chains, all of which can be traced back to a single genetic source, a genetic domain sequence is created by concatenating the sequences of each chain in the order found in the original gene sequence. Both the standard library of SCOP sequences and a library including genetic domain sequences are available. Selected representative subsets derived from both libraries using the criteria described above are also included.




structure protein structure

More to explore:


Need help integrating and/or managing biomedical data?