SUPFAM

During the course of evolution, protein sequences derived from a common ancestor diverge by mutations, insertions and deletions, gene duplication and recombination and give rise to diverse families with no easily detectable sequence similarity. These relationships are often revealed only after the availability of protein structures and their structural comparison. SUPFAM [1-3] is a database of potential superfamily relationships derived from identifying distant evolutionary relationships between protein sequence families (Pfam families) and structural families (SCOP) using a rigorous profile-profile comparison method, AlignHUSH [4]. The methodology exploits the evolutionary information inherent of SCOP classification to identify related Pfam families. The present SUPFAM database update (Release 6) has been derived using Pfam (version 27.0) [5] and SCOP database (version 1.75) [6]. Firstly each Pfam family profile is searched against the SCOP family profiles to identify possible evolutionary relationships using AlignHUSH. We identify 5017 Pfam families could be associated with SCOP superfamilies. Secondly, the remaining Pfam families are searched against a database of profiles of Pfam families, to identify Pfam families that could be indirectly related to a SCOP family. About 247 Pfam families were associated with other Pfam families mapped to SCOP superfamily. Thus in the present database, associations of 5295 Pfam families (out of 14831 ~ 36%) with a SCOP family are reported. SUPFAM database also consists of clusters wherein Pfam families which could not be mapped to any structural superfamilies, but are found to be related to one another and are clustered together to form "Potentially New Superfamilies (PNSFs)". These PNSFs (126 in number) could provide an important resource for structural genomics initiative targets.

Webpage:
http://supfam.mbu.iisc.ernet.in/

Tags:

protein sequence protein domains and classification

More to explore:

1/20



Need help integrating and/or managing biomedical data?