Efficiency of protein remote homology detection methods depends on the dispersion of the protein sequence space and the availability of intermediate sequences between two related protein families. In the absence of any structural evidence and natural intermediate sequences, detecting distant evolutionary relationships is a challenging task. Large gaps, between related families, in the sequence space can be bridged through the design of protein-like sequences [1, 2]. In our recent publication [1], we developed a computational algorithm to design protein-like intermediate sequences between related protein families. 3,611,010 artificial sequences were designed between pairs of related protein families for 374 multi-membered SCOP-folds (1.75v). Such computationally designed intermediately related sequences when augmented into commonly employed databases enable detection of remote relationships. Through the NrichD database resource, we provide designed sequences plugged into commonly employed structure and sequence databases [3, 4] for the user to perform homology searches. These enriched databases (SCOP-NrichD and Pfam-NrichD), their respective natural sequence databases (SCOP-DB and Pfam-DB) and the dataset of artificial sequences (AS-DB) can be freely downloaded from the website. User can also perform jackhmmer [5] searches against these enriched databases through the web-portal. Searches are made additionally in their respective natural sequence database to achieve maximum coverage. These intermediate sequences are annotated with their parent profiles, which makes iterative searches traceable and help in fold recognition. Another useful feature provided by the web-server is to generate sequences for or between related families. User can define SCOP domain families or provide a multiple sequence alignments of the protein families and generate artificial sequences at different level of divergence.
protein sequence general sequence