Clustered protein sequences and multiple sequence alignments
protein sequence general sequence
Protein research foundation database of peptides: sequences, literature and unnatural amino acids
Protein sequences from model organisms, GO assignment and subcellular localization
PANDIT is a collection of multiple sequence alignments and phylogenetic trees covering many common protein domains. It contains the seed protein sequence alignments from the Pfam-A (curated families) ...
ColabFold databases are MMseqs2 expandable profile databases to generate diverse multiple sequence alignments to predict protein structures.
The Entrez Protein search and retrieval system contains protein entries that have been compiled from a variety of sources, including SwissProt, PIR, PRF, PDB, and translations from annotated coding re ...
Efficiency of protein remote homology detection methods depends on the dispersion of the protein sequence space and the availability of intermediate sequences between two related protein families. In ...
BAliBASE; a benchmark alignment database, including enhancements for repeats, transmembrane sequences and circular permutations.
Related protein sequences (clusters)of Reference Sequence proteins encoded by complete genomes
Computational Bridge to Experiments
The UniProt archive (UniParc), part of the UniProt databases, is an archival protein sequence collection from all major publicly accessible resources. New and revised protein sequences are added daily ...
The Protein Information Resource (PIR) is an integrated public bioinformatics resource that supports genomic and proteomic research and scientific studies. PIR has provided many protein databases and ...
Experimentally Characterized Protein annotations
Representation of multiple sequence alignments of protein families in terms of Position Specific Scoring Matrices (PSSMs) is commonly used in the detection of remote homologues. A PSSM is generated wi ...
The UniProt Reference Clusters are three separate datasets that compress sequence space at different resolutions, achieved by merging sequences and sub-sequences that are 100% (UniRef100), >=90% (UniR ...
Sequences and classification of predicted membrane transporters encoded in complete genomes
Clusters of predicted plant proteins
Evolution of protein-protein Interfaces InterEvol is a resource for researchers to investigate the structural interaction of protein molecules and sequences using a variety of tools and resources.
The database of Phylogeny and ALIgnment of homologous protein structures (PALI) contains structure-based sequence alignments and dendrograms based on information primarily derived from the structural ...
Families of protein-coding genes from five sequenced plant species
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Pfam also generates higher-level groupings of related ent ...