UniRef

Other names: UniProt Reference Clusters, UniProt Reference Clusters (UniRef), uniref

The UniProt Reference Clusters are three separate datasets that compress sequence space at different resolutions, achieved by merging sequences and sub-sequences that are 100% (UniRef100), >=90% (UniRef90), or >=50% (UniRef50) identical, regardless of source organism. The UniRef100 database provides the most comprehensive non-redundant coverage of the known protein sequence space including not only all of UniProtKB but also splice variants that are not separated out in these databases, as well as additional active sequences from UniParc. The UniRef90 and UniRef50 databases provide a more even sampling of sequences by reducing the numbers of closely related sequence. This speeds sequence similarity searches while rendering such searches more informative. The compression of UniRef100 into UniRef90 and UniRef50 yields size reductions of approximately 40% and 65%, respectively.

Webpage:
http://www.uniprot.org/uniref/

Licence:
Name: CC
URL: https://creativecommons.org/licenses/by-nd/3.0/

Publications:

Tags:

protein sequence general sequence biological process cellular component coding sequence diversity disease protein sequences sequence clusters sequence analysis gene structure

More to explore:

1/20



Need help integrating and/or managing biomedical data?