BioThesaurus

BioThesaurus is a web-based system that maps a comprehensive collection of protein and gene names to protein entries in the UniProt Knowledgebase (UniProtKB). Currently covering more than two million protein sequences, BioThesaurus consists of over 2.8 million names extracted from multiple molecular biology databases according to the database cross-references provided in iProClass (Wu et al, 2004). The BioThesaurus web site allows the retrieval of synonymous names of given protein entries and the identification of protein entries sharing the same names. The BioThesaurus dataset can be used for automatic protein named entity recognition. It is updated monthly and can be freely downloaded at http://pir.georgetown.edu/iprolink/biothesaurus/data/thesaurus.

Webpage:
http://pir.georgetown.edu/iprolink/biothesaurus/

Publications:

Tags:

genomics genome annotation terms, ontologies and nomenclature

More to explore:

1/20



Need help integrating and/or managing biomedical data?