SureChEMBL

SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis. SureChEMBL provides access to a previously unavailable, open and timely set of annotated compound-patent associations, complemented with sophisticated combined structure and keyword-based search capabilities against the compound repository and patent document corpus. Currently, the database contains 17 million compounds extracted from 14 million patent documents.

Webpage:
https://www.surechembl.org

Publications:

Tags:

chemical formula chemical structure patent text mining molecular biology drug

More to explore:

1/20



Need help integrating and/or managing biomedical data?