Tag: sequence analysis

Found 104 sources

Source	Match	ReputationScore*
Pfam The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pf ...		100%
GenBank GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. The complete release notes for the current version of GenBank are available on the NCBI ftp site. A new release is made every two months. G ...		94%
SILVA SILVA is a comprehensive, quality-controlled web resource for up-to-date aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains alongside supplementary online services. In addition to data products, SILVA provide ...		94%
Database of Single Nucleotide Polymorphism dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations an ...		87%
Sequence Read Archive The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms Data submitted to SRA. It is organized using a metadata model consisting of six objects: study, sample, experiment, run, analysis and submissi ...		87%
Conserved Domain Database The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, including NCBI-curated domains, which use 3D-structure information to explicitly to define domain boundaries and p ...		84%
Reference Sequence Database The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins.		78%
Database resources of the National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBankÂ® nucleic acid sequence database and the PubMed database of citations and abstracts publish ...		71%
PROSITE PROSITE is a database of protein families and domains. PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them.		67%
NCBI Gene The Entrez Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. Entrez can effic ...		66%
Information system for G protein-coupled receptors The GPCRDB is a molecular-class information system that collects, combines, validates and stores large amounts of heterogenous data on G protein-coupled receptors (GPCRs). The GPCRDB contains data on sequences, ligand binding constants and mutations. ...		58%
UniRef The UniProt Reference Clusters are three separate datasets that compress sequence space at different resolutions, achieved by merging sequences and sub-sequences that are 100% (UniRef100), >=90% (UniRef90), or >=50% (UniRef50) identical, regardless o ...		56%
SwissRegulon The Swissregulon Database contains genome-wide annotations of regulatory sites. The predictions are based on Bayesian probabilistic analysis of a combination of input information including i) Experimentally determined binding sites reported in the li ...		53%
European Hepatitis C Virus database The euHCVdb is mainly oriented towards protein sequence, structure and function analyses and structural biology of Hepatitis C Virus.		53%
Gene3D Gene3D uses the information in CATH to predict the locations of structural domains on millions of protein sequences available in public databases. Sequence data from UniProtKB and Ensembl for domains with no experimentally determined structures are s ...		52%
Universal PBM Resource for Oligonucleotide Binding Evaluation The UniPROBE (Universal PBM Resource for Oligonucleotide Binding Evaluation) database hosts data generated by universal protein binding microarray (PBM) technology on the in vitro DNA binding specificities of proteins.		52%
Ribosomal Database Project (RDP-II) The Ribosomal Database Project - II (RDP-II)(1) provides data, tools and services related to ribosomal RNA sequences to the research community. Through its website (http://rdp.cme.msu.edu), RDP-II offers aligned and annotated rRNA sequence data, anal ...		51%
Phospho.ELM Phospho.ELM is a manually curated database of eukaryotic phosphorylation sites. The resource includes data collected from published literature as well as high-throughput data sets.		50%
IMGT/LIGM-DB IMGT/LIGM-DB is the IMGT® comprehensive database of immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences, from human and other vertebrate species, with translation for fully annotated sequences, created in 1989 by LIGM (http://www.imgt.o ...		50%
Ensembl Mouse Genome Browser Analysis of finished and draft mouse genomic clone sequences.		50%
Candida Genome Database The Candida Genome Database (CGD) provides access to genomic sequence data and manually curated functional information about genes and proteins of the human pathogen Candida albicans. It collects gene names and aliases, and assigns gene ontology term ...		50%
PASS2 PASS2 contains alignments of structural motifs of protein superfamilies. PASS2 is an automatic version of the original superfamily alignment database, CAMPASS (CAMbridge database of Protein Alignments organised as Structural Superfamilies). PASS2 con ...		49%
Ensembl Compara Ensembl Compara provides cross-species resources and analyses, at both the sequence level and the gene level.		49%
GenomeNet Network of database and computational resources including KEGG (pathways, interactions, etc.) and DBGET/LinkDB (an integrated database retrieval system). It also hosts several web-based tools for sequence analysis (i.e. Blast, Motif, Clustal W).		48%
HAMAP database of microbial protein families HAMAP is a system, based on manual protein annotation, that identifies and semi-automatically annotates proteins that are part of well-conserved families or subfamilies: the HAMAP families. HAMAP is based on manually created family rules and is appli ...		48%
Codon Usage Database Find GC content and frequency of codon usage for any organism that has a sequence in GenBank.		48%
iProX iProX is a public platform for collecting and sharing raw data, analysis results and metadata obtained from proteomics experiments. The iProX repository employs a web-based proteome data submission process and open sharing of mass spectrometry-based ...		47%
Bacterial protein tYrosine Kinase database The Bacterial protein tYrosine Kinase database (BYKdb) contains computer-annotated BY-kinase sequences. The database web interface allows static and dynamic queries and provides integrated analysis tools including sequence annotation.		47%
HOGENOM HOGENOM is a phylogenomic database providing families of homologous genes and associated phylogenetic trees (and sequence alignments) for a wide set sequenced organisms.		44%
PIR SuperFamily The PIR SuperFamily concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships.		44%
EffectiveDB The Effective database contains pre-calculated predictions of bacterial secreted proteins and of functional secretion systems. Effective bundles various tools to recognize Type III secretion signals, conserved binding sites of Type III chaperones, eu ...		42%
APPRIS Annotates variants with biological data such as protein structural information, functionally important residues, conservation of functional domains and evidence of cross-species conservation.		42%
Movebank Data Repository This data repository allows users to publish animal tracking and animal-borne sensor datasets that have been uploaded to Movebank (https://www.movebank.org). Published datasets have gone through a submission and review process, and are associated wit ...		42%
Autophagy Database Proteins involved in self-digestion of eukaryotic cells		41%
Homologous Vertebrate Genes Database HOVERGEN is a database of homologous vertebrate genes that allows one to select sets of homologous genes among vertebrate species, and to visualize multiple alignments and phylogenetic trees.		41%
Entrez NCBI information retrieval system, including GenBank, MMDB (structures), genomes, population sets, OMIM, taxonomy and PubMed.		41%
ParameciumDB ParameciumDB is a new model organism database for Paramecium, built using components of the Generic Model Organism Database (http://www.gmod.org) construction set (Chado relational database schema, Turnkey generic web framework and Gbrowse). The data ...		40%
MAR databases The MAR databases is a collection of manually curated marine microbial contextual and sequence databases, based at the Marine Metagenomics Portal. This was developed as a part of the ELIXIR EXCELERATE project in 2017 and is maintained by The Center f ...		39%
GlycoPOST GlycoPOST is a mass spectrometry data repository for glycomics. Users can release their "raw/processed" data via this site with a unique identifier number for the paper publication. Submission conditions are in accordance with the Minimum Information ...		39%
tRNADB-CE tRNA Gene DataBase Curated by Experts		39%
Ebola and Hemorrhagic Fever Virus Database The Ebola and Hemorrhagic Fever Virus Database stems from the Hemorrhagic Fever Viruses (HFV) Database Project founded by Dr. Carla Kuiken in 2009 at the Los Alamos National Laboratory (LANL). The HFV Database was modeled on the Los Alamos HIV Databa ...		38%
CR-EST - Crop ESTs The crop EST database CR-EST (http://pgrc.ipk-gatersleben.de/cr-est/) is a publicly available online resource providing access to sequence, classification, clustering, and annotation data of crop EST projects at IPK Gatersleben, Germany. CR-EST curre ...		38%
ROdent Unidentified Gene-Encoded large proteins The ROUGE protein database is a sister database of HUGE protein database which has accumulated the results of comprehensive sequence analysis of human long cDNAs (KIAA cDNAs). The ROUGE protein database has been created to publicize the information o ...		38%
RNArchitecture RNArchitecture is a database that provides a comprehensive description of relationships between known families of structured ncRNAs, with focus on sequence and structure similarities. RNArchitecture also provides literature information and links to o ...		37%
Domain Interaction Graph Guided ExploreR DIGGER is an essential resource for studying the mechanistic consequences of alternative splicing such as isoform-specific interaction and consequence of exon skipping. The database integrates information of domain-domain and protein-protein interact ...		37%
Nematodes.org Wiki for coordinating nematode sequencing projects		37%
siRNAdb The siRNA database provides a gene-centric view of human siRNA experimental data, including siRNAs of known efficacy and siRNAs predicted to be of high efficacy by siSearch. Linked to these sequences is information including siRNA thermodynamic prope ...		37%
MatrisomeDB The ECM-protein knowledge database. Please follow MatrisomeDB. MatrisomeDB will be hosted at matrisomedb.org very soon.		36%
mESAdb microRNA Expression and Sequence Analysis Database		36%
Nucleotide Sequence Database Collaboration This database consists of a joint effort to collect and disseminate databases containing DNA and RNA sequences. It is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. It covers the spectrum of data raw reads, th ...		36%
SIMAP Protein sequences are of utmost importance for studying the function and evolution of genes and genomes. Therefore a rich collection of methods in computational biology relies on the analysis and comparison of protein sequences. Many of these intensi ...		36%
ElastoDB Repository for well-characterized elastin sequences to facilitate its study. The database has since expanded to include other non-elastin sequences that share elastic properties.		35%
Alias A tool for converting identifiers in which multiple aliases are used to refer to sequences. Also available as a stand-alone tool.		35%
EBI patent sequences Non-redundant databases of patent DNA and protein sequences		35%
HMS-ICS The Hyperlink Management System (HMS) automatically updates and maintains hyperlinks among major databases using various data IDs (e.g. HUGO Gene Symbols, IDs from PDB, UniProt). The ID Converter System (ICS) supports the conversion of data IDs using ...		34%
DescribePROT DescribePROT is a database containing annotations of 13 putative structural and functional properties at the amino acid level for ~1.4 million proteins from 83 popular/model organism, to be extended to hundreds of additional organisms. Users can sear ...		34%
SARS-CoV-2 3D database This tool is for understanding the coronavirus proteome and evaluating possible drug targets.		34%
MassIVE.quant A community resource of quantitative mass spectrometry-based proteomics datasets. MassIVE.quant is an extension of the Mass Spectrometry Interactive Virtual Environment (MassIVE) to provide the opportunity for large-scale deposition of data from qua ...		34%
Kassiopeia A web application for the generation, storage, and presentation of genome-wide analyses of mutually exclusive exonomes.		33%
Mabellini A genome-wide database for understanding the structural proteome and evaluating prospective antimicrobial targets of the emerging pathogen Mycobacterium abscessus. An on-line source for Mycobacterium abscessus modeled structural proteome. MabeLLINI ...		33%
Tabloid Proteome an annotated database of protein associations. Tabloid Proteome is a database of protein association network generated using publically available mass spectrometry based experiments in PRIDE.These associations represent a broad scala of biological a ...		33%
dbPTM dbPTM is a databases which accumulates the biological information related to protein post-translational modification (PTM), such as the catalytic sites, structural information, solvent accessibility of residues, protein secondary structures, protein ...		33%
dbAMP DBAMP 2.0: updated resource for antimicrobial peptides with an enhanced scanning method on genomic and proteomic data.		33%
Amordad Database engine for comparing metagenomic data at massive scale. It first obtains the sequence signature of metagenomes and organizes them as points in high dimensional space.		32%
CytomegaloVirusDb Multi-omics knowledge database for cytomegaloviruses.		32%
CrustyBase CrustyBase is an interactive online database for crustacean transcriptomes. CrustyBase provides an environment for navigating and visualising crustacean transcriptome datasets. Users can search existing transcriptomes or import new datasets of their ...		31%
2DE-pattern 2DE-pattern is a database containing data on proteins/isoforms/proteoforms profiles.		31%
SARSCOVIDB New Platform for the Analysis of the Molecular Impact of SARS-CoV-2 Viral Infection.		31%
MRMAssayDB A Comprehensive Resource for Targeted Proteomics Assays in the Community.		31%
UbiBrowser 2.0 A comprehensive resource for proteome-wide known and predicted ubiquitin ligase/deubiquitinase-substrate interactions in eukaryotic species.		31%
AlphaKnot AlphaKnot is a server that measures entanglement in AlphaFold-solved protein models while considering pLDDT confidence values.		31%
DBSAV database DBSAV database reports GTS scores of human genes and DeepSAV scores of SAVs in the human proteome, including pathogenic SAVs, benign SAVs, gnomAD SAVs observed in exome sequencing, and all possible SAVs by single nucleotide variations. Each human pro ...		31%
ILDGDB A manually curated database of genomics, transcriptomics, proteomics and drug information for interstitial lung diseases. ILDGDB is a manually curated database that provides comprehensive experimentally supported associations between genes and inter ...		31%
PIR - Protein Information Resource The Protein Information Resource (PIR) is an integrated public bioinformatics resource that supports genomic and proteomic research and scientific studies. PIR has provided many protein databases and analysis tools to the scientific community, includ ...		30%
miROrtho Computational prediction of animal microRNA genes		30%
DIGIT DIGIT is a database of immunoglobulin variable domain sequences annotated with the type of antigen, the germline sequences and pairing information between light and heavy chains.		30%
UniGene <<<!!!<<< This repository is no longer available>>>!!!>>>. Although the web pages are no longer available, you will still be able to download the final UniGene builds as static content from the FTP site https://ftp.ncbi.nlm.nih.gov/repository/UniGen ...		30%
SDRDB Short-chain dehydrogenases/reductases database.		28%
FLAD Forensic loci allele database.		28%
Genome Trax A search tool for finding variants from specific chromosome coordinates. It is possible to integrate the results in NGS pipeline.		28%
RSpred A Rifin/Stevor prediction tool.		28%
DGD Provides a list of groups of co-located and duplicated genes.		28%
PlantMWpIDB A database for the molecular weight and isoelectric points of the plant proteomes.		28%
CavitySpace A database of potential ligand binding sites in the human proteome.		28%
qPTM An updated database for PTM dynamics in human, mouse, rat and yeast.		28%
ChemBioPort An integrative platform to navigate the biology, structure and chemical inhibition of human proteins.		28%
hCoronavirusesDB A genetic and proteomic database for the SARS-CoV, MERS-CoV, and SARS-COV-2.		28%
NCBI PopSet NCBI PopSet collects DNA sequences to analyze the ways that populations are related by evolution. Such sequences indicate if populations originate from different members of the same species or from organisms of different species entirely.		28%
HypDB Deep Proteome Profiling Enabled Functional Annotation and Data-Independent Quantification of Proline Hydroxylation Targets.		28%
ExVe ExVe is the knowledge base of orthologous proteins identified in fungal extracellular vesicles.		28%
DisEnrich DisEnrich—the database of human proteome IDRs that are significantly enriched in particular amino acids.		28%
GreeningDB A Database of Host-Pathogen Protein-Protein Interactions and Annotation Features of the Bacteria Causing Huanglongbing HLB Disease.		28%
BoMiProt BoMiProt is a manually curated, comprehensive repository of published information of bovine milk proteins.		28%
Entrez Protein Clusters A collection of related protein sequences (clusters) consists of proteins derived from the annotations of whole genomes, organelles and plasmids. It currently limited to Archaea, Bacteria, Plants, Fungi, Protozoans, and Viruses.		28%
HPREP A comprehensive database for human proteome repeats. Human Proteome Repeats Database. HPREP : HUMAN PROTEOME REPEATS.		28%
Antar Predict miRNA targets for human and mouse using the two predictive models: a model trained from microarry studies following transfection, and a model trained from PAR-CLIP datasets.		28%
CEDAR CEDAR (The ComplexomE profiling DAta Resource) facilitates the storage and sharing of complexome profiling data, compliant with the MIACE standard , with the goal to enable and simplify their reuse.		28%
AniProtDB The Animal Proteome Database (AniProtDB) is a comprehensive collection of proteomes from 100 species spanning 21 animal phyla. In addition to providing open access to this collection of high-quality metazoan proteomes, information on predicted protei ...		28%
PSINDB The Postsynaptic Interaction Database is a comprehensive resource of the human postsynaptic (PS) binary protein-protein interactions. It contains experimental and computational evidence about interactions, along with structural and disease-related in ...		28%
virusMS virusMS is a database for synthetic peptides of viruses with mass spectrometry. It is a tool for resourcing, annotating, and analysing synthetic peptides of SARS-CoV-2 for immunopeptidomics and other immunological studies.		28%
A3D Database A3D Database is a database for structure-based protein aggregation predictions for the human proteome		28%
MS-Decipher A user-friendly proteome database search software with an emphasis on deciphering the spectra of O-linked glycopeptides.		28%
COVID-ONE-humoral immune The One-stop Database for COVID-19-specific Antibody Responses and Clinical Parameters.		28%
FABRIC Cancer Portal FABRIC Cancer Portal is a comprehensive catalogue of human coding genes in cancer based on the FABRIC framework. FABRIC quantifies the selection of genes in tumor and weighs their evidence for being cancer drivers.		28%

*ReputationScore indicates how established a given datasource is. Find out more.

Need help integrating and/or managing biomedical data?