exome sequence | BiŌkeanós

Found 8 tags

rna sequence 304 protein sequence 209 nucleic acid sequence 185 sequence analysis 104 dna sequences 97 sequence assembly 49 sequence sites, features and motifs 35 sequence annotation 31

Found 345 sources

Source	Match	ReputationScore*
Exome Aggregation Consortium Browser The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. ...	1	38%
Genebass Genebass is a resource of exome-based association statistics, made available to the public. The dataset encompasses 3,817 phenotypes with gene-based and single-variant testing across 281,852 individuals with exome sequence data from the UK Biobank.	2	26%
ESP NHLBI Exome Sequencing Project (ESP): Exome Variant Server (EVS) for browsing single nucleotide variation data from exome sequencing experiments mainly focused on heart, lung and blood disorders.	3	27%
Wellcome Sanger Institute: Whole Exome Sequencing There is a substantial overlap between the NIHR IBD BioResource and the IBD UK Genetics Consortium (IBDGC). The NIHR BioResource provides some DNA samples. IBDGC data is being provided by the Wellcome Sanger Institute, who are performing the sequenci ...	4	23%
dbMTS dbMTS is a comprehensive database of putative human microRNA target site (MTS) SNVs and their functional predictions. dbMTS collects all potential SNVs microRNA target seed regions in human 3’UTRs and provides their functional predictions and annotat ...	5	25%
Practical Haplotype Graph Platform for storing and using pangenomes for imputation.	6	22%
Sequence Ontology SO is a collaborative ontology project for the definition of sequence features used in biological sequence annotation. The Sequence Ontology is a set of terms and relationships used to describe the features and attributes of biological sequence. SO i ...	7	49%
SomaMutDB A database of somatic mutations in normal human tissues.	8	23%
CanVaS CanVaS is a Greek cancer patient genetic variation resource.	9	22%
Gene4Denovo an integrated database and analytic platform for de novo mutations in humans. De novo mutations (DNMs) significantly contribute to sporadic diseases, particularly in neuropsychiatric disorders. Whole-exome sequencing (WES) and whole-genome sequencin ...	10	26%
KRGDB The large-scale variant database of 1722 Koreans based on whole genome sequencing.	11	26%
DDBJ Sequence Read Archive DDBJ Sequence Read Archive (DRA) is an archive database for output data generated by next-generation sequencing machines including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD® System, and others. DRA is a member of the I ...	12	23%
DDBJ Trace Archive DDBJ Trace Archive (DTA) is a permanent repository of DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from various large-scale sequencing projects. DTA is a member of the International Nucleotide Sequence ...	13	22%
COGVIC COGVIC(Catalogue Of Germline Variants In Cancer). A comprehensive database of germline pathogenic variants in East Asian pan-cancer patients.	14	22%
CNVIntegrate Multi-ethnic database for identifying copy number variations associated with cancer. View gene-centric CNV profile collected from healthy individuals and multiple cancer types.	15	22%
Sequence Read Archive The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms Data submitted to SRA. It is organized using a metadata model consisting of six objects: study, sample, experiment, run, analysis and submissi ...	16	66%
TMC-SNPdb 2.0 An ethnic-specific database of Indian germline variants.	17	22%
GenBank Sequence Format GenBank Sequence Format (GenBank Flat File Format) consists of an annotation section and a sequence section. The start of the annotation section is marked by a line beginning with the word "LOCUS". The start of sequence section is marked by a line be ...	18	24%
AstraZeneca PheWAS Portal The AstraZeneca PheWAS Portal is a public repository of gene-phenotype associations for phenotypes derived from electronic health records, questionnaire data, and continuous traits. These data were generated using exome sequencing and phenotype data ...	19	26%
Uniclust Clustered protein sequences and multiple sequence alignments	20	22%
gnomAD Genome Aggregation Database (gnomAD) - browser that aggregates exome and whole-genome sequencing data from a wide variety of large-scale sequencing projects. It enables search of genetic variation information by gene, variant or region.	21	53%
Sequence Alignment Map The Sequence Alignment/Map (SAM) format is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section.	22	68%
MPS6 Review and classification of published variants in the ARSB gene. The purpose of this database is to support researchers and clinicians. understand structural changes on alylsulfatase B (ASB) caused by Mucopolysaccharidosis type VI (MPS6) mutations ...	23	27%
Database of Sequence Tagged Sites dbSTS is an NCBI resource that contains sequence data for short genomic landmark sequences or Sequence Tagged Sites.	24	38%
Genome Variation Format The Genome Variation Format (GVF) is a very simple file format for describing sequence alteration features at nucleotide resolution relative to a reference genome.	25	24%
UK Biobank UK Biobank is a large-scale biomedical database and research resource that provides researchers access to detailed longitudinal phenotype, medical and genetic data from 500,000 volunteer participants.	26	33%
SIMAP Protein sequences are of utmost importance for studying the function and evolution of genes and genomes. Therefore a rich collection of methods in computational biology relies on the analysis and comparison of protein sequences. Many of these intensi ...	27	27%
Human Genetic Variation Database The Human Genetic Variation Database (HGVD) aims to provide a central resource to archive and display Japanese genetic variation and association between the variation and transcription level of genes. The database currently contains genetic variation ...	28	35%
FASTQ Sequence and Sequence Quality Format FASTQ is a text-based file format for sharing sequencing data combining both the sequence and an associated per base quality score.	29	44%
DBSAV database DBSAV database reports GTS scores of human genes and DeepSAV scores of SAVs in the human proteome, including pathogenic SAVs, benign SAVs, gnomAD SAVs observed in exome sequencing, and all possible SAVs by single nucleotide variations. Each human pro ...	30	23%
dbNSFP Database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) and splice-site variants (ssSNVs) in the human genome. It also facilitates the steps of filtering and prioritizing SNVs fr ...	31	36%
GenBank GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. The complete release notes for the current version of GenBank are available on the NCBI ftp site. A new release is made every two months. G ...	32	72%
CMPD MPD is designed for providing a comprehensive, integrated and well-annotated resource, focusing on protein sequence-altering variations originated from both germline and cancer-associated somatic variations. The mutated protein sequence pool was base ...	33	22%
Feature Annotation Location Description Ontology The Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences for data resources represented in RDF and/or OWL. FALDO can be used to describe nucleotide features in sequ ...	34	28%
INSD sequence record XML The International Nucleotide Sequence Database Collaboration (INSDC) is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. INSDC covers the spectrum of data raw reads, though alignments and assemblies to functional ...	35	26%
ATAV ATAV is a comprehensive platform for population-scale genomic analyses. ATAV stores variant and per site coverage data for all samples in a centralized database, which is efficiently queried by ATAV to support diagnostic analyses for trios and single ...	36	22%
PhenomeCentral Repository for clinicians and scientists working in the rare disorder community. It enables secure sharing of case records by clinicians and rare disease scientists and helps the user to find additional cases of the same unnamed disorder. The reposit ...	37	29%
EBI patent sequences Non-redundant databases of patent DNA and protein sequences	38	26%
openSNP A crowdsourced collection of personal genomics data. Includes SNP genotyping, exome sequencing data, phenotypic annotation and quantified self tracking data.	39	29%
Minimum Information about any (x) Sequence The minimum information about any (x) sequence (MIxS) is an overarching framework of sequence metadata, that includes technology-specific checklists from the previous MIGS and MIMS standards, provides a way of introducing additional checklists such a ...	40	39%
NCBI Trace Archives The Trace Archives includes the following archives: The Sequence Read Archive (SRA) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw ...	41	24%
Reference Sequence Annotation An ontology for sequence annotations and how to preserve them with reference sequences.	42	26%
UniParc The UniProt archive (UniParc), part of the UniProt databases, is an archival protein sequence collection from all major publicly accessible resources. New and revised protein sequences are added daily into UniParc while not deleting the previous vers ...	43	25%
ENA Sequence Flat File Format ENA Sequence Flat File Format is a standardised plain text format for nucleotide sequences. This format was previously called the EMBL Sequence Flat File Format.	44	24%
CGGA The Chinese Glioma Genome Atlas (CGGA) is a user-friendly web application for data storage and analysis to explore brain tumors datasets. This database includes the whole-exome sequencing, DNA methylation, mRNA sequencing, mRNA microarray and microRN ...	45	22%
UniRef The UniProt Reference Clusters are three separate datasets that compress sequence space at different resolutions, achieved by merging sequences and sub-sequences that are 100% (UniRef100), >=90% (UniRef90), or >=50% (UniRef50) identical, regardless o ...	46	43%
DNA Data Bank of Japan An annotated collection of all publicly available nucleotide and protein sequences. DDBJ collects sequence data mainly from Japanese researchers, as well as researchers in other countries. DDBJ is part of the International Nucleotide Sequence Databas ...	47	40%
UCSC Genome Browser database Genome assemblies and aligned annotations for a wide range of vertebrates and model organisms, along with an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic datasets.	48	88%
Berkeley Drosophila Genome Project EST database The goals of the Drosophila Genome Center are to finish the sequence of the euchromatic genome of Drosophila melanogaster to high quality and to generate and maintain biological annotations of this sequence.	49	23%
Pfam The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pf ...	50	76%
FASTA Sequence Format FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede th ...	51	53%
European Nucleotide Archive The European Nucleotide Archive (ENA) is a globally comprehensive data resource for nucleotide sequence, spanning raw data, alignments and assemblies, functional and taxonomic annotation and rich contextual data relating to sequenced samples and expe ...	52	52%
Mitochondrial Disease Sequence Data Resource The Mitochondrial Disease Sequence Data Resource (MSeqDR) is a centralized genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phen ...	53	32%
NCBI Third Party Annotation TPA is a database that contains sequences built from the existing primary sequence data in GenBank. TPA records are retrieved through the Nucleotide Database and feature information on the sequence, how it was cataloged, and proper way to cite the se ...	54	22%
SYSTERS The integration of SYSTERS, GeneNest and SpliceNest into one framework facilitates the over-all exploration of the whole sequence space covering protein, mRNA and EST sequences, as well as genomic DNA. The SYSTERS protein sequence cluster set provide ...	55	22%
Dfam The Dfam database is a open collection of DNA Transposable Element sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations. Dfam represents a collection of multiple sequence alignments, each containing a set of r ...	56	39%
cis-Regulatory Element Database The cisRED database holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations. Sequence inputs include low-coverage genome sequence data and ENCODE data.	57	34%
Binary sequence information Format A .2bit file stores multiple DNA sequences (up to 4 Gb total) in a compact randomly-accessible format. The file contains masking information as well as the DNA itself. The DNA sequence is represented as two bits per pixel with associated list of regi ...	58	22%
SuperSite Dictionary of binding sites in proteins	59	22%
PANDIT PANDIT is a collection of multiple sequence alignments and phylogenetic trees covering many common protein domains. It contains the seed protein sequence alignments from the Pfam-A (curated families) database; nucleotide sequence alignments derived f ...	60	23%
mESAdb microRNA Expression and Sequence Analysis Database	61	28%
Genome Warehouse The Genome Warehouse (GWH) is a public archival resource housing genome-scale data for a wide range of species. GWH accepts a variety of data types, including whole genome, chloroplast, mitochondrion and plasmid. For each collected genome assembly, G ...	62	36%
SEVENS Seven-transmembrane-helix receptors (7-TMR), known as G-protein-coupled receptors [1], are important genes that work as the gateway of signal transudation induced by ligand binding. Recent progress in determination of human draft sequences [2,3] acce ...	63	22%
Expressed Sequence Tags database The dbEST contains sequence data and other information on "single-pass" cDNA sequences, or "Expressed Sequence Tags", from a number of organisms. NCBI is in the process of merging EST and GSS records into the Nucleotide database, and the process is e ...	64	42%
GISSD Group I Intron Sequence and Structure Database	65	22%
YeTFaSCo Yeast Transcription Factor binding Site sequence Collection	66	22%
cpnDB Chaperonins are a diverse family of molecular chaperones present in the plastids, mitochondria, and cytoplasm of eukaryotes, and in bacteria and archaea. The family is divided into group I (CPN60, also known as Hsp60 or GroEL, found in bacteria, some ...	67	31%
ENA Sequence XML Schema ENA Sequence XML Schema is a standardised XML schema for nucleotide sequences. All assembled and annotated sequences must conform to this schema.	68	24%
CoPS Comprehensive peptide signature database	69	22%
O-GLYCBASE O-GLYCBASE is a database of glycoproteins with O-linked and C-linked glycosylation sites. Entries with at least one experimentally verified glycosylation site have been compiled from protein sequence databases and literature. Each entry contains info ...	70	22%
OryGenesDB: an interactive tool for rice reverse genetics The aim of this Oryza sativa database was first to display sequence information such as the T-DNA and Ds flanking sequence tags (FSTs) produced in the framework of the French genomics initiative Genoplante and the EU consortium Cereal Gene Tags. This ...	71	27%
NRichD Efficiency of protein remote homology detection methods depends on the dispersion of the protein sequence space and the availability of intermediate sequences between two related protein families. In the absence of any structural evidence and natural ...	72	22%
PRF Protein research foundation database of peptides: sequences, literature and unnatural amino acids	73	22%
Peptaibol The Peptaibol Database is a sequence and structure resource for the unusual class of peptides known as peptaibols. The database includes sequence, biological source, and bibliographical data for the naturally-occurring peptaibols. Information is also ...	74	22%
Progenetix - genomic copy number aberrations in cancer The Progenetix database provides an overview of copy number abnormalities in human cancer from Comparative Genomic Hybridization (CGH) experiments. With 30817 cases from 1016 publications (Oct 2013), Progenetix is the largest curated database for who ...	75	40%
resiDB ResiDB is a user-friendly sequence similarity-dependent database manager for bacteria, fungi, viruses, protozoa, invertebrate, plants, archaea, environmental and whole genome shotgun sequence data. Create a new database Access existing databases Loa ...	76	23%
Genome Sequence Archive GSA is a data repository specialized for archiving raw sequence reads. It supports data generated from a variety of sequencing platforms ranging from Sanger sequencing machines to single-cell sequencing machines and provides data storing and sharing ...	77	39%
ProTeus Signature sequences at the protein N- and C-termini	78	22%
The UCSC Archaeal Genome Browser The UCSC Archaeal Genome Browser is a window on the biology of more than 100 microbial species from the domain Archaea. Basic gene annotation is derived from NCBI Genbank/RefSeq entries, with overlays of sequence conservation across multiple species, ...	79	56%
HMMER Profile File Format The profile hidden Markov Model (HMM) calculated from multiple sequence alignment data in this service is stored in Profile HMM save format (usually with ".hmm" extension). It is an ASCII file containing a lot of header and descriptive records follow ...	80	36%
Major Intrinsic Proteins Modification Database This is a database of comparative protein structure models of the MIP (Major Intrinsic Protein) family of proteins. The MIPs have been identified from the completed genome sequence of organisms available at NCBI.	81	35%
Ocean Gene Atlas The Ocean Gene Atlas service provides data mining access to three complementary data objects: gene sequence catalogs (ENA), sample environmental context (PANGAEA), and gene abundances estimates in samples (computed by mapping sequence reads onto gene ...	82	28%
Proteomics Standards Initiative Extended Fasta Format The PSI Extended Fasta Format (PEFF) is a unified format for protein and nucleotide sequence databases to be used by sequence search engines and other associated tools (spectra library search tools, sequence alignment software, data repositories, etc ...	83	28%
DTU Bioinformatics CBS offers Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosi ...	84	22%
DriverDBv2 DriverDB, a database that incorporates >9500 cancer-related RNA-seq datasets and >7000 more exome-seq datasets, in addition to annotation databases and published bioinformatics algorithms dedicated to driver gene/mutation identification. Seven additi ...	85	30%
Universal PBM Resource for Oligonucleotide Binding Evaluation The UniPROBE (Universal PBM Resource for Oligonucleotide Binding Evaluation) database hosts data generated by universal protein binding microarray (PBM) technology on the in vitro DNA binding specificities of proteins.	86	39%
miRBase The miRBase database is a searchable database of published miRNA sequences and annotation. Each entry in miRBase represents a predicted hairpin portion of a miRNA transcript (termed mir in the database), with information on the location and sequence ...	87	73%
DDBJ/ENA/GenBank Feature Table The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February, ...	88	31%
CATH The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath_new) currently contains 34,287 domain structures classified into 1,383 superfamilies and 3,285 sequence families. Each structural family is expanded with domain seq ...	89	22%
SCOPe The ASTRAL compendium provides a set of tools and databases designed to aid investigators in the analysis of protein structure, particularly through the use of sequence comparison. Astral augments SCOP, a manual classification of protein domains acco ...	90	35%
PolyQ Polyglutamine Repeats in Proteins	91	28%
ConoServer ConoServer is a database specializing in sequences and structures of peptides expressed by marine cone snails. The database gives access to protein sequences, nucleic acid sequences and structural information on conopeptides. ConoServer's data are fi ...	92	46%
Minimum Information about a MARKer gene Sequence MIMARKS is the metadata reporting standard of the Genomic Standards Consortium that covers marker gene sequences from environmental surveys or individual organisms	93	42%
NCBI Viral Genomes Resource NCBI Viral Genomes Resource is a collection of virus genomic sequences that provides curated sequence data, related information and tools. It includes all complete viral genome sequences deposited in the International Nucleotide Sequence Database Col ...	94	38%
Nucleotide Sequence Database Collaboration This database consists of a joint effort to collect and disseminate databases containing DNA and RNA sequences. It is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. It covers the spectrum of data raw reads, th ...	95	27%
Rfam The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs). The families in Rfam break down into three broad functional classes: non-coding RNA genes ...	96	45%
Protein Clusters Related protein sequences (clusters)of Reference Sequence proteins encoded by complete genomes	97	22%
PDBSite 3D structure of protein functional sites	98	22%
Insertion Sequence Finder This database provides a list of insertion sequences (IS) isolated from bacteria and archae. It is organized into individual files containing their general features (name, size, origin, family.....) as well as their DNA and potential protein sequence ...	99	46%
Enzyme Structure Function Ontology The ESFO provides a new paradigm for organizing enzyme sequence, structure, and function information, whereby specific elements of enzyme sequence and structure are mapped to specific conserved aspects of function, thus facilitating the functional an ...	100	28%
PA-GOSUB Protein sequences from model organisms, GO assignment and subcellular localization	101	22%
Reference Sequence Database The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins.	102	60%
Conserved Domain Database The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, including NCBI-curated domains, which use 3D-structure information to explicitly to define domain boundaries and p ...	103	64%
PAZAR PAZAR is a software framework for the construction and maintenance of regulatory sequence data annotations; a framework which allows multiple boutique databases to function independently within a larger system (or information mall). The goal of PAZAR ...	104	34%
UniSave The UniProtKB Sequence/Annotation Version database (UniSave) is a comprehensive archive of UniProtKB/Swiss-Prot a nd UniProtKB/TrEMBL entry versions. All changed Swiss-Prot and TrEMBL entries are loaded into the UniSave as part of the public UniProtK ...	105	23%
Minimotif Miner Search tools for short functional motifs involved in posttranslational modifications, binding to other proteins, nucleic acids, or small molecules	106	22%
Lipase Engineering Database The Lipase Engineering Database (http://www.led.uni-stuttgart.de) integrates information on sequence, structure, and function of lipases, esterases, and related proteins. Sequence data on 806 protein entries are assigned to 38 homologous families, wh ...	107	27%
FireDB fireDB is a database of Protein Data Bank structures, ligands and annotated functional site residues. The database can be accessed by PDB codes or UniProt accession numbers as well as keywords.	108	30%
RNAcentral RNAcentral is a free, public resource that offers integrated access to a comprehensive and up-to-date set of non-coding RNA sequences provided by a collaborating group of databases representing a broad range of organisms and RNA types.	109	36%
Locus Reference Genomic sequences Each LRG is stable genomic DNA sequence for a region of the human genome	110	22%
Deep Sequence and Shape Motif (DESSO) DESSO is a deep learning-based framework that can be used to accurately identify both sequence and shape regulatory motifs from the human genome.	111	25%
NucleaRDB Families of nuclear hormone receptors	112	32%
al MENA Middle East and North Africa (MENA) encompass very unique populations, with a rich history and encompasses characteristic ethnic, linguistic and genetic diversity. The genetic diversity of MENA region has been largely unknown. The recent availability ...	113	22%
Gramene: A curated, open-source, integrated data resource for comparative functional genomics in plants Gramene's purpose is to provide added value to plant genomics data sets available within the public sector, which will facilitate researchers' ability to understand the plant genomes and take advantage of genomic sequence known in one species for ide ...	114	52%
Visual Database for Organelle Genome VDOG, Visual Database for Organelle Genome is an innovative database of the genome information in the organelles. Most of the data in VDOG are originally extracted from GeneBank, re-organized and represented.	115	22%
Minimal Metagenome Sequence Analysis Standard A proposed set of minimal standard analyses necessary for proper interpretation of meta-omic data and to allow comparative metagenomics and metatranscriptomics. Please note: We cannot find an up-to-date website for this resource. As such, we have mar ...	116	32%
Sequence-Structural Templates of Single-member Superfamilies SSToSS is a database which provides sequence-structural templates of single member protein domain superfamilies like PASS2. Sequence-structural templates are recognized by considering the content and overlap of sequence similarity and structural para ...	117	28%
NCBI Trace Archive The NCBI Trace Archive is a permanent repository of DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from various large-scale sequencing projects. The Trace Archive serves as the repository of sequencing da ...	118	22%
.ACE format The ACE file format is a specification for storing data about genomic contigs. The original ACE format was developed for use with Consed, a program for viewing, editing, and finishing DNA sequence assemblies. ACE files are generated by various assemb ...	119	22%
EcoliWiki: A Wiki-based community resource for Escherichia coli EcoliWiki is a community-based resource for the annotation of all non-pathogenic E. coli, its phages, plasmids, and mobile genetic elements.	120	27%
CompoDynamics Sequence composition dynamics of genes and genomes.	121	23%
MulPSSM Representation of multiple sequence alignments of protein families in terms of Position Specific Scoring Matrices (PSSMs) is commonly used in the detection of remote homologues. A PSSM is generated with respect to one of the sequences involved in the ...	122	22%
siRNAdb The siRNA database provides a gene-centric view of human siRNA experimental data, including siRNAs of known efficacy and siRNAs predicted to be of high efficacy by siSearch. Linked to these sequences is information including siRNA thermodynamic prope ...	123	28%
Hits High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent ...	124	22%
Stanford HIV Drug Resistance Database The Stanford HIV Drug Resistance Database (HIVDB) is an essential resource for public health officials monitoring ADR and TDR, for scientists developing new ARV drugs, and for HIV care providers managing patients with HIVDR.	125	43%
PASS2 PASS2 contains alignments of structural motifs of protein superfamilies. PASS2 is an automatic version of the original superfamily alignment database, CAMPASS (CAMbridge database of Protein Alignments organised as Structural Superfamilies). PASS2 con ...	126	38%
The Chromosome 7 Annotation Project The objective of this project is to generate the most comprehensive description of human chromosome 7 to facilitate biological discovery, disease gene research and medical genetic applications.	127	34%
Bio-Mirror A world bioinformatic public service for high-speed access to up-to-date DNA & protein biological sequence databanks.	128	25%
SEQanswers Wiki on all aspects of next-generation genomics	129	27%
TESS TESS (Transcription Element Search System, http://www.cbil.upenn.edu/tess) is a web-based service that searches DNA sequence for transcription factor binding sites. It integrates three databases of transcription factors and binding site models, and p ...	130	22%
PHOSIDA Phosphorylation sites in various species identified by mass spectrometry	131	34%
Regulatory Element Database for Drosophila REDfly is a curated collection of known Drosophila transcriptional cis-regulatory modules (CRMs) and transcription factor binding sites (TFBSs). REDfly seeks to include all experimentally verified fly regulatory elements along with their DNA sequence ...	132	36%
BMC Caller A webtool to identify and analyze bacterial microcompartment types in sequence data.	133	22%
ASC - Active Sequence Collection ASC (Active Sequences Collection) is a database of short amino acid sequences with known biological activity. The current version is substantially improved as compared to the previous release; it now includes more than 1300 different active short pro ...	134	22%
PIR - Protein Information Resource The Protein Information Resource (PIR) is an integrated public bioinformatics resource that supports genomic and proteomic research and scientific studies. PIR has provided many protein databases and analysis tools to the scientific community, includ ...	135	23%
Minimal Information about any Sequence Ontology An OWL representation of the Minimum Information for any (x) Standard (MIxS), managed by the Genomic Standards Consortium.	136	22%
Protein kinase resource The Protein Kinase Resource (PKR) is a curated information source which provides an integrated view of sequence and structure data combined with biochemical and genetic function data focused on a single family of proteins, the protein kinases. In add ...	137	22%
PIR SuperFamily The PIR SuperFamily concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships.	138	34%
RNArchitecture RNArchitecture is a database that provides a comprehensive description of relationships between known families of structured ncRNAs, with focus on sequence and structure similarities. RNArchitecture also provides literature information and links to o ...	139	29%
Ensembl Zebrafish Genome Browser This ensembl website features the zebrafish whole genome shotgun assembly sequence.	140	38%
NCBI Virus NCBI Virus is a community portal for viral sequence data from RefSeq, GenBank and other NCBI repositories.	141	42%
fRNAdb Functional RNA Database (fRNAdb) is a database service that hosts a large collection of non-coding transcripts including annotated/un-annotated sequences from H-inv database, NONCODE, and RNAdb. A set of computational sequence analyses are performed ...	142	22%
ParameciumDB ParameciumDB is a new model organism database for Paramecium, built using components of the Generic Model Organism Database (http://www.gmod.org) construction set (Chado relational database schema, Turnkey generic web framework and Gbrowse). The data ...	143	30%
Chicken Variation Database The chicken Variation Database (ChickVD) is an integrated information system for storage, retrieval, visualization and analysis of chicken variation data.	144	28%
PSSRdb Polymorphic Simple Sequence Repeats Database	145	26%
PGDBj Ortholog Database The PGDBj Ortholog Database, created under the auspices of the Plant Genome Database Japan (PGDBj), contains information about orthologous genes in plants based on their corresponding amino acid sequence similarity. By placing PGDBj Ortholog Database ...	146	26%
GPCR-SSFE GPCR-Sequence-Structure-Feature-Extractor (SSFE). Provides template suggestions and homology models of Class A GPCRs. Identifies key sequence and structural motifs in Class A GPCRs to guide template selection and build homology models.	147	22%
Genomic Contextual Data Markup Language The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that is a reference implementation the Minimum Information about a Genome Sequence (MIGS/MIMS/MIMARKS), and the extensions the Minimum Inf ...	148	33%
Organelle Genome Resource The organelle genomes are part of the NCBI Reference Sequence (RefSeq) project that provides curated sequence data and related information for the community to use as a standard.	149	22%
Distributed Sequence Annotation System The Distributed Annotation System (DAS) defines a communication protocol used to exchange annotations on genomic or protein sequences.	150	32%
NCBI Gene The Entrez Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. Entrez can effic ...	151	50%
Codon Usage Database Find GC content and frequency of codon usage for any organism that has a sequence in GenBank.	152	36%
GenomeTraFaC GenomeTraFaC is a database of conserved regulatory elements obtained by systematically analyzing the orthologous set of human and mouse genes. It mainly focuses on all of the high-quality mRNA entries of mouse and human genes in the Reference Sequenc ...	153	28%
Integrated resource of protein families, domains and functional sites InterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as si ...	154	67%
PolyA_DB	155	22%
msRepDB A comprehensive repetitive sequence database of over 80 000 species.	156	22%
ASTD AltSplice and AltExtron provide information on alternative intron/exons, alternative splice events, and isoform splice patterns. AEdb contains: AEdb-Sequence (sequence and properties of alternatively splice exons), AEdb-Function (data on functional a ...	157	22%
UniProt Knowledgebase Universal Protein resource. A database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the re ...	158	100%
HOGENOM HOGENOM is a phylogenomic database providing families of homologous genes and associated phylogenetic trees (and sequence alignments) for a wide set sequenced organisms.	159	34%
SitEx Projections of protein functional Sites on Exons	160	26%
Ribosomal Database Project (RDP-II) The Ribosomal Database Project - II (RDP-II)(1) provides data, tools and services related to ribosomal RNA sequences to the research community. Through its website (http://rdp.cme.msu.edu), RDP-II offers aligned and annotated rRNA sequence data, anal ...	161	39%
Ensembl Compara Ensembl Compara provides cross-species resources and analyses, at both the sequence level and the gene level.	162	38%
Gene3D Gene3D uses the information in CATH to predict the locations of structural domains on millions of protein sequences available in public databases. Sequence data from UniProtKB and Ensembl for domains with no experimentally determined structures are s ...	163	39%
Alias A tool for converting identifiers in which multiple aliases are used to refer to sequences. Also available as a stand-alone tool.	164	26%
Pig Genomic Informatics System The Pig Genomic Informatics System (PigGIS) presents accurate pig gene annotations in all sequenced genomic regions. It integrates various available pig sequence data, including 3.84 million whole-genome-shortgun (WGS) reads and 0.7 million Expressed ...	165	26%
Network of Cancer Genes The Network of Cancer Genes (NCG) contains information on duplicability, evolution, protein-protein and microRNA-gene interaction, function, expression and essentiality of cancer genes from manually curated publications . NCG also provides informatio ...	166	34%
CoxBase CoxBase is an online platform for epidemiological surveillance, visualization, analysis and typing of Coxiella burnetii genomic sequence.	167	22%
lncRNASNP2	168	22%
NCBI Nucleotide The NCBI Nucleotide database collects sequences from such sources as GenBank, RefSeq, TPA, and PDB. Sequences collected relate to genome, gene, and transcript sequence data, and provide a foundation for research related to the biomedical field.	169	22%
DARNED Database of RNA Editing	170	22%
CR-EST - Crop ESTs The crop EST database CR-EST (http://pgrc.ipk-gatersleben.de/cr-est/) is a publicly available online resource providing access to sequence, classification, clustering, and annotation data of crop EST projects at IPK Gatersleben, Germany. CR-EST curre ...	171	29%
Amordad Database engine for comparing metagenomic data at massive scale. It first obtains the sequence signature of metagenomes and organizes them as points in high dimensional space.	172	25%
CORG - A database for COmparative Regulatory Genomics Sequence conservation in non-coding, upstream regions of orthologous genes from man and mouse is likely to reflect common regulatory DNA sites. Motivated by this assumption we have delineated a catalogue of conserved non-coding sequence blocks and pr ...	173	22%
ProTherm ProThermDB is a database for proteins and mutants with data on protein stability, an increase of 84% from the previous version. It contains several thermodynamic parameters such as melting temperature, free energy obtained with thermal and denaturant ...	174	22%
Rice Genome Annotation Project This website provides genome sequence from the Nipponbare subspecies of rice and annotation of the 12 rice chromosomes. These data are available through search pages and the Genome Browser that provides an integrated display of annotation data.	175	37%
PREX PeroxiRedoxin classification indEX	176	30%
VIRsiRNAdb VIRsiRNAdb contains information on experimentally validated Viral siRNA/shRNA which target viral genome regions. It provides efficacy information where available, as well as the siRNA sequence, viral target and subtype, as well as the target genomic ...	177	30%
Hardwood Genomics Project The Hardwood Genomics Project is a databases for expressed genes, genetic markers, genetic linkage maps, and reference populations. It provides lasting genomic and biological resources for the discovery and conservation of genes in hardwood trees for ...	178	29%
APPRIS Annotates variants with biological data such as protein structural information, functionally important residues, conservation of functional domains and evidence of cross-species conservation.	179	32%
PRODORIC2	180	22%
PROSITE PROSITE is a database of protein families and domains. PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them.	181	51%
Fungal and Oomycete genomics resource FungiDB is an integrated genomic and functional genomic database for the kingdom Fungi. The database integrates whole genome sequence and annotation and also includes experimental and environmental isolate sequence data. The database includes compara ...	182	35%
ColabFold ColabFold databases are MMseqs2 expandable profile databases to generate diverse multiple sequence alignments to predict protein structures.	183	22%
GenomeNet Network of database and computational resources including KEGG (pathways, interactions, etc.) and DBGET/LinkDB (an integrated database retrieval system). It also hosts several web-based tools for sequence analysis (i.e. Blast, Motif, Clustal W).	184	36%
MEROPS The MEROPS database is an information resource for peptidases (also termed proteases, proteinases and proteolytic enzymes) and the proteins that inhibit them.	185	53%
ElastoDB Repository for well-characterized elastin sequences to facilitate its study. The database has since expanded to include other non-elastin sequences that share elastic properties.	186	27%
Berkeley Drosophila Genome Project insitu In early 2010 we updated the site to facilitate more rapid transfer of our data to the public database and focus our efforts on the core mission of providing expression pattern images to the research community. The original database https://www.fruit ...	187	22%
PseudoBase++ PseudoBase is a database containing structural, functional and sequence data related to RNA pseudoknots. It can be reached by its central page at http://pseudobaseplusplus.utep.edu. From here one can retrieve pseudoknot data as well as submit data fo ...	188	22%
abYsis abYsis is a web-based antibody research system that includes an integrated database of antibody sequence and structure data. The publicly available version includes pre-analyzed sequence data from the European Molecular Biology Laboratory European Nu ...	189	33%
NCBI PopSet NCBI PopSet collects DNA sequences to analyze the ways that populations are related by evolution. Such sequences indicate if populations originate from different members of the same species or from organisms of different species entirely.	190	22%
ValidNESs	191	22%
FlyBase Genetic, genomic and molecular information pertaining to the model organism Drosophila melanogaster and related sequences. This database also contains information relating to human disease models in Drosophila, the use of transgenic constructs contai ...	192	56%
Minimal Information about any Sequence (MIxS) Controlled Vocabularies Controlled vocabularies for the MIxS family of metadata checklists. See http://gensc.org/gc_wiki/index.php/MIxS for details on the MIxS checklists.	193	22%
mutLBSgeneDB Mutations in Ligand Binding Sites gene DataBase	194	27%
MAR databases The MAR databases is a collection of manually curated marine microbial contextual and sequence databases, based at the Marine Metagenomics Portal. This was developed as a part of the ELIXIR EXCELERATE project in 2017 and is maintained by The Center f ...	195	30%
UTRdb/UTRsite The 5' and 3' untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translational efficiency. For this reason we developed UTRdb, a specialized database of 5 ...	196	23%
Connectivity Table file format A CT (Connectivity Table) file contains secondary structure information for a RNA sequence.	197	24%
TIGR Plant Transcript Assembly database The TIGR Plant Transcript Assemblies (TA) database (http://plantta.tigr.org) uses expressed sequences collected from the NCBI GenBank Nucleotide database for the construction of transcript assemblies. The sequences collected include expressed Sequenc ...	198	22%
PDBselect PDBselect (http://bioinfo.tg.fh-giessen.de/pdbselect/) is a list of representative protein chains with low mutal sequence identity selected from the protein data bank (PDB) to enable unbiased statistics. The list increased from 155 chains in 1992 to ...	199	30%
Database of small human non-coding RNAs Integrated annotation and sequencing-based expression data for all major classes of human small non-coding RNAs (sncRNAs) for both full sncRNA transcripts and mature sncRNA products derived from these larger RNAs.	200	32%
Information system for G protein-coupled receptors The GPCRDB is a molecular-class information system that collects, combines, validates and stores large amounts of heterogenous data on G protein-coupled receptors (GPCRs). The GPCRDB contains data on sequences, ligand binding constants and mutations. ...	201	44%
ForestTreeDB ForestTreeDB is intended as a resource that centralizes large-scale EST sequencing results from several tree species (http://foresttree.org/ftdb). Our group at the Center for Computational Genomics and Bioinformatics (University of Minnesota) aims to ...	202	22%
NEMBASE Nematode sequence and functional data database	203	22%
alkaligrass A high-quality genome sequence of alkaligrass provides insights into halophyte stress tolerance. A high-quality chromosome-level genome sequence of alkaligrass assembled from Illumina, PacBio and 10× Genomics reads combined with genome-wide chromosom ...	204	25%
eSLDB - eukaryotic Subcellular Localization database eSLDB (eukaryotic Subcellular Localization DataBase) collects the annotations of subcellular localization of eukaryotic proteomes. For each sequence, the database lists localization obtained adopting three different approaches: 1) experimentally dete ...	205	22%
Saccharomyces Genome Database The Saccharomyces Genome Database (SGD) provides comprehensive integrated biological information for the budding yeast Saccharomyces cerevisiae along with search and analysis tools to explore these data, enabling the discovery of functional relations ...	206	54%
RADAR A Rigorously Annotated Database of A-to-I RNA editing	207	22%
RPFdb Ribosome profiling database	208	22%
Spliceosome Database	209	22%
eF-site - Electrostatic surface of Functional site Electrostatic potentials and hydrophobic properties of the active sites	210	22%
Colorectal Cancer Atlas Colorectral Cancer Atlas is an web-based resource which integrates genomic and proteomic pertaining to colorectal cancer cell lines and tissues. Data catalogued includes, quantitative and non-quantitative protein expression, sequence variations, cell ...	211	29%
Placental Genetic Variance Includes variations of DNA sequence, chromosomal structure and copy number, as well as RNA and translational variation. The Genetic Variation ontology expands on work done for Variation Ontology (VariO) and Sequence Types and Features Ontology (SO) w ...	212	22%
iPfam A database of Pfam domain interactions	213	22%
Interrupted coding sequences ICDS database is a database containing ICDS detected by a similarity-based approach. The definition of each interrupted gene is provided as well as the ICDS genomic localisation with the surrounding sequence.	214	29%
MitoProteome MitoProteome is a mitochondrial protein sequence database and annotation system. The initial release contains 847 human mitochondrial protein sequences, derived from public sequence databases and mass spectrometric analysis of highly purified human h ...	215	30%
DescribePROT DescribePROT is a database containing annotations of 13 putative structural and functional properties at the amino acid level for ~1.4 million proteins from 83 popular/model organism, to be extended to hundreds of additional organisms. Users can sear ...	216	26%
Cnidarian Evolutionary Genomics Database CnidBase, the Cnidarian Evolutionary Genomics Database, is a tool for investigating the evolutionary, developmental and ecological factors that affect gene expression and gene function in cnidarians.	217	25%
CRISPRCasdb CRISPRCasdb acts as a gateway to a publicly accessible database and software to enable the easy detection of CRISPR sequences in locally-produced data and the consultation of CRISPR sequence data present in the database. It also gives information on ...	218	46%
BPS Database of RNA Base-Pair Structures	219	22%
Multiple Alignment Format The Multiple Alignment Format stores DNA level multiple alignments in an easily readable format between entire genomes. Unlike previous formats this resource can cope with forward and reverse strand directions, multiple pieces to the alignment, and s ...	220	24%
TrSDB Transcription factor database	221	22%
Cacao Genome Database The Cacao Genome Database (CGD) is a database storing information on the genome of Theobroma cacao. The release of the cacao genome sequence provides researchers with access to the latest genomic tools, enabling more efficient research and accelerati ...	222	24%
SoyBase SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains genetic, physical and genomic sequence maps integrated with qualitati ...	223	40%
DESSO-DB A web database for sequence and shape motif analyses and identification.	224	22%
piRNAclusterDB Clusters of piRNAs	225	22%
NCBI Genome Data Viewer The NCBI Genome Data Viewer (GDV) is a genome browser supporting the exploration and analysis of annotated eukaryotic genome assemblies. The GDV browser can visualize different types of molecular data in a whole genome context, including gene annotat ...	226	26%
Therapeutic Structural Antibody Database The Therapeutic Structural Antibody Database tracks all antibody- and nanobody-related therapeutics recognized by the World Health Organisation (WHO), and identifies any corresponding structures in the Structural Antibody Database (SAbDab) with near- ...	227	31%
ARAMEMNON ARAMEMNON is a curated database for Arabidopsis thaliana transmembrane (TM) proteins and transporters. The database compiles topology and signal sequence predictions and displays the results in a directly comparable graphical output format for presen ...	228	35%
UNITE database UNITE is a database and sequence management environment centered on the eukaryotic nuclear ribosomal ITS region. All eukaryotic ITS sequences from the International Nucleotide Sequence Database Collaboration are clustered to approximately the species ...	229	22%
Hollywood Exon annotation database	230	22%
TOPPR The Online Protein Processing Resource	231	22%
sRNAMap small regulatory RNA in microbial genomes	232	22%
CloneDB Clones and libraries: sequence data, map positions and distributor information	233	22%
CLUSTAL-W Alignment Format CLUSTAL-W Alignment Format is a simple text-based format, often with a *.aln file extension, used for the input and output of DNA or protein sequences into the Clustal suite of multiple alignment programs.	234	72%
LOX-DB Due to their involvement in several diseases like cancer, inflammation, fever or arthritis, a lot of research is done on lipoxygenases yielding information about sequence, structure and function of these proteins. The LipOXygenases-DataBase (LOX-DB) ...	235	22%
Expansin Engineering Database Expansin Engineering Database integrates information on sequence, structure and function of expansins.	236	23%
GABI-Kat SimpleSearch T-DNA insertions in Arabidopsis and their flanking sequence tags.	237	42%
miRNEST miRNEST is an integrative collection of animal, plant and virus microRNA data. miRNEST is being gradually developed to create an integrative resource of miRNA-associated data. The data comes from our computational predictions (new miRNAs, targets, mi ...	238	34%
INTERVAL The INTERVAL bioresource comprises 50,000 English blood donors, on whom deep molecular phenotypes (e.g. genomics, proteomics, metabolomics, lipidomics) have been generated. In over 100 years of blood donation practice, INTERVAL is the first randomise ...	239	23%
Membranome A database of single-pass membrane proteins	240	22%
Pharmacogenomics Ontology The PharmGKB Ontology imports genetic sequence data, collected in relational format, into the OWL, and aims to automate the process of updating the links between the ontology and data acquisition when the ontology changes. They have linked PharmGKB w ...	241	31%
SILVA SILVA is a comprehensive, quality-controlled web resource for up-to-date aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains alongside supplementary online services. In addition to data products, SILVA provide ...	242	72%
BAliBASE BAliBASE; a benchmark alignment database, including enhancements for repeats, transmembrane sequences and circular permutations.	243	35%
EbolaID Provides a complete, quality checked and regularly updated list of oligonucleotides for the Ebola virus. The database describes the genetic diversity across the Ebola genome to facilitate the design of accurate diagnostic methods and therapeutic appr ...	244	24%
CIS-BP The Catalog of Inferred Sequence Binding Preferences (CIS-BP) is a library of transcription factor (TF) DNA binding motifs and specificities. The data are organized in a user friendly manner for ease of searching, browsing, and downloading. CIS-BP al ...	245	22%
eProS Energy profiles of protein structures	246	22%
WDSPdb WD40 domain structure predictions	247	22%
DoBISCUIT Database Of BIoSynthesis clusters CUrated and InTegrated	248	22%
Molecular Modeling Database The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more.	249	39%
PomBase PomBase is a model organism database that provides organization of and access to scientific data for the fission yeast Schizosaccharomyces pombe. PomBase supports genomic sequence and features, genome-wide datasets and manual literature curation as w ...	250	44%
DBD DBD provides transcription factor predictions for more than 150 completely sequenced genomes available for browsing and download. Predictions are based on presence of sequence specific DNA binding domain assignments using hidden Markov models from th ...	251	23%
Genome Reviews The goal of the Genome Reviews project is to provide an up-to-date, standardised and comprehensively annotated view of the genomic sequence of organisms with completely deciphered genomes. Genome Reviews are curated versions of EMBL/GenBank/DDBJ dat ...	252	23%
REDIportal A-to-I RNA editing events in human	253	22%
SelenoDB A database of selenoprotein genes, proteins and SECIS elements	254	22%
SomamiR Somatic mutations that impact microRNA targeting in cancer	255	22%
DAnCER Disease-Annotated Chromatin Epigenetics Resource	256	22%
National Omics Data Encyclopedia The National Omics Data Encyclopedia (NODE) is big data library with complete and integrative data storage, safe and efficiency-guaranteed data management as well as comprehensive and user-friendly data service functions. NODE stores raw sequence dat ...	257	23%
Bacterial protein tYrosine Kinase database The Bacterial protein tYrosine Kinase database (BYKdb) contains computer-annotated BY-kinase sequences. The database web interface allows static and dynamic queries and provides integrated analysis tools including sequence annotation.	258	36%
GlycoCT sequence format for carbohydrates. GlycoCT format is devised to describe the carbohydrate sequences, with a controlled vocabulary to name monosaccharides, adopting IUPAC rules to generate a consistent, machine-readable nomenclature, based on a connection table approach, instead of a l ...	259	29%
SINEBase A database of short interspersed elements (SINEs)	260	22%
ChromDB Chromatin-associated proteins in a broad range of organisms	261	22%
Database of Rice Transcription Factors DRTF contains 2025 putative transcription factors (TFs) in Oryza sativa L. ssp. indica and 2384 in ssp. japonica, distributed in 63 families, identified by computational prediction and manual curation. It includes detailed annotations of each TF incl ...	262	31%
Factorbook Human transcription factor binding data from ChIP-seq	263	22%
Annotated regulatory Binding Sites from Orthologous Promoters ABS: A database of Annotated regulatory Binding Sites from known binding sites identified in promoters of orthologous vertebrate genes.	264	30%
Ebola and Hemorrhagic Fever Virus Database The Ebola and Hemorrhagic Fever Virus Database stems from the Hemorrhagic Fever Viruses (HFV) Database Project founded by Dr. Carla Kuiken in 2009 at the Los Alamos National Laboratory (LANL). The HFV Database was modeled on the Los Alamos HIV Databa ...	265	29%
POSTAR Post-transcriptional regulation by RNA-binding proteins	266	22%
UniGene <<<!!!<<< This repository is no longer available>>>!!!>>>. Although the web pages are no longer available, you will still be able to download the final UniGene builds as static content from the FTP site https://ftp.ncbi.nlm.nih.gov/repository/UniGen ...	267	23%
YM500 smRNA-seq database for miRNA research	268	22%
RAID Human RNA-RNA and RNA-protein interactions	269	22%
tRNAdb Compilation of tRNA sequences and tRNA genes	270	22%
COMBREX Computational Bridge to Experiments	271	29%
L1Base Functional annotation and prediction of LINE-1 elements	272	22%
ARED-Plus	273	22%
Candida Genome Database The Candida Genome Database (CGD) provides access to genomic sequence data and manually curated functional information about genes and proteins of the human pathogen Candida albicans. It collects gene names and aliases, and assigns gene ontology term ...	274	38%
EchinoDB EchinoDB is a database consisting of amino acid sequence othoclusters from 42 echinoderm transcriptomes. We sampled taxa to span the deepest divergences within each of the 5 extant echinoderm classes. Data can be searched by keywords such as annotati ...	275	22%
IMGT/LIGM-DB IMGT/LIGM-DB is the IMGT® comprehensive database of immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences, from human and other vertebrate species, with translation for fully annotated sequences, created in 1989 by LIGM (http://www.imgt.o ...	276	38%
Databases of Orthologous Promoters DoOP is a database of eukaryotic promoter sequences (upstream regions), aiming to facilitate the recognition of regulatory sites conserved between species. Based on the Arabidopsis thaliana and Homo sapiens genome annotation, this resource is also a ...	277	28%
LenVarDB Database of length variantion in protein domains	278	22%
Short Read Archive eXtensible Markup Language The SRA data model contains the following objects: Study: information about the sequencing project Sample: information about the sequenced samples Experiment: information about the libraries, platform; associated with study, sample(s) and run(s) Run: ...	279	30%
UUCD Ubiquitin and ubiquitin-like conjugation database	280	22%
ECgene Genome annotation for alternative splicing	281	22%
AniProtDB The Animal Proteome Database (AniProtDB) is a comprehensive collection of proteomes from 100 species spanning 21 animal phyla. In addition to providing open access to this collection of high-quality metazoan proteomes, information on predicted protei ...	282	22%
PLPMDB Pyridoxal-5'-phosphate dependent enzymes mutations	283	22%
eBLOCKS Classifying proteins into families and super-families allows identification of functionally mportant conserved domains. The motifs and scoring matrices derived from such conserved regions provide computational tools to recognize similar patterns in n ...	284	22%
miRNAMap microRNA precursors and their mapping to targets in vertebrate genomes	285	22%
MAPPER-2 This resource provides information primarily on the upstream non-coding sequence data of genes in 3 genomes which gives insight into the transcription factors binding sites (TFBSs). For each transcript, the region scanned extends from 10,000bp upstre ...	286	34%
Database resources of the National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBankÂ® nucleic acid sequence database and the PubMed database of citations and abstracts publish ...	287	54%
RetrOryza With the availability of the complete genomic sequence of rice, the identification and annotation of LTR-Retrotransposons has become a necessity as they comprise an important part of plant genomes (1). RetrOryza is a database that aims at providing t ...	288	22%
PlantAligDB Web-based platform of nucleotide sequence alignments of plants.	289	22%
TTSMI Triplex Target DNA Sites in the human genome	290	22%
Binary Alignment Map Format BAM is the compressed binary version of the Sequence Alignment/Map (SAM) format, a compact and indexable representation of nucleotide sequence alignments. Many next-generation sequencing and analysis tools work with SAM/BAM. For custom track display, ...	291	33%
MeT-DB RNA MEthylation by SEquencing databaSe	292	22%
Epitome Epitome is a database of all known antigenic residues and the antibodies that interact with them, including a detailed description of the residues involved in the interaction and their sequence/structure environments. Each entry in the database descr ...	293	22%
3DBIONOTES Web based application designed to integrate protein structure, protein sequence and protein annotations in a unique graphical environment. The current version of the application offers a unified, enriched and interactive view of EMDB volumes, PDB str ...	294	22%
China National GeneBank DataBase The China National GeneBank database (CNGBdb) is a unified platform for biological big data sharing and application services. At present, CNGBdb has integrated a large amount of internal and external biological data from resources such as CNGB, NCBI, ...	295	26%
INTEGRALL INTEGRALL is a web-based platform dedicated to compile information on integrons and designed to organize all the data available for these genetic structures. INTEGRALL provides a public genetic repository for sequence data and nomenclature and offers ...	296	22%
microRNA.org microRNA target predictions and expression profiles	297	22%
PPT-DB Protein Property Prediction and Testing Database	298	22%
ADDA - A Domain Database ADDA is a global clustering of protein sequences into protein domains and protein domain families. The database currently contains domains for 1.5 Mio sequences from UniProt, ENSEMBL, and other sequence databases. The domains are grouped into 123,000 ...	299	22%
RNA Ontology RNAO is a controlled vocabulary pertaining to RNA function and based on RNA sequences, secondary and three-dimensional structures. The central aim of the RNA Ontology Consortium (ROC) is to develop an ontology to capture all aspects of RNA - from pri ...	300	34%
Ribonuclease P Database RNase P sequences, alignments, and structures	301	22%
Generic Feature Format Version 3 The Generic Feature Format Version 3 (GFF3) format was developed after earlier formats, although widely used, became fragmented into multiple incompatible dialects. The GFF3 format addresses the most common extensions to GFF, while preserving backwar ...	302	33%
TIGRFAMs TIGRFAMs is a collection of manually curated protein families focusing primarily on prokaryotic sequences.It consists of hidden Markov models (HMMs), multiple sequence alignments, Gene Ontology (GO) terminology, Enzyme Commission (EC) numbers, gene s ...	303	40%
MachiBase Drosophila melanogaster 5' mRNA transcription start site database	304	22%
DoriC DoriC regions in bacterial and archaeal genomes	305	22%
SNP2TFBS Regulatory SNPs affecting predicted transcription factor binding sites	306	22%
PALI The database of Phylogeny and ALIgnment of homologous protein structures (PALI) contains structure-based sequence alignments and dendrograms based on information primarily derived from the structural alignments at domain level [1,2]. Protein domain d ...	307	22%
KIDFamMap Kinase-inhibitor-disease family map	308	22%
PHYTOPROT Clusters of predicted plant proteins	309	22%
Ontology for Genetic Interval Using BFO (Basic Formal Ontology) as its upper-level ontology, the Ontology for Genetic Interval (OGI) represents gene as an entity with its 3D shape, topography, and primary DNA sequence as the foundation for its 3D structure. There is no official h ...	310	23%
ACTIVITY ACTIVITY, a database on DNA site sequences with known activity magnitudes, measurement systems and sequence-activity relationships under fixed experimental conditions is additionally adapted to applications to the phylogenetic footprints of known sit ...	311	22%
MimoDB Mimotope database, active site-mimicking peptides selected from phage-display libraries	312	30%
NBDB NBDB database provides profiles of Elementary Functional Loops (EFLs) involved in binding of nucleotide-containing ligands. Each EFL in form of a PSSM (position-specific scoring matrix) profile is complemented with the information on SCOP entities, s ...	313	22%
SilkDB The SilkDB is an open-access database for genome biology of the silkworm (Bombyx mori). SilkDB contains the genomic data, including genome assembly, gene annotation, chromosomal mapping, orthologous relationship and experiment data, such as microarra ...	314	31%
LNCediting RNA editing sites in lncRNAs from human, monkey, mouse and fly	315	22%
Kinomer Classification of protein kinases encoded in various eukatotic species	316	22%
MegaMotifbase Structural motifs in protein families and superfamilies	317	22%
Transcription Factor Class TFClass is a resource that classifies eukaryotic transcription factors (TFs) according to their DNA-binding domains. Combining information from different resources, manually checking the retrieved mammalian TF sequences and applying extensive phyloge ...	318	32%
PyIgClassify Clusters of conformations of antibody CDRs	319	22%
ZiFDB Zinc Finger DataBase	320	22%
WERAM Writers, Erasers and Readers of Histone Acetylation and Methylation	321	22%
NRED Noncoding RNA Expression Database	322	22%
MALISAM Manual alignments for structurally analogous motifs in proteins	323	22%
SpliceNest A tool for visualizing splicing of genes from EST data	324	22%
BeetleBase Genome database of the beetle Tribolium castaneum	325	33%
Synthetic Gene Database The Synthetic Gene Database (http://www.evolvingcode.net/codon/sgdb/index.php) is a resource that has collected together sequence information on synthetic genes (i.e. genes that were designed conceptually, rather than built from an initial, physical ...	326	22%
RepTar Predicted targets of host and viral miRNAs	327	28%
OPTIC Orthologous and Paralogous Transcripts in Clades	328	22%
JuncDB Exon-exon Junction database	329	22%
GELBANK GELBANK is a publicly available database of two-dimensional gel electrophoresis (2DE) gel images of proteomes from organisms with known genome information (available at http://gelbank.anl.gov). GELBANK serves as a database for those proteomics labs t ...	330	27%
The Arabidopsis Information Resource The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. Data available from TAIR includes the complete genome sequence along with gene structure, gene pro ...	331	50%
RaftProt Lipid raft associated proteins in mammals	332	22%
Nematodes.org Wiki for coordinating nematode sequencing projects	333	28%
Ensembl Fungi Ensembl Fungi is a browser for fungal genomes. A majority of these are taken from the databases of the International Nucleotide Sequence Database Collaboration (the European Nucleotide Archive at the EBI, GenBank at the NCBI, and the DNA Database of ...	334	40%
miRGator microRNA target prediction, functional analysis, and gene expression data	335	22%
BIOZON Biozon is a platform that allows for the storage, management, and analysis of interrelated proteins, genes, interactions, protein families, cellular pathways and more. These heterogeneous data types and the relations between them are locally warehous ...	336	22%
OnTheFly DNA-binding specificities of transcription factors in Drosophila	337	22%
EnteroBase Global genomic population structure of Clostridioides difficile	338	22%
Cyanolyase Sequences and motifs of the phycobilin lyase protein family	339	23%
TransportDB Sequences and classification of predicted membrane transporters encoded in complete genomes	340	22%
Secreted Protein Database Secreted proteins from human, mouse and rat	341	22%
tRFdb Short (14-32 nt) tRNA-related fragments	342	22%
CharProtDB Experimentally Characterized Protein annotations	343	28%
SuperCAT A database for multilocus sequence typing analysis of the Bacillus cereus group of bacteria	344	22%
Animal Toxin Database Database of animal toxins	345	22%

*ReputationScore indicates how established a given datasource is. Find out more.

Need help integrating and/or managing biomedical data?