rna sequence 304 protein sequence 209 nucleic acid sequence 185 sequence analysis 104 dna sequences 97 sequence assembly 49 sequence sites, features and motifs 35 sequence annotation 31
Source |
ReputationScore*
|
---|---|
Exome Aggregation Consortium Browser
The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community.
...
|
|
Genebass
Genebass is a resource of exome-based association statistics, made available to the public. The dataset encompasses 3,817 phenotypes with gene-based and single-variant testing across 281,852 individuals with exome sequence data from the UK Biobank.
|
|
ESP
NHLBI Exome Sequencing Project (ESP): Exome Variant Server (EVS) for browsing single nucleotide variation data from exome sequencing experiments mainly focused on heart, lung and blood disorders.
|
|
Wellcome Sanger Institute: Whole Exome Sequencing
There is a substantial overlap between the NIHR IBD BioResource and the IBD UK Genetics Consortium (IBDGC). The NIHR BioResource provides some DNA samples. IBDGC data is being provided by the Wellcome Sanger Institute, who are performing the sequenci
...
|
|
dbMTS
dbMTS is a comprehensive database of putative human microRNA target site (MTS) SNVs and their functional predictions. dbMTS collects all potential SNVs microRNA target seed regions in human 3’UTRs and provides their functional predictions and annotat
...
|
|
Practical Haplotype Graph
Platform for storing and using pangenomes for imputation.
|
|
Sequence Ontology
SO is a collaborative ontology project for the definition of sequence features used in biological sequence annotation. The Sequence Ontology is a set of terms and relationships used to describe the features and attributes of biological sequence. SO i
...
|
|
SomaMutDB
A database of somatic mutations in normal human tissues.
|
|
CanVaS
CanVaS is a Greek cancer patient genetic variation resource.
|
|
Gene4Denovo
an integrated database and analytic platform for de novo mutations in humans.
De novo mutations (DNMs) significantly contribute to sporadic diseases, particularly in neuropsychiatric disorders. Whole-exome sequencing (WES) and whole-genome sequencin
...
|
|
KRGDB
The large-scale variant database of 1722 Koreans based on whole genome sequencing.
|
|
DDBJ Sequence Read Archive
DDBJ Sequence Read Archive (DRA) is an archive database for output data generated by next-generation sequencing machines including Roche 454 GS System®, Illumina Genome Analyzer®, Applied Biosystems SOLiD® System, and others. DRA is a member of the I
...
|
|
DDBJ Trace Archive
DDBJ Trace Archive (DTA) is a permanent repository of DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from various large-scale sequencing projects. DTA is a member of the International Nucleotide Sequence
...
|
|
COGVIC
COGVIC(Catalogue Of Germline Variants In Cancer). A comprehensive database of germline pathogenic variants in East Asian pan-cancer patients.
|
|
CNVIntegrate
Multi-ethnic database for identifying copy number variations associated with cancer. View gene-centric CNV profile collected from healthy individuals and multiple cancer types.
|
|
Sequence Read Archive
The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms Data submitted to SRA. It is organized using a metadata model consisting of six objects: study, sample, experiment, run, analysis and submissi
...
|
|
TMC-SNPdb 2.0
An ethnic-specific database of Indian germline variants.
|
|
GenBank Sequence Format
GenBank Sequence Format (GenBank Flat File Format) consists of an annotation section and a sequence section. The start of the annotation section is marked by a line beginning with the word "LOCUS". The start of sequence section is marked by a line be
...
|
|
AstraZeneca PheWAS Portal
The AstraZeneca PheWAS Portal is a public repository of gene-phenotype associations for phenotypes derived from electronic health records, questionnaire data, and continuous traits. These data were generated using exome sequencing and phenotype data
...
|
|
Uniclust
Clustered protein sequences and multiple sequence alignments
|
|
gnomAD
Genome Aggregation Database (gnomAD) - browser that aggregates exome and whole-genome sequencing data from a wide variety of large-scale sequencing projects. It enables search of genetic variation information by gene, variant or region.
|
|
Sequence Alignment Map
The Sequence Alignment/Map (SAM) format is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section.
|
|
MPS6
Review and classification of published variants in the ARSB gene.
The purpose of this database is to support researchers and clinicians.
understand structural changes on alylsulfatase B (ASB) caused by Mucopolysaccharidosis type VI (MPS6) mutations
...
|
|
Database of Sequence Tagged Sites
dbSTS is an NCBI resource that contains sequence data for short genomic landmark sequences or Sequence Tagged Sites.
|
|
Genome Variation Format
The Genome Variation Format (GVF) is a very simple file format for describing sequence alteration features at nucleotide resolution relative to a reference genome.
|
|
UK Biobank
UK Biobank is a large-scale biomedical database and research resource that provides researchers access to detailed longitudinal phenotype, medical and genetic data from 500,000 volunteer participants.
|
|
SIMAP
Protein sequences are of utmost importance for studying the function and evolution of genes and genomes. Therefore a rich collection of methods in computational biology relies on the analysis and comparison of protein sequences. Many of these intensi
...
|
|
Human Genetic Variation Database
The Human Genetic Variation Database (HGVD) aims to provide a central resource to archive and display Japanese genetic variation and association between the variation and transcription level of genes. The database currently contains genetic variation
...
|
|
FASTQ Sequence and Sequence Quality Format
FASTQ is a text-based file format for sharing sequencing data combining both the sequence and an associated per base quality score.
|
|
DBSAV database
DBSAV database reports GTS scores of human genes and DeepSAV scores of SAVs in the human proteome, including pathogenic SAVs, benign SAVs, gnomAD SAVs observed in exome sequencing, and all possible SAVs by single nucleotide variations. Each human pro
...
|
|
dbNSFP
Database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) and splice-site variants (ssSNVs) in the human genome. It also facilitates the steps of filtering and prioritizing SNVs fr
...
|
|
GenBank
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. The complete release notes for the current version of GenBank are available on the NCBI ftp site. A new release is made every two months. G
...
|
|
CMPD
MPD is designed for providing a comprehensive, integrated and well-annotated resource, focusing on protein sequence-altering variations originated from both germline and cancer-associated somatic variations. The mutated protein sequence pool was base
...
|
|
Feature Annotation Location Description Ontology
The Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences for data resources represented in RDF and/or OWL. FALDO can be used to describe nucleotide features in sequ
...
|
|
INSD sequence record XML
The International Nucleotide Sequence Database Collaboration (INSDC) is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. INSDC covers the spectrum of data raw reads, though alignments and assemblies to functional
...
|
|
ATAV
ATAV is a comprehensive platform for population-scale genomic analyses.
ATAV stores variant and per site coverage data for all samples in a centralized database, which is efficiently queried by ATAV to support diagnostic analyses for trios and single
...
|
|
PhenomeCentral
Repository for clinicians and scientists working in the rare disorder community. It enables secure sharing of case records by clinicians and rare disease scientists and helps the user to find additional cases of the same unnamed disorder. The reposit
...
|
|
EBI patent sequences
Non-redundant databases of patent DNA and protein sequences
|
|
openSNP
A crowdsourced collection of personal genomics data. Includes SNP genotyping, exome sequencing data, phenotypic annotation and quantified self tracking data.
|
|
Minimum Information about any (x) Sequence
The minimum information about any (x) sequence (MIxS) is an overarching framework of sequence metadata, that includes technology-specific checklists from the previous MIGS and MIMS standards, provides a way of introducing additional checklists such a
...
|
|
NCBI Trace Archives
The Trace Archives includes the following archives: The Sequence Read Archive (SRA) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw
...
|
|
Reference Sequence Annotation
An ontology for sequence annotations and how to preserve them with reference sequences.
|
|
UniParc
The UniProt archive (UniParc), part of the UniProt databases, is an archival protein sequence collection from all major publicly accessible resources. New and revised protein sequences are added daily into UniParc while not deleting the previous vers
...
|
|
ENA Sequence Flat File Format
ENA Sequence Flat File Format is a standardised plain text format for nucleotide sequences. This format was previously called the EMBL Sequence Flat File Format.
|
|
CGGA
The Chinese Glioma Genome Atlas (CGGA) is a user-friendly web application for data storage and analysis to explore brain tumors datasets. This database includes the whole-exome sequencing, DNA methylation, mRNA sequencing, mRNA microarray and microRN
...
|
|
UniRef
The UniProt Reference Clusters are three separate datasets that compress sequence space at different resolutions, achieved by merging sequences and sub-sequences that are 100% (UniRef100), >=90% (UniRef90), or >=50% (UniRef50) identical, regardless o
...
|
|
DNA Data Bank of Japan
An annotated collection of all publicly available nucleotide and protein sequences. DDBJ collects sequence data mainly from Japanese researchers, as well as researchers in other countries. DDBJ is part of the International Nucleotide Sequence Databas
...
|
|
UCSC Genome Browser database
Genome assemblies and aligned annotations for a wide range of vertebrates and model organisms, along with an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic datasets.
|
|
Berkeley Drosophila Genome Project EST database
The goals of the Drosophila Genome Center are to finish the sequence of the euchromatic genome of Drosophila melanogaster to high quality and to generate and maintain biological annotations of this sequence.
|
|
Pfam
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pf
...
|
|
FASTA Sequence Format
FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede th
...
|
|
European Nucleotide Archive
The European Nucleotide Archive (ENA) is a globally comprehensive data resource for nucleotide sequence, spanning raw data, alignments and assemblies, functional and taxonomic annotation and rich contextual data relating to sequenced samples and expe
...
|
|
Mitochondrial Disease Sequence Data Resource
The Mitochondrial Disease Sequence Data Resource (MSeqDR) is a centralized genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phen
...
|
|
NCBI Third Party Annotation
TPA is a database that contains sequences built from the existing primary sequence data in GenBank. TPA records are retrieved through the Nucleotide Database and feature information on the sequence, how it was cataloged, and proper way to cite the se
...
|
|
SYSTERS
The integration of SYSTERS, GeneNest and SpliceNest into one framework facilitates the over-all exploration of the whole sequence space covering protein, mRNA and EST sequences, as well as genomic DNA. The SYSTERS protein sequence cluster set provide
...
|
|
Dfam
The Dfam database is a open collection of DNA Transposable Element sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations. Dfam represents a collection of multiple sequence alignments, each containing a set of r
...
|
|
cis-Regulatory Element Database
The cisRED database holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations. Sequence inputs include low-coverage genome sequence data and ENCODE data.
|
|
Binary sequence information Format
A .2bit file stores multiple DNA sequences (up to 4 Gb total) in a compact randomly-accessible format. The file contains masking information as well as the DNA itself. The DNA sequence is represented as two bits per pixel with associated list of regi
...
|
|
SuperSite
Dictionary of binding sites in proteins
|
|
PANDIT
PANDIT is a collection of multiple sequence alignments and phylogenetic trees covering many common protein domains. It contains the seed protein sequence alignments from the Pfam-A (curated families) database; nucleotide sequence alignments derived f
...
|
|
mESAdb
microRNA Expression and Sequence Analysis Database
|
|
Genome Warehouse
The Genome Warehouse (GWH) is a public archival resource housing genome-scale data for a wide range of species. GWH accepts a variety of data types, including whole genome, chloroplast, mitochondrion and plasmid. For each collected genome assembly, G
...
|
|
SEVENS
Seven-transmembrane-helix receptors (7-TMR), known as G-protein-coupled receptors [1], are important genes that work as the gateway of signal transudation induced by ligand binding. Recent progress in determination of human draft sequences [2,3] acce
...
|
|
Expressed Sequence Tags database
The dbEST contains sequence data and other information on "single-pass" cDNA sequences, or "Expressed Sequence Tags", from a number of organisms. NCBI is in the process of merging EST and GSS records into the Nucleotide database, and the process is e
...
|
|
GISSD
Group I Intron Sequence and Structure Database
|
|
YeTFaSCo
Yeast Transcription Factor binding Site sequence Collection
|
|
cpnDB
Chaperonins are a diverse family of molecular chaperones present in the plastids, mitochondria, and cytoplasm of eukaryotes, and in bacteria and archaea. The family is divided into group I (CPN60, also known as Hsp60 or GroEL, found in bacteria, some
...
|
|
ENA Sequence XML Schema
ENA Sequence XML Schema is a standardised XML schema for nucleotide sequences. All assembled and annotated sequences must conform to this schema.
|
|
CoPS
Comprehensive peptide signature database
|
|
O-GLYCBASE
O-GLYCBASE is a database of glycoproteins with O-linked and C-linked glycosylation sites. Entries with at least one experimentally verified glycosylation site have been compiled from protein sequence databases and literature. Each entry contains info
...
|
|
OryGenesDB: an interactive tool for rice reverse genetics
The aim of this Oryza sativa database was first to display sequence information such as the T-DNA and Ds flanking sequence tags (FSTs) produced in the framework of the French genomics initiative Genoplante and the EU consortium Cereal Gene Tags. This
...
|
|
NRichD
Efficiency of protein remote homology detection methods depends on the dispersion of the protein sequence space and the availability of intermediate sequences between two related protein families. In the absence of any structural evidence and natural
...
|
|
PRF
Protein research foundation database of peptides: sequences, literature and unnatural amino acids
|
|
Peptaibol
The Peptaibol Database is a sequence and structure resource for the unusual class of peptides known as peptaibols. The database includes sequence, biological source, and bibliographical data for the naturally-occurring peptaibols. Information is also
...
|
|
Progenetix - genomic copy number aberrations in cancer
The Progenetix database provides an overview of copy number abnormalities in human cancer from Comparative Genomic Hybridization (CGH) experiments. With 30817 cases from 1016 publications (Oct 2013), Progenetix is the largest curated database for who
...
|
|
resiDB
ResiDB is a user-friendly sequence similarity-dependent database manager for bacteria, fungi, viruses, protozoa, invertebrate, plants, archaea, environmental and whole genome shotgun sequence data.
Create a new database Access existing databases Loa
...
|
|
Genome Sequence Archive
GSA is a data repository specialized for archiving raw sequence reads. It supports data generated from a variety of sequencing platforms ranging from Sanger sequencing machines to single-cell sequencing machines and provides data storing and sharing
...
|
|
ProTeus
Signature sequences at the protein N- and C-termini
|
|
The UCSC Archaeal Genome Browser
The UCSC Archaeal Genome Browser is a window on the biology of more than 100 microbial species from the domain Archaea. Basic gene annotation is derived from NCBI Genbank/RefSeq entries, with overlays of sequence conservation across multiple species,
...
|
|
HMMER Profile File Format
The profile hidden Markov Model (HMM) calculated from multiple sequence alignment data in this service is stored in Profile HMM save format (usually with ".hmm" extension). It is an ASCII file containing a lot of header and descriptive records follow
...
|
|
Major Intrinsic Proteins Modification Database
This is a database of comparative protein structure models of the MIP (Major Intrinsic Protein) family of proteins. The MIPs have been identified from the completed genome sequence of organisms available at NCBI.
|
|
Ocean Gene Atlas
The Ocean Gene Atlas service provides data mining access to three complementary data objects: gene sequence catalogs (ENA), sample environmental context (PANGAEA), and gene abundances estimates in samples (computed by mapping sequence reads onto gene
...
|
|
Proteomics Standards Initiative Extended Fasta Format
The PSI Extended Fasta Format (PEFF) is a unified format for protein and nucleotide sequence databases to be used by sequence search engines and other associated tools (spectra library search tools, sequence alignment software, data repositories, etc
...
|
|
DTU Bioinformatics
CBS offers Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosi
...
|
|
DriverDBv2
DriverDB, a database that incorporates >9500 cancer-related RNA-seq datasets and >7000 more exome-seq datasets, in addition to annotation databases and published bioinformatics algorithms dedicated to driver gene/mutation identification. Seven additi
...
|
|
Universal PBM Resource for Oligonucleotide Binding Evaluation
The UniPROBE (Universal PBM Resource for Oligonucleotide Binding Evaluation) database hosts data generated by universal protein binding microarray (PBM) technology on the in vitro DNA binding specificities of proteins.
|
|
miRBase
The miRBase database is a searchable database of published miRNA sequences and annotation. Each entry in miRBase represents a predicted hairpin portion of a miRNA transcript (termed mir in the database), with information on the location and sequence
...
|
|
DDBJ/ENA/GenBank Feature Table
The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February,
...
|
|
CATH
The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath_new) currently contains 34,287 domain structures classified into 1,383 superfamilies and 3,285 sequence families. Each structural family is expanded with domain seq
...
|
|
SCOPe
The ASTRAL compendium provides a set of tools and databases designed to aid investigators in the analysis of protein structure, particularly through the use of sequence comparison. Astral augments SCOP, a manual classification of protein domains acco
...
|
|
PolyQ
Polyglutamine Repeats in Proteins
|
|
ConoServer
ConoServer is a database specializing in sequences and structures of peptides expressed by marine cone snails. The database gives access to protein sequences, nucleic acid sequences and structural information on conopeptides. ConoServer's data are fi
...
|
|
Minimum Information about a MARKer gene Sequence
MIMARKS is the metadata reporting standard of the Genomic Standards Consortium that covers marker gene sequences from environmental surveys or individual organisms
|
|
NCBI Viral Genomes Resource
NCBI Viral Genomes Resource is a collection of virus genomic sequences that provides curated sequence data, related information and tools. It includes all complete viral genome sequences deposited in the International Nucleotide Sequence Database Col
...
|
|
Nucleotide Sequence Database Collaboration
This database consists of a joint effort to collect and disseminate databases containing DNA and RNA sequences. It is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. It covers the spectrum of data raw reads, th
...
|
|
Rfam
The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs). The families in Rfam break down into three broad functional classes: non-coding RNA genes
...
|
|
Protein Clusters
Related protein sequences (clusters)of Reference Sequence proteins encoded by complete genomes
|
|
PDBSite
3D structure of protein functional sites
|
|
Insertion Sequence Finder
This database provides a list of insertion sequences (IS) isolated from bacteria and archae. It is organized into individual files containing their general features (name, size, origin, family.....) as well as their DNA and potential protein sequence
...
|
|
Enzyme Structure Function Ontology
The ESFO provides a new paradigm for organizing enzyme sequence, structure, and function information, whereby specific elements of enzyme sequence and structure are mapped to specific conserved aspects of function, thus facilitating the functional an
...
|
|
PA-GOSUB
Protein sequences from model organisms, GO assignment and subcellular localization
|
|
Reference Sequence Database
The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins.
|
|
Conserved Domain Database
The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, including NCBI-curated domains, which use 3D-structure information to explicitly to define domain boundaries and p
...
|
|
PAZAR
PAZAR is a software framework for the construction and maintenance of regulatory sequence data annotations; a framework which allows multiple boutique databases to function independently within a larger system (or information mall). The goal of PAZAR
...
|
|
UniSave
The UniProtKB Sequence/Annotation Version database (UniSave) is a comprehensive archive of UniProtKB/Swiss-Prot a nd UniProtKB/TrEMBL entry versions. All changed Swiss-Prot and TrEMBL entries are loaded into the UniSave as part of the public UniProtK
...
|
|
Minimotif Miner
Search tools for short functional motifs involved in posttranslational modifications, binding to other proteins, nucleic acids, or small molecules
|
|
Lipase Engineering Database
The Lipase Engineering Database (http://www.led.uni-stuttgart.de) integrates information on sequence, structure, and function of lipases, esterases, and related proteins. Sequence data on 806 protein entries are assigned to 38 homologous families, wh
...
|
|
FireDB
fireDB is a database of Protein Data Bank structures, ligands and annotated functional site residues. The database can be accessed by PDB codes or UniProt accession numbers as well as keywords.
|
|
RNAcentral
RNAcentral is a free, public resource that offers integrated access to a comprehensive and up-to-date set of non-coding RNA sequences provided by a collaborating group of databases representing a broad range of organisms and RNA types.
|
|
Locus Reference Genomic sequences
Each LRG is stable genomic DNA sequence for a region of the human genome
|
|
Deep Sequence and Shape Motif (DESSO)
DESSO is a deep learning-based framework that can be used to accurately identify both sequence and shape regulatory motifs from the human genome.
|
|
NucleaRDB
Families of nuclear hormone receptors
|
|
al MENA
Middle East and North Africa (MENA) encompass very unique populations, with a rich history and encompasses characteristic ethnic, linguistic and genetic diversity. The genetic diversity of MENA region has been largely unknown. The recent availability
...
|
|
Gramene: A curated, open-source, integrated data resource for comparative functional genomics in plants
Gramene's purpose is to provide added value to plant genomics data sets available within the public sector, which will facilitate researchers' ability to understand the plant genomes and take advantage of genomic sequence known in one species for ide
...
|
|
Visual Database for Organelle Genome
VDOG, Visual Database for Organelle Genome is an innovative database of the genome information in the organelles. Most of the data in VDOG are originally extracted from GeneBank, re-organized and represented.
|
|
Minimal Metagenome Sequence Analysis Standard
A proposed set of minimal standard analyses necessary for proper interpretation of meta-omic data and to allow comparative metagenomics and metatranscriptomics. Please note: We cannot find an up-to-date website for this resource. As such, we have mar
...
|
|
Sequence-Structural Templates of Single-member Superfamilies
SSToSS is a database which provides sequence-structural templates of single member protein domain superfamilies like PASS2. Sequence-structural templates are recognized by considering the content and overlap of sequence similarity and structural para
...
|
|
NCBI Trace Archive
The NCBI Trace Archive is a permanent repository of DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from various large-scale sequencing projects. The Trace Archive serves as the repository of sequencing da
...
|
|
.ACE format
The ACE file format is a specification for storing data about genomic contigs. The original ACE format was developed for use with Consed, a program for viewing, editing, and finishing DNA sequence assemblies. ACE files are generated by various assemb
...
|
|
EcoliWiki: A Wiki-based community resource for Escherichia coli
EcoliWiki is a community-based resource for the annotation of all non-pathogenic E. coli, its phages, plasmids, and mobile genetic elements.
|
|
CompoDynamics
Sequence composition dynamics of genes and genomes.
|
|
MulPSSM
Representation of multiple sequence alignments of protein families in terms of Position Specific Scoring Matrices (PSSMs) is commonly used in the detection of remote homologues. A PSSM is generated with respect to one of the sequences involved in the
...
|
|
siRNAdb
The siRNA database provides a gene-centric view of human siRNA experimental data, including siRNAs of known efficacy and siRNAs predicted to be of high efficacy by siSearch. Linked to these sequences is information including siRNA thermodynamic prope
...
|
|
Hits
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent
...
|
|
Stanford HIV Drug Resistance Database
The Stanford HIV Drug Resistance Database (HIVDB) is an essential resource for public health officials monitoring ADR and TDR, for scientists developing new ARV drugs, and for HIV care providers managing patients with HIVDR.
|
|
PASS2
PASS2 contains alignments of structural motifs of protein superfamilies. PASS2 is an automatic version of the original superfamily alignment database, CAMPASS (CAMbridge database of Protein Alignments organised as Structural Superfamilies). PASS2 con
...
|
|
The Chromosome 7 Annotation Project
The objective of this project is to generate the most comprehensive description of human chromosome 7 to facilitate biological discovery, disease gene research and medical genetic applications.
|
|
Bio-Mirror
A world bioinformatic public service for high-speed access to up-to-date DNA & protein biological sequence databanks.
|
|
SEQanswers
Wiki on all aspects of next-generation genomics
|
|
TESS
TESS (Transcription Element Search System, http://www.cbil.upenn.edu/tess) is a web-based service that searches DNA sequence for transcription factor binding sites. It integrates three databases of transcription factors and binding site models, and p
...
|
|
PHOSIDA
Phosphorylation sites in various species identified by mass spectrometry
|
|
Regulatory Element Database for Drosophila
REDfly is a curated collection of known Drosophila transcriptional cis-regulatory modules (CRMs) and transcription factor binding sites (TFBSs). REDfly seeks to include all experimentally verified fly regulatory elements along with their DNA sequence
...
|
|
BMC Caller
A webtool to identify and analyze bacterial microcompartment types in sequence data.
|
|
ASC - Active Sequence Collection
ASC (Active Sequences Collection) is a database of short amino acid sequences with known biological activity. The current version is substantially improved as compared to the previous release; it now includes more than 1300 different active short pro
...
|
|
PIR - Protein Information Resource
The Protein Information Resource (PIR) is an integrated public bioinformatics resource that supports genomic and proteomic research and scientific studies. PIR has provided many protein databases and analysis tools to the scientific community, includ
...
|
|
Minimal Information about any Sequence Ontology
An OWL representation of the Minimum Information for any (x) Standard (MIxS), managed by the Genomic Standards Consortium.
|
|
Protein kinase resource
The Protein Kinase Resource (PKR) is a curated information source which provides an integrated view of sequence and structure data combined with biochemical and genetic function data focused on a single family of proteins, the protein kinases. In add
...
|
|
PIR SuperFamily
The PIR SuperFamily concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships.
|
|
RNArchitecture
RNArchitecture is a database that provides a comprehensive description of relationships between known families of structured ncRNAs, with focus on sequence and structure similarities. RNArchitecture also provides literature information and links to o
...
|
|
Ensembl Zebrafish Genome Browser
This ensembl website features the zebrafish whole genome shotgun assembly sequence.
|
|
NCBI Virus
NCBI Virus is a community portal for viral sequence data from RefSeq, GenBank and other NCBI repositories.
|
|
fRNAdb
Functional RNA Database (fRNAdb) is a database service that hosts a large collection of non-coding transcripts including annotated/un-annotated sequences from H-inv database, NONCODE, and RNAdb. A set of computational sequence analyses are performed
...
|
|
ParameciumDB
ParameciumDB is a new model organism database for Paramecium, built using components of the Generic Model Organism Database (http://www.gmod.org) construction set (Chado relational database schema, Turnkey generic web framework and Gbrowse). The data
...
|
|
Chicken Variation Database
The chicken Variation Database (ChickVD) is an integrated information system for storage, retrieval, visualization and analysis of chicken variation data.
|
|
PSSRdb
Polymorphic Simple Sequence Repeats Database
|
|
PGDBj Ortholog Database
The PGDBj Ortholog Database, created under the auspices of the Plant Genome Database Japan (PGDBj), contains information about orthologous genes in plants based on their corresponding amino acid sequence similarity. By placing PGDBj Ortholog Database
...
|
|
GPCR-SSFE
GPCR-Sequence-Structure-Feature-Extractor (SSFE). Provides template suggestions and homology models of Class A GPCRs. Identifies key sequence and structural motifs in Class A GPCRs to guide template selection and build homology models.
|
|
Genomic Contextual Data Markup Language
The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that is a reference implementation the Minimum Information about a Genome Sequence (MIGS/MIMS/MIMARKS), and the extensions the Minimum Inf
...
|
|
Organelle Genome Resource
The organelle genomes are part of the NCBI Reference Sequence (RefSeq) project that provides curated sequence data and related information for the community to use as a standard.
|
|
Distributed Sequence Annotation System
The Distributed Annotation System (DAS) defines a communication protocol used to exchange annotations on genomic or protein sequences.
|
|
NCBI Gene
The Entrez Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. Entrez can effic
...
|
|
Codon Usage Database
Find GC content and frequency of codon usage for any organism that has a sequence in GenBank.
|
|
GenomeTraFaC
GenomeTraFaC is a database of conserved regulatory elements obtained by systematically analyzing the orthologous set of human and mouse genes. It mainly focuses on all of the high-quality mRNA entries of mouse and human genes in the Reference Sequenc
...
|
|
Integrated resource of protein families, domains and functional sites
InterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as si
...
|
|
PolyA_DB |
|
msRepDB
A comprehensive repetitive sequence database of over 80 000 species.
|
|
ASTD
AltSplice and AltExtron provide information on alternative intron/exons, alternative splice events, and isoform splice patterns. AEdb contains: AEdb-Sequence (sequence and properties of alternatively splice exons), AEdb-Function (data on functional a
...
|
|
UniProt Knowledgebase
Universal Protein resource. A database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the re
...
|
|
HOGENOM
HOGENOM is a phylogenomic database providing families of homologous genes and associated phylogenetic trees (and sequence alignments) for a wide set sequenced organisms.
|
|
SitEx
Projections of protein functional Sites on Exons
|
|
Ribosomal Database Project (RDP-II)
The Ribosomal Database Project - II (RDP-II)(1) provides data, tools and services related to ribosomal RNA sequences to the research community. Through its website (http://rdp.cme.msu.edu), RDP-II offers aligned and annotated rRNA sequence data, anal
...
|
|
Ensembl Compara
Ensembl Compara provides cross-species resources and analyses, at both the sequence level and the gene level.
|
|
Gene3D
Gene3D uses the information in CATH to predict the locations of structural domains on millions of protein sequences available in public databases. Sequence data from UniProtKB and Ensembl for domains with no experimentally determined structures are s
...
|
|
Alias
A tool for converting identifiers in which multiple aliases are used to refer to sequences. Also available as a stand-alone tool.
|
|
Pig Genomic Informatics System
The Pig Genomic Informatics System (PigGIS) presents accurate pig gene annotations in all sequenced genomic regions. It integrates various available pig sequence data, including 3.84 million whole-genome-shortgun (WGS) reads and 0.7 million Expressed
...
|
|
Network of Cancer Genes
The Network of Cancer Genes (NCG) contains information on duplicability, evolution, protein-protein and microRNA-gene interaction, function, expression and essentiality of cancer genes from manually curated publications . NCG also provides informatio
...
|
|
CoxBase
CoxBase is an online platform for epidemiological surveillance, visualization, analysis and typing of Coxiella burnetii genomic sequence.
|
|
lncRNASNP2 |
|
NCBI Nucleotide
The NCBI Nucleotide database collects sequences from such sources as GenBank, RefSeq, TPA, and PDB. Sequences collected relate to genome, gene, and transcript sequence data, and provide a foundation for research related to the biomedical field.
|
|
DARNED
Database of RNA Editing
|
|
CR-EST - Crop ESTs
The crop EST database CR-EST (http://pgrc.ipk-gatersleben.de/cr-est/) is a publicly available online resource providing access to sequence, classification, clustering, and annotation data of crop EST projects at IPK Gatersleben, Germany. CR-EST curre
...
|
|
Amordad
Database engine for comparing metagenomic data at massive scale. It first obtains the sequence signature of metagenomes and organizes them as points in high dimensional space.
|
|
CORG - A database for COmparative Regulatory Genomics
Sequence conservation in non-coding, upstream regions of orthologous genes from man and mouse is likely to reflect common regulatory DNA sites. Motivated by this assumption we have delineated a catalogue of conserved non-coding sequence blocks and pr
...
|
|
ProTherm
ProThermDB is a database for proteins and mutants with data on protein stability, an increase of 84% from the previous version. It contains several thermodynamic parameters such as melting temperature, free energy obtained with thermal and denaturant
...
|
|
Rice Genome Annotation Project
This website provides genome sequence from the Nipponbare subspecies of rice and annotation of the 12 rice chromosomes. These data are available through search pages and the Genome Browser that provides an integrated display of annotation data.
|
|
PREX
PeroxiRedoxin classification indEX
|
|
VIRsiRNAdb
VIRsiRNAdb contains information on experimentally validated Viral siRNA/shRNA which target viral genome regions. It provides efficacy information where available, as well as the siRNA sequence, viral target and subtype, as well as the target genomic
...
|
|
Hardwood Genomics Project
The Hardwood Genomics Project is a databases for expressed genes, genetic markers, genetic linkage maps, and reference populations. It provides lasting genomic and biological resources for the discovery and conservation of genes in hardwood trees for
...
|
|
APPRIS
Annotates variants with biological data such as protein structural information, functionally important residues, conservation of functional domains and evidence of cross-species conservation.
|
|
PRODORIC2 |
|
PROSITE
PROSITE is a database of protein families and domains. PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them.
|
|
Fungal and Oomycete genomics resource
FungiDB is an integrated genomic and functional genomic database for the kingdom Fungi. The database integrates whole genome sequence and annotation and also includes experimental and environmental isolate sequence data. The database includes compara
...
|
|
ColabFold
ColabFold databases are MMseqs2 expandable profile databases to generate diverse multiple sequence alignments to predict protein structures.
|
|
GenomeNet
Network of database and computational resources including KEGG (pathways, interactions, etc.) and DBGET/LinkDB (an integrated database retrieval system). It also hosts several web-based tools for sequence analysis (i.e. Blast, Motif, Clustal W).
|
|
MEROPS
The MEROPS database is an information resource for peptidases (also termed proteases, proteinases and proteolytic enzymes) and the proteins that inhibit them.
|
|
ElastoDB
Repository for well-characterized elastin sequences to facilitate its study. The database has since expanded to include other non-elastin sequences that share elastic properties.
|
|
Berkeley Drosophila Genome Project insitu
In early 2010 we updated the site to facilitate more rapid transfer of our data to the public database and focus our efforts on the core mission of providing expression pattern images to the research community. The original database https://www.fruit
...
|
|
PseudoBase++
PseudoBase is a database containing structural, functional and sequence data related to RNA pseudoknots. It can be reached by its central page at http://pseudobaseplusplus.utep.edu. From here one can retrieve pseudoknot data as well as submit data fo
...
|
|
abYsis
abYsis is a web-based antibody research system that includes an integrated database of antibody sequence and structure data. The publicly available version includes pre-analyzed sequence data from the European Molecular Biology Laboratory European Nu
...
|
|
NCBI PopSet
NCBI PopSet collects DNA sequences to analyze the ways that populations are related by evolution. Such sequences indicate if populations originate from different members of the same species or from organisms of different species entirely.
|
|
ValidNESs |
|
FlyBase
Genetic, genomic and molecular information pertaining to the model organism Drosophila melanogaster and related sequences. This database also contains information relating to human disease models in Drosophila, the use of transgenic constructs contai
...
|
|
Minimal Information about any Sequence (MIxS) Controlled Vocabularies
Controlled vocabularies for the MIxS family of metadata checklists. See http://gensc.org/gc_wiki/index.php/MIxS for details on the MIxS checklists.
|
|
mutLBSgeneDB
Mutations in Ligand Binding Sites gene DataBase
|
|
MAR databases
The MAR databases is a collection of manually curated marine microbial contextual and sequence databases, based at the Marine Metagenomics Portal. This was developed as a part of the ELIXIR EXCELERATE project in 2017 and is maintained by The Center f
...
|
|
UTRdb/UTRsite
The 5' and 3' untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translational efficiency. For this reason we developed UTRdb, a specialized database of 5
...
|
|
Connectivity Table file format
A CT (Connectivity Table) file contains secondary structure information for a RNA sequence.
|
|
TIGR Plant Transcript Assembly database
The TIGR Plant Transcript Assemblies (TA) database (http://plantta.tigr.org) uses expressed sequences collected from the NCBI GenBank Nucleotide database for the construction of transcript assemblies. The sequences collected include expressed Sequenc
...
|
|
PDBselect
PDBselect (http://bioinfo.tg.fh-giessen.de/pdbselect/) is a list of representative protein chains with low mutal sequence identity selected from the protein data bank (PDB) to enable unbiased statistics. The list increased from 155 chains in 1992 to
...
|
|
Database of small human non-coding RNAs
Integrated annotation and sequencing-based expression data for all major classes of human small non-coding RNAs (sncRNAs) for both full sncRNA transcripts and mature sncRNA products derived from these larger RNAs.
|
|
Information system for G protein-coupled receptors
The GPCRDB is a molecular-class information system that collects, combines, validates and stores large amounts of heterogenous data on G protein-coupled receptors (GPCRs). The GPCRDB contains data on sequences, ligand binding constants and mutations.
...
|
|
ForestTreeDB
ForestTreeDB is intended as a resource that centralizes large-scale EST sequencing results from several tree species (http://foresttree.org/ftdb). Our group at the Center for Computational Genomics and Bioinformatics (University of Minnesota) aims to
...
|
|
NEMBASE
Nematode sequence and functional data database
|
|
alkaligrass
A high-quality genome sequence of alkaligrass provides insights into halophyte stress tolerance.
A high-quality chromosome-level genome sequence of alkaligrass assembled from Illumina, PacBio and 10× Genomics reads combined with genome-wide chromosom
...
|
|
eSLDB - eukaryotic Subcellular Localization database
eSLDB (eukaryotic Subcellular Localization DataBase) collects the annotations of subcellular localization of eukaryotic proteomes. For each sequence, the database lists localization obtained adopting three different approaches: 1) experimentally dete
...
|
|
Saccharomyces Genome Database
The Saccharomyces Genome Database (SGD) provides comprehensive integrated biological information for the budding yeast Saccharomyces cerevisiae along with search and analysis tools to explore these data, enabling the discovery of functional relations
...
|
|
RADAR
A Rigorously Annotated Database of A-to-I RNA editing
|
|
RPFdb
Ribosome profiling database
|
|
Spliceosome Database |
|
eF-site - Electrostatic surface of Functional site
Electrostatic potentials and hydrophobic properties of the active sites
|
|
Colorectal Cancer Atlas
Colorectral Cancer Atlas is an web-based resource which integrates genomic and proteomic pertaining to colorectal cancer cell lines and tissues. Data catalogued includes, quantitative and non-quantitative protein expression, sequence variations, cell
...
|
|
Placental Genetic Variance
Includes variations of DNA sequence, chromosomal structure and copy number, as well as RNA and translational variation. The Genetic Variation ontology expands on work done for Variation Ontology (VariO) and Sequence Types and Features Ontology (SO) w
...
|
|
iPfam
A database of Pfam domain interactions
|
|
Interrupted coding sequences
ICDS database is a database containing ICDS detected by a similarity-based approach. The definition of each interrupted gene is provided as well as the ICDS genomic localisation with the surrounding sequence.
|
|
MitoProteome
MitoProteome is a mitochondrial protein sequence database and annotation system. The initial release contains 847 human mitochondrial protein sequences, derived from public sequence databases and mass spectrometric analysis of highly purified human h
...
|
|
DescribePROT
DescribePROT is a database containing annotations of 13 putative structural and functional properties at the amino acid level for ~1.4 million proteins from 83 popular/model organism, to be extended to hundreds of additional organisms. Users can sear
...
|
|
Cnidarian Evolutionary Genomics Database
CnidBase, the Cnidarian Evolutionary Genomics Database, is a tool for investigating the evolutionary, developmental and ecological factors that affect gene expression and gene function in cnidarians.
|
|
CRISPRCasdb
CRISPRCasdb acts as a gateway to a publicly accessible database and software to enable the easy detection of CRISPR sequences in locally-produced data and the consultation of CRISPR sequence data present in the database. It also gives information on
...
|
|
BPS
Database of RNA Base-Pair Structures
|
|
Multiple Alignment Format
The Multiple Alignment Format stores DNA level multiple alignments in an easily readable format between entire genomes. Unlike previous formats this resource can cope with forward and reverse strand directions, multiple pieces to the alignment, and s
...
|
|
TrSDB
Transcription factor database
|
|
Cacao Genome Database
The Cacao Genome Database (CGD) is a database storing information on the genome of Theobroma cacao. The release of the cacao genome sequence provides researchers with access to the latest genomic tools, enabling more efficient research and accelerati
...
|
|
SoyBase
SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains genetic, physical and genomic sequence maps integrated with qualitati
...
|
|
DESSO-DB
A web database for sequence and shape motif analyses and identification.
|
|
piRNAclusterDB
Clusters of piRNAs
|
|
NCBI Genome Data Viewer
The NCBI Genome Data Viewer (GDV) is a genome browser supporting the exploration and analysis of annotated eukaryotic genome assemblies. The GDV browser can visualize different types of molecular data in a whole genome context, including gene annotat
...
|
|
Therapeutic Structural Antibody Database
The Therapeutic Structural Antibody Database tracks all antibody- and nanobody-related therapeutics recognized by the World Health Organisation (WHO), and identifies any corresponding structures in the Structural Antibody Database (SAbDab) with near-
...
|
|
ARAMEMNON
ARAMEMNON is a curated database for Arabidopsis thaliana transmembrane (TM) proteins and transporters. The database compiles topology and signal sequence predictions and displays the results in a directly comparable graphical output format for presen
...
|
|
UNITE database
UNITE is a database and sequence management environment centered on the eukaryotic nuclear ribosomal ITS region. All eukaryotic ITS sequences from the International Nucleotide Sequence Database Collaboration are clustered to approximately the species
...
|
|
Hollywood
Exon annotation database
|
|
TOPPR
The Online Protein Processing Resource
|
|
sRNAMap
small regulatory RNA in microbial genomes
|
|
CloneDB
Clones and libraries: sequence data, map positions and distributor information
|
|
CLUSTAL-W Alignment Format
CLUSTAL-W Alignment Format is a simple text-based format, often with a *.aln file extension, used for the input and output of DNA or protein sequences into the Clustal suite of multiple alignment programs.
|
|
LOX-DB
Due to their involvement in several diseases like cancer, inflammation, fever or arthritis, a lot of research is done on lipoxygenases yielding information about sequence, structure and function of these proteins. The LipOXygenases-DataBase (LOX-DB)
...
|
|
Expansin Engineering Database
Expansin Engineering Database integrates information on sequence, structure and function of expansins.
|
|
GABI-Kat SimpleSearch
T-DNA insertions in Arabidopsis and their flanking sequence tags.
|
|
miRNEST
miRNEST is an integrative collection of animal, plant and virus microRNA data. miRNEST is being gradually developed to create an integrative resource of miRNA-associated data. The data comes from our computational predictions (new miRNAs, targets, mi
...
|
|
INTERVAL
The INTERVAL bioresource comprises 50,000 English blood donors, on whom deep molecular phenotypes (e.g. genomics, proteomics, metabolomics, lipidomics) have been generated. In over 100 years of blood donation practice, INTERVAL is the first randomise
...
|
|
Membranome
A database of single-pass membrane proteins
|
|
Pharmacogenomics Ontology
The PharmGKB Ontology imports genetic sequence data, collected in relational format, into the OWL, and aims to automate the process of updating the links between the ontology and data acquisition when the ontology changes. They have linked PharmGKB w
...
|
|
SILVA
SILVA is a comprehensive, quality-controlled web resource for up-to-date aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains alongside supplementary online services. In addition to data products, SILVA provide
...
|
|
BAliBASE
BAliBASE; a benchmark alignment database, including enhancements for repeats, transmembrane sequences and circular permutations.
|
|
EbolaID
Provides a complete, quality checked and regularly updated list of oligonucleotides for the Ebola virus. The database describes the genetic diversity across the Ebola genome to facilitate the design of accurate diagnostic methods and therapeutic appr
...
|
|
CIS-BP
The Catalog of Inferred Sequence Binding Preferences (CIS-BP) is a library of transcription factor (TF) DNA binding motifs and specificities. The data are organized in a user friendly manner for ease of searching, browsing, and downloading. CIS-BP al
...
|
|
eProS
Energy profiles of protein structures
|
|
WDSPdb
WD40 domain structure predictions
|
|
DoBISCUIT
Database Of BIoSynthesis clusters CUrated and InTegrated
|
|
Molecular Modeling Database
The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more.
|
|
PomBase
PomBase is a model organism database that provides organization of and access to scientific data for the fission yeast Schizosaccharomyces pombe. PomBase supports genomic sequence and features, genome-wide datasets and manual literature curation as w
...
|
|
DBD
DBD provides transcription factor predictions for more than 150 completely sequenced genomes available for browsing and download. Predictions are based on presence of sequence specific DNA binding domain assignments using hidden Markov models from th
...
|
|
Genome Reviews
The goal of the Genome Reviews project is to provide an up-to-date, standardised and comprehensively annotated view of the genomic sequence of organisms with completely deciphered genomes. Genome Reviews are curated versions of EMBL/GenBank/DDBJ dat
...
|
|
REDIportal
A-to-I RNA editing events in human
|
|
SelenoDB
A database of selenoprotein genes, proteins and SECIS elements
|
|
SomamiR
Somatic mutations that impact microRNA targeting in cancer
|
|
DAnCER
Disease-Annotated Chromatin Epigenetics Resource
|
|
National Omics Data Encyclopedia
The National Omics Data Encyclopedia (NODE) is big data library with complete and integrative data storage, safe and efficiency-guaranteed data management as well as comprehensive and user-friendly data service functions. NODE stores raw sequence dat
...
|
|
Bacterial protein tYrosine Kinase database
The Bacterial protein tYrosine Kinase database (BYKdb) contains computer-annotated BY-kinase sequences. The database web interface allows static and dynamic queries and provides integrated analysis tools including sequence annotation.
|
|
GlycoCT sequence format for carbohydrates.
GlycoCT format is devised to describe the carbohydrate sequences, with a controlled vocabulary to name monosaccharides, adopting IUPAC rules to generate a consistent, machine-readable nomenclature, based on a connection table approach, instead of a l
...
|
|
SINEBase
A database of short interspersed elements (SINEs)
|
|
ChromDB
Chromatin-associated proteins in a broad range of organisms
|
|
Database of Rice Transcription Factors
DRTF contains 2025 putative transcription factors (TFs) in Oryza sativa L. ssp. indica and 2384 in ssp. japonica, distributed in 63 families, identified by computational prediction and manual curation. It includes detailed annotations of each TF incl
...
|
|
Factorbook
Human transcription factor binding data from ChIP-seq
|
|
Annotated regulatory Binding Sites from Orthologous Promoters
ABS: A database of Annotated regulatory Binding Sites from known binding sites identified in promoters of orthologous vertebrate genes.
|
|
Ebola and Hemorrhagic Fever Virus Database
The Ebola and Hemorrhagic Fever Virus Database stems from the Hemorrhagic Fever Viruses (HFV) Database Project founded by Dr. Carla Kuiken in 2009 at the Los Alamos National Laboratory (LANL). The HFV Database was modeled on the Los Alamos HIV Databa
...
|
|
POSTAR
Post-transcriptional regulation by RNA-binding proteins
|
|
UniGene
<<<!!!<<< This repository is no longer available>>>!!!>>>. Although the web pages are no longer available, you will still be able to download the final UniGene builds as static content from the FTP site https://ftp.ncbi.nlm.nih.gov/repository/UniGen
...
|
|
YM500
smRNA-seq database for miRNA research
|
|
RAID
Human RNA-RNA and RNA-protein interactions
|
|
tRNAdb
Compilation of tRNA sequences and tRNA genes
|
|
COMBREX
Computational Bridge to Experiments
|
|
L1Base
Functional annotation and prediction of LINE-1 elements
|
|
ARED-Plus |
|
Candida Genome Database
The Candida Genome Database (CGD) provides access to genomic sequence data and manually curated functional information about genes and proteins of the human pathogen Candida albicans. It collects gene names and aliases, and assigns gene ontology term
...
|
|
EchinoDB
EchinoDB is a database consisting of amino acid sequence othoclusters from 42 echinoderm transcriptomes. We sampled taxa to span the deepest divergences within each of the 5 extant echinoderm classes. Data can be searched by keywords such as annotati
...
|
|
IMGT/LIGM-DB
IMGT/LIGM-DB is the IMGT® comprehensive database of immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences, from human and other vertebrate species, with translation for fully annotated sequences, created in 1989 by LIGM (http://www.imgt.o
...
|
|
Databases of Orthologous Promoters
DoOP is a database of eukaryotic promoter sequences (upstream regions), aiming to facilitate the recognition of regulatory sites conserved between species. Based on the Arabidopsis thaliana and Homo sapiens genome annotation, this resource is also a
...
|
|
LenVarDB
Database of length variantion in protein domains
|
|
Short Read Archive eXtensible Markup Language
The SRA data model contains the following objects: Study: information about the sequencing project Sample: information about the sequenced samples Experiment: information about the libraries, platform; associated with study, sample(s) and run(s) Run:
...
|
|
UUCD
Ubiquitin and ubiquitin-like conjugation database
|
|
ECgene
Genome annotation for alternative splicing
|
|
AniProtDB
The Animal Proteome Database (AniProtDB) is a comprehensive collection of proteomes from 100 species spanning 21 animal phyla. In addition to providing open access to this collection of high-quality metazoan proteomes, information on predicted protei
...
|
|
PLPMDB
Pyridoxal-5'-phosphate dependent enzymes mutations
|
|
eBLOCKS
Classifying proteins into families and super-families allows identification of functionally mportant conserved domains. The motifs and scoring matrices derived from such conserved regions provide computational tools to recognize similar patterns in n
...
|
|
miRNAMap
microRNA precursors and their mapping to targets in vertebrate genomes
|
|
MAPPER-2
This resource provides information primarily on the upstream non-coding sequence data of genes in 3 genomes which gives insight into the transcription factors binding sites (TFBSs). For each transcript, the region scanned extends from 10,000bp upstre
...
|
|
Database resources of the National Center for Biotechnology Information
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts publish
...
|
|
RetrOryza
With the availability of the complete genomic sequence of rice, the identification and annotation of LTR-Retrotransposons has become a necessity as they comprise an important part of plant genomes (1). RetrOryza is a database that aims at providing t
...
|
|
PlantAligDB
Web-based platform of nucleotide sequence alignments of plants.
|
|
TTSMI
Triplex Target DNA Sites in the human genome
|
|
Binary Alignment Map Format
BAM is the compressed binary version of the Sequence Alignment/Map (SAM) format, a compact and indexable representation of nucleotide sequence alignments. Many next-generation sequencing and analysis tools work with SAM/BAM. For custom track display,
...
|
|
MeT-DB
RNA MEthylation by SEquencing databaSe
|
|
Epitome
Epitome is a database of all known antigenic residues and the antibodies that interact with them, including a detailed description of the residues involved in the interaction and their sequence/structure environments. Each entry in the database descr
...
|
|
3DBIONOTES
Web based application designed to integrate protein structure, protein sequence and protein annotations in a unique graphical environment. The current version of the application offers a unified, enriched and interactive view of EMDB volumes, PDB str
...
|
|
China National GeneBank DataBase
The China National GeneBank database (CNGBdb) is a unified platform for biological big data sharing and application services. At present, CNGBdb has integrated a large amount of internal and external biological data from resources such as CNGB, NCBI,
...
|
|
INTEGRALL
INTEGRALL is a web-based platform dedicated to compile information on integrons and designed to organize all the data available for these genetic structures. INTEGRALL provides a public genetic repository for sequence data and nomenclature and offers
...
|
|
microRNA.org
microRNA target predictions and expression profiles
|
|
PPT-DB
Protein Property Prediction and Testing Database
|
|
ADDA - A Domain Database
ADDA is a global clustering of protein sequences into protein domains and protein domain families. The database currently contains domains for 1.5 Mio sequences from UniProt, ENSEMBL, and other sequence databases. The domains are grouped into 123,000
...
|
|
RNA Ontology
RNAO is a controlled vocabulary pertaining to RNA function and based on RNA sequences, secondary and three-dimensional structures. The central aim of the RNA Ontology Consortium (ROC) is to develop an ontology to capture all aspects of RNA - from pri
...
|
|
Ribonuclease P Database
RNase P sequences, alignments, and structures
|
|
Generic Feature Format Version 3
The Generic Feature Format Version 3 (GFF3) format was developed after earlier formats, although widely used, became fragmented into multiple incompatible dialects. The GFF3 format addresses the most common extensions to GFF, while preserving backwar
...
|
|
TIGRFAMs
TIGRFAMs is a collection of manually curated protein families focusing primarily on prokaryotic sequences.It consists of hidden Markov models (HMMs), multiple sequence alignments, Gene Ontology (GO) terminology, Enzyme Commission (EC) numbers, gene s
...
|
|
MachiBase
Drosophila melanogaster 5' mRNA transcription start site database
|
|
DoriC
DoriC regions in bacterial and archaeal genomes
|
|
SNP2TFBS
Regulatory SNPs affecting predicted transcription factor binding sites
|
|
PALI
The database of Phylogeny and ALIgnment of homologous protein structures (PALI) contains structure-based sequence alignments and dendrograms based on information primarily derived from the structural alignments at domain level [1,2]. Protein domain d
...
|
|
KIDFamMap
Kinase-inhibitor-disease family map
|
|
PHYTOPROT
Clusters of predicted plant proteins
|
|
Ontology for Genetic Interval
Using BFO (Basic Formal Ontology) as its upper-level ontology, the Ontology for Genetic Interval (OGI) represents gene as an entity with its 3D shape, topography, and primary DNA sequence as the foundation for its 3D structure. There is no official h
...
|
|
ACTIVITY
ACTIVITY, a database on DNA site sequences with known activity magnitudes, measurement systems and sequence-activity relationships under fixed experimental conditions is additionally adapted to applications to the phylogenetic footprints of known sit
...
|
|
MimoDB
Mimotope database, active site-mimicking peptides selected from phage-display libraries
|
|
NBDB
NBDB database provides profiles of Elementary Functional Loops (EFLs) involved in binding of nucleotide-containing ligands. Each EFL in form of a PSSM (position-specific scoring matrix) profile is complemented with the information on SCOP entities, s
...
|
|
SilkDB
The SilkDB is an open-access database for genome biology of the silkworm (Bombyx mori). SilkDB contains the genomic data, including genome assembly, gene annotation, chromosomal mapping, orthologous relationship and experiment data, such as microarra
...
|
|
LNCediting
RNA editing sites in lncRNAs from human, monkey, mouse and fly
|
|
Kinomer
Classification of protein kinases encoded in various eukatotic species
|
|
MegaMotifbase
Structural motifs in protein families and superfamilies
|
|
Transcription Factor Class
TFClass is a resource that classifies eukaryotic transcription factors (TFs) according to their DNA-binding domains. Combining information from different resources, manually checking the retrieved mammalian TF sequences and applying extensive phyloge
...
|
|
PyIgClassify
Clusters of conformations of antibody CDRs
|
|
ZiFDB
Zinc Finger DataBase
|
|
WERAM
Writers, Erasers and Readers of Histone Acetylation and Methylation
|
|
NRED
Noncoding RNA Expression Database
|
|
MALISAM
Manual alignments for structurally analogous motifs in proteins
|
|
SpliceNest
A tool for visualizing splicing of genes from EST data
|
|
BeetleBase
Genome database of the beetle Tribolium castaneum
|
|
Synthetic Gene Database
The Synthetic Gene Database (http://www.evolvingcode.net/codon/sgdb/index.php) is a resource that has collected together sequence information on synthetic genes (i.e. genes that were designed conceptually, rather than built from an initial, physical
...
|
|
RepTar
Predicted targets of host and viral miRNAs
|
|
OPTIC
Orthologous and Paralogous Transcripts in Clades
|
|
JuncDB
Exon-exon Junction database
|
|
GELBANK
GELBANK is a publicly available database of two-dimensional gel electrophoresis (2DE) gel images of proteomes from organisms with known genome information (available at http://gelbank.anl.gov). GELBANK serves as a database for those proteomics labs t
...
|
|
The Arabidopsis Information Resource
The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. Data available from TAIR includes the complete genome sequence along with gene structure, gene pro
...
|
|
RaftProt
Lipid raft associated proteins in mammals
|
|
Nematodes.org
Wiki for coordinating nematode sequencing projects
|
|
Ensembl Fungi
Ensembl Fungi is a browser for fungal genomes. A majority of these are taken from the databases of the International Nucleotide Sequence Database Collaboration (the European Nucleotide Archive at the EBI, GenBank at the NCBI, and the DNA Database of
...
|
|
miRGator
microRNA target prediction, functional analysis, and gene expression data
|
|
BIOZON
Biozon is a platform that allows for the storage, management, and analysis of interrelated proteins, genes, interactions, protein families, cellular pathways and more. These heterogeneous data types and the relations between them are locally warehous
...
|
|
OnTheFly
DNA-binding specificities of transcription factors in Drosophila
|
|
EnteroBase
Global genomic population structure of Clostridioides difficile
|
|
Cyanolyase
Sequences and motifs of the phycobilin lyase protein family
|
|
TransportDB
Sequences and classification of predicted membrane transporters encoded in complete genomes
|
|
Secreted Protein Database
Secreted proteins from human, mouse and rat
|
|
tRFdb
Short (14-32 nt) tRNA-related fragments
|
|
CharProtDB
Experimentally Characterized Protein annotations
|
|
SuperCAT
A database for multilocus sequence typing analysis of the Bacillus cereus group of bacteria
|
|
Animal Toxin Database
Database of animal toxins
|
|
*ReputationScore indicates how established a given datasource is. Find out more.