UCSC Genome Browser database

Genome assemblies and aligned annotations for a wide range of vertebrates and model organisms, along with an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic datasets.

CLUSTAL-W Alignment Format

CLUSTAL-W Alignment Format is a simple text-based format, often with a *.aln file extension, used for the input and output of DNA or protein sequences into the Clustal suite of multiple alignment programs.

Sequence Alignment Map

The Sequence Alignment/Map (SAM) format is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section.

Sequence Read Archive

The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms Data submitted to SRA. It is organized using a metadata model consisting of six objects: study, sample, experiment, run, analysis and submissi ...

Mouse Genome Database - a Mouse Genome Informatics (MGI) Resource

MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. Data includes gene characterization, nomenclature, mapping, gene homo ...

Gramene: A curated, open-source, integrated data resource for comparative functional genomics in plants

Gramene's purpose is to provide added value to plant genomics data sets available within the public sector, which will facilitate researchers' ability to understand the plant genomes and take advantage of genomic sequence known in one species for ide ...

Sequence Ontology

SO is a collaborative ontology project for the definition of sequence features used in biological sequence annotation. The Sequence Ontology is a set of terms and relationships used to describe the features and attributes of biological sequence. SO i ...

Insertion Sequence Finder

This database provides a list of insertion sequences (IS) isolated from bacteria and archae. It is organized into individual files containing their general features (name, size, origin, family.....) as well as their DNA and potential protein sequence ...


CRISPRCasdb acts as a gateway to a publicly accessible database and software to enable the easy detection of CRISPR sequences in locally-produced data and the consultation of CRISPR sequence data present in the database. It also gives information on ...

Genomes OnLine Database

The Genomes Online Database provides access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world. Information in GOLD is organized into four levels: Study, Biosample/Organism, Sequencing ...

Gene Disruption Project Database

The GDP Database provides a public resource of gene disruptions of Drosophila genes using a single transposable element.

NCBI BioSample

The NCBI BioSample database stores submitter-supplied descriptive information, or metadata, about the biological materials from which data stored in NCBI’s primary data archives are derived. NCBI’s archives host data from diverse types of samples fro ...

Aspergillus Genome Database

The Aspergillus Genome Database is a resource for genomic sequence data as well as gene and protein information for Aspergilli. This publicly available repository is a central point of access to genome, transcriptome and polymorphism data for the fun ...


GeneDB is a genome database for prokaryotic and eukaryotic organisms and provides a portal through which data generated by the "Pathogen Genomics" group at the Wellcome Trust Sanger Institute and other collaborating sequencing centres can be accessed ...

The Vertebrate Genome Annotation Database

The Vertebrate Genome Annotation (VEGA) database is a central repository for high quality manual annotation of vertebrate finished genome sequence.


BAliBASE; a benchmark alignment database, including enhancements for repeats, transmembrane sequences and circular permutations.


HOGENOM is a phylogenomic database providing families of homologous genes and associated phylogenetic trees (and sequence alignments) for a wide set sequenced organisms.

Minimotif Miner 3.0

A database of short functional motifs involved in posttranslational modifications, binding to other proteins, nucleic acids, or small molecules.

Binary Alignment Map Format

BAM is the compressed binary version of the Sequence Alignment/Map (SAM) format, a compact and indexable representation of nucleotide sequence alignments. Many next-generation sequencing and analysis tools work with SAM/BAM. For custom track display, ...


Peroxibase provides access to peroxidase sequences from all kingdoms of life, and provides a series of bioinformatics tools and facilities suitable for analysing these sequences.

Mitochondrial Disease Sequence Data Resource

The Mitochondrial Disease Sequence Data Resource (MSeqDR) is a centralized genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phen ...

Saccharomyces cerevisiae Transcription Factor Database

ScerTF is a database of position weight matrices (PWMs) for transcription factors in Saccharomyces species. It identifies a single matrix for each TF that best predicts in vivo data, providing metrics related to the performance of that matrix in accu ...

MycoBrowser tuberculosis

Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmat ...

MycoBrowser leprae

Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmat ...

MycoBrowser smegmatis

Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmat ...

MycoBrowser marinum

Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmat ...


TargetTrack, a target registration database, provides information on the experimental progress and status of targets selected for structure determination.

SitEx database of eukaryotic protein functional sites

SitEx is a database containing information on eukaryotic protein functional sites. It stores the amino acid sequence positions in the functional site, in relation to the exon structure of encoding gene This can be used to detect the exons involved in ...


SCPortalen is a single-cell database created to facilitate and enable researchers to access and explore published single-cell datasets. It integrates human and mouse single-cell transcriptomics datasets, single-cell metadata, cell images and sequence ...


CentrosomeDB is a collection of human and drosophila centrosomal genes that were reported in the literature and other sources. The database offers the possibility to study the evolution, function, and structure of the centrosome. They have compiled i ...

Short Read Archive eXtensible Markup Language

The SRA data model contains the following objects: Study: information about the sequencing project Sample: information about the sequenced samples Experiment: information about the libraries, platform; associated with study, sample(s) and run(s) Run: ...

Prokaryotic Glycoproteins Database

ProGlycProt (Prokaryotic Glycoproteins) is a manually curated, comprehensive repository of experimentally characterized eubacterial and archaeal glycoproteins, generated from an exhaustive literature search. This is the focused beginning of an effort ...

Global Initiative on Sharing Avian Influenza Data

The GISAID Initiative promotes the international sharing of all influenza virus sequences, related clinical and epidemiological data associated with human viruses, and geographical as well as species-specific data associated with avian and other anim ...

Structural and functional annotation of Arabidopsis thaliana gene and protein families

GeneFarm is a database whose purpose is to store traceable annotations for Arabidopsis nuclear genes and gene products.

Functional Coverage of the Proteome

FCP is a publicly accessible web tool dedicated to analysing the current state and trends on the population of available structures along the classification schemes of enzymes and nuclear receptors, offering both graphical and quantitative data on th ...

INSD sequence record XML

The International Nucleotide Sequence Database Collaboration (INSDC) is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. INSDC covers the spectrum of data raw reads, though alignments and assemblies to functional ...

Patent Data Resources

Patent data resources at the EBI contain patent abstracts, patent chemical compounds, patent sequences and patent equivalents. Multiple sets of patent sequences are available at EBI. Patent proteins cover sequences of EPO (European Patent Office) pro ...

New Hampshire eXtended Format

NHX is based on the New Hampshire (NH) standard (also called "Newick tree format").

Newick tree Format

The Newick Standard for representing trees in computer-readable form makes use of the correspondence between trees and nested parentheses, noticed in 1857 by the famous English mathematician Arthur Cayley.

Multiple Alignment Format

The Multiple Alignment Format stores DNA level multiple alignments in an easily readable format between entire genomes. Unlike previous formats this resource can cope with forward and reverse strand directions, multiple pieces to the alignment, and s ...

Stockholm Multiple Alignment Format

The "Stockholm" format is a system for marking up features in a multiple alignment. These mark-up annotations are preceded by a 'magic' label, of which there are four types. The Stockholm format is used by HMMER, Pfam, and Belvu.

ENA Sequence Flat File Format

ENA Sequence Flat File Format is a standardised plain text format for nucleotide sequences. This format was previously called the EMBL Sequence Flat File Format.

NCBI Trace Archives

The Trace Archives includes the following archives: The Sequence Read Archive (SRA) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw ...

Access to Biological Collection Data DNA extension

ABCDDNA is a theme specific extension for ABCD (Access to Biological Collections Data) created to facilitate storage and exchange of data related to DNA collection units, such as DNA extraction specifics, DNA quality parameters, and data characterisi ...

GeoSpecies Ontology

This ontology was designed to help integrate species concepts with species occurrences, gene sequences, images, references and geographical information.


Discovery of Broadly Neutralizing Antibodies (bNAbs) has given a great boost to HIV vaccine research. Study of bNAbs capable of neutralizing a broad array of different HIV strains is important for a number of reasons: (i) structures of antigens co-cr ...

nucleotide inFormation binary Format

The .nib format pre-dates the .2bit format and is less compact. It describes a DNA sequence by packing two bases into each byte.

Protein InFormation Resource Format

This PIR Database File Structure and Format Specification describes the files comprising the PIR-International Protein Sequence Database and the format of each. The format has been enhanced significantly for Release 39.00 to what is referred to as "e ...

Axt Alignment Format

Axt Alignment files are produced from Blastz, an alignment tool available from Webb Miller's lab at Penn State University. The axtNet and axtChain alignments are produced by processing the alignment files with additional utilities written by Jim Kent ...

Chain Format for pairwise alignment

The chain format describes a pairwise alignment that allow gaps in both sequences simultaneously. Each set of chain alignments starts with a header line, contains one or more alignment data lines, and terminates with a blank line. The format is delib ...

Gene Prediction File Format

Gene Prediction File Format (genePred) is a table format commonly used for gene prediction tracks in the Genome Browser. Variations of genePred include standard format, extended format and a format which includes RefSeq genes with gene names.

Standard Flowgram Format

Standard flowgram format (SFF) is a binary file format used to encode results of pyrosequencing from the 454 Life Sciences platform for high-throughput sequencing. SFF files can be viewed, edited and converted with DNA Baser SFF Workbench (graphic to ...

