CRAM

CRAM is a sequencing read file format that is highly space efficient by using reference-based compression of sequence data and offers both lossless and lossy modes of compression. Building on early proof-of-principle for reference-based compression (Hsi-Yang Fritz, et al. (2011). Genome Res. 21:734-740), the CRAM format balances usability with compression efficiency.

Webpage:

https://www.sanger.ac.uk/science/tools/cram

Publications:

Publications
More detailed information about this field from each metasource.

Efficient storage of high throughput DNA sequencing data using reference-based compression
PMID: 21245279
metasource: Fairsharing.org
version: None

The Scramble conversion tool
PMID: 24930138
metasource: Fairsharing.org
version: None

Efficient storage of high throughput DNA sequencing data using reference-based compression PubMed citations: 123.

123 articles citing: Efficient storage of high throughput DNA sequencing data using reference-based compression

Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. PMID:35199087
GA4GH: International policies and standards for data sharing across genomic research and healthcare. PMID:35072136
CRAM 3.1: Advances in the CRAM File Format. PMID:34999766
SamQL: a structured query language and filtering tool for the SAM/BAM file format. PMID:34600480
PIGG defines the Emm blood group system. PMID:34535746
Hamming-shifting graph of genomic short reads: Efficient construction and its application for compression. PMID:34280186
Refget: standardised access to reference sequences. PMID:34260694
FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy. PMID:33752596
Megadepth: efficient coverage quantification for BigWigs and BAMs. PMID:33693500
HTSlib: C library for reading/writing high-throughput sequencing data. PMID:33594436
Twelve years of SAMtools and BCFtools. PMID:33590861
Sharp Second-Order Pointwise Asymptotics for Lossless Compression with Side Information. PMID:33286477
Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes. PMID:33265483
Efficient DNA sequence compression with neural networks. PMID:33179040
Revealing Prognosis-Related Pathways at the Individual Level by a Comprehensive Analysis of Different Cancer Transcription Data. PMID:33138076
Practical guide for managing large-scale human genome data in research. PMID:33097812
IonCRAM: a reference-based compression tool for ion torrent sequence files. PMID:32907531
Chromatin binding of FOXA1 is promoted by LSD1-mediated demethylation in prostate cancer. PMID:32868907
Towards standardization guidelines for in silico approaches in personalized medicine. PMID:32827396
A systematic comparison of pharmacogene star allele calling bioinformatics algorithms: a focus on CYP2D6 genotyping. PMID:32789024
Practical estimation of cloud storage costs for clinical genomic data. PMID:32529017
Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review. PMID:32453750
Genomic Sequencing Capacity, Data Retention, and Personal Access to Raw Data in Europe. PMID:32435258
How Can Law and Policy Advance Quality in Genomic Analysis and Interpretation for Clinical Care? PMID:32342785
Tximeta: Reference sequence checksums for provenance identification in RNA-seq. PMID:32097405
GABAC: an arithmetic coding solution for genomic data. PMID:31830243
svtools: population-scale analysis of structural variation. PMID:31218349
Mind the gap: resources required to receive, process and interpret research-returned whole genome data. PMID:31161416
Cram-JS: reference-based decompression in node and the browser. PMID:31099383
Genomic Analysis in the Age of Human Genome Sequencing. PMID:30901550
Tackling the Challenges of FASTQ Referential Compression. PMID:30792576
Alfred: interactive multi-sample BAM alignment statistics, feature counting and feature annotation for long- and short-read sequencing. PMID:30520945
TRCMGene: A two-step referential compression method for the efficient storage of genetic data. PMID:30395579
BdBG: a bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs. PMID:30364599
Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. PMID:30279509
Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application. PMID:30084865
The Terabase Search Engine: a large-scale relational database of short-read sequences. PMID:30052772
Crumble: reference free lossy compression of sequence quality values. PMID:29992288
Genomic big data hitting the storage bottleneck. PMID:29782620
Diversity of fungi associated with roots of Calanthe orchid species in Korea. PMID:29299843
CALQ: compression of quality values of aligned sequencing data. PMID:29186284
Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species. PMID:29092050
GeneComp, a new reference-based compressor for SAM files. PMID:29046896
Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification. PMID:28825060
Alignment of 1000 Genomes Project reads to reference assembly GRCh38. PMID:28531267
Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. PMID:28396521
The RNASeq-er API-a gateway to systematically updated analysis of public RNA-seq data. PMID:28369191
LW-FQZip 2: a parallelized reference-based compression of FASTQ files. PMID:28320326
Using reference-free compressed data structures to analyze sequencing reads from thousands of human genomes. PMID:27986821
ascatNgs: Identifying Somatically Acquired Copy-Number Alterations from Whole-Genome Sequencing Data. PMID:27930809
cgpCaVEManWrapper: Simple Execution of CaVEMan in Order to Detect Somatic Single Nucleotide Variants in NGS Data. PMID:27930805
A privacy-preserving solution for compressed storage and selective retrieval of genomic data. PMID:27789525
Comparison of high-throughput sequencing data compression tools. PMID:27776113
A new algorithm for "the LCS problem" with application in compressing genome resequencing data. PMID:27556803
Towards precision medicine. PMID:27528417
Fourth Generation of Next-Generation Sequencing Technologies: Promise and Consequences. PMID:27406789
Boiler: lossy compression of RNA-seq alignments using coverage vectors. PMID:27298258
Recommendations on e-infrastructures for next-generation sequencing. PMID:27267963
VariantBam: filtering and profiling of next-generational sequencing data using region-specific rules. PMID:27153727
The challenges of big data. PMID:27147249
CARGO: effective format-free compressed storage of genomic information. PMID:27131376
Novel bioinformatic developments for exome sequencing. PMID:27075447
Compressive mapping for next-generation sequencing. PMID:27054987
The real cost of sequencing: scaling computation to keep pace with data generation. PMID:27009100
Effect of lossy compression of quality scores on variant calling. PMID:26966283
MetaCRAM: an integrated pipeline for metagenomic taxonomy identification and compression. PMID:26895947
The European Bioinformatics Institute in 2016: Data growth and integration. PMID:26673705
The International Nucleotide Sequence Database Collaboration. PMID:26657633
Biological data sciences in genome research. PMID:26430150
Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. PMID:26370285
elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling. PMID:26182406
Big Data: Astronomical or Genomical? PMID:26151137
ERGC: an efficient referential genome compression algorithm. PMID:26139636
Compression of Large genomic datasets using COMRAD on Parallel Computing Platform. PMID:26124572
GDC 2: Compression of large collections of genomes. PMID:26108279
LFQC: a lossless compression algorithm for FASTQ files. PMID:26093148
Light-weight reference-based compression of FASTQ data. PMID:26051252
QVZ: lossy compression of quality values. PMID:26026138
Data-dependent bucketing improves reference-free compression of sequencing reads. PMID:25910696
Quality score compression improves genotyping accuracy. PMID:25748910
Extending reference assembly models. PMID:25651527
Reference-based compression of short-read sequences using path encoding. PMID:25649622
Streamlined Genome Sequence Compression using Distributed Source Coding. PMID:25520552
The Essential Component in DNA-Based Information Storage System: Robust Error-Tolerating Module. PMID:25414846
The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. PMID:25410596
Aligned genomic data compression via improved modeling. PMID:25395305
Fast lossless compression via cascading Bloom filters. PMID:25252952
Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution. PMID:24958926
The Scramble conversion tool. PMID:24930138
Advances in genome studies in plants and animals. PMID:24626952
ENZYMAP: exploiting protein annotation for modeling and predicting EC number changes in UniProt/Swiss-Prot. PMID:24586563
XS: a FASTQ read simulator. PMID:24433564
HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads. PMID:24368726
SRComp: short read sequence compression using burstsort and Elias omega coding. PMID:24349065
DNA-COMPACT: DNA COMpression based on a pattern-aware contextual modeling technique. PMID:24282536
The European Bioinformatics Institute's data resources 2014. PMID:24271396
Compression of structured high-throughput sequencing data. PMID:24260313
Data compression for sequencing data. PMID:24252160
Human neuroimaging as a "Big Data" science. PMID:24113873
Short read alignment with populations of genomes. PMID:23813006
QualComp: a new lossy compressor for quality scores based on rate distortion theory. PMID:23758828
Using Genome Query Language to uncover genetic variation. PMID:23751181
Sequence squeeze: an open contest for sequence compression. PMID:23596984
Computational solutions for omics data. PMID:23594911
Existing and emerging technologies for tumor genomic profiling. PMID:23589546
The future of DNA sequence archiving. PMID:23587147
Compression of FASTQ and SAM format sequencing data. PMID:23533605
Facing growth in the European Nucleotide Archive. PMID:23203883
Adaptive efficient compression of genomes. PMID:23146997
NGC: lossless and lossy compression of aligned high-throughput sequencing data. PMID:23066097
SCALCE: boosting sequence compression algorithms using locally consistent encoding. PMID:23047557
Compression of next-generation sequencing reads aided by highly efficient de novo assembly. PMID:22904078
Compressive genomics. PMID:22781691
Metagenomics - a guide from sampling to data analysis. PMID:22587947
Genomics and privacy: implications of the new reality of closed data for the field. PMID:22144881
GReEn: a tool for efficient compression of genome resequencing data. PMID:22139935
Next-generation sequencing technologies and applications for human genetic history and forensics. PMID:22115430
Major submissions tool developments at the European Nucleotide Archive. PMID:22080548
The Sequence Read Archive: explosive growth of sequencing data. PMID:22009675
ReCoil - an algorithm for compression of extremely large datasets of dna data. PMID:21988957
Developing and implementing an institute-wide data sharing policy. PMID:21955348
The real cost of sequencing: higher than you think! PMID:21867570
SEED: efficient clustering of next-generation sequences. PMID:21810899
The Scramble conversion tool PubMed citations: 16.

16 articles citing: The Scramble conversion tool

CRAM 3.1: Advances in the CRAM File Format. PMID:34999766
HTSlib: C library for reading/writing high-throughput sequencing data. PMID:33594436
Genozip - A Universal Extensible Genomic Data Compressor. PMID:33585897
IonCRAM: a reference-based compression tool for ion torrent sequence files. PMID:32907531
Genomic Sequencing Capacity, Data Retention, and Personal Access to Raw Data in Europe. PMID:32435258
GABAC: an arithmetic coding solution for genomic data. PMID:31830243
Mind the gap: resources required to receive, process and interpret research-returned whole genome data. PMID:31161416
Cram-JS: reference-based decompression in node and the browser. PMID:31099383
CALQ: compression of quality values of aligned sequencing data. PMID:29186284
GeneComp, a new reference-based compressor for SAM files. PMID:29046896
Comparison of high-throughput sequencing data compression tools. PMID:27776113
CARGO: effective format-free compressed storage of genomic information. PMID:27131376
Novel bioinformatic developments for exome sequencing. PMID:27075447
Sambamba: fast processing of NGS alignment formats. PMID:25697820
Reference-based compression of short-read sequences using path encoding. PMID:25649622
DeeZ: reference-based compression by local assembly. PMID:25357237

Tags:

Tags
More detailed information about this field from each metasource.

dna sequence data
metasource: Fairsharing.org
version: None

high throughput screening
metasource: Fairsharing.org
version: None

next generation dna sequencing
metasource: Fairsharing.org
version: None

rna sequencing
metasource: Fairsharing.org
version: None

dna sequences high throughput screening next generation dna sequencing rna sequence

More to explore:

1/20

Previous Next

Need help integrating and/or managing biomedical data?

CRAM

Webpage:

More to explore:

1/20

Minimal Information about a high throughput SEQuencing Experiment

PhenoDigm

Portable Network Graphics

European Nucleotide Archive

DDBJ Sequence Read Archive

Wellcome Trust Sanger Institute, Scientific resources

Sequence Read Archive

Minimum Information for Reporting Next Generation Sequencing Genotyping

Pre-Clustering File Format

National Omics Data Encyclopedia

Joint Photographic Experts Group Format

BioXpress

EKPD

WGE

Genome Variation Format

ArrayExpress

Wiggle Track Format

The SEQanswers wiki

International HLA and Immunogenetics Workshop XML

Bioconductor