Nanopore signal compression benchmark data in BLOW5 format
Data files
Oct 14, 2024 version files 246.36 GB
-
nanopore_sigpress_data.tar
246.36 GB
-
README.md
1.88 KB
Abstract
This dataset is a collection of HG001 (NA12878 DNA), HG002 (NA24384 DNA) and Universal Human Reference RNA (UHR) reference samples sequenced on different nanopore devices and flowcell combinations from Oxford Nanopore Technologies devices. The intended use case is (but not limited) to benchmarking different nanopore signal compression methods. The original data in the FAST5/POD5 format was converted to BLOW5 format using slow5tools.
Description of the data and file structure
This dataset is a collection of HG001 (NA12878 DNA), HG002 (NA24384 DNA) and Universal Human Reference RNA (UHR) reference samples sequenced on different nanopore devices and flowcell combinations from Oxford Nanopore Technologies devices. The intended use case is (but not limited) to benchmarking different nanopore signal compression methods. The original data in the FAST5/POD5 format was converted to BLOW5 format using slow5tools.
extract the .tar file using the following command:
tar xf nanopore_sigpress_data.tar
You should see the following BLOW5 files:
# Reads belonging to chr22 of a HG001 sample sequenced on a PromethION using an R9.4.1 flowcell
nanopore_sigpress_data/hg001_dna_r9.4.1_prom_chr22/na12878_prom_merged_r9.4.1_chr22.blow5
# A HG002 sample sequenced on a MinION using an R10.4.1 flowcell (5KHz sampling rate)
nanopore_sigpress_data/hg002_dna_r10.4.1_min_5khz/MGXZXX230413_reads.blow5
# Reads belonging to chr22 of a HG002 sample sequenced on a PromethION using an R10.4.1 flowcell (4KHz sampling rate)
nanopore_sigpress_data/hg002_dna_r10.4.1_prom_4khz_chr22/PGXX22394_reads_chr22.blow5
# Reads belonging to chr22 of a HG002 sample sequenced on a PromethION using an R10.4.1 flowcell (5KHz sampling rate)
nanopore_sigpress_data/hg002_dna_r10.4.1_prom_5khz_chr22/PGXXXX230339_reads_chr22.blow5
# A UHR sample sequenced on a MinION using an R9.4.1 flowcell
nanopore_sigpress_data/uhr_rna_r9.4.1_prom/PRPN119035_reads.blow5
# A subset of 500,000 reads of a UHR sample sequenced on a MinION using an RP4 (RNA004) flowcell
nanopore_sigpress_data/uhr_rna_rna004_prom_500k_reads/PNXRXX240011_reads_500k.blow5
