CONSULT: accurate contamination removal using locality-sensitive hashing
Data files
Mar 29, 2024 version files 231.12 GB
Abstract
A fundamental question appears in many bioinformatics applications: Does a sequencing read belong to a large dataset of genomes from some broad taxonomic group, even when the closest match in the set is evolutionarily divergent from the query? For example, low-coverage genome sequencing (skimming) projects either assemble the organelle genome or compute genomic distances directly from unassembled reads. Using unassembled reads needs contamination detection because samples often include reads from unintended groups of species. Similarly, assembling the organelle genome needs distinguishing organelle and nuclear reads. While k-mer-based methods have shown promise in read-matching, prior studies have shown that existing methods are insufficiently sensitive for contamination detection. Here, we introduce a new read-matching tool called CONSULT that tests whether k-mers from a query fall within a user-specified distance of the reference dataset using locality-sensitive hashing. Taking advantage of large memory machines available nowadays, CONSULT libraries accommodate tens of thousands of microbial species. Our results show that CONSULT has higher true-positive and lower false-positive rates of contamination detection than leading methods such as Kraken-II and improves distance calculation from genome skims. We also demonstrate that CONSULT can distinguish organelle reads from nuclear reads, leading to dramatic improvements in skim-based mitochondrial assemblies.
README: Access to the data used for CONSULT benchmarking
Date belonging to the following paper:
- Rachtman, E., Bafna, V., & Mirarab, S. (2021). CONSULT: accurate contamination removal using locality-sensitive hashing. NAR Genomics and Bioinformatics. doi:10.1093/nargab/lqab071
Description of the data and file structure
Drosophila data
Genome and genome skims used for real Drosophila data analysis are provided.
Before clean-up
Dros_fastq_af_bbmerge.tar
This file contains deduplicated reads for Drosophila species before clean-up
It contains the following Drosophila species in fq format:
-
sub_Drosophila_ananassae_2.fq.gz
: Drosophila ananassae -
sub_Drosophila_biarmipes_2.fq.gz
: Drosophila biarmipes -
sub_Drosophila_bipectinata_2.fq.gz
: Drosophila bipectinata -
sub_Drosophila_erecta_2.fq.gz
: Drosophila erecta -
sub_Drosophila_eugracilis_2.fq.gz
: Drosophila eugracilis -
sub_Drosophila_mauritiana_2.fq.gz
: Drosophila mauritiana -
sub_Drosophila_mojavensis_2.fq.gz
: Drosophila mojavensis -
sub_Drosophila_persimilis_2.fq.gz
: Drosophila persimilis -
sub_Drosophila_pseudoobscura_2.fq.gz
: Drosophila pseudoobscura -
sub_Drosophila_sechellia_2.fq.gz
: Drosophila sechellia -
sub_Drosophila_simulans_2.fq.gz
: Drosophila simulans -
sub_Drosophila_virilis_2.fq.gz
: Drosophila virilis -
sub_Drosophila_willistoni_2.fq.gz
: Drosophila willistoni -
sub_Drosophila_yakuba_2.fq.gz
: Drosophila yakuba
Dros_fastq_af_human_removed.tar
This file contains reads for Drosophila species before clean-up but after the removal of human reads. It contains the following Drosophila species
in fq format:
-
ucseq_sub_Drosophila_ananassae_2.fq.gz
: Drosophila ananassae -
ucseq_sub_Drosophila_biarmipes_2.fq.gz
: Drosophila biarmipes -
ucseq_sub_Drosophila_bipectinata_2.fq.gz
: Drosophila bipectinata -
ucseq_sub_Drosophila_erecta_2.fq.gz
: Drosophila erecta -
ucseq_sub_Drosophila_eugracilis_2.fq.gz
: Drosophila eugracilis -
ucseq_sub_Drosophila_mauritiana_2.fq.gz
: Drosophila mauritiana -
ucseq_sub_Drosophila_mojavensis_2.fq.gz
: Drosophila mojavensis -
ucseq_sub_Drosophila_persimilis_2.fq.gz
: Drosophila persimilis -
ucseq_sub_Drosophila_pseudoobscura_2.fq.gz
: Drosophila pseudoobscura -
ucseq_sub_Drosophila_sechellia_2.fq.gz
: Drosophila sechellia -
ucseq_sub_Drosophila_simulans_2.fq.gz
: Drosophila simulans -
ucseq_sub_Drosophila_virilis_2.fq.gz
: Drosophila virilis -
ucseq_sub_Drosophila_willistoni_2.fq.gz
: Drosophila willistoni -
ucseq_sub_Drosophila_yakuba_2.fq.gz
: Drosophila yakuba
After filtering
Dros_fastq_af_consult_filt.tar
This file contains Drosophila fastq after filtering with CONSULT.
-
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassae -
ucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipes -
ucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinata -
ucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erecta -
ucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilis -
ucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritiana -
ucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensis -
ucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilis -
ucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscura -
ucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechellia -
ucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulans -
ucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilis -
ucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoni -
ucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_consult_GTDBfilt_p3c1.tar.gz
This file contains Drosophila species filtered with consult against GTDB with settings p = 3, c = 1
-
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassae -
ucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipes -
ucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinata -
ucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erecta -
ucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilis -
ucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritiana -
ucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensis -
ucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilis -
ucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscura -
ucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechellia -
ucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulans -
ucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilis -
ucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoni -
ucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_consult_GTDBfilt_p3c2.tar.gz
This file contains Drosophila species filtered with consult against GTDB with settings p=3, c = 2
-
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassae -
ucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipes -
ucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinata -
ucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erecta -
ucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilis -
ucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritiana -
ucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensis -
ucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilis -
ucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscura -
ucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechellia -
ucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulans -
ucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilis -
ucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoni -
ucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_consult_GTDBfilt_p4c2.tar
This file contains Drosophila species filtered with consult against GTDB with settings p = 4, c = 2
-
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassae -
ucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipes -
ucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinata -
ucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erecta -
ucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilis -
ucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritiana -
ucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensis -
ucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilis -
ucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscura -
ucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechellia -
ucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulans -
ucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilis -
ucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoni -
ucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_kraken_filt.tar
This file contains Drosophila fastq after filtering with Kraken
-
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassae -
ucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipes -
ucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinata -
ucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erecta -
ucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilis -
ucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritiana -
ucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensis -
ucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilis -
ucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscura -
ucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechellia -
ucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulans -
ucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilis -
ucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoni -
ucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_kraken_GTDBfilt_c0.00.tar
This file contains Drosophila reads after filtering with Kraken against GTDB with confidence 0.0:
-
dimtrx_kraken_gtdbcusTax_Drosfilt_conf0.0.txt
: Distance matrix -
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassae -
ucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipes -
ucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinata -
ucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erecta -
ucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilis -
ucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritiana -
ucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensis -
ucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilis -
ucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscura -
ucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechellia -
ucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulans -
ucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilis -
ucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoni -
ucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_kraken_GTDBfilt_c0.04.tar.gz
This file contains Drosophila reads after filtering with Kraken against GTDB with confidence 0.4:
-
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassae -
ucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipes -
ucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinata -
ucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erecta -
ucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilis -
ucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritiana -
ucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensis -
ucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilis -
ucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscura -
ucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechellia -
ucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulans -
ucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilis -
ucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoni -
ucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_kraken_GTDBfilt_c0.05.tar
This file contains Drosophila reads after filtering with Kraken against GTDB with confidence 0.05:
-
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassae -
ucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipes -
ucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinata -
ucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erecta -
ucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilis -
ucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritiana -
ucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensis -
ucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilis -
ucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscura -
ucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechellia -
ucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulans -
ucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilis -
ucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoni -
ucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
GORG Dataset
We provide query summary reports (Kraken output) for GORG samples searched against TOL, GTDB and Bact/Arch Kraken using Kraken:
-
gorg_conf0.00_kraken.tar.gz
: conf = 0.00 -
gorg_conf0.02_kraken.tar.gz
: conf = 0.02 -
gorg_conf0.05_kraken.tar.gz
: conf = 0.05 -
gorg_conf0.04_krakenGTDB.tar.gz
: conf = 0.04; note that, as shown in the paper, this threshold was only run for the GTDB dataset, which did not have enough coverage with other thresholds. Thus, only GTDB is included in this directory.
We also provide the query sets used during testing:
-
gorg_all_queries.tar.gz
: Contains the fastq files including the GORG query set. Each.fq
file is one query from GORG.
Mitochondrial Dataset:
Original sequencing data:
-
filt_fastq.tar.gz
: Contains filtered reads. Each fastq file is one query. These are after filtering, as described in the paper. -
unfiltered_fastq.tar.gz
: Contains unfiltered reads
Chloroplast Data:
Original sequencing data:
-
fastq_affilt.tar.gz
: Contains filtered reads
Chloroplast assemblies using various assembly methods:
- filtered_spades: Spades applied to filtered reads
- seed: Seed and extend method
- base directory: get organelle
Each nested folder of the following directories includes results of getOrganelle (log files, assemblies in fasta format, etc.). Note: refer to https://github.com/Kinggerm/GetOrganelle for the description of files included in GetOrganelle results.
-
getorganelle_afterfilt.tar.gz
: obtained from filtered reads using getOrganelle -
getorganelle_beforefilt.tar.gz
: obtained from unfiltered reads using getOrganelle
Chloroplast annotations:
-
annotations.tar.gz
: outputs of annotation software GeSeq;- For each of the eight samples that failed to be assembled fully without filtering (ERR2114804, ERR2114804, SRR2531285, SRR5500897, SRR7685402, SRR2531285, SRR5500897, SRR7685402), we show results of both filtered
filt
and unfiltered_unfilt
annotations. -
*.fa
files show assemblies,.gb
shows annotation results, and.jpg
are drawings of the annotations.
- For each of the eight samples that failed to be assembled fully without filtering (ERR2114804, ERR2114804, SRR2531285, SRR5500897, SRR7685402, SRR2531285, SRR5500897, SRR7685402), we show results of both filtered
Bacterial Simulated queries
excluded_fna_fq_downSmpl10M.tar
This file contains query samples for TOL query set used in the study
-
10x_Cca.fq
: Carya cathayensis -
10x_Cil.fq
: Carya illinoinensis -
10x_Oryza_sativa.fq
: Plant Oryza sativa -
10x_Prunus_persica.fq
: Plant Prunus persica -
15x_Arabidopsis_lyrata.fq
: Plant Arabidopsis lyrata -
20x_Arabidopsis_thaliana.fq
: Plant Arabidopsis thaliana -
250x_Bathycoccus_prasinos.fq
: Plant Bathycoccus prasinos -
2x_Nicotiana_sylvestris.fq
: Plant Nicotiana sylvestris -
2x_Zea_mays.fq
: Plant Zea mays -
5x_Coffee_arabica.fq
: Plant Coffee arabica -
G000007005.fq
: Bacterial/Archaeal species -
G000007185.fq
: Bacterial/Archaeal species -
G000009965.fq
: Bacterial/Archaeal species -
G000011125.fq
: Bacterial/Archaeal species -
G000016385.fq
: Bacterial/Archaeal species -
G000016525.fq
: Bacterial/Archaeal species -
G000017185.fq
: Bacterial/Archaeal species -
G000018365.fq
: Bacterial/Archaeal species -
G000019605.fq
: Bacterial/Archaeal species -
G000022365.fq
: Bacterial/Archaeal species -
G000024305.fq
: Bacterial/Archaeal species -
G000091665.fq
: Bacterial/Archaeal species -
G000145295.fq
: Bacterial/Archaeal species -
G000151105.fq
: Bacterial/Archaeal species -
G000166095.fq
: Bacterial/Archaeal species -
G000173675.fq
: Bacterial/Archaeal species -
G000186365.fq
: Bacterial/Archaeal species -
G000189555.fq
: Bacterial/Archaeal species -
G000190155.fq
: Bacterial/Archaeal species -
G000195935.fq
: Bacterial/Archaeal species -
G000204585.fq
: Bacterial/Archaeal species -
G000215995.fq
: Bacterial/Archaeal species -
G000220645.fq
: Bacterial/Archaeal species -
G000221185.fq
: Bacterial/Archaeal species -
G000223395.fq
: Bacterial/Archaeal species -
G000231015.fq
: Bacterial/Archaeal species -
G000242875.fq
: Bacterial/Archaeal species -
G000243455.fq
: Bacterial/Archaeal species -
G000245135.fq
: Bacterial/Archaeal species -
G000253055.fq
: Bacterial/Archaeal species -
G000264495.fq
: Bacterial/Archaeal species -
G000302455.fq
: Bacterial/Archaeal species -
G000307305.fq
: Bacterial/Archaeal species -
G000317795.fq
: Bacterial/Archaeal species -
G000363885.fq
: Bacterial/Archaeal species -
G000375685.fq
: Bacterial/Archaeal species -
G000389735.fq
: Bacterial/Archaeal species -
G000399765.fq
: Bacterial/Archaeal species -
G000402095.fq
: Bacterial/Archaeal species -
G000421185.fq
: Bacterial/Archaeal species -
G000422285.fq
: Bacterial/Archaeal species -
G000437835.fq
: Bacterial/Archaeal species -
G000446015.fq
: Bacterial/Archaeal species -
G000495715.fq
: Bacterial/Archaeal species -
G000730285.fq
: Bacterial/Archaeal species -
G000746745.fq
: Bacterial/Archaeal species -
G000770635.fq
: Bacterial/Archaeal species -
G000816105.fq
: Bacterial/Archaeal species -
G000830275.fq
: Bacterial/Archaeal species -
G000830295.fq
: Bacterial/Archaeal species -
G000875775.fq
: Bacterial/Archaeal species -
G000955905.fq
: Bacterial/Archaeal species -
G000966265.fq
: Bacterial/Archaeal species -
G001004105.fq
: Bacterial/Archaeal species -
G001189275.fq
: Bacterial/Archaeal species -
G001315825.fq
: Bacterial/Archaeal species -
G001316025.fq
: Bacterial/Archaeal species -
G001316045.fq
: Bacterial/Archaeal species -
G001316145.fq
: Bacterial/Archaeal species -
G001316265.fq
: Bacterial/Archaeal species -
G001317345.fq
: Bacterial/Archaeal species -
G001399695.fq
: Bacterial/Archaeal species -
G001399795.fq
: Bacterial/Archaeal species -
G001402855.fq
: Bacterial/Archaeal species -
G001412615.fq
: Bacterial/Archaeal species -
G001438895.fq
: Bacterial/Archaeal species -
G001481595.fq
: Bacterial/Archaeal species -
G001484685.fq
: Bacterial/Archaeal species -
G001507935.fq
: Bacterial/Archaeal species -
G001508175.fq
: Bacterial/Archaeal species -
G001510225.fq
: Bacterial/Archaeal species -
G001510275.fq
: Bacterial/Archaeal species -
G001510295.fq
: Bacterial/Archaeal species -
G001515215.fq
: Bacterial/Archaeal species -
G001516665.fq
: Bacterial/Archaeal species -
G001516725.fq
: Bacterial/Archaeal species -
G001516745.fq
: Bacterial/Archaeal species -
G001560165.fq
: Bacterial/Archaeal species -
G001560565.fq
: Bacterial/Archaeal species -
G001563335.fq
: Bacterial/Archaeal species -
G001577775.fq
: Bacterial/Archaeal species -
G001587655.fq
: Bacterial/Archaeal species -
G001593925.fq
: Bacterial/Archaeal species -
G001595885.fq
: Bacterial/Archaeal species -
G001627075.fq
: Bacterial/Archaeal species -
G001628455.fq
: Bacterial/Archaeal species -
G001628475.fq
: Bacterial/Archaeal species -
G001674955.fq
: Bacterial/Archaeal species -
G001679155.fq
: Bacterial/Archaeal species -
G001685465.fq
: Bacterial/Archaeal species -
G001717005.fq
: Bacterial/Archaeal species -
G001723845.fq
: Bacterial/Archaeal species -
G001729285.fq
: Bacterial/Archaeal species -
G001776015.fq
: Bacterial/Archaeal species -
G001856825.fq
: Bacterial/Archaeal species -
G001870125.fq
: Bacterial/Archaeal species -
G001887595.fq
: Bacterial/Archaeal species -
G001914405.fq
: Bacterial/Archaeal species -
G001918455.fq
: Bacterial/Archaeal species -
G001918475.fq
: Bacterial/Archaeal species -
G001919175.fq
: Bacterial/Archaeal species -
G001920575.fq
: Bacterial/Archaeal species -
G001940645.fq
: Bacterial/Archaeal species -
G001940655.fq
: Bacterial/Archaeal species -
G001940665.fq
: Bacterial/Archaeal species -
G002009975.fq
: Bacterial/Archaeal species -
G002011035.fq
: Bacterial/Archaeal species -
G002011075.fq
: Bacterial/Archaeal species -
G900109425.fq
: Bacterial/Archaeal species -
G900156635.fq
: Bacterial/Archaeal species
Reference Libraries
Custom Kraken libraries constructed using different genomic reference sets are provided
-
kraken_db_gtdb_genomes_reps_r95_k35l31s7_cp.tar.gz
: GTDB datasets with default Kraken taxonomy; this file is too big to be included here and is instead made available on https://skmer.ucsd.edu/data/consult/kraken/ -
tree_of_life_noViral_unmasked_k35_l31_s7_cp.tar.gz
: TOL with default Kraken taxonomy -
tree_of_life_noViral_unmasked_k35_l31_s7_customtax_cp.tar.gz
: TOL with custom taxonomy
Sharing/Access information
See more on