CONSULT: accurate contamination removal using locality-sensitive hashing
Data files
Mar 29, 2024 version files 231.12 GB
-
annotations.tar.gz
1.89 MB
-
Dros_fastq_af_bbmerge.tar
471.71 MB
-
Dros_fastq_af_consult_filt.tar
495 MB
-
Dros_fastq_af_consult_GTDBfilt_p3c1.tar.gz
495.39 MB
-
Dros_fastq_af_consult_GTDBfilt_p3c2.tar
516.16 MB
-
Dros_fastq_af_consult_GTDBfilt_p4c2.tar
512.38 MB
-
Dros_fastq_af_human_removed.tar
528.38 MB
-
Dros_fastq_af_kraken_filt.tar
461.68 MB
-
Dros_fastq_af_kraken_GTDBfilt_c0.00.tar
461.07 MB
-
Dros_fastq_af_kraken_GTDBfilt_c0.04.tar
504.72 MB
-
Dros_fastq_af_kraken_GTDBfilt_c0.04.tar.gz
505.12 MB
-
Dros_fastq_af_kraken_GTDBfilt_c0.05.tar
509.46 MB
-
excluded_fna_fq_downSmpl10M.tar
810.43 MB
-
fastq_affilt.tar.gz
49.84 GB
-
filt_fastq.tar.gz
2.72 GB
-
getorganelle_afterfilt.tar.gz
9.33 GB
-
getorganelle_beforefilt.tar.gz
26.27 GB
-
gorg_all_queries.tar.gz
4.95 GB
-
gorg_conf0.00_kraken.tar.gz
351.36 MB
-
gorg_conf0.02_kraken.tar.gz
264.48 MB
-
gorg_conf0.04_krakenGTDB.tar.gz
68.25 MB
-
gorg_conf0.05_kraken.tar.gz
44.44 MB
-
README.md
20.74 KB
-
tree_of_life_noViral_unmasked_k35_l31_s7_cp.tar.gz
37.03 GB
-
tree_of_life_noViral_unmasked_k35_l31_s7_customtax_cp.tar.gz
37.41 GB
-
unfiltered_fastq.tar.gz
56.57 GB
Abstract
A fundamental question appears in many bioinformatics applications: Does a sequencing read belong to a large dataset of genomes from some broad taxonomic group, even when the closest match in the set is evolutionarily divergent from the query? For example, low-coverage genome sequencing (skimming) projects either assemble the organelle genome or compute genomic distances directly from unassembled reads. Using unassembled reads needs contamination detection because samples often include reads from unintended groups of species. Similarly, assembling the organelle genome needs distinguishing organelle and nuclear reads. While k-mer-based methods have shown promise in read-matching, prior studies have shown that existing methods are insufficiently sensitive for contamination detection. Here, we introduce a new read-matching tool called CONSULT that tests whether k-mers from a query fall within a user-specified distance of the reference dataset using locality-sensitive hashing. Taking advantage of large memory machines available nowadays, CONSULT libraries accommodate tens of thousands of microbial species. Our results show that CONSULT has higher true-positive and lower false-positive rates of contamination detection than leading methods such as Kraken-II and improves distance calculation from genome skims. We also demonstrate that CONSULT can distinguish organelle reads from nuclear reads, leading to dramatic improvements in skim-based mitochondrial assemblies.
Date belonging to the following paper:
- Rachtman, E., Bafna, V., & Mirarab, S. (2021). CONSULT: accurate contamination removal using locality-sensitive hashing. NAR Genomics and
Bioinformatics. doi:10.1093/nargab/lqab071
Description of the data and file structure
Drosophila data
Genome and genome skims used for real Drosophila data analysis are provided.
Before clean-up
Dros_fastq_af_bbmerge.tar
This file contains deduplicated reads for Drosophila species before clean-up
It contains the following Drosophila species in fq format:
sub_Drosophila_ananassae_2.fq.gz
: Drosophila ananassaesub_Drosophila_biarmipes_2.fq.gz
: Drosophila biarmipessub_Drosophila_bipectinata_2.fq.gz
: Drosophila bipectinatasub_Drosophila_erecta_2.fq.gz
: Drosophila erectasub_Drosophila_eugracilis_2.fq.gz
: Drosophila eugracilissub_Drosophila_mauritiana_2.fq.gz
: Drosophila mauritianasub_Drosophila_mojavensis_2.fq.gz
: Drosophila mojavensissub_Drosophila_persimilis_2.fq.gz
: Drosophila persimilissub_Drosophila_pseudoobscura_2.fq.gz
: Drosophila pseudoobscurasub_Drosophila_sechellia_2.fq.gz
: Drosophila sechelliasub_Drosophila_simulans_2.fq.gz
: Drosophila simulanssub_Drosophila_virilis_2.fq.gz
: Drosophila virilissub_Drosophila_willistoni_2.fq.gz
: Drosophila willistonisub_Drosophila_yakuba_2.fq.gz
: Drosophila yakuba
Dros_fastq_af_human_removed.tar
This file contains reads for Drosophila species before clean-up but after the removal of human reads. It contains the following Drosophila species
in fq format:
ucseq_sub_Drosophila_ananassae_2.fq.gz
: Drosophila ananassaeucseq_sub_Drosophila_biarmipes_2.fq.gz
: Drosophila biarmipesucseq_sub_Drosophila_bipectinata_2.fq.gz
: Drosophila bipectinataucseq_sub_Drosophila_erecta_2.fq.gz
: Drosophila erectaucseq_sub_Drosophila_eugracilis_2.fq.gz
: Drosophila eugracilisucseq_sub_Drosophila_mauritiana_2.fq.gz
: Drosophila mauritianaucseq_sub_Drosophila_mojavensis_2.fq.gz
: Drosophila mojavensisucseq_sub_Drosophila_persimilis_2.fq.gz
: Drosophila persimilisucseq_sub_Drosophila_pseudoobscura_2.fq.gz
: Drosophila pseudoobscuraucseq_sub_Drosophila_sechellia_2.fq.gz
: Drosophila sechelliaucseq_sub_Drosophila_simulans_2.fq.gz
: Drosophila simulansucseq_sub_Drosophila_virilis_2.fq.gz
: Drosophila virilisucseq_sub_Drosophila_willistoni_2.fq.gz
: Drosophila willistoniucseq_sub_Drosophila_yakuba_2.fq.gz
: Drosophila yakuba
After filtering
Dros_fastq_af_consult_filt.tar
This file contains Drosophila fastq after filtering with CONSULT.
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassaeucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipesucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinataucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erectaucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilisucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritianaucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensisucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilisucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscuraucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechelliaucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulansucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilisucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoniucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_consult_GTDBfilt_p3c1.tar.gz
This file contains Drosophila species filtered with consult against GTDB with settings p = 3, c = 1
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassaeucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipesucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinataucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erectaucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilisucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritianaucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensisucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilisucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscuraucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechelliaucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulansucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilisucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoniucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_consult_GTDBfilt_p3c2.tar.gz
This file contains Drosophila species filtered with consult against GTDB with settings p=3, c = 2
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassaeucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipesucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinataucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erectaucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilisucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritianaucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensisucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilisucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscuraucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechelliaucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulansucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilisucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoniucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_consult_GTDBfilt_p4c2.tar
This file contains Drosophila species filtered with consult against GTDB with settings p = 4, c = 2
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassaeucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipesucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinataucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erectaucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilisucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritianaucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensisucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilisucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscuraucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechelliaucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulansucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilisucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoniucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_kraken_filt.tar
This file contains Drosophila fastq after filtering with Kraken
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassaeucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipesucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinataucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erectaucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilisucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritianaucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensisucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilisucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscuraucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechelliaucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulansucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilisucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoniucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_kraken_GTDBfilt_c0.00.tar
This file contains Drosophila reads after filtering with Kraken against GTDB with confidence 0.0:
dimtrx_kraken_gtdbcusTax_Drosfilt_conf0.0.txt
: Distance matrixucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassaeucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipesucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinataucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erectaucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilisucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritianaucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensisucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilisucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscuraucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechelliaucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulansucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilisucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoniucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_kraken_GTDBfilt_c0.04.tar.gz
This file contains Drosophila reads after filtering with Kraken against GTDB with confidence 0.4:
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassaeucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipesucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinataucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erectaucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilisucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritianaucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensisucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilisucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscuraucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechelliaucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulansucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilisucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoniucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
Dros_fastq_af_kraken_GTDBfilt_c0.05.tar
This file contains Drosophila reads after filtering with Kraken against GTDB with confidence 0.05:
ucseq_ucseq_sub_Drosophila_ananassae_2.fq
: Drosophila ananassaeucseq_ucseq_sub_Drosophila_biarmipes_2.fq
: Drosophila biarmipesucseq_ucseq_sub_Drosophila_bipectinata_2.fq
: Drosophila bipectinataucseq_ucseq_sub_Drosophila_erecta_2.fq
: Drosophila erectaucseq_ucseq_sub_Drosophila_eugracilis_2.fq
: Drosophila eugracilisucseq_ucseq_sub_Drosophila_mauritiana_2.fq
: Drosophila mauritianaucseq_ucseq_sub_Drosophila_mojavensis_2.fq
: Drosophila mojavensisucseq_ucseq_sub_Drosophila_persimilis_2.fq
: Drosophila persimilisucseq_ucseq_sub_Drosophila_pseudoobscura_2.fq
: Drosophila pseudoobscuraucseq_ucseq_sub_Drosophila_sechellia_2.fq
: Drosophila sechelliaucseq_ucseq_sub_Drosophila_simulans_2.fq
: Drosophila simulansucseq_ucseq_sub_Drosophila_virilis_2.fq
: Drosophila virilisucseq_ucseq_sub_Drosophila_willistoni_2.fq
: Drosophila willistoniucseq_ucseq_sub_Drosophila_yakuba_2.fq
: Drosophila yakuba
GORG Dataset
We provide query summary reports (Kraken output) for GORG samples searched against TOL, GTDB and Bact/Arch Kraken using Kraken:
gorg_conf0.00_kraken.tar.gz
: conf = 0.00gorg_conf0.02_kraken.tar.gz
: conf = 0.02gorg_conf0.05_kraken.tar.gz
: conf = 0.05gorg_conf0.04_krakenGTDB.tar.gz
: conf = 0.04; note that, as shown in the paper, this threshold was only run for the GTDB dataset, which did not have enough coverage with other thresholds. Thus, only GTDB is included in this directory.
We also provide the query sets used during testing:
gorg_all_queries.tar.gz
: Contains the fastq files including the GORG query set. Each.fq
file is one query from GORG.
Mitochondrial Dataset:
Original sequencing data:
filt_fastq.tar.gz
: Contains filtered reads. Each fastq file is one query. These are after filtering, as described in the paper.unfiltered_fastq.tar.gz
: Contains unfiltered reads
Chloroplast Data:
Original sequencing data:
fastq_affilt.tar.gz
: Contains filtered reads
Chloroplast assemblies using various assembly methods:
- filtered_spades: Spades applied to filtered reads
- seed: Seed and extend method
- base directory: get organelle
Each nested folder of the following directories includes results of getOrganelle (log files, assemblies in fasta format, etc.). Note: refer to https://github.com/Kinggerm/GetOrganelle for the description of files included in GetOrganelle results.
getorganelle_afterfilt.tar.gz
: obtained from filtered reads using getOrganellegetorganelle_beforefilt.tar.gz
: obtained from unfiltered reads using getOrganelle
Chloroplast annotations:
annotations.tar.gz
: outputs of annotation software GeSeq;- For each of the eight samples that failed to be assembled fully without filtering (ERR2114804, ERR2114804, SRR2531285, SRR5500897, SRR7685402, SRR2531285, SRR5500897, SRR7685402), we show results of both filtered
filt
and unfiltered_unfilt
annotations. *.fa
files show assemblies,.gb
shows annotation results, and.jpg
are drawings of the annotations.
- For each of the eight samples that failed to be assembled fully without filtering (ERR2114804, ERR2114804, SRR2531285, SRR5500897, SRR7685402, SRR2531285, SRR5500897, SRR7685402), we show results of both filtered
Bacterial Simulated queries
excluded_fna_fq_downSmpl10M.tar
This file contains query samples for TOL query set used in the study
10x_Cca.fq
: Carya cathayensis10x_Cil.fq
: Carya illinoinensis10x_Oryza_sativa.fq
: Plant Oryza sativa10x_Prunus_persica.fq
: Plant Prunus persica15x_Arabidopsis_lyrata.fq
: Plant Arabidopsis lyrata20x_Arabidopsis_thaliana.fq
: Plant Arabidopsis thaliana250x_Bathycoccus_prasinos.fq
: Plant Bathycoccus prasinos2x_Nicotiana_sylvestris.fq
: Plant Nicotiana sylvestris2x_Zea_mays.fq
: Plant Zea mays5x_Coffee_arabica.fq
: Plant Coffee arabicaG000007005.fq
: Bacterial/Archaeal speciesG000007185.fq
: Bacterial/Archaeal speciesG000009965.fq
: Bacterial/Archaeal speciesG000011125.fq
: Bacterial/Archaeal speciesG000016385.fq
: Bacterial/Archaeal speciesG000016525.fq
: Bacterial/Archaeal speciesG000017185.fq
: Bacterial/Archaeal speciesG000018365.fq
: Bacterial/Archaeal speciesG000019605.fq
: Bacterial/Archaeal speciesG000022365.fq
: Bacterial/Archaeal speciesG000024305.fq
: Bacterial/Archaeal speciesG000091665.fq
: Bacterial/Archaeal speciesG000145295.fq
: Bacterial/Archaeal speciesG000151105.fq
: Bacterial/Archaeal speciesG000166095.fq
: Bacterial/Archaeal speciesG000173675.fq
: Bacterial/Archaeal speciesG000186365.fq
: Bacterial/Archaeal speciesG000189555.fq
: Bacterial/Archaeal speciesG000190155.fq
: Bacterial/Archaeal speciesG000195935.fq
: Bacterial/Archaeal speciesG000204585.fq
: Bacterial/Archaeal speciesG000215995.fq
: Bacterial/Archaeal speciesG000220645.fq
: Bacterial/Archaeal speciesG000221185.fq
: Bacterial/Archaeal speciesG000223395.fq
: Bacterial/Archaeal speciesG000231015.fq
: Bacterial/Archaeal speciesG000242875.fq
: Bacterial/Archaeal speciesG000243455.fq
: Bacterial/Archaeal speciesG000245135.fq
: Bacterial/Archaeal speciesG000253055.fq
: Bacterial/Archaeal speciesG000264495.fq
: Bacterial/Archaeal speciesG000302455.fq
: Bacterial/Archaeal speciesG000307305.fq
: Bacterial/Archaeal speciesG000317795.fq
: Bacterial/Archaeal speciesG000363885.fq
: Bacterial/Archaeal speciesG000375685.fq
: Bacterial/Archaeal speciesG000389735.fq
: Bacterial/Archaeal speciesG000399765.fq
: Bacterial/Archaeal speciesG000402095.fq
: Bacterial/Archaeal speciesG000421185.fq
: Bacterial/Archaeal speciesG000422285.fq
: Bacterial/Archaeal speciesG000437835.fq
: Bacterial/Archaeal speciesG000446015.fq
: Bacterial/Archaeal speciesG000495715.fq
: Bacterial/Archaeal speciesG000730285.fq
: Bacterial/Archaeal speciesG000746745.fq
: Bacterial/Archaeal speciesG000770635.fq
: Bacterial/Archaeal speciesG000816105.fq
: Bacterial/Archaeal speciesG000830275.fq
: Bacterial/Archaeal speciesG000830295.fq
: Bacterial/Archaeal speciesG000875775.fq
: Bacterial/Archaeal speciesG000955905.fq
: Bacterial/Archaeal speciesG000966265.fq
: Bacterial/Archaeal speciesG001004105.fq
: Bacterial/Archaeal speciesG001189275.fq
: Bacterial/Archaeal speciesG001315825.fq
: Bacterial/Archaeal speciesG001316025.fq
: Bacterial/Archaeal speciesG001316045.fq
: Bacterial/Archaeal speciesG001316145.fq
: Bacterial/Archaeal speciesG001316265.fq
: Bacterial/Archaeal speciesG001317345.fq
: Bacterial/Archaeal speciesG001399695.fq
: Bacterial/Archaeal speciesG001399795.fq
: Bacterial/Archaeal speciesG001402855.fq
: Bacterial/Archaeal speciesG001412615.fq
: Bacterial/Archaeal speciesG001438895.fq
: Bacterial/Archaeal speciesG001481595.fq
: Bacterial/Archaeal speciesG001484685.fq
: Bacterial/Archaeal speciesG001507935.fq
: Bacterial/Archaeal speciesG001508175.fq
: Bacterial/Archaeal speciesG001510225.fq
: Bacterial/Archaeal speciesG001510275.fq
: Bacterial/Archaeal speciesG001510295.fq
: Bacterial/Archaeal speciesG001515215.fq
: Bacterial/Archaeal speciesG001516665.fq
: Bacterial/Archaeal speciesG001516725.fq
: Bacterial/Archaeal speciesG001516745.fq
: Bacterial/Archaeal speciesG001560165.fq
: Bacterial/Archaeal speciesG001560565.fq
: Bacterial/Archaeal speciesG001563335.fq
: Bacterial/Archaeal speciesG001577775.fq
: Bacterial/Archaeal speciesG001587655.fq
: Bacterial/Archaeal speciesG001593925.fq
: Bacterial/Archaeal speciesG001595885.fq
: Bacterial/Archaeal speciesG001627075.fq
: Bacterial/Archaeal speciesG001628455.fq
: Bacterial/Archaeal speciesG001628475.fq
: Bacterial/Archaeal speciesG001674955.fq
: Bacterial/Archaeal speciesG001679155.fq
: Bacterial/Archaeal speciesG001685465.fq
: Bacterial/Archaeal speciesG001717005.fq
: Bacterial/Archaeal speciesG001723845.fq
: Bacterial/Archaeal speciesG001729285.fq
: Bacterial/Archaeal speciesG001776015.fq
: Bacterial/Archaeal speciesG001856825.fq
: Bacterial/Archaeal speciesG001870125.fq
: Bacterial/Archaeal speciesG001887595.fq
: Bacterial/Archaeal speciesG001914405.fq
: Bacterial/Archaeal speciesG001918455.fq
: Bacterial/Archaeal speciesG001918475.fq
: Bacterial/Archaeal speciesG001919175.fq
: Bacterial/Archaeal speciesG001920575.fq
: Bacterial/Archaeal speciesG001940645.fq
: Bacterial/Archaeal speciesG001940655.fq
: Bacterial/Archaeal speciesG001940665.fq
: Bacterial/Archaeal speciesG002009975.fq
: Bacterial/Archaeal speciesG002011035.fq
: Bacterial/Archaeal speciesG002011075.fq
: Bacterial/Archaeal speciesG900109425.fq
: Bacterial/Archaeal speciesG900156635.fq
: Bacterial/Archaeal species
Reference Libraries
Custom Kraken libraries constructed using different genomic reference sets are provided
kraken_db_gtdb_genomes_reps_r95_k35l31s7_cp.tar.gz
: GTDB datasets with default Kraken taxonomy; this file is too big to be included here and is instead made available on https://skmer.ucsd.edu/data/consult/kraken/tree_of_life_noViral_unmasked_k35_l31_s7_cp.tar.gz
: TOL with default Kraken taxonomytree_of_life_noViral_unmasked_k35_l31_s7_customtax_cp.tar.gz
: TOL with custom taxonomy
Sharing/Access information
See more on