Skip to main content
Dryad

Data from: krepp: A k-mer-based maximum pseudo-likelihood method for estimating read distances and genome-wide phylogenetic placement

Data files

Feb 03, 2026 version files 95.52 GB

Click names to download individual files Select up to 11 GB of files for zip download

Abstract

Comparing each sequencing read in a sample to a reference database is a fundamental step in wide-ranging applications. The results of these comparisons can facilitate phylogenetic characterization. However, phylogenetic placement is currently only possible at scale for marker genes, a small fraction of the genome. We introduce krepp, an alignment-free k-mer-based method that enables placing reads from anywhere on the genome on an ultra-large reference phylogeny (e.g., 123,853 leaves). This repository contains data from benchmarking experiments in which we show the scalability and accuracy of krepp. We also demonstrate the ability of our method to compare and characterize real metagenomic samples.