k-mer matrix Aegilops tauschii diversity panel (Open wild wheat consortium phase II) Part 2/3

Cavalet-Giorsa, Emile et al. (2024). k-mer matrix Aegilops tauschii diversity panel (Open wild wheat consortium phase II) Part 2/3 [Dataset]. Dryad.


Wild wheat relatives of bread wheat represent genetic diversity that can be used for wheat crop improvement. We generated a k-mer presence/absence matrix of over 920 accessions of the wild wheat Aegilops tauschii, the donor of the bread wheat D genome. This dataset was generated under the aegis of Phase II of the Open Wild Wheat Consortium (

README: Population genomics of the wild wheat Aegilops tauschii (Open Wild Wheat Consortium) - k-mer matrix PART 2/3-

Description of the data and file structure

k-mer presence/absence matrix for the Aegilops tauschii diversity panel of over 920 accessions, including genetically redundant accessions.

Resequencing of the Ae. tauschii accessions: We generated short-read whole genome sequencing data for 350 Ae. tauschii accessions.  PCR-free paired-end libraries were constructed and sequenced on an Illumina Novaseq 6000 instrument, yielding a median 8.3-fold coverage per sample (ranging from 5.87 to 16.86-fold).

k-mer matrix generation. We developed and implemented an optimised k-mer matrix workflow to generate a presence/absence k-mer matrix for large diversity panels ( We counted k-mers (k = 51) in raw sequencing data for 350 accessions generated in this study, 306 accessions published by Gaurav et al. (2022), 275 accessions published by Zhou et al. (2021) and 24 accessions by Zhao et al. (2023). Accessions with less than 5-fold sequencing coverage were discarded to avoid affecting the k-mer count. k-mers were filtered by a minimum occurrence of six across accessions and a maximum occurrence of (N-6), being N the total number of accessions. Detailed information about this k-mer matrix is found in the related pre-print by Cavalet-Giorsa, Gonzalez-Munoz and Athiyannan et al. (2023) (

The full k-mer matrix has a total size of ~729G and has been divided into 25 gzipped files. The full k-mer matrix is tab-delimited in which the first column contains 10,078,115,665 k-mers and the remaining columns show the presence (1) or absence (0) across the accessions that are listed in order in the matrix_acc945_samples_list.txt file found in part 1/3 (doi:10.5061/dryad.p5hqbzkvx).

This dataset contains gzipped files 9 through 17 of the full k-mer matrix, as follow:










The first and third parts of the k-mer matrix are available in additional related DRYAD datasets:




The full methods are available in the related publication.


