Skip to main content
Dryad

k-mer matrix Aegilops tauschii diversity panel (Open wild wheat consortium phase) Part 3/3

Cite this dataset

Cavalet-Giorsa, Emile et al. (2024). k-mer matrix Aegilops tauschii diversity panel (Open wild wheat consortium phase) Part 3/3 [Dataset]. Dryad. https://doi.org/10.5061/dryad.wm37pvmvd

Abstract

Wild wheat relatives of bread wheat represent genetic diversity that can be used for wheat crop improvement. We generated a k-mer presence/absence matrix of over 920 accessions of the wild wheat Aegilops tauschii, the donor of the bread wheat D genome. This dataset was generated under the aegis of Phase II of the Open Wild Wheat Consortium (www.openwildwheat.org).

README: k-mer matrix Aegilops tauschii diversity panel (Open Wild Wheat Consortium Phase II) PART 3/3

https://doi.org/10.5061/dryad.wm37pvmvd

Description of the data and file structure

k-mer presence/absence matrix for the Aegilops tauschii diversity panel of over 920 accessions, including genetically redundant accessions.

Resequencing of the Ae. tauschii accessions: We generated short-read whole genome sequencing data for 350 Ae. tauschii accessions.  PCR-free paired-end libraries were constructed and sequenced on an Illumina Novaseq 6000 instrument, yielding a median 8.3-fold coverage per sample (ranging from 5.87 to 16.86-fold).

k-mer matrix generation. We developed and implemented an optimised k-mer matrix workflow to generate a presence/absence k-mer matrix for large diversity panels (https://github.com/githubcbrc/KGWASMatrix). We counted k-mers (k = 51) in raw sequencing data for 350 accessions generated in this study, 306 accessions published by Gaurav et al. (2022), 275 accessions published by Zhou et al. (2021) and 24 accessions by Zhao et al. (2023). Accessions with less than 5-fold sequencing coverage were discarded to avoid affecting the k-mer count. k-mers were filtered by a minimum occurrence of six across accessions and a maximum occurrence of (N-6), being N the total number of accessions. Detailed information about this k-mer matrix is found in the related pre-print by Cavalet-Giorsa, Gonzalez-Munoz and Athiyannan et al. (2023) (https://doi.org/10.1101/2023.11.29.568958).

The full k-mer matrix has a total size of ~729G and has been divided into 25 gzipped files. The full k-mer matrix is tab-delimited in which the first column contains 10,078,115,665 *k-*mers and the remaining columns show the presence (1) or absence (0) across the accessions that are listed in order in the matrix_acc945_samples_list.txt file available in part 1/3 (doi:10.5061/dryad.p5hqbzkvx).

This dataset contains gzipped files 18 through 25 of the full k-mer matrix, as follow:

18_m.tsv.gz

19_m.tsv.gz

20_m.tsv.gz

21_m.tsv.gz

22_m.tsv.gz

23_m.tsv.gz

24_m.tsv.gz

25_m.tsv.gz

The first and second parts of the k-mer matrix are available in additional related DRYAD datasets:

doi:10.5061/dryad.p5hqbzkvx

doi:10.5061/dryad.wpzgmsbvm

Methods

The full methods are available in the related publication.

Funding

King Abdullah University of Science and Technology