k-mer matrix Aegilops tauschii diversity panel (Open wild wheat consortium phase II) Part 1/3
Data files
Jun 13, 2024 version files 271.57 GB
-
0_m.tsv.gz
29.75 GB
-
1_m.tsv.gz
30.21 GB
-
2_m.tsv.gz
30.29 GB
-
3_m.tsv.gz
30.18 GB
-
4_m.tsv.gz
30.26 GB
-
5_m.tsv.gz
30.17 GB
-
6_m.tsv.gz
30.30 GB
-
7_m.tsv.gz
30.18 GB
-
8_m.tsv.gz
30.23 GB
-
matrix_acc945_samples_list.txt
7.26 KB
-
README.md
2.47 KB
Abstract
Wild wheat relatives of bread wheat represent genetic diversity that can be used for wheat crop improvement. We generated a k-mer presence/absence matrix of over 920 accessions of the wild wheat Aegilops tauschii, the donor of the bread wheat D genome. This dataset was generated under the aegis of Phase II of the Open Wild Wheat Consortium (www.openwildwheat.org).
https://doi.org/10.5061/dryad.p5hqbzkvx
Description of the data and file structure
k-mer presence/absence matrix for the Aegilops tauschii diversity panel of over 920 accessions, including genetically redundant accessions.
Resequencing of the Ae. tauschii accessions: We generated short-read whole genome sequencing data for 350 *Ae. tauschii *accessions. PCR-free paired-end libraries were constructed and sequenced on an Illumina Novaseq 6000 instrument, yielding a median 8.3-fold coverage per sample (ranging from 5.87 to 16.86-fold).
k-mer matrix generation. We developed and implemented an optimised k-mer matrix workflow to generate a presence/absence k-mer matrix for large diversity panels (https://github.com/githubcbrc/KGWASMatrix). We counted k-mers (k = 51) in raw sequencing data for 350 accessions generated in this study, 306 accessions published by Gaurav et al. (2022), 275 accessions published by Zhou et al. (2021) and 24 accessions by Zhao et al. (2023). Accessions with less than 5-fold sequencing coverage were discarded to avoid affecting the k-mer count. k-mers were filtered by a minimum occurrence of six across accessions and a maximum occurrence of (N-6), being N the total number of accessions. Detailed information about this k-mer matrix is found in the related pre-print by Cavalet-Giorsa, Gonzalez-Munoz and Athiyannan et al. (2023) (https://doi.org/10.1101/2023.11.29.568958).
The full k-mer matrix has a total size of ~729G and has been divided into 25 gzipped files. The full k-mer matrix is tab-delimited in which the first column contains 10,078,115,665 k-mers and the remaining columns show the presence (1) or absence (0) across the accessions that are listed in order in the matrix_acc945_samples_list.txt file.
This dataset contains the first 9 gzipped files of the full k-mer matrix, as follow:
matrix_acc945_samples_list.txt : ordered list of Aegilops tauschii accession IDs as contained in the matrix
0_m.tsv.gz
1_m.tsv.gz
2_m.tsv.gz
3_m.tsv.gz
4_m.tsv.gz
5_m.tsv.gz
6_m.tsv.gz
7_m.tsv.gz
8_m.tsv.gz
The second and third parts of the k-mer matrix are available in additional related DRYAD datasets:
doi:10.5061/dryad.wpzgmsbvm
doi:10.5061/dryad.wm37pvmvd
The full methods are available in the related publication.