SNP alleles for pooled DNA samples of Vicia villosa targeting QTL for pod dehiscence and seed dormancy
Data files
Feb 25, 2024 version files 683.25 MB
-
README.md
-
VetchPoolDNA_Dosage.csv
-
VetchPoolDNA_MarkerInformation.csv
-
VetchPoolDNA_Performance.csv
Abstract
Hairy vetch (Vicia villosa Roth) is a promising legume cover crop, but its use is limited by high rates of pod dehiscence and seed dormancy. We used phenotypically contrasting pooled DNA samples (n=24 with 29-74 individuals per sample) from an ongoing cover crop breeding program across four environments (site-year combinations: Maryland 2020, Maryland 2022, Wisconsin 2021, Wisconsin 2022) to find genetic associations and genomic prediction accuracies for pod dehiscence and seed dormancy. We also combined pooled DNA sample genetic association results with the results of a prior genome-wide association study. Genomic prediction resulted in positive predictive abilities for both traits between environments and with an independent dataset (0.34-0.50), but reduced predictive ability for DNA pools with divergent seed dormancy in the Maryland environments (0.07-0.15). The pooled DNA samples found six significant (false discovery rate q-value<0.01) quantitative trait loci (QTL) for seed dormancy and four significant QTL for pod dehiscence. Unfortunately, the minor alleles of the pod dehiscence QTL increased the rate of pod dehiscence and are not useful for marker-assisted selection. When combined with a prior association study, sixteen seed dormancy QTL and zero pod dehiscence QTL were significant. Combining the association studies did not increase the detection of useful QTL.
README: SNP alleles for pooled DNA samples of Vicia villosa targeting QTL for pod dehiscence and seed dormancy
https://doi.org/10.5061/dryad.s4mw6m9d6
Description of the data and file structure
VetchPoolDNA_MarkerInformation.csv contains columns indicating the reference and alternate alleles for each SNP. It also indicates the percent individuals with missing date in the individual sequencing dataset for that SNP. Similarly, there is a column with the mean read depth within the sequencing of pooled DNA.
VetchPoolDNA_Dosage.csv contains allele dosages with columns representing sites and rows representing individuals or DNA pools (ID column). 99 indicates missing data. Pool allele frequencies are the mean allele read frequency! Therefore, these are on different scales (individuals 0-2 and DNA pools 0-1).
VetchPoolDNA_Performance.csv contains the pod dehiscence rates, seed dormancy rates, and number of individuals with phenotypic information included in each pool.
Sharing/Access information
Individuals in this file are identical to those measured in the prior published GWAS. DOI: 10.5061/dryad.z34tmpgm1
##
Methods
This study sampled individuals evaluated during ongoing breeding efforts of the Cover Crop Breeding Network. Four environments were selected which had sufficient population size and contained meaningful variability in pod dehiscence and seed dormancy. For the remainder of this study, I will refer to each environment using an abbreviation and final two digits of the harvest year (i.e. Beltsville, MD 2020 and 2022: 20MD, 22MD, Prairie du Sac, WI 2021 and 2022: 21WI and 22WI). The 20MD breeding nursery was planted on Sept 27th 2019 (39°01′50″ N, 76°55′59″ W, Russett–Christiana complex soil) into a tilled field which was broadcast seeded with turf red fescue (Festuca rubra L.; 340 kg ha-1). On November 5th 2019, 20MD was sprayed with Raptor (ammonium salt of imazamox; 0.37 L ha−1) to control winter weeds. For 20MD, pods were collected from June 23rd 2020 to June 30th 2020 (harvest varies with pod maturity of individual genotypes, see below). The 22MD nursery was planted at the same location as 20MD on October 7th, 2021 and pods were collected from June 29th to July 5th 2022. The 22MD nursery used black plastic covered raised beds to control weed pressure. The 21WI nursery was planted on September 23rd 2020 (43°20′55″ N, 89°45′18″ W, Richwood silt loam soil) into landscape fabric. The 22WI nursery was planted in early October on the same location as 21WI into landscape fabric. For 22WI, pods were collected between July 20th and August 5th 2022. Prior to establishment, all environments were supplemented with lime, K and P based on soil test results. Additional details on the breeding program goals and methods can be found in prior publications (Kucek et al., 2019; Kissing Kucek et al., 2020b; Tilhou et al., 2023).
Breeding site-years consisted of direct-seeded spaced individual hairy vetch plants (20MD: n=3696; 21W1: n=1200; 22MD: n=3648; 22WI: n=1200) which were visually evaluated for fall vigor, spring vigor, and plant maturity (Kalu and Fick, 1981). Since hairy vetch is out-crossing, one round of selection based on vigor occurs in late spring prior to cross pollination. Only selected individuals are allowed to cross pollinate and are then evaluated for seed production characteristics, which includes pod dehiscence and seed dormancy. Selection intensity prior to flowering varied from 3.3 to 47% in each environment, allowing the following number of individuals to cross pollinate: 20MD: n=124, 21WI: n=560, 22MD: n=155, and 22WI: n=363.
Pods were collected for dormancy and dehiscence evaluations at ripe seed pod stage according to Kalu and Fick (1981). Pod dehiscence was measured using a visual score with one measurement assigned to each pod (targeting a minimum of 50 pods per individual; Kissing Kucek et al., 2020b). During 2020, visual scores were on a 0-3 scale (described in Kissing Kucek et al., 2020b) while 2021 and 2022 used a 0 or 1 scale (0: pod is closed enough that a seed could not fall out, 1: pod is open enough that a seed could fall out). Green, flat, or immature pods were discarded prior to scoring. Seed dormancy was measured as a proportion of 25 seeds which imbibed water after 7 d per individual plant, with three replicates per individual (detailed methods in Kissing Kucek et al., 2020a). Seeds which did not imbibe water after 7 d were scarified and observed after an additional 7 d to determine seed viability for environments collected in 2020. For 2021 and 2022 environments, hard seeds were determined to be viable and not scarified. Dead seeds were not included in the dormant seed proportion.
Subsequent field selection for disease resistance and seed production resulted in smaller population sizes available for pod dehiscence ratings, seed dormancy ratings, and DNA pool construction (20MD: n=115; 20W1: n=206; 22MD: n=109; 22WI: n=287). Hairy vetch tissue from these individuals were collected for sequencing during active vegetative growth in mid-summer. Each leaf sample was placed into a labeled coin envelope and immediately placed on ice before transport to a laboratory freezer (-20°C) and stored until freeze drying.
For each location, trait-based pooled DNA samples were constructed from stored leaf samples using the best 25% and worst 25% performance for pod dehiscence and seed dormancy. In addition, one random pooled DNA sample was constructed for each environment from a random sample of 25% of the population size. Individuals included in a trait-based pooled DNA sample were not included in the random sample. Pool construction was achieved by combining equal sized leaf tissue from each individual prior to pulverizing (Craig et al., 2009). A subsample of homogenized samples was then used to extract DNA. To help validate this method, four technical replicates were created from four randomly selected tissue samples.
Sequencing and SNP filtering
In total, 24 DNA pools were submitted for sequencing (five pools [high pod dehiscence, low pod dehiscence, random, high seed dormancy, and low seed dormancy] by four environments [21MD, 22MD, 21WI, and 22 WI] and four technical replicates. The University of Wisconsin Biotechnology Center prepared libraries for genotype-by-sequencing using an NsiI-BfaI double digestion restriction enzyme digestion. Fragments were then ligated to barcoded adaptors prior to polymerase chain reaction amplification and sequencing on an Illumina sequencer (NovaSeq 6000) targeting a 20 million reads per sample (mean of approximately 15x coverage).
Bioinformatics processing was completed at the University of Wisconsin Biotechnology Center using the TASSEL analysis platform (Glaubitz et al., 2014) in parallel with re-calling single nucleotide polymorphisms (SNPs) from the Tilhou et al. 2023 GWAS panel which are hairy vetch individuals evaluated in Oregon and Texas in 2019 (n=869). Barcoded sequence read outputs were collapsed into a set of unique sequence tags with counts. These tags were aligned to the reference genome (V. villosa v1.1; Fuller et al., 2023). Each tag was assigned to a position with the best unique alignment, and the occupancies of tags for each sample were observed from barcode data. For pooled DNA samples, allele states were analyzed as two-times alternate allele frequencies (continuous 0-2) and individual DNA samples were analyzed as alternate allele dosages (0, 1, or 2). Overall, 2,877,384 SNPs were present prior to filtering. Of these, only 2,105,338 SNPs were mapped to the seven main chromosome fragments. Within these, 165,057 of the remainder had >30 read depths within the pooled DNA samples. This cut-off was based on the accuracy among technical replicates at varying read depths. At this point, the SNPs had 14.8% of missing values which were imputed to the mean values for the marker. Last, SNPs were removed with <0.025 minor allele frequency, resulting in 122,801 SNPs.