Genomic characterisation and dissection of the onset of resistance to acetyl CoA carboxylase-inhibiting herbicides in a large collection of Digitaria insularis from Brazil
Cite this dataset
Sehgal, Deepmala et al. (2024). Genomic characterisation and dissection of the onset of resistance to acetyl CoA carboxylase-inhibiting herbicides in a large collection of Digitaria insularis from Brazil [Dataset]. Dryad. https://doi.org/10.5061/dryad.9s4mw6mpt
Abstract
An in-depth genotypic characterisation of a diverse collection of Digitaria insularis was undertaken to explore the neutral genetic variation across the natural expansion range of this weed species in Brazil. With the exception of Minas Gerais, populations from all other states showed high estimates of expected heterozygosity (HE > 0.60) and genetic diversity. There was a lack of population structure based on geographic origin and a low population differentiation between populations across the landscape as evidenced by an average Fst value of 0.02. On combining haloxyfop [acetyl CoA carboxylase (ACCase)-inhibiting herbicide] efficacy data with neutral genetic variation, we found evidence of the presence of two scenarios of resistance evolution in this weed species. Whilst populations originating from north-eastern region demonstrated an active role of gene flow, populations from the mid-western region displayed multiple, independent resistance evolution as the major evolutionary mechanism. A target-site mutation (Trp2027Cys) in the ACCase gene, observed in less than 1% of resistant populations, could not explain the reduced sensitivity of 15% of the populations to haloxyfop. The genetic architecture of resistance to ACCase-inhibiting herbicides was dissected using a genome-wide association study (GWAS) approach. GWAS revealed the association of three SNPs with reduced sensitivity to haloxyfop and clethodim. In silico analysis of these SNPs revealed important non-target site genes belonging to families involved in herbicide detoxification, including UDPGT91C1 and GT2, and genes involved in the vacuolar sequestration-based degradation pathway. Exploration of five genomic prediction models revealed that the highest prediction power (≥ 0.80) was achieved with the models Bayes A and RKHS, incorporating SNPs with additive effects and epistatic interactions, respectively.
README: Genomic characterization and GWAS of Digitaria insularis
https://doi.org/10.5061/dryad.9s4mw6mpt
An in-depth genotypic characterisation of a diverse collection of Digitaria insularis was undertaken to explore the neutral genetic variation across the natural expansion range of this weed species in Brazil. With the exception of Minas Gerais, populations from all other states showed high estimates of expected heterozygosity (HE > 0.60) and genetic diversity. There was a lack of population structure based on geographic origin and a low population differentiation between populations across the landscape as evidenced by average Fst value of 0.02. On combining haloxyfop [acetyl CoA carboxylase (ACCase)-inhibiting herbicide] efficacy data with neutral genetic variation, we found evidence of presence of two scenarios of resistance evolution in this weed species. Whilst populations originating from north-eastern region demonstrated an active role of gene flow, populations from the mid-western region displayed multiple, independent resistance evolution as the major evolutionary mechanism. A target-site mutation (Trp2027Cys) in the ACCase gene, observed in less than 1% of resistant populations, could not explain the reduced sensitivity of 15% of the populations to haloxyfop. Tthe genetic architecture of resistance to acetyl CoA carboxylase (ACCase)-inhibiting herbicides was dissected using a genome wide association study (GWAS) approach. GWAS revealed association of three SNPs with reduced sensitivity to haloxyfop and clethodim. In silico analysis of these SNPs revealed important non-target site genes belonging to families involved in herbicide detoxification, including UDPGT91C1 and GT2, and genes involved in vacuolar sequestration-based degradation pathway. Exploration of five genomic prediction models revealed that the highest prediction power (≥ 0.80) was achieved with the models Bayes A and RKHS, incorporating SNPs with additive effects and epistatic interactions, respectively.
Plant material
A total of 205 D. insularis weed populations were used in the study. The populations were collected from the most relevant soybean- (and corn) growing regions across Brazil inhabiting crop fields and crop margins in the southern, mid-western, south-eastern and north-eastern regions. Briefly, these 205 populations originated from seven states including Mato Grosso (77), Mato Grosso do Sul (10), Minas Gerais (19), Bahia (24), Goiás (20), Paraná (PR) and Rio Grande do Sul (18). The detail information on the geographic origin and geographic coordinates are provided in the supplementary table (Table S1). Some collection sites (1/5th of the collection) had a history of multiple herbicide treatments on the crop including glyphosate, ACCase inhibitors such as haloxyfop or clethodim alone or in mixtures and/or ALS inhibitors (Table S1). The populations were collected in 2020 and 2021 and were investigated for their sensitivity to ACCase inhibitors haloxyfop and clethodim at multiple dose rates by Weed Research team at Brazil Resistance Management Laboratory in Uberlândia, Brazil, as part of resistance monitoring studies.
Herbicide resistance screening of D. insularis collection
The rates used in the population screening were initially determined in a pilot study using ten D. insularis sensitive populations sampled in Brazil urban areas with no history of herbicide application. Informative herbicide rates were determined as 7.8; 15.0, 27.0 and 64.0 g a.i ha-1 for haloxyfop and 27.0, 54.0 and 108.0 g a.i ha-1 for clethodim.
Seeds from all D. insularis populations were germinated and planted in 1 L pots filled with a commercial substrate to produce around 15 plants per pot. Each combination of population and herbicide dose was replicated 3 times (45 plants tested per treatment) in a completely randomized design. The herbicide treatments were sprayed on plants at the 4 leaf-stage, in a spray chamber equipped with flat fan nozzles calibrated to deliver 200 L ha-1 at 200 kPa pressure. Plant control was evaluated 21 days after treatment, using a visual scale of 0 to 100%; 0% represents healthy plants and the absence of symptoms, and 100% represents the death of the plant.
DNA extraction, Genotyping-by-sequencing (GBS) library preparation and sequencing
Four pinches of seeds were sown in a punnet of size 18 cm x 6 cm filled with peat keeping 3cm between pinches. For each population, two such punnets were sown. The punnets were watered manually and put on trolleys in a glass house with controlled conditions (day temperature, 240C; night temperature, 180C; light, 16 hrs; humidity -65%). The punnets were watered daily for three weeks. After three weeks, a 1cm x 2cm of leaf sample was cut from 25 individual plants for each population and pooled into a 14ml falcon tube. The falcon tubes were stored in a −80 °C freezer until further manipulation.
The leaf samples were dried in a freeze dryer for three consecutive days and nights by keeping the shelves at a contact temperature of 1.0 0C and freezer at -60 0C. After freeze drying, the samples were shipped to LGC Genomics GmbH, Germany for DNA extraction, reduced representation library preparation and sequencing.
Genomic DNA extraction was performed from the pooled samples using the sbeadex™ maxi plant kit (LGC) on KingFisher Flex (after lysis step) followed by a spectrophotometric quantification step using Nano Drop 8000 (Thermo Fisher Scientific). Reduced representation library preparation was done by the standardized ddRAD protocol at LGC Genomics. Briefly, 100 ng of genomic DNA were digested with 2 units each of Apek I and Pst I enzymes (NEB) in 1 times NEB buffer 3.1 in 20μl volume for one hour at 37°C. The restriction enzymes were heat inactivated by incubation at 75°C for 60 min. The detailed protocol for the ligation reaction, library purification, amplification and normalization were performed according to the standardized ddRAD protocol at LGC Genomics, GmbH. The library was size selected on a LMP-Agarose gel, removing fragments smaller than 300 bp and those larger than 500 bp. Sequencing was done on an Illumina NovaSeq 6000 (150bp paired-end read).
Genotypes and SNP filtering
Demultiplexing of all libraries for each sequencing lane was done using the Illumina bcl2fastq v2.20 software. Demultiplexing of library groups into samples was done according to their inline barcodes and verification of restriction site. No mismatches or Ns were allowed in the inline barcodes, but Ns were allowed in the restriction site. Reads with final length < 20 bases were rejected and reads with 5’ ends not matching the restriction enzyme site were also discarded. The reads were quality trimmed at 3’-end to get a minimum average Phred quality score of 20 over a window of ten bases. The mapping of quality trimmed reads on the D. insularis reference genome v01.0 (available at Weedpedia, https://weedpedia.weedgenomics.org/) was done using BWA-MEM v0.7.12. One combined alignment file of all samples in the BAM format was used for variant discovery and genotyping of samples with Freebayes v1.2.0. Filtering of variants was done using the following GBS-specific rule set;
- The read count for a locus must exceed 8 reads
- Genotypes must have been observed in at least 66% of samples
- Minimum allele frequency across all samples must exceed 5%.
Genetic diversity and population differentiation
The genetic diversity indices expected heterozygosity (HE) and inbreeding coefficients (FIS) were calculated using the R packages ‘adegenet’ and ‘hierfstat’ (Jerome Goudet, 2005). The polymorphic information content (PIC) was calculated using an in-house R package. The two- and three-dimensional principal component analysis (PCA) was conducted using the R packages ‘stats’ and ‘rgl’. A Bayesian clustering approach implemented in the program STRUCTURE version 2.3.4 (Pritchard et al., 2000) was used to analyse population genetic structure by setting replication number to 10,000 for the burn-in and Markov Chain Monte Carlo (MCMC) iterations each and using options of admixture model and correlated allele frequencies. The number of subpopulations i.e., K was set from 1 to 7 and three independent runs were performed for each K. The Structure Harvester (https://taylor0.biology.ucla.edu/structureHarvester/) was used to analyze the results from the STRUCTURE software, which constructs a deltaK vs K plot using the method of Evanno et al. (2005). The weighted neighbour joining (NJ) was constructed in DARwin 6.0. (Perrier and Jacquemoud-Collet, 2006).
Target site mutations in ACCase
Two primer pairs were used to amplify the ACCase gene sequence in D. insularis: FE35332 Forward (5′-ATGTCCACTCCTGAATTCCCA-3′), FE35333 Reverse (5′-CATTCTGAGGGAAGTATCAT-3′). PCR was performed in 25 μL reaction volume containing 5.0 µL of GoTaq Buffer, 0.5 µL of 10 mM dNTPs, 1.5 µL of 25 mM MgCl2, 0.5 µL of 10 µM of each forward and reverse primers, 0.2 µL of GoTaq G2 Hot Start Polymerase (Promega), and 14.8 µL of ultrapure nuclease-free water (Sigma). PCR cycling conditions were: one cycle at 95 °C for 2 min, 35 cycles at 94 °C for 30 s, 58 °C for 30 s, 72 °C for 90 s, and final extension at 72 °C for 10 min. PCR product was run on 1.0% agarose gel to verify the amplicon size of 1.5kb. The amplified samples were purified and sequenced in a Genetic Analyzer 3500 instrument (Applied Biosystems, Thermo Fisher) following manufacturer’s instructions. Four individuals per population were sequenced using the original amplification primers and the following three internal sequencing forward primers: FE35334 Sequencing Forward 1 (5’- TGGGAGAGCAAAGCTTGGGGT-3’), FE35407 Sequencing Forward 2 (5’- GAAGTGCTGCTATTGCCAGTGC -3’) and FE35408 Sequencing Forward 3 (5’- GACCCACCAGACAGACCTGTTA -3’). The chromatograms were manually read using Bioedit version 7.2.5 software (Hall, 1999) to screen the seven known target-site mutations.
Genome wide association study (GWAS) and in silico analysis of significant SNPs
The normality of the resistance data scores was checked in PAST3 program (Hammer et al., 2001). The resistance scores data of haloxyfop at dose rate of 7.8 g a.i ha-1 and clethodim at dose rate of 27.0 g a.i ha-1 showed normal and near-normal distribution, respectively and hence were used in GWAS. Both general linear model (GLM) and mixed linear model (MLM) were used for GWAS in TASSEL software ver 5.0 (Bradbury et al., 2007). The kinship matrix was calculated using VanRaden algorithm (VanRaden, 2008) in the GAPIT package 2.0 (Lipka et al., 2012). In the GLM, PCA was used as a fixed variate and in the MLM, PCA and kinship matrices were used as fixed and random variates, respectively. The threshold to declare significant marker-trait associations (MTA) was ≥10−3 (log10p) after applying a correction for a false discovery rate (FDR) at p < 0.05.
The VCF file was annotated with SnpEff version 5.1 using the IWGC Digitaria insularis reference genome and annotation v01.0. In addition, the in-silico analysis of the significant SNPs was conducted using nucleotide Basic Local Alignment Search Tool (BLAST) in the EnsemblPlants database (https://plants.ensembl.org/index.html). The EnsemblPlants database has cDNA/transcript sequences of more than 80 monocots and dicots and of model plants, which were used to find the homologies. The genes found in the overlapping region and within 1.0 Mb upstream and downstream of the matched regions were selected as candidate genes. To determine their molecular functions, the protein sequences of the candidate genes were downloaded from EnsemblPlants database and used in protein BLAST analysis in NCBI server (https://blast.ncbi.nlm.nih.gov/Blast.cgi) and their molecular functions were determined after ascertaining their homologies with known proteins in grasses.
Genomic prediction models
Four parametric models (Ridge regression, Bayes A, Bayes B, Bayes C) and a non-parametric model (RKHS) were used and all these models are implemented in the ‘BGLR’ package (De Los Campos et al., 2022) in R. Ridge regression (RR) method considers common variance for all markers and shrinks the marker effects toward zero (Meuwissen et al., 2001). All Bayesian models do not consider the common variance of markers and incorporate additive genetic effects. The RKHS model uses a kernel function and captures non-additive effects (epistatic interactions). Either all available SNP markers were used or different sets of SNP markers were employed according to preliminary GWAS results. The SNP markers were ranked according to increasing p-values in GWAS analyses. Prediction accuracy of all models was calculated through “Pearson correlation coefficient” between observed and predicted values based on 100 iterations and 10-fold cross-validations.
Results
GBS marker distribution
A total of 105,699 single nucleotide polymorphisms (SNPs) were generated across 205 populations using the criterion of a minimum coverage to call a SNP. Of these, 5,238 SNPs were retained for genetic analysis after filtering for 5% minor allele frequency (MAF) and missing data 30% and removing markers that could not be mapped on the currently available reference genome of D. insularis on Weedpedia (https://weedpedia.weedgenomics.org/). The polymorphic information content (PIC) of the filtered SNPs varied from 0.10 to 0.38. Overall, SNPs showed good distribution with chromosomes S02 and S06 showing the highest (842) and the least (344) number of SNPs, respectively (Fig. 1). On an average, 624Mb of the physical genome was encompassed by these SNPs with the maximum genome coverage achieved on chromosome S01 (93.7Mb) and minimum (51.9Mb) on chromosome S09 (Table S2).
Within populations genetic diversity and population differentiation
The expected heterozygosity (HE) and polymorphic information content (PIC) values of the populations ranged from 0.56 to 0.64 and 0.20 to 0.37, with an average of 0.62 and 0.37, respectively (Table 1). Both parameters, HE and PIC, showed that Minas Gerais (MG) population was moderately diverse while most other populations showed high level of genetic diversity (HE > 0.60). The inbreeding coefficient (FIS) for all the seven populations was negative (Table 1) suggesting highly outcrossing nature of the populations under local environmental conditions.
The average Fst across all samples s was 0.020, indicating a low genetic differentiation among populations. The pairwise Fst values ranged from 0.003 (between populations MS and PR, GO and PR, MT and RS and PR and RS) to 0.064 (between MG and BA) (Table 2). As expected, a high gene flow (Nm) was detected among populations, varying from 3.65 to 83.08, with an overall average of 12.25. Interestingly, the lowest Nm estimates were obtained between MG and five populations (MT, MS, PR, RS and BA) (Table 2).
Principal component analysis (PCA) and STRUCTURE analysis
The PCA plot with genome wide 5,238 SNPs revealed two important features; i) all populations were scattered in the PCA plot and ii) populations did not show clusters based on their geographical origin (Fig. 2). The population structure explored by Bayesian STRUCTURE analysis revealed two subpopulations at the best K (K = 2) (Fig. S1a, b). Both subpopulations comprised of individuals from all seven populations (Fig. S1b). Only populations from MG had a defined cluster (cluster II) based on a cluster membership threshold of 0.80 (Fig. S1b), whereas all remaining populations were distributed between the two clusters. The weighted neighbour joining (NJ) tree confirmed that two groups comprised individuals from seven populations clustered randomly irrespective of their geographic origin (Fig. S2).
With the Fst outlier analysis, we detected 63 SNPs that showed Fst values higher than the average Fst (Fig. 3a). We used these 63 SNPs to redraw PCA to investigate if any geographical differentiation could be observed (Fig. 3b). Interestingly, using a subset of 63 SNPs (Fst > 0.03), MT population could be moderately differentiated from rest of the populations while the remaining six populations remained mixed (Fig. 3b). To understand the functional significance of this subtle geographical differentiation, we conducted an in-silico analysis of the sequences of the top fifteen outlier hits (Fst > 0.04) to identify candidate genes (Table S3). It was revealed that orthologs of the genes involved in growth and development and/or stress pathways showed hits with these sequences.
Resistance to ACCase inhibitors and population structure of resistant and sensitive populations
As part of the resistance monitoring program, the D. insularis populations were tested with, haloxyfop at 7.8, 15.0, 27.0 and 64.0 g a.i ha-1 ai/ha and clethodim at 27, 54 and 108 g a.i ha-1 ai/ha. A total of 29 (15.0%), 29 (15.0%), 7 (3.6%) and 6 (3.1%) populations were found to show less than 90% control with haloxyfop at 7.8, 15.0, 27.0 and 64.0 g a.i ha-1, respectively. For clethodim, 16 (8.2%), 8 (4.1%) and 4 (2.0%) populations were found to be controlled at less than 90% at doses 27.0, 54.0 and 108.0 g a.i ha-1, respectively. There were no significant differences in the sensitivity scores for the two lower doses of haloxyfop i.e., 7.8 and 15.0 g a.i ha-1, hence sensitivity scores from the dose 7.8 g a.i ha-1 were used in all further analysis. The state wise distribution showed that more than 50% of populations with reduced sensitivity to the haloxyfop dose rate 7.8 g a.i ha-1 were from MG state followed by MT (31%) (Fig S3a). For higher doses of haloxyfop and for all the three doses of clethodim, populations with reduced sensitivity were from PR and MS states (Fig. S3b). The efficacy scores at 7.8 g a.i ha-1 dose of haloxyfop and 27.0 g a.i ha-1 of clethodim showed a normal and near-normal distribution, respectively (Fig. S3 c,d).
Sequencing of the ACCase gene in all samples revealed that only two populations (BR-20-Din-144 and BR-21-Din-046) showed the presence of Trp2027Cys mutation. Whilst both populations showed reduced sensitivity to all the four doses of haloxyfop (7.8, 15.0, 27.0 and 64.0 g a.i ha-1), only one exhibited reduced sensitivity to the two doses (27.0 and 54.0 g a.i ha-1) of clethodim. The four populations (BR-21-Din-417, BR-21-Din-419, BR-21-Din-421 and BR-21-Din-423), that showed reduced sensitivity to the higher doses of haloxyfop (27.0 and 64.0 g a.i ha-1) and to all the three doses of clethodim (27.0, 54.0 and 108.0 g a.i ha-1), did not show the Trp2027Cys mutation. None of the other known mutations in this gene were identified in the resistant populations.
To understand whether the high gene flow observed in the species is shaping the genetic structure of resistance, we investigated the population genetic structure of a subset of 65 populations, contrasting for resistance or reduced sensitivity to 7.8 g a.i ha-1 of haloxyfop regardless of geographic origin. Only 7.8 g a.i ha-1 dose rate was selected for these analyses as this dose rate showed a good size of populations with reduced sensitivity (29) which could be compared with sensitive populations (36). The other dose rates produced too few populations showing reduced sensitivity (<8) and hence were not analysed further. The PCA plot showed that resistant and sensitive populations were interspersed, and no clear groups were observed (Fig. 4a), which led us to suggest that the resistance has evolved independently across the landscape. We then drew weighted NJ trees of populations from MG and MT regions separately (Fig. 4b, c). For MG, the tree showed distinction between resistant and sensitive populations but for MT, resistant and sensitive populations were interspersed.
Genome wide association study (GWAS) and genomic prediction models
Since efficacy scores at 7.8 g a.i ha-1 dose of haloxyfop and 27.0 g a.i ha-1 of clethodim showed a normal and near-normal distribution, respectively (Fig S3 c,d), characteristics of a quantitative trait, a pilot GWAS study was conducted to identify candidate genes associated with reduced efficacy to haloxyfop and clethodim. Three and two SNPs were found associated with reduced efficacy with haloxyfop in GLM and MLM models, respectively (Fig. 5). Two SNPs, S02_45439268 and S03_5213977, were common in both models (Table 3). S02_45439268 on chromosome 2 explained the highest percentage variation (R2) of 26.6% (p value = 3.13 E-06), while S03_5213977 on chromosome 3 explained 14.6% R2 (p value = 9.71 E-04). In silico analysis (BLAST analysis using reference genomes/cDNA sequences of D. exilis or model species) of these SNPs revealed that the SNP S02_45439268 was located 0.1Mb upstream of a D. exilis gene Dexi2A01G0009650. The protein BLAST analysis of the protein sequence of this gene showed homologies with UDP-glycosyltransferase 91C1-like protein (UDPGT91C1) of Panicum virgatum, Sorghum bicolor, Oryza glaberrima and Setaria italica. The SNP S03_5213977 showed homologies with the galactinol--sucrose galactosyltransferase (GT) 2 in Hordeum vulgare, Aegilops tauschii, Triticum urartu and Lolium arundinaceum.
For efficacy shift to clethodim, four and one SNP were identified with GLM and MLM, respectively (Fig. 6). The four SNPs with the GLM model explained 11.3 to 16.6% of R2, while the SNP identified with the MLM explained 24.0% of R2 (Table 3). Interestingly, two SNPs were located within 0.1Mb of potential NTSR genes. The SNP S05_47637569 was located within 0.1Mb of D. exilis gene Dexi1A01G0019350. The BLASTP analysis of protein sequence of this gene showed homologies with vacuolar protein sorting (VPS)-associated protein 20 of Panicum virgatum, Setaria italica, Zea mays and Lolium rigidum. The SNP S08_15270772 is located within 0.1Mb of D. exilis gene Dexi9A01G0037640, the protein sequence of which showed homologies with CMP-sialic acid transporter 2 of Setaria italica, Panicum virgatum and Sorghum bicolor.
We further investigated the potential of genomic prediction (GP) models to predict NTSR resistance in the panel using resistance data scores at haloxyfop dose of 7.8 g a.i ha-1. We tested five models, Ridge Regression, Bayes A, Bayes B, Bayes C and RKHS, on different subsets of SNPs selected either based on GWAS hits or complete set of 5,238 SNPs (Fig. 7). Although predictions based on models having 50 SNPs from GWAS showed more than 50% prediction accuracy, the highest accuracy was achieved with complete set of 5,238 SNPs with better prediction accuracy from Bayes A or RKHS models.
Description of the data and file structure
Title of Dataset: Genomic characterisation and dissection of the onset of resistance to acetyl CoA carboxylase-inhibiting herbicides in a large collection of Digitaria insularis from Brazil
Date of data collection: 2022-11-25 to 2023-03-15
Geographic location of data collection: Jealott’s Hill International Research Centre, Bracknell, United Kingdom
Information about funding sources that supported the collection of the data: No external funding received for this project
DATA & FILE OVERVIEW
File List: Phenotype_Data, have the phenotypic values of 193 genotypes for reduced sensitivity to haloxyfop (7.8 g a.i./ha) and clethodim (27.0 g a.i./ha)
Are there multiple versions of the dataset?
If yes, name of file(s) that was updated: Nil
METHODOLOGICAL INFORMATION
Description of methods used for generation of data: Seeds from all D. insularis populations were sown in 1 L pots filled with a commercial substrate and after germination 15 plants per pot were kept. Each combination of population and herbicide dose was replicated 3 times (45 plants tested per treatment) in a completely randomized design. Plants were sprayed at the 4 leaf-stage and were evaluated 21 days after treatment, using a visual scale of 0%–100%;
0% represents healthy plants and the absence of symptoms, and 100% represents the death of the plant. For genotyping, 25 leaves from 25 individuals of a population were pooled and this was done for all populations investigated. The populations were genotyped by genotyping-by-sequencing (LGC Genomics Ltd., Germany) generating 105,699 genome-wide SNPs. Of these, 5,238 SNPs were retained for genetic analysis after filtering for 5% minor allele frequency (MAF)
and missing data 30% and removing markers that could not be mapped on the currently available reference genome of D. insularis on Weedpedia (https://weedpedia.weedgenomics.org/). The normality of the resistance data scores was checked in PAST3 program. Principal component analysis and population structure analysis were done using R package 'rgl' and STRUCTURE program, respectively. Trait Analysis by aSSociation Evolution and Linkage (TASSEL) version 5.0 was used for GWAS.
The kinship matrix was calculated using VanRaden algorithm in the GAPIT(Genome Association and Prediction Integrated Tool)package 3.0 in the R environment.
Describe any quality-assurance procedures performed on the data: Genotyping data was checked for missing data and heterozygosity and only the best high quality SNPs were used for population genetics and GWAS. Populations producing more than 40% missing data were removed from the final analysis.
People involved with sample collection, processing, analysis and/or submission: Deepmala Sehgal, Claudia Oliveira, Sandra Mathioni, Stephanie Widdison.
DATA-SPECIFIC INFORMATION FOR: [Phenotypic data]
Number of variables: 2
Number of cases/rows: 193
Variable List: reduced sensitivity to haloxyfop (7.8 g a.i./ha) and clethodim (27.0 g a.i./ha)
Missing data codes: NA
Specialized formats or other abbreviations used: Haloxyfop-7.8g, Clethodim-27.0 g
DATA-SPECIFIC INFORMATION FOR: [Geno-5.2K]
Number of variables: 5238
Number of cases/rows: 193
Variable List: 5238 SNPs
Missing data codes: N
Specialized formats or other abbreviations used: A - Adinine, G - Guanine, C - Cytosine, T - Thymine, N - Missing, R = A or G, K = G or T, S = G or C, Y = C or T, M = A or C, W = A or T
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: Nil
Links to other publicly accessible locations of the data: NA
Links/relationships to ancillary data sets: Nil
Was data derived from another source?
If yes, list source(s): Nil
Links to publications that cite or use the data: https://www.frontiersin.org/articles/10.3389/fgene.2024.1340852/
Recommended citation for this dataset: Sehgal et al.(2024), Data from: Genomic characterisation and dissection of the onset of resistance to acetyl CoA carboxylase-inhibiting herbicides in a large collection of Digitaria insularis from Brazil
Code/Software
The normality of the resistance data scores was checked in PAST3 program. Principal component analysis and population structure analysis were done using R package 'rgl' and STRUCTURE program, respectively. Trait Analysis by aSSociation Evolution and Linkage (TASSEL) version 5.0 was used for GWAS.
The kinship matrix was calculated using VanRaden algorithm in the GAPIT(Genome Association and Prediction Integrated Tool)package 3.0 in the R environment.
Methods
Plant material
A total of 205 D. insularis weed populations were used in the study. The populations were collected from the most relevant soybean- (and corn) growing regions across Brazil inhabiting crop fields and crop margins in the southern, mid-western, south-eastern, and north-eastern regions. Briefly, these 205 populations originated from seven states including Mato Grosso (77), Mato Grosso do Sul (10), Minas Gerais (19), Bahia (24), Goiás (20), Paraná (PR) and Rio Grande do Sul (18). The detailed information on the geographic origin and geographic coordinates are provided in the supplementary table (Table S1). Some collection sites (1/5th of the collection) had a history of multiple herbicide treatments on the crop including glyphosate, ACCase inhibitors such as haloxyfop or clethodim alone, or in mixtures and/or ALS inhibitors (Table S1). The populations were collected in 2020 and 2021 and were investigated for their sensitivity to ACCase inhibitors haloxyfop and clethodim at multiple dose rates by the Weed Research team at Brazil Resistance Management Laboratory in Uberlândia, Brazil, as part of resistance monitoring studies.
Herbicide resistance screening of D. insularis collection
The rates used in the population screening were initially determined in a pilot study using ten D. insularis-sensitive populations sampled in Brazil's urban areas with no history of herbicide application. Informative herbicide rates were determined as 7.8; 15.0, 27.0, and 64.0 g a.i ha-1 for haloxyfop and 27.0, 54.0, and 108.0 g a.i ha-1 for clethodim.
Seeds from all D. insularis populations were germinated and planted in 1 L pots filled with a commercial substrate to produce around 15 plants per pot. Each combination of population and herbicide dose was replicated 3 times (45 plants tested per treatment) in a completely randomized design. The herbicide treatments were sprayed on plants at the 4 leaf-stage, in a spray chamber equipped with flat fan nozzles calibrated to deliver 200 L ha-1 at 200 kPa pressure. Plant control was evaluated 21 days after treatment, using a visual scale of 0 to 100%; 0% represents healthy plants and the absence of symptoms, and 100% represents the death of the plant.
DNA extraction, Genotyping-by-sequencing (GBS) library preparation and sequencing
Four pinches of seeds were sown in a punnet of size 18 cm x 6 cm filled with peat keeping 3cm between pinches. For each population, two such punnets were sown. The punnets were watered manually and put on trolleys in a glass house with controlled conditions (day temperature, 240C; night temperature, 180C; light, 16 hrs; humidity -65%). The punnets were watered daily for three weeks. After three weeks, a 1cm x 2cm of leaf sample was cut from 25 individual plants for each population and pooled into a 14ml falcon tube. The falcon tubes were stored in a −80 °C freezer until further manipulation.
The leaf samples were dried in a freeze dryer for three consecutive days and nights by keeping the shelves at a contact temperature of 1.0 0C and the freezer at -60 0C. After freeze drying, the samples were shipped to LGC Genomics GmbH, Germany for DNA extraction, reduced representation library preparation, and sequencing.
Genomic DNA extraction was performed from the pooled samples using the sbeadex™ maxi plant kit (LGC) on KingFisher Flex (after the lysis step) followed by a spectrophotometric quantification step using Nano Drop 8000 (Thermo Fisher Scientific). Reduced representation library preparation was done by the standardized ddRAD protocol at LGC Genomics. Briefly, 100 ng of genomic DNA were digested with 2 units each of Apek I and Pst I enzymes (NEB) in 1 times NEB buffer 3.1 in 20μl volume for one hour at 37°C. The restriction enzymes were heat-inactivated by incubation at 75°C for 60 min. The detailed protocol for the ligation reaction, library purification, amplification, and normalization were performed according to the standardized ddRAD protocol at LGC Genomics, GmbH. The library was size selected on a LMP-Agarose gel, removing fragments smaller than 300 bp and those larger than 500 bp. Sequencing was done on an Illumina NovaSeq 6000 (150bp paired-end read).
Genotypes and SNP filtering
Demultiplexing of all libraries for each sequencing lane was done using the Illumina bcl2fastq v2.20 software. Demultiplexing of library groups into samples was done according to their inline barcodes and verification of the restriction site. No mismatches or Ns were allowed in the inline barcodes, but Ns were allowed in the restriction site. Reads with final length < 20 bases were rejected and reads with 5’ ends not matching the restriction enzyme site were also discarded. The reads were quality trimmed at 3’-end to get a minimum average Phred quality score of 20 over a window of ten bases. The mapping of quality trimmed reads on the D. insularis reference genome v01.0 (available at Weedpedia, https://weedpedia.weedgenomics.org/) was done using BWA-MEM v0.7.12. One combined alignment file of all samples in the BAM format was used for variant discovery and genotyping of samples with Freebayes v1.2.0. Filtering of variants was done using the following GBS-specific rule set;
1. The read count for a locus must exceed 8 reads
2. Genotypes must have been observed in at least 66% of samples
3. Minimum allele frequency across all samples must exceed 5%.
Genetic diversity and population differentiation
The genetic diversity indices expected heterozygosity (HE) and inbreeding coefficients (FIS) were calculated using the R packages ‘adegenet’ and ‘hierfstat’ (Jerome Goudet, 2005). The polymorphic information content (PIC) was calculated using an in-house R package. The two- and three-dimensional principal component analysis (PCA) was conducted using the R packages ‘stats’ and ‘rgl’. A Bayesian clustering approach implemented in the program STRUCTURE version 2.3.4 (Pritchard et al., 2000) was used to analyse population genetic structure by setting the replication number to 10,000 for the burn-in and Markov Chain Monte Carlo (MCMC) iterations each and using options of admixture model and correlated allele frequencies. The number of subpopulations i.e., K was set from 1 to 7 and three independent runs were performed for each K. The Structure Harvester (https://taylor0.biology.ucla.edu/structureHarvester/) was used to analyze the results from the STRUCTURE software, which constructs a deltaK vs K plot using the method of Evanno et al. (2005). The weighted neighbour joining (NJ) was constructed in DARwin 6.0. (Perrier and Jacquemoud-Collet, 2006).
Target site mutations in ACCase
Two primer pairs were used to amplify the ACCase gene sequence in D. insularis: FE35332 Forward (5′-ATGTCCACTCCTGAATTCCCA-3′), FE35333 Reverse (5′-CATTCTGAGGGAAGTATCAT-3′). PCR was performed in 25 μL reaction volume containing 5.0 µL of GoTaq Buffer, 0.5 µL of 10 mM dNTPs, 1.5 µL of 25 mM MgCl2, 0.5 µL of 10 µM of each forward and reverse primers, 0.2 µL of GoTaq G2 Hot Start Polymerase (Promega), and 14.8 µL of ultrapure nuclease-free water (Sigma). PCR cycling conditions were: one cycle at 95 °C for 2 min, 35 cycles at 94 °C for 30 s, 58 °C for 30 s, 72 °C for 90 s, and final extension at 72 °C for 10 min. PCR product was run on 1.0% agarose gel to verify the amplicon size of 1.5kb. The amplified samples were purified and sequenced in a Genetic Analyzer 3500 instrument (Applied Biosystems, Thermo Fisher) following the manufacturer’s instructions. Four individuals per population were sequenced using the original amplification primers and the following three internal sequencing forward primers: FE35334 Sequencing Forward 1 (5’- TGGGAGAGCAAAGCTTGGGGT-3’), FE35407 Sequencing Forward 2 (5’- GAAGTGCTGCTATTGCCAGTGC -3’) and FE35408 Sequencing Forward 3 (5’- GACCCACCAGACAGACCTGTTA -3’). The chromatograms were manually read using Bioedit version 7.2.5 software (Hall, 1999) to screen the seven known target-site mutations.
Genome-wide association study (GWAS) and in silico analysis of significant SNPs
The normality of the resistance data scores was checked in PAST3 program (Hammer et al., 2001). The resistance scores data of haloxyfop at a dose rate of 7.8 g a.i ha-1 and clethodim at a dose rate of 27.0 g a.i ha-1 showed normal and near-normal distribution, respectively, and hence were used in GWAS. Both the general linear model (GLM) and mixed linear model (MLM) were used for GWAS in TASSEL software ver 5.0 (Bradbury et al., 2007). The kinship matrix was calculated using VanRaden algorithm (VanRaden, 2008) in the GAPIT package 2.0 (Lipka et al., 2012). In the GLM, PCA was used as a fixed variate and in the MLM, PCA and kinship matrices were used as fixed and random variates, respectively. The threshold to declare significant marker-trait associations (MTA) was ≥10−3 (log10p) after applying a correction for a false discovery rate (FDR) at p < 0.05.
The VCF file was annotated with SnpEff version 5.1 using the IWGC Digitaria insularis reference genome and annotation v01.0. In addition, the in-silico analysis of the significant SNPs was conducted using nucleotide Basic Local Alignment Search Tool (BLAST) in the EnsemblPlants database (https://plants.ensembl.org/index.html). The EnsemblPlants database has cDNA/transcript sequences of more than 80 monocots and dicots and of model plants, which were used to find the homologies. The genes found in the overlapping region and within 1.0 Mb upstream and downstream of the matched regions were selected as candidate genes. To determine their molecular functions, the protein sequences of the candidate genes were downloaded from EnsemblPlants database and used in protein BLAST analysis in NCBI server (https://blast.ncbi.nlm.nih.gov/Blast.cgi) and their molecular functions were determined after ascertaining their homologies with known proteins in grasses.
Genomic prediction models
Four parametric models (Ridge regression, Bayes A, Bayes B, Bayes C) and a non-parametric model (RKHS) were used and all these models are implemented in the ‘BGLR’ package (De Los Campos et al., 2022) in R. Ridge regression (RR) method considers common variance for all markers and shrinks the marker effects toward zero (Meuwissen et al., 2001). All Bayesian models do not consider the common variance of markers and incorporate additive genetic effects. The RKHS model uses a kernel function and captures non-additive effects (epistatic interactions). Either all available SNP markers were used or different sets of SNP markers were employed according to preliminary GWAS results. The SNP markers were ranked according to increasing p-values in GWAS analyses. The prediction accuracy of all models was calculated through “Pearson correlation coefficient” between observed and predicted values based on 100 iterations and 10-fold cross-validations.