Landscape genomics reveals genetic signals of environmental adaptation of African wild eggplant
Data files
Sep 18, 2023 version files 45.35 MB
-
README.md
-
Wild_African_eggplant_sample_details_and_environmental_data.csv
-
Wild_African_eggplant_SNP_dataset.vcf
Abstract
Crop wild relatives possess desirable traits that confer resilience to various environmental stresses. We applied landscape genomics, that associates environment with genomic variation to understand the genetic basis of their adaptation.
In this study, we applied landscape genomics to examine the differences in allele frequency of 15,416 Single Nucleotide Polymorphisms (SNPs) among 153 accessions of wild eggplant relatives from Africa, the principal hotspot of these wild relatives. Further, we explored the correlation between the genetic variations and the bio-climatic and soil conditions at their collection sites.
Our results showed that the environment has a greater impact on the genetic variation in the eggplant wild relative populations compared to the geographical distances between collection sites while controlling for population structure. These findings indicate the relevance of the environment in shaping genetic variation in eggplant relatives over time. We detected also candidate SNPs associated with ten environmental factors. Some of these SNPs signal genes involved in pathways that help with adaptation to environmental stresses such as drought, heat, cold, salinity, pests, and diseases.
README: Landscape genomics reveals genetic signals of environmental adaptation of African wild eggplant
Description of the data and file structure
File list:
- Wild African eggplant SNP dataset.vcf
- Wild African eggplant sample details and environmental data.csv
Data specific information
1. Wild African eggplant SNP dataset.vcf
The data lines in the .vcf file include
- CHROM - chromosome: An identifier from the reference genome
- POS - position: The reference position, with the 1st base having position 1. Positions are sorted numerically, in increasing order, within each reference sequence CHROM
- ID - identifier: Colon-separated list of unique identifiers for the SNPs
- REF - reference base(s)
- ALT - alternate base(s): A list of alternate non-reference alleles.
- QUAL - quality: Phred-scaled quality score for the assertion made in ALT
- FILTER - filter status: PASS if this position has passed all filters, i.e., a call is made at this position
- INFO - additional information
- The genotype names e.g. aethiopicum1 are the accessions used in the analysis. The names are derived from the species names followed by a number denoting the sequence of the genotype in the order of the species. The genotype, even within the same species, maybe from different populations. Further information about the genotypes is found in the following file.
2. Wild African eggplant sample details and environmental data.csv
- Number of genotypes (rows): 153
- Genotype details (rows 1-7):
Genotype: sequencing code;
VINO: variety identification number at the WorldVeg genebank in Taiwan;
Taxon: genotype name derived from the species name and a sequential number in the list (same as the genotype names on the vcf file);
lon: longitude;
lat: latitude;
Country: Country ISO codes for the country of origin of the accessions.
The country codes include: TZA- Tanzania, KEN- Kenya, UGA- Uganda, SDN- Sudan, NGA- Nigeria, and GHA- Ghana.
- Number of environmental variables: 10
PDrM_14: Precipitation of the driest month (mm);
PWaQ_18: Precipitation of the warmest quarter (mm);
PCoQ_19: Precipitation of the coldest quarter (mm);
MTWeQ_8: Mean temperature of the wettest quarter (°C);
srad: solar radiation (kJ m-2 day-1);
phh2o: pH of the soil water (pH * 10);
nitrogen: soil nitrogen content (cg/kg);
clay: soil clay content (g/kg);
silt: soil silt content (g/kg);
ocd: organic carbon deposit (hg/m3).
Sharing/Access information
This is a section for linking to other ways to access the data, and for linking to sources the data is derived from, if any.
Data was derived from the following sources:
- WorlClim 2.1 climate data for 1970-2000 (Fick & Hijmans, 2017) at a resolution of 2.5 minutes (https://www.worldclim.org/data/worldclim21.html)
- SoilGrids database released in 2016 (https://soilgrids.org/) through ISRIC—WDC Soils (Hengl et al., 2017) at 250-meter resolution and at a depth of 15-30 cm
Code/Software
Methods
SNP dataset
According to the manufacturer's instructions, we isolated the genomic DNA from fresh leaves of five seedlings per accession using the FavorPrep Plant Genomic DNA Extraction Mini Kit (FAVORGEN). We then constructed the sequencing library following the approach of Elshire et al. (2011). Genomic DNA was quantified by Qubit and normalized to 100ng in 96-well plates. We digested the DNA samples using the restriction enzyme ApeKI and ligated them with two adapters for sequencing, followed by the polymerase chain reaction to amplify the target DNA fragments to complete the sequencing library preparation. A service provider did sequencing with the Illumina HiseqX platform in a pair-end 150bp run.For the SNP calling, we followed mainly the manual of Stacks software (Catchen et al., 2013). In short, we filtered the raw reads by quality and demultiplexed using the process radtags program. We then mapped the retained reads to the eggplant reference genome (Eggplant_V4.1.fa) (Barchi et al., 2021) using the Burrows-Wheeler Aligner (BWA) version 0.7.17 (Li & Durbin, 2009). We sorted and indexed the reads using Samtools version 1.15.1 (Li et al., 2009), after which we performed the variant calling using the gstacks and population programs in Stacks software. We further filtered the SNPs and the accessions with less than 20% missing data and a Minor Allele Frequency (MAF) > 0.05, giving the final high-quality SNP dataset comprising 15,146 SNPs.
Environmental variable dataset
We downloaded the grids for 19 bioclimatic variables, solar radiation, wind speed, and vapor pressure derived from WorlClim 2.1 (Fick & Hijmans, 2017) at a resolution of 2.5 minutes. The 19 bioclimatic variables were each downloaded as annual data averages between 1970 and 2000. We averaged the monthly solar radiation, wind, and vapor pressure rasters to obtain annual value rasters from this period. We downloaded the soil data from the SoilGrids database released in 2016 (https://soilgrids.org/) through ISRIC—WDC Soils (Hengl et al., 2017) at 250-meter resolution and at a depth of 15-30 cm, approximately the depth at which the eggplant roots can grow. Soil variables included nitrogen, soil organic carbon, organic carbon density, organic carbon stock, cation exchange capacity, pH, clay sand, and silt content. The soil dataset resolution was aggregated to match that of the climate data using the resample and extent functions of the raster package in R (Hijmans, 2023), ensuring they are consistent in both resolution and extent. The environmental variables for each accession with the extract function of the R raster package (Hijman, 2023) using the GIS coordinates at sampling points to obtain a full data set of all the climate and soil variables. For our modeling, we selected the environmental variables based on Variance Inflation Factors (VIFs) selecting for variables with a VIF less than 5.