Data for isolation-by-environment and its consequences for range shifts with global change: Landscape genomics of the invasive common tansy
Data files
Jun 05, 2024 version files 15.83 MB
Abstract
Invasive species are a growing global economic and ecological problem. However, it is not well understood how environmental factors mediate invasive range expansion. In this study, we investigated the recent and rapid range expansion of common tansy across environmental gradients in Minnesota, U.S.A. We densely sampled individuals across the expanding range and performed reduced representation sequencing to generate a dataset of 3071 polymorphic loci for 176 individuals. The dataset includes additional samples from the native range in Finland that were not used in the downstream analysis but are contributed for completeness. The dataset includes the genotype calls for all individuals sampled and sequenced. The genotype file was generated by stacks2.59 running the denovo pipeline and then using the populations function where we kept loci that were in 70% of populations and had a minor allele frequency of at least 1%. We used non-spatial and spatially-explicit analyses to determine the relative influences of geographic distance and environmental variation on patterns of genomic variation. We found no evidence for isolation-by-distance (IBD) but strong evidence for isolation-by-environment (IBE), indicating that environmental factors may have modulated patterns of range expansion.
README: Data for Isolation-by-Environment and its consequences for range shifts with global change: landscape genomics of the invasive common tansy
https://doi.org/10.5061/dryad.h70rxwdsk
Author Information
Author Contact: Ryan Briscoe Runquist (rbriscoe@umn.edu)
Principal Investigator Contact Information
Name: Ryan Briscoe Runquist
Institution: University of Minnesota
Address: Department of Plant and Microbial Biology, 1479 Gortner Ave, 140 Gortner Laboratory, St. Paul, MN 55108
Email: rbriscoe@umn.edu
ORCID: https://orcid.org/0000-0001-7160-9110
Associate or Co-investigator Contact Information
Name: David A. Moeller
Institution: University of Minnesota
Address: Department of Plant and Microbial Biology, 1479 Gortner Ave, 140 Gortner Laboratory, St. Paul, MN 55108
Email: moeller@umn.edu
ORCID: https://orcid.org/0000-0002-6202-9912
Description of the data and file structure
File List
a. Filename: populations snps.vcf
Description: This file contains the outputted genotyping SNP calls for 3071 loci from 178 populations of common tansy. 176 of the samples are from the invaded range in or near Minnesota, USA (174 in Minnesota and 2 in Wisconsin). The remaining 2 samples are from the native range in Finland. The 176 samples were used in the paper. Genotypes were generated from GBS Illumina sequencing. Raw reads are available at SRA (BioProject PRJNA1099706 SUB14375599). Genotypes were generated using the stacks 2.59 denovo pipeline with the pipeline parameters of M=2, n=2, m=3. To filter and write out loci for population genetics analysis, we ran the populations function in stacks. We kept one random SNP per locus in order to maintain locus independence during downstream analyses. Loci were included if they were present in at least 70% of individuals (which was also equivalent to populations since we have 1 individual/population), had a minor allele frequency (MAF) of at least 1%, and had a maximum heterozygosity of <=95%.
b. tansy_pop_info_allsamples.csv
Description: CSV file containing metadata about populations from the vcf file used in landscape genomics analyses.
Relationship between files:
The csv has descriptive geographic information about the samples in the genotype file.
DATA-SPECIFIC INFORMATION FOR: tansy_pop_info_allsamples_3.csv
- Number of variables: 15
- Number of cases/rows: 178
- Missing data codes:
- Variable List
genID: Name of the sample in the vcf file
pop: Unique identifying number used during datahandling and analyses
name: Population name
geo_grp: Geographic grouping structure - level not used in final analyses
lat: Latitude of population (WGS84)
lon: Longitude of population (WGS84)
PROVNAME: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Province
ECS_PROV: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Province Number
SECNAME: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Section
ECS_SEC: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Section Number
SUBSECNAME: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Subsection
ECS_SUBSECTION: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Subsection Number
ECS_fac: Numeric designation of ECS subsection used for analyses
ECS_fac2: Numeric designation of ECS subsection used for analyses re-leveled to be continuous
MN_quad: Designation of what geographic quadrant of MN (NE, NW, SE, SW) the population existed
Sharing/Access information
Links to other publicly accessible locations of the data:
- Data also available from UMN DRUM: https://doi.org/10.13020/nx0n-f098
- Additional information and scripts for analyses on github: https://github.com/rdbrunquist/common-tansy-landscape-genomics
Data was derived from the following sources:
- Raw reads available through SRA: BioProject PRJNA1099706 SUB14375599
METHODOLOGICAL INFORMATION
1. Description of methods used for collection/generation of data:
Samples were processed from leaf tissue that was collected fresh at the sampling locality and then immediately placed in a bag of silica gel and labeled. Locality information for each sample was taken using a handheld GPS unit. Genotypes were generated from GBS Illumina sequencing at UMN Genomic Center (UMGC). Raw reads are available at SRA (BioProject PRJNA1099706 SUB14375599).
2. Methods for processing the data:
Genotypes were generated using the stacks 2.59 denovo pipeline with the pipeline parameters of M=2, n=2, m=3. To filter and write out loci for population genetics analysis, we ran the populations function in stacks. We kept one random SNP per locus in order to maintain locus independence during downstream analyses. Loci were included if they were present in at least 70% of individuals (which was also equivalent to populations since we have 1 individual/population), had a minor allele frequency (MAF) of at least 1%, and had a maximum heterozygosity of <=95%.
3. Instrument- or software-specific information needed to interpret the data:
VCF file should be interpretable by any genetics software (e.g. Genodive) or statistical software (e.g. R) that is able to handle genotype files.
CSV file is openable using Excel or text editor
4. Standards and calibration information, if appropriate:
NA
5. Environmental/experimental conditions:
Collected from natural field conditions.
6. Describe any quality-assurance procedures performed on the data:
Samples were processed at the University of Minnesota Genomics Center (https://genomics.umn.edu/services/gbs) using Illumina NextSeq sequencing using the following protocol. UMGC created dual-indexed GBS libraries using the enzyme combination BamHI + NsiI. Enzyme selection followed from a small pilot study used to assess the proper enzyme combination to produce approximately 5000-10000 loci for the average read depth of approximately 1 million per individual. Briefly, extracted DNA was quantified using Picogreen ® (Thermofisher Scientific, MA, USA) and normalized to 10 ng/µl. A total of 100 ng of DNA per sample was digested with 10 units of BamHI & NsiI (New England Biolabs ® , Inc. MA, USA) restriction enzyme and incubated at 37C for 2 hours, and then heat inactivated at 80C for 20 minutes. The DNA samples were then ligated with 200 units of T4 ligase (New England Biolabs ® , Inc. MA, USA) and phased adaptors with -GATC and -TGCA overhangs at 22C for 1 hour and heat killed. The ligated samples were then purified with solid phase reversible immobilization (SPRI) beads and then amplified for 18 cycles with 2X NEB Taq Master Mix to add the barcodes. Libraries were SPRI purified, quantified, and pooled. Fragments with the 300-744 bp size region were selected and diluted to 2 nM for sequencing on the Illumina NextSeq 2000 (Illumina, CA, USA) using single end 1X150 reads. They generated ≈ 320M pass filter reads during sequencing. Once the run was completed, they performed quality control analysis and determined that all expected barcodes and samples were detected, reads were well balanced, and the mean quality scores ≥Q30 for all libraries.
7. People involved with sample collection, processing, analysis and/or submission:
Collection: RB Runquist, Thomas A. Lake
DNA Extractions: RB Runquist
Sequencing: UMGC
Data processing: RB Runquist
Code/Software
Analyses were mainly conducted in R/RStudio. Scripts and markdown documents available at https://github.com/rdbrunquist/common-tansy-landscape-genomics