Data and code from: Population genomics of a cosmopolitan weed provides insights into its local adaptation and recent demographic history
Data files
May 08, 2026 version files 309.45 MB
-
Admixture_Erigeron_canadensis.R
5.65 KB
-
AMOVA_Erigeron_canadensis.R
5 KB
-
cpDNA_loci_Erigeron_canadensis.R
3.99 KB
-
divMigrate_Erigeron_canadensis.R
4.47 KB
-
divMigrate-27366003.log
12.08 KB
-
Erigeron_canadensis_metadata_cleaned.csv
229.03 KB
-
Erigeron.canadensis.SNPs.g66mac3minDP3birl15rm75.F4.meanDP10maxDP1000maf05thin.recode.vcf
307.68 MB
-
Erigeron.dist.out.nex
1.47 MB
-
FEEMs_Erigeron_canadensis.R
1.82 KB
-
GeneticDiversity_Erigeron_canadensis.R
2.78 KB
-
Mantel_Test_IBD_Erigeron_canadensis.R
6.83 KB
-
Metadata_description.md
6.51 KB
-
Offset_field_Erigeron_canadensis.R
2.17 KB
-
PCA_bio1_to_19_Erigeron_canadensis.R
1.77 KB
-
PCA_NeiDist_Fst_Erigeron_canadensis.R
3.96 KB
-
RDA_Erigeron_canadensis.R
4.32 KB
-
RDA_Offset_Erigeron_canadensis.R
4.06 KB
-
README.md
7.48 KB
-
Trees_Erigeron_canadensis.R
2.72 KB
Abstract
This dataset accompanies the manuscript "Population genomics of a cosmopolitan weed provides insights into its local adaptation and recent demographic history" and contains the genomic, environmental, and analytical resources used to investigate the evolutionary dynamics of Erigeron canadensis across its native and non-native ranges. The dataset includes VCF genotype files derived from ddRADseq of 640 individuals across 280 populations, environmental data (climatic and anthropogenic variables), and R code for data filtering, population genetic structure analyses, landscape genomic modeling, and genomic offset predictions. These resources support analyses of population structure, gene flow, environmental associations, and potential maladaptation in non-native ranges. The data are intended to facilitate further research on invasive species genomics, genotype-environment associations, and the role of selfing and multiple introductions in plant invasion success.
Dataset DOI: 10.5061/dryad.8w9ghx40k
Description of the data and file structure
This dataset accompanies the manuscript “Population genomics of a cosmopolitan weed provides insights into its local adaptation and recent demographic history”, published in Molecular Ecology. It includes genomic data, associated environmental variables, and R scripts used for data processing and analysis.
1. Genotype and Metadata Files
- Erigeron.canadensis.SNPs.g66mac3minDP3birl15rm75.F4.meanDP10maxDP1000maf05thin.recode.vcf
Final SNP dataset in VCF format, containing 11,501 filtered SNPs across 640 individuals from 280 populations. SNPs were filtered for quality, minor allele frequency, missing data, and linkage (one SNP per contig). Erigeron_canadensis_metadata_cleaned.csvMetadata file providing population and environmental information (coordinates, range, region, climate variables, anthropogenic index, and performance data such as biomass and number of capitula). Variable definitions, units, and data sources are documented in the accompanyingMetadata_description.mdfile.Metadata_description.mdData dictionary forErigeron_canadensis_metadata_cleaned.csv. Describes each column, including the full meaning of abbreviations, units of measurement, and references for the climate, remote-sensing, and anthropogenic data sources.
2. R Scripts
These scripts reproduce the analyses from the manuscript, including population structure, genetic diversity, environmental associations, and genomic offset. Each script is commented and organized for reproducibility.
Admixture_Erigeron_canadensis.R– Runs ADMIXTURE for ancestry inferenceAMOVA_Erigeron_canadensis.R– Performs analysis of molecular varianceOffset_field_Erigeron_canadensis.R– Tests genomic offset against field performanceRDA_Erigeron_canadensis.R– Performs dbRDA to explore genotype–environment associationsdivMigrate_Erigeron_canadensis.R– Estimates directional gene flowdivMigrate-27366003.log– Output log from gene flow analysisFEEMs_Erigeron_canadensis.R– Fast Estimation of Effective Migration SurfacesPCA_bio1_to_19_Erigeron_canadensis.R– PCA of 19 BioClim variablesErigeron.dist.out.nex– Nexus-format genetic distance matrixTrees_Erigeron_canadensis.R– Constructs neighbor-joining treesMantel_Test_IBD_Erigeron_canadensis.R– Partial Mantel tests for IBD/IBEGeneticDiversity_Erigeron_canadensis.R– Calculates Ho, He, Ar, FISRDA_Offset_Erigeron_canadensis.R– Genomic offset estimationcpDNA_loci_Erigeron_canadensis.R– Placeholder for cpDNA analysisPCA_NeiDist_Fst_Erigeron_canadensis.R– PCA of Nei distance and Fst values
3. Software
All scripts were developed in R (version ≥ 4.0). The following packages are required: adegenet, dartR, poppr, vegan, hierfstat, vcfR, phangorn, ape, ggtree, ggplot2, and others as specified in each script.
Files and variables
File: Erigeron.canadensis.SNPs.g66mac3minDP3bir15rm75.F4.meanDP10maxDP1000maf05thin.recode.vcf
Description:
This is the final filtered genotype dataset in Variant Call Format (VCF), containing 11,501 biallelic SNPs from 640 individuals across 280 populations. SNPs were filtered for quality, missingness, linkage (1 SNP per contig), and minor allele frequency. Standard VCF fields are included (CHROM, POS, REF, ALT, QUAL, FILTER, INFO), along with genotype data for each sample.
Variables:
Standard VCF fields, plus sample genotypes in GT format.
Missing data are represented as ./.
File: Erigeron_canadensis_metadata_cleaned.csv
Description:
Tabular metadata for all sampled individuals and populations. Includes geographic coordinates, environmental variables, population classification (native/non-native), and field performance metrics (biomass, capitula count). Variables are described in detail in the associated manuscript. Missing values are denoted as NA.
Key Variables:
Population_ID: Unique population identifierSample_ID: Individual sample identifierLatitude,Longitude: Geographic location (decimal degrees)Range,Region,Country: Geopolitical and ecological groupingsCWD,Annual_Precip,Temp_Seasonality, etc.: Climate-related variablesHuman_Footprint: Anthropogenic pressure indexBiomass,Capitula: Field performance traits (log-transformed)
Code
File: Admixture_Erigeron_canadensis.R
Description:
Runs ADMIXTURE analysis using the filtered SNP dataset. Includes code for cross-validation, optimal K selection, and plotting of ancestry proportions.
File: AMOVA_Erigeron_canadensis.R
Description:
Performs analysis of molecular variance (AMOVA) to partition genetic variation among predefined hierarchical groups (e.g., regions, ranges).
File: Offset_field_Erigeron_canadensis.R
Description:
Tests for a correlation between genomic offset values and observed fitness-related traits (biomass, capitula).
File: RDA_Erigeron_canadensis.R
Description:
Runs redundancy analysis (RDA) to assess genotype–environment associations using climate and human footprint variables.
File: divMigrate_Erigeron_canadensis.R
Description:
Estimates directional gene flow among regions using genetic distances. Outputs migration matrix for visualization.
File: divMigrate-27366003.log
Description:
Log file containing output from the divMigrate analysis run in the previous script.
File: FEEMs_Erigeron_canadensis.R
Description:
Calculates Fast Estimation of Effective Migration Surfaces
File: PCA_bio1_to_19_Erigeron_canadensis.R
Description:
Conducts PCA on 19 BioClim variables to reduce dimensionality before environmental association analysis.
File: Erigeron.dist.out.nex
Description:
Nexus-format genetic distance matrix for phylogenetic or clustering analyses (e.g., neighbor-joining trees).
File: Trees_Erigeron_canadensis.R
Description:
Builds and visualizes phylogenetic trees based on Nei distances among populations.
File: Mantel_Test_IBD_Erigeron_canadensis.R
Description:
Performs partial Mantel tests to evaluate isolation by distance and environment.
File: GeneticDiversity_Erigeron_canadensis.R
Description:
Calculates population-level genetic diversity statistics: observed/expected heterozygosity, allelic richness, and inbreeding coefficient.
File: RDA_Offset_Erigeron_canadensis.R
Description:
Estimates genomic offset using RDA-based projections of genotypes in non-native environments.
File: cpDNA_loci_Erigeron_canadensis.R
Description:
Placeholder script for analysis of chloroplast loci (no cpDNA sequences are included in this version).
File: PCA_NeiDist_Fst_Erigeron_canadensis.R
Description:
Performs PCA using pairwise Nei genetic distances and FST values between populations or regions.
Software
Any program that will open a spreadsheet, such as Excel is recommended.
RStudio (Software capable of running R) is used for statistical analysis and data visualization.
This dataset includes genomic, environmental, and analytical data for Erigeron canadensis sampled from 280 populations across 19 regions (103 native, 177 non-native). Field-collected seeds were germinated in a greenhouse, and leaf tissue was collected from offspring. DNA was extracted using the peqGOLD Plant DNA Mini Kit (VWR, Avantor®).
Double-digest RAD sequencing (ddRADseq) libraries were constructed using two restriction enzymes. Custom-barcoded adapters were ligated, and pools of 96 indexed individuals were purified and size-selected (350–450 bp). PCR amplification added external indices, and libraries were sequenced on an Illumina NovaSeq 6000 platform with a 10% PhiX spike-in.
Raw reads were demultiplexed and processed using a pipeline that included trimming, de novo assembly, and SNP calling. SNP filtering retained only high-quality biallelic loci with a minimum allele count of 3, minimum depth of 3, mean sequence quality of at least 30, missing data per site of no more than 34%, and a minor allele frequency of at least 0.05. From an initial set of over 800,000 SNPs across 41,149 contigs, 11,501 SNPs in 5,080 contigs were retained. One SNP per contig was selected to reduce linkage.
Genotype data were analyzed in R to calculate genetic diversity, population structure, and spatial patterns. Analytical steps included heterozygosity, inbreeding coefficients, hierarchical AMOVA, clustering, and distance-based redundancy analysis. Environmental data included climate variables, climatic water deficit, and human footprint indices. Genomic offset was estimated by comparing observed and predicted genotypes across environmental gradients.
All variant call sets (VCF genotype files), environmental data, and R code are included in this repository.
