Data from: Drift happens: molecular genetic diversity and differentiation among populations of jewelweed (Impatiens capensis Meerb.) reflect fragmentation of floodplain forests

Toczydlowski, Rachel H.1; Waller, Donald M.1

Published Feb 26, 2019 on Dryad. https://doi.org/10.5061/dryad.v3t16d8

Data files

Feb 26, 2019 version files 156.57 MB

Dryad-allsamples-longreads-optimalparams_indel8depth8-ip7-filtered.vcf

21.54 MB
Dryad-populations.snps-stacks2.1.m3max3M4n4-filtered.vcf

135 MB
sample_name_key.csv

21.58 KB

Abstract

Landscape features often shape patterns of gene flow and genetic differentiation in plant species. Populations that are small and isolated enough also become subject to genetic drift. We examined patterns of gene flow and differentiation among 12 floodplain populations of the selfing annual jewelweed (Impatiens capensis Meerb.) nested within four river systems and two major watersheds in Wisconsin, USA. Floodplain forests and marshes provide a model system for assessing the effects of habitat fragmentation within agricultural/urban landscapes and for testing whether rivers act to genetically connect dispersed populations. We generated a panel of 12,856 single nucleotide polymorphisms and assessed genetic diversity, differentiation, gene flow, and drift. Clustering methods revealed strong population genetic structure with limited admixture and highly differentiated populations (mean multilocus FST = 0.32, FST’ = 0.33). No signals of isolation by geographic distance or environment emerged, but alleles may flow along rivers given that genetic differentiation increased with river distance. Differentiation also increased in populations with fewer private alleles (R2 = 0.51) and higher local inbreeding (R2 = 0.22). Populations varied greatly in levels of local inbreeding (FIS = 0.2 to 0.9) and FIS declined in smaller, more isolated populations. These results suggest that genetic drift dominates other forces in structuring these Impatiens populations. In rapidly changing environments, species must migrate or genetically adapt. Habitat fragmentation limits both processes, potentially compromising the ability of species to persist in fragmented landscapes.

final_filtered_vcf_file_Stacks

This is the final cleaned and filtered VCF file used for downstream population genetic and landscape genetic analyses. There are 315 samples and 12,856 single nucleotide polymorphisms generated using genotyping by sequencing (GBS; Illumina). This file was generating by assembling GBS data using the Stacks (2.1) pipeline on a high-throughput computing cluster (HTCondor). Samples: 295, 297, 298, 299, 300, 301, 303, 305, 306, 308, 309, 312, 313, 314 were removed prior to final analyses as they were technical replicates. Samples: 34, 36, 40, 43 were randomly selected from population 4033 and removed to balance sample sizes across populations. See published paper (Drift happens: Molecular genetic diversity and differentiation among populations of jewelweed (Impatiens capensis Meerb.) reflect fragmentation of floodplain forests, Toczydlowski, Rachel, Waller, Donald) Methods section and Table S2 for more details about bioinformatic filtering and Stacks assembly parameters.

Dryad-populations.snps-stacks2.1.m3max3M4n4-filtered.vcf

GitHub:https://github.com/toczydlowski/stacks-on-htcondor

We ran the program Stacks (http://catchenlab.life.illinois.edu/stacks/; J. Catchen, P. Hohenlohe, S. Bassham, A. Amores, and W. Cresko. Stacks: an analysis tool set for population genomics. Molecular Ecology. 2013.) to assemble genetic data generated by genotyping by sequencing. We tested many Stacks assembly parameters in parallel on a high-throughput computing network (HTCondor). Our scripts for running this pipeline are housed on github at: https://github.com/toczydlowski/stacks-on-htcondor. These scripts will be made publicly available soon. We plan to publish a short note officially releasing and describing how to use these scripts.

final_filtered_vcf_file_ipyrad

This is the final cleaned and filtered VCF file output from the program ipyrad for our data (https://ipyrad.readthedocs.io/; Eaton, D. A. (2014). PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics, 30(13), 1844-1849.). There are 315 samples and 4,215 single nucleotide polymorphisms generated using genotyping by sequencing (GBS; Illumina). This file was generating by assembling GBS data using the ipyrad (0.7.13) pipeline on a high-throughput computing cluster (HTCondor). Samples: 295, 297, 298, 299, 300, 301, 303, 305, 306, 308, 309, 312, 313, 314 were removed prior to final analyses as they were technical replicates. Samples: 34, 36, 40, 43 were randomly selected from population 4033 and removed to balance sample sizes across populations. See published paper (Drift happens: Molecular genetic diversity and differentiation among populations of jewelweed (Impatiens capensis Meerb.) reflect fragmentation of floodplain forests, Toczydlowski, Rachel, Waller, Donald) Methods section and Table S2 for more details about bioinformatic filtering and Stacks assembly parameters. We assembled the GBS data in two different programs and compared the outputs to test the robustness of our de novo assembly (see Supplemental materials for the published paper). We used Stacks output for all population genetic and landscape genetic analyses published in the paper.

Dryad-allsamples-longreads-optimalparams_indel8depth8-ip7-filtered.vcf

Sample_name_key

This is a key linking the sample names in the VCF files to additional metadata (e.g. population, lat/long, etc.)

sample_name_key.csv

FASTQ_genetic_sequence_data_on_GenBank

Original FASTQ files of genetic sequence data that went into the Stacks and ipyrad genetic assembly pipelines to build the single nucleotide polymorphism dataset used in this paper. One file per sample (including technical replicates, N = 315). GenBank accession number PRJNA524160.