Data from: Levels and partitioning of genetic variation of northeastern populations of diamondback terrapin (Malaclemys terrapin)
Data files
Oct 16, 2025 version files 61.45 MB
-
admixture_plots_081525.R
23.24 KB
-
admixture_refiltered_23.sh
566 B
-
admixture_rerun_README2.txt
750 B
-
allelic_richness_081525.R
16.53 KB
-
landscape_genetics_081525.R
22.26 KB
-
plink_ped2bed_script_23.sh
513 B
-
README.md
3.19 KB
-
recode_plink_23.sh
244 B
-
remove_25pct_missing_data_from_genepop.R
2.08 KB
-
stackspopulations_popmap.txt
2.80 KB
-
summary_stats_and_pca_cleaned_dataset_081525.R
23.89 KB
-
terrapin_116_cleaned.gen
61.36 MB
Abstract
The diamondback terrapin (Malaclemys terrapin) is a mid-sized turtle that serves as a keystone predator in salt marsh ecosystems of eastern North America. The terrapin has historically faced population declines due to habitat loss and overharvesting, which has resulted in its listing under multiple jurisdictions across the northern part of its range. To characterize levels and partitioning of terrapin genetic variation throughout the northeast region, we used restriction site-associated DNA sequencing (RADseq). We analyzed genetic variation among 116 individuals sampled across 18 sites. Within-population genetic diversity was relatively low (He = 0.080–0.122), and we observed a strong negative correlation between diversity and latitude. Furthermore, levels of genetic differentiation were moderate (pairwise FST = 0.00–0.19), with the mean pairwise FST of each population exhibiting a strong positive correlation with latitude. Together, these results are consistent with a model of serial colonization from a Pleistocene refugium in the mid-Atlantic. Spatial genetic variation was best explained by a landscape model that considered migration to be limited to coastal habitats, where northern range-edge populations maintained comparatively low genetic diversity and were more genetically distinct than populations to the south—consistent with their greater geographic isolation. Admixture analyses revealed weak genetic clustering, with the distribution of genetic clusters reflecting the combined historical effects of isolation-by-distance and human-mediated translocations. Regional efforts to restore terrapin habitat or reintroduce captive individuals should consider patterns of historic gene flow, cognizant of the relatively distinct and isolated populations at the northeastern range edge.
Dataset DOI: 10.5061/dryad.n2z34tnbb
Description of the data and file structure
A genotype file containing the final SNP genotypes for all diamondback terrapin individuals analyzed in the paper. This data file is formatted as a GENEPOP file for use with the package 'graph4lg' in R after conversion from class 'genind'. The first line provides a list of loci names in the order they are listed for each genotyped individual on the lines below. Each multilocus genotype is then listed on a single line, beginning with the sample identifier and proceeding to list the alleles recorded for each locus.
Files and variables
File: admixture_plots_081525.R
Description: R code used to generate figures based on the ADMIXTURE results.
File: admixture_refiltered_23.sh
Description: Slurm script used to run the ADMIXTURE analysis on Brown University's cluster.
File: admixture_rerun_README2.txt
Description: Readme file describing how to run the ADMIXTURE analysis.
File: allelic_richness_081525.R
Description: R script used to calculate allelic richness.
File: landscape_genetics_081525.R
Description: R script used to generate and compare hypotheses about terrapin genetic variation in a landscape genetics framework.
File: plink_ped2bed_script_23.sh
Description: This script converts file formats from .ped and .map to .bed, .bam, and .fam in order to run the ADMIXTURE analysis.
File: recode_plink_23.sh
Description: This script is used to recode the data file to ensure alleles are coded as 1 and 2 (and the missing genotype will always be 0).
File: stackspopulations_popmap.txt
Description: Mapping file used in STACKS to identify which population each sample is assigned to.
Variables
- The first column refers to the sample identifiers (e.g., TK0005920).
- The second column refers to the population identifiers (e.g., Barrington_RI).
File: remove_25pct_missing_data_from_genepop.R
Description: We used this R script to reduce the potential effects of missing data on results by filtering the data file to remove samples with 25% missing data.
File: summary_stats_and_pca_cleaned_dataset_081525.R
Description: R script used to calculate population genetic summary statistics and conduct principal coordinates analysis using the input genotype file.
File: terrapin_116_cleaned.gen
Description: The datafile corresponding to 116 multilocus genotypes for diamondback terrapin used for all genetic analyses reported in the paper.
Code/software
We provide a set of R scripts, shell scripts with slurm headers to enable runs on the Brown Universty cluster, and readme files used to process STACKS output files, perform population genetic analyses, and visualize results.
Access information
Other publicly accessible locations of the data:
- The raw Illumina data are available from the NCBI Sequence Read Archive (SRA), accession PRJNA1333380.
