Going with the flow? Relative importance of riverine hydrologic connectivity versus tidal influence for spatial structure of genetic diversity and relatedness in a foundational submersed aquatic plant
Data files
May 06, 2025 version files 104.33 MB
-
cost_surfaces.zip
94.82 MB
-
data.zip
43.04 KB
-
processed_rasters.zip
3.71 MB
-
raw_rasters.zip
363.12 KB
-
README.md
8.07 KB
-
shapefiles.zip
5.39 MB
Abstract
Genetic connectivity in rivers is generally high and levels of genotypic and genetic diversity of riverine species are expected to accumulate in downstream locations. Genetic structure of marine and estuarine species is less predictable even though hydrologic connectivity is expected to be relatively high. These observations have been generated across different species and locations such that understanding effects hydrologic connectivity relative to tidal versus non-tidal environments in the same river remains incomplete. To control for species and location, we quantified diversity in 941 samples of Vallisneria americana Michx. (Hydrocharitaceae) collected from 36 sites along the species’ entire distribution in the tidal and non-tidal Potomac River of Maryland, Virginia, and the District of Columbia. Using ten microsatellite loci, we found 508 unique multilocus genotypes (MLGs), 36 of which were found multiple times across the riverscape, accounting for over 53% of the genotyped shoots. We found some evidence supporting connectivity throughout the river and stronger evidence that tidal regime drives genotypic and genetic structure within V. americana. Extensive clonality, including two MLGs spanning 230 and 152 river km, limits diversity in the non-tidal reaches and contrasts with very little evidence of clonal expansion (e.g., asexual reproduction) in non-tidal reaches. Genetic differentiation, structure, and pairwise relatedness of sampled shoots and MLGs also differed by tidal regime with the non-tidal Potomac having higher levels of relatedness and lower levels of genetic diversity. The differences in spatial distribution of genetic diversity suggest very different outlooks for V. americana adaptation and acclimation to current and future perturbations across tidal and non-tidal regions of the Potomac, which lead to different recommendations for restoration of the same species in the same river.
These data were used to understand the relative roles of hydrologic connectivity and tidal regime on the genetic diversity of Vallisneria americana in the Potomac River in Maryland, Washington, DC; and Virginia, USA collected in 2007, 2011, and 2013.
Description of the data and file structure
The zip files all contain data folders that were used to complete the analyses for the manuscript titled, Going with the flow? Relative importance of riverine hydrologic connectivity versus tidal influence for spatial structure of genetic diversity & relatedness in a foundational submersed aquatic plant.
The files are needed for the R project Potomac.Manuscript.2024.RProject-v0.2 (doi:10.5281/zenodo.14787930) to work. The zip files should be unzipped and the directories in those files should be placed at the top level of the project directory.
data.zip
The data.zip file contains three csv files.
The csv file allpotomac.microsatellite.data.csv includes the microsatellite genotype data and associated information about each sample that form the basis of the analyses. These data were generated from samples collected in the field.
IDName: The identifier code for the sample
NewPop: The code for the sampling location
OrderPop: A numeric code for sites with higher values downstream that is used to order sites when plotting.
Full.Population.Name: The full name of the sampling location.
Year.Collected: The year the sample was collected from the field.
Clone.ID.2018: The identifier code for the multilocus genotype of each sample.
Tide: Indicates whether sample same from a tidal or nontidal environment.
TideCode: A code required by the R package related to denote groups ? AA = nontidal and AB = tidal samples.
Collector: Names of individuals: involved with collecting material from the field.
Longitude and Latitude: Sampling coordinates collected in the field with a handheld Garmin ETrex GPS unit.
X and Y: Sampling coordinates in Universal Transverse Mercator (UTM) coordinate eastings and northings in UTM Zone 18.
Columns aagx030-m16 are the microsatellite loci formatted with the two alleles at each locus separated by a colon. NA values in those columns represent missing data where allele calls could not be made. This format for alleles is used by adegenet to create the genind objects that are the basis of most of the analyses.
The remaining columns aagx030.1-m16.2 Provide the same genotype data but the alleles are given in two separate columns. This data format is required by the R package related.
The data in the remaining csv files were derived from the observations and data in allpotomac.microsatellite.data.csv
The mlg.count.and.freq.by.pop.csv File contains the count of the number of samples in each site that were identified as one of eight large multilocus genotypes (MLGs).
NewPop: The code for the sampling location
Columns starting with MLG_: Counts of samples assigned to that MLG
Number_of_Stems: The total number of samples collected from the sampling location.
Columns starting with FR_: Relative frequency of each MLG (count/number of stems).
The csv file PR.Pop.Centroids.UTMs.csv Includes the centroid locations of each of the sampling locations calculated as the mean latitude and longitude and the mean UTM coordinates of all samples at each site.
NewPop: The code for the sampling location.
OrderPop: A numeric code for sites with higher values downstream that is used to order sites when plotting.
Tide: Indicates whether sample same from a tidal or nontidal environment.
Year.Collected: The year the site was sampled.
Longitude and Latitude: Centroid of the sampling coordinates for all samples at each site. If the centroid fell outside of water it was moved to the same position along the river but within water.
X and Y: The Universal Transverse Mercator (UTM) coordinate of the sample centroid eastings and northings in UTM Zone 18.
Source: Indicates if the site was included in Lloyd et al. 2011 or if the data are new in this analysis.
RiverKm: Distance in kilometers from the mouth of the Potomac River. This information is used to assess correlation of genetic diversity with position along the course of the river.
shapefiles.zip
CB.PR.Digitized.MN.2024.shp is the polygon representation of water extent for the Chesapeake Bay and Potomac River that was hand digitized based on aerial imagery. The variable Water has the following values that denote different tributaries of major rivers or the main stem of the Chesapeake Bay: "Chesapeake", "Shenandoah", "Cacapon", "South Branch", "Patterson Creek", and "Potomac". The data are in the geodetic coordinate reference system (CRS), NAD83 (EPSG 4269).
raw_rasters.zip
potomac10m.tif was created in and exported from ArcMap 10.2. The raster is created from the shapefile CB.PR.Digitized.MN.2024.shp by selecting only the Potomac River portion of the shapefile and projecting into UTM Zone 18, NAD83 before converting to raster.
processed_rasters.zip
processed_rasters.zip Contains two tif files that are raster representations of the Potomac River with 10 m cell size projected in UTM Zone 18, NAD83.
Potomac10m_trimmed.tif was treated to ensure that the tif created by ArcMap had no extraneous rows or columns beyond the area of interest.
Potomac10m_corrected.tif modified Potomac10m_trimmed.tif to ensure that all sample and site centroids were included in cells coded as water. Given the loss of precision converting polygons to raster cells, points close to shorelines sometimes are not coded being in water. This correction is needed to keep cost distance analysis from failing.
cost_surfaces.zip
cost_surfaces.zip contains two transition matrices created using the r package gdistance and saved to rds files that are readable in the R statistical environment (tr1.Potomac.Transition.10m.rds and tr1c.Potomac.Transition.10m.rds). The transition matrices are both based on the rasters of the extent of the Potomac River. The tr1c.Potomac.Transition.10m.rds version is corrected for diagonal movement and north south distortion.?
Sharing/Access information
The data can be accessed in the R Project for this manuscript that is archived on Zenodo (doi:10.5281/zenodo.14787930) along with the R code that uses all these data sets.
Code/Software
The data were analyzed using various packages within the R statistical environment. The R code is organized to run within a project that is available at doi:10.5281/zenodo.14787930.
Scripts are located within the code folder of the project and are numbered in the order they need to be run. Each script contains information on dependencies on other scripts and data sources, and installs and loads all required packages.
Usage notes
The csv files that can be opened by any text editor most other programs and coding languages used for data analysis.
Shapefiles are a common format for vector-based geographic information system (GIS) data developed by the company esri. They can be opened and used in any GIS software and in R or Python. A shapefile consists of multiple file types beyond the .shp (specifically, .cpg, .dbf, .prj, .sbn, and .sbx). The user only interacts directly with the .shp file but the other files need to be in the same directory.
Raster files represent geographic information as a grid of cells with values representing features of interest. The rasters in this repository are provided as tif files. They can be read by any image processor, GIS software, R, or Python.
The transition matrices are .tif files, but they are of a specific format that is used within the R statistical environment using the R package gdistance.
Leaves were collected in the field and DNA was extracted. Microsatellites were amplified using PCR and detected on an Applied Biosystems Incorporated 3730 DNA analyzer. Microsatellite alleles were called using GeneMapper and checked manually for accuracy. Multilocus genotypes were identified using the program Genodive and the poppr package in the R statistical environment as described in the publication.
