A lack of genetic diversity and minimal adaptive evolutionary divergence in introduced Mysis shrimp after 50 years
Data files
Jan 09, 2024 version files 65.99 MB
-
blast_fasta_pcadapt.fasta
-
blast_fasta_RDA.fasta
-
mysis_adaptive_snps.vcf
-
mysis_clean_10_for_neutral_popstats.vcf
-
mysis_clean_imputed_names_fix.vcf
-
mysis_clean_imputed_neutral_names_fix.vcf
-
mysis_clean_imputed.vcf
-
mysis_clean_non_imputed.vcf
-
mysis_env_final.csv
-
mysis_neutral_snps.vcf
-
mysis_popmap_final
-
README.md
-
stacks_tests.xlsx
Abstract
The successes of introduced populations in novel habitats often provide powerful examples of evolution and adaptation. In the 1950’s, opossum shrimp (Mysis diluviana) individuals from Clearwater Lake in Minnesota, USA were transported and introduced to Twin Lakes in Colorado, USA by fisheries managers to supplement food sources for trout. Shrimp were subsequently introduced from Twin Lakes into numerous lakes throughout Colorado. Because managers kept detailed records of the timing of the introductions, we had the opportunity to test for evolutionary divergence within a known time interval. Here, we used reduced representation genomic data to investigate patterns of genetic diversity and test for genetic divergence between populations and for evidence of adaptive evolution within the introduced populations in Colorado. We found overall very low levels of genetic diversity across all populations, with evidence for some genetic divergence between the Minnesota source population and the introduced populations in Colorado. There was also little differentiation among the Colorado populations, consistent with the known provenance of a single founding population, with the exception of the population from Gross Reservoir, Colorado. Demographic modeling suggests that the population in Gross Reservoir is of hybrid origin, with an earlier founding population from an unknown source being later supplemented from another population. Despite the overall low genetic diversity we observed, FST outlier and environmental association analyses identified multiple loci exhibiting signatures of selection and adaptive variation related to elevation and lake depth. The success of introduced species is thought to be limited by genetic variation, but our results imply that populations with limited genetic variation can become established in a wide range of novel environments.
README: A lack of genetic diversity and minimal adaptive evolutionary divergence in introduced Mysis shrimp after 50 years
https://doi.org/10.5061/dryad.8cz8w9gwx
Description of the data and file structure
blast_fasta_RDA.fasta 350 bp consensus sequences around each candidate SNP flagged by redundancy analysis.
blast_fasta_pcadapt.fasta 350 bp consensus sequences around each candidate SNP flagged by PCADAPT. Note some of the reads were flagged as truncated by samtools
mysis_clean_10_for_neutral_popstats.vcf 3,803 SNPs generated following the recommendations of Schmidt et al. 2021 for popualtion genetic statistics. None of these SNPs were found in the adaptive SNP list from PCAdapt, or RDA, so they are likely unbiased by strong FST outliers or selection.
mysis_clean_imputed_names_fix.vcf 18,441 SNPs imputed in Linkimpute that were used for redundancy analysis, PCAdapt, and SNMF.
mysis_clean_imputed_neutral_names_fix.vcf 18,220 putative neutral SNPs with SNPs flagged as FST outliers from PCAdapt filtered out. Dataset was imputed with linkimpute before undergoing further filtering.
mysis_clean_non_imputed.vcf 18,441 SNPs prior to imputation in Linkimpute so users can impose stricter filtering regimes based on missingness or replicate ADMIXTURE analyses.
mysis_env_final.csv Environmental data for all sampled Lakes. Empty values are missing (blank) or NA. Variables measured by DS and BJ for all lakes apart form Clearwater Lake include:
state: The state the United States of America where the sampled lake is located. Either CO (Colorado) or MI (Minnesota)
county: State county where the sampled lake is
water_name: Name of the sampled lake or reservoir
code: Short code for sampled lakes. CAR = Carter Lake, CLER= Clearwater Lake, DIL=Dillon Reservoir, GDL= Grand Lake, GRO = Gross Reservoir, JEF= Jefferson Reservoir, TWL= Twin Lakes, RUE= Ruedi Reservoir.
date_sampled: date in m/d/y when the genotyped shrimp were sampled.
area-hectares: surface area of sampled lake in hectares
depth-meters: Max water depth in meters of the sampled lake
elevation-meters_above_sea_level: elevation of sampled lake in meters above sea level
cond-microSiemens: Conductivity of sampled lake in microSeimens
secchi-meters: secchi depth of sampled lake in meters
turbidity-Nephelometric_Turbidity_Units: turbidity of sampled lake in nephelometric turbidity units
min_Dissolved_Oxygen_on_bottom-mg_per_L: Minimum dissolved oxygen on the bottom of the sampled lake in mg per L
mean_august_surface_temp_C: mean August surface temperature of the sampled lake in C
num_yrs_Aug_temp: Number of years of collected data to measure mean August surface temperature
mysis_detected: yes (Y), no (N) if Mysis have been found in the sampled lake. Only used for invaded (Colorado) lakes
number_stocking_events: Number of times Mysis are known to have been stocked in a sampled lake based on historical records.
stocked: yes(Y), no (N) if Mysis are known to have been stocked in a sampled lake
connected_to_stocked_lake: yes(Y), no (N) if sampled lake is known to be connected to a lake known to have been stocked with Mysis
mysis_popmap_final.txt Popmap used for populations in Stacks. Individuals are separated by lake.
mysis_adaptive_snps.vcf 221 putative adaptive SNPs identified by PCADAPT and redundancy analysis
mysis_clean_imputed.vcf 18,441 SNPs imputed in Linkimpute
mysis_neutral_snps.vcf 18,220 neutral SNPs (candidate SNPs removed) for PCA and DAPC analysis
Code/Software
Scripts
All in R markdown files with annotations and some personal commentary to help interperpret the data
001- Bioinformatics Pipeline Final.Rmd This includes workflow used to de-multiplex, and filter the raw ddRAD data from Admera Health.
stacks_tests.xlsx Shows the result of multiple runs of Stacks denovo_map.pl to optimize the parameter used for final genotype calling.
002- Population_Genetics_final.Rmd Workflow to analyse neutral population structure.
003-Identification_of_Population_Structure_Associated_with_Habitat_final.Rmd Workflow to generate FST outlier SNP list that were filtered out of the dataset for "neutral" popualtion genetics, code for redundancy analysis and adaptive PCA.