Selection over small and large spatial scales in the face of high gene flow
Data files
Dec 12, 2024 version files 48.35 MB
-
Dryad_Urchin_CCGP.zip
48.34 MB
-
README.md
3.56 KB
Abstract
Local adaptation represents the balance of selection and gene flow. Increasingly, studies find that adaptation can occur on spatial scales much smaller than the scale of dispersal, resulting in balanced polymorphisms within populations. However, in many cases we lack information on how this microgeographic adaptation might facilitate or hinder larger scale environmental heterogeneity, for example across latitude. Marine systems present a special case, as many marine species have high dispersal capacity so that dispersal ‘neighborhoods’ may encompass environmental heterogeneity over both extremely small and extremely large spatial scales. Here, we leverage fine-scale sampling across the California range of the Pacific purple urchin (Strongylocentrotus purpuratus), a species with previous evidence of both local adaptation and extremely high gene flow. We find that despite complete absence of neutral population structure, satellite-based sea surface temperature and tidal zone drive genetic differences among populations, suggesting that balanced polymorphisms can lead to adaptation across both large scale (latitudinal) and small scale (subtidal v. intertidal) scales. In fact, some of the same genetic variants differentiate populations at both spatial scales, potentially because both environmental parameters are related to temperature. Further, we find that genes that are expressed at a single tissue or life history stage are more divergent than expected across both latitudinal and tidal zone comparisons, suggesting that these genes have specific functions that might generate phenotypic variation important for local adaptation. Together these results suggest that even in populations with little population structure, genetic variation can be sorted across even small spatial scales, potentially resulting in local adaptation across a complex environmental mosaic.
README: Selection Over Small and Large Spatial Scales in the Face of High Gene Flow
README contains necessary files for rerunning analyses discussed in this paper
Description of the data and file structure
Directories and their contents. Any NAs represent missing data.
data
6.filtered_goodinds.recode.vcf: vcf file for 114 individuals and 991,002 SNPs
6.filtered_goodinds_thin.recode.vcf: thinned to 19,081 SNPs
CORRECTED_SNPS_qvalues_114ind.csv: data for Outflank
* CHROM= chromosome
* POS= position
* FST_site= FST values associated with site
* FST_tidal= FST values associated with tidal zone (intertidal v. subtidal)
* FST_NS= FST values associated with northern v. southern sites
* NS_qvalues= qvalues associated with northern v. southern sites
* Tidal_qvalues= qvalues associated with tidal zone (intertidal v. subtidal)
* Site_qvalues= qvalues associated with site.
metadata
sites_data.csv: site information
*site= site name
*site_code= 2-3 letter code for each site
*long,lat= longitude and latitude for each site
*tide= tidal zone
*color,color2= colors assigned to sites for figures
*site_name=how site name will be shown in figures
new_sites_data.csv all columns matching with sites_data with new column "shapes" for differentiating between intertidal and subtidal sites
Urchin_metadata_114inds_SORTED.csv: sample information, any section with "NA" represents data that was not provided by the collector(s) of that sample
*Number=number for sample
*CCGP_code=code for each site
*M_Number, other, stuff, id= additional numbers for differentiating between samples and sample sites
*Lat, Long=lat/long coordinates for sample sites
*Individuals.per.site..erase.this.column.after.corroborating.information.= number of individuals collected
*General.notes= more site name information
*FieldNumber=acronym for site if applicable
*Instant.ID=species ID
*Collector=who collected the samples
*DateCollected= date samples were collected
*Phylum, Class, Order, Family, Genus, Species= Echinodermata, Echinoidea, Echinoida, Strongylocentrotidae, Strongylocentrotus, purpuratus for all samples
*Depth= depth of sampling site
*Habitat= general description of habitat where samples were collected
*Habitat_tidal= categorization of sites in the intertidal v. subtidal (sub)
*north_south=categorization of sites in north v. south
*site_name=concise full name of site
*site_code=2-3 letter code for each site
Rmarkdown
- Combine_Cands_Urch114.Rmd: candidate genes from RDA and LFMM (the input data for these scripts are created in the RDA_Urchin114_final.Rmd, and LFMM_Urch114_final.Rmd, respectively).
- ConStruct.Rmd: Construct analyses for K = 1- 5 and Figure generation for K=2
- GE_comparison.R: Gene expression comparison and figure generation
- LFMM_Urch114_final.Rmd: LFMM analyses
- outflank_urchin.Rmd: Outflank analyses and figure generation
- pca_urchin.Rmd: PCA analyses
- RDA_Urch114_final.Rmd: RDA analyses
- Site Map.R: Site map for figure 1
- TopGO.R: Top Gene Ontology values ## Shellscripts
- Dups.sh: this script removes PCR duplicates
- gatk1.sh: step one of gatk, GenomicsDBImport
- gatk2.sh: step two of gatk, GenotypeGVCFs
- gatk3.sh: step three of gatk, GatherVcfs, SelectVariants, VariantFiltration, filtering with vcftools
- Gvcf_array.sh: slurm array script that combines gvcf files with HaplotypeCaller in gatk
- mapping.sh: slurm array script maps each set of reads per sample to the reference genome
- rg.sh: adds read groups to bam files
Sharing/Access information
Data files listed above