Data from: Impact of a putative riverine barrier on genomic population structure and gene flow in the presence of sexual selection
Data files
Jul 17, 2025 version files 20.19 MB
-
for-dryad.zip
20.18 MB
-
README.md
4.64 KB
Abstract
Gene flow connects populations and facilitates the exchange of alleles, impacting speciation and adaptation. In western Panama, lekking golden-collared and white-collared manakins (Manacus vitellinus and M. candei, respectively) interbreed in a narrow hybrid zone across which males’ brilliant yellow collar plumage, principally controlled by the carotenoid metabolism gene BCO2, has introgressed from vitellinus into candei under sexual selection. Plumage introgression is sharply limited across the lower reaches of the widest river in the region, but both color morphs occur on both riverbanks at its headwaters. Previous authors have speculated that the river may be a strong barrier to gene flow, preventing further plumage color introgression, but this hypothesis has never been tested. In this study, we used between ~10,000 and 14,000 single nucleotide polymorphisms (SNPs) to test this hypothesis by assessing genetic differentiation and estimating gene flow across the river. The data associated with this study include VCF files containing genetic variants, information on demographic modeling of cross-river populations, and scripts used to run the analyses.
Dataset DOI: 10.5061/dryad.6hdr7src2
Description of the data and file structure
The data contained in this repository, along with methods descriptions in the paper and data associated with NCBI BioProject PRJNA951544 are sufficient to reproduce all analyses from the related paper, published in Evolution and with the same title as this dataset.
Code
Data processing
raw-to-gstacks.sh - Shell commands to demultiplex raw reads (not needed if downloading reads from NCBI), align to the M. vitellinus reference (available from NCBI at ASM171598v3), merge bams where the same individual was sequenced multiple times, then assemble contigs using gstacks.
snp-filtering.sh - Shell commands to call SNPs with Stacks populations then filter with VCFtools, creating two VCF files, one for population genetic structure analyses and one for gene flow analyses.
Population structure
ibd.R - R script to run the IBD analysis for both males and both sexes combined.
pca-source.R - Code that sets up PCA, DAPC, and AMOVA analyses in R. Called at the top of the pca-dapc-amova.R script.
pca-dapc-amova.R - R script that runs the PCA, DAPC, and AMOVA analyses, both for males-only and both-sexes datasets.
snmf.R - R script that runs the SNMF (i.e., structure analysis).
Gene flow
All python scripts were obtained and modified from https://github.com/dportik/dadi_pipeline as cited in the manuscript.
eu.wu_find.best.projection_example.py - Example python script to be run for each comparison to find the best projection sizes to use.
e.w_no.mig_example.py - Example python code, editable to change the demographic scenario evaluated. We used this script to evaluate demographic scenarios for all east and all west samples before subsetting the dataset (below).
e.w.noupper_initial.param.values_example.py - Results from the above analysis showed that continuous symmetric migration was the best fit scenario. This Python code runs that scenario for the east-excluding-upper vs. west-excluding-upper groups of samples to get initial parameter values used in the next step.
eu.wu_initial.param.values_example.py - Same as the above for east-upper vs. west-upper.
e.w.noupper_final.param.values_example.py - Example Python script (run many times and results combined for larger sample size) that provides a replicates for parameter values. This version excludes up-river populations.
eu.wu_final.param.values_example.py - Same as above for east-upper vs. west-upper.
dadti-bootstrap.R - R script that takes the output from the final.param.values scripts above and creates bootstrap replicates for confidence intervals.
Data
Data processing
pops_full.txt - Used in gstacks and Stacks populations.
Population structure
pca-snmf.vcf, pca-snmf.males.vcf - Output of snp filtering script with and without subsetting to only males
pca-snmf.raw, pca-snmf.males.raw - The above files converted to .raw using PLINK. Used in PCA script.
pops_gps.txt - Lat/lon of sampling sites. Used in IBD script.
sampleinfo_filtered3.txt - Sex, age, and color info on each sample. M means male, F means female, U means unknown. A means adult, I means immature, U means unknown. Y means yellow, G means green, W means white. Biorepository numbers refer to the accession number of the tissue sample in the Smithsonian's National Museum of Natural History Biorepository.
Gene flow
dad.input.vcf - Ouput from snp filtering script.
pops3.txt - Used in gene flow scripts for all east vs. all west comparisons.
pops4.txt - Used in gene flow scripts for upriver-excluded comparisons.
pops5.txt - Used in gene flow scripts for upriver-only comparisons.
dadi_results_tables.xlsx - Summary sheets showing all dadi results, including the initial run comparing demographic scenarios (Sheets 1 and 2, E-W summary AIC table and E-W full results), initial parameter value estimation for upriver-excluded (Sheet 3, exclude_upriver) and upriver-only (Sheet 4, upriver_only) datasets, and final parameter value estimation for each of the above (Sheet 5, parameter_replicates). In Sheets 1-4, the column names are the same as the dadi output. In Sheet 5, the separate runs for each dataset are combined in one table, with the Dataset column indicating dataset, the Set column indicating a separate run, and the Sim column indicating the replicate from each run.