Out with the old, introgression with the new: Signals of ancient and recent admixture in hybridizing Mesoamerican crocodiles (Crocodylus acutus x Crocodylus moreletii)
Data files
Oct 07, 2024 version files 549.83 MB
-
AncestryPaint_abbababa.zip
18.29 KB
-
Filtering.zip
111.52 MB
-
FSC2.zip
7.18 MB
-
ipyrad.zip
86.41 MB
-
LD.zip
344.57 MB
-
noreponly_metadata_fixed.csv
63.49 KB
-
PopulationStats_Admixture.zip
60.69 KB
-
README.md
6.42 KB
Abstract
A central aim of conservation is to preserve existing biodiversity and understand the ecological and evolutionary processes that support it. Inter- and intra-specific hybridization in wildlife has been recognized as a common and naturally occurring phenomenon that facilitates species adaptation and evolution. However, hybridization still constitutes one of the most challenging problems for legal protection and species management due to its perceived biological risk, lack of regulatory oversight, and different case-by-case impacts. When considering rare or threatened hybridizing species with unequal legal protection, management strategies risk being inaccurate or unsuccessful unless contextualized with an informed understanding of the species' genetic and evolutionary backgrounds. We investigated hybridization dynamics and genetic diversity of American crocodiles (Crocodylus acutus) and Morelet’s crocodiles (Crocodylus moreletii) from Belize to ascertain whether genetic exchange through admixture displayed signs of evolutionary significance. Using genomic reduced representation (3RAD) datasets from 242 wild crocodile samples, we found evidence of population structure among C. acutus, as well as ancient bidirectional gene flow that had occurred between C. acutus and C. moreletii. Notably, we also found evidence of high levels of recent admixture along the coastal Crocodylus populations in areas with extensive habitat modification due to human impact. These findings as well as a discovered disconnect between morphological and genetic species assignments used to identify populations have implications for conservation management practices and suggest a range of additional genetic investigations to understand the natural and anthropogenic role of hybridization in large long-lived tropical predators that span marine and terrestrial ecosystems.
https://doi.org/10.5061/dryad.3bk3j9kt9
Scripts for running analyses and their respective input/output files. All scripts for running the analyses are given as is and need to be edited to match personal data directories and modified to your specific needs). Analyses scripts can also be found on GitHub (see Related Works section).
NOTE: In all scripts and data files, ‘Sampling localities’ in the main manuscript are listed as “Monitoring unit”.
Description of the data and file structure
- ipyrad
- ipyradpipeline.md - instructions on running ipyrad pipeline
- params-noreponly_v2.txt - input parameter file used for SNP calling and genotyping in ipyrad
- ipyrad_output - folder containing output files from the ipyrad pipeline
- noreponly.* - output files (.geno, .ugeno, .str, .ustr, .vcf, and .gz) from ipyrad pipeline. Note: DS1 = noreponly_v2.vcf.gz
- Filtering
- filteredVCF - folder including filtered VCF files generated from SNPfiltR_filtering.R
- SNPfiltR_filtering.R - R script for filtering ipyrad vcf file to working datasets and creating PCA & tSNE plots in Supplementary Figures S3-S4
- LDpruning_Dataconversion.R - R script for LD pruning filtered vcf files and creating subsetted VCF files
- PopulationStats_Admixture
- ADMIXTURE.md - instructions for the pipeline & scripts for running ADMIXTURE analyses
- admixture_k3_pops - folder of .txt files for the ADMIXTURE population groups assignments containing the respective Sample IDs in each group
- Popstruct_Plots.R - R script for generating plots from the ADMIXTURE results.
- popmap.csv - condensed population map & metadata for only the 242 working samples used in the ADMIXTURE analysis created from the Popstruct_Plots.R script.
- PCAscript.R - R script to create PCA plots
- PopGenStats.md - instructions for the pipeline & scripts to run VCFtools for generating population genetic summary stats
- PopGenStats.R - R script for generating population diversity statistics
- FSC2
- FSC2.md - instructions for the pipeline & scripts to run Fastsimcoal2 analyses (need additional scripts to run, descriptions in text)
- FS_FixRootTime_K3_Mods.slurm
- FSC_croc_boot.slurm
- fsc-selectbestrun.sh
- Get_AIC_across_mods.R
- Get_best_FSCacross_mods.sh
- Get_best_FSCacross_boots.sh
- Get_pars_across_bootreps_crocs.R
- Prep_FSC_reps.sh
- FSC2.R - R script for processing FSC2 results and generating plots
- Ad_k3_pop90 - Folder containing subfolder for FSC2 input and output
- Models_16.2y - folder containing subfolders for each of the 16 tested models and .obs file generated via easySFS. Model folders each contain an .est and .tpl input parameter file for FSC2
- noreponly_v2.75.renamed.LDpruned.popmap90.txt - input popmap file for running FSC2
- best_L_all_Mods_16.2yr - folder containing
- *.bestlhoods files from all tested models
- .csv of best-fit model AIC scores and parameter estimates
- best_L_allMods_boot_16.2yr - folder containing:
- *.bestlhoods files from best-fit model
- boot_ranges.csv - output of boot ranges
- FSC2.md - instructions for the pipeline & scripts to run Fastsimcoal2 analyses (need additional scripts to run, descriptions in text)
- LD
- PlinkLD.md - instructions for the pipeline and scripts to generate the LD files using Plink
- plinkfiles_10Mb - folder of generated Plink files for each ADMIXTURE population group
- LDscript.R - Rscript for generating LD decay plots
- AncestryPaint_abbababa
- AncestryPainting.md - instructions for the pipeline and script used to generate Ancestry painting plots
- snpRabbababa.R - Rscript for ABBA-BABA analyses
- noreponly_metadata_fixed.csv - metadata file for all 273 samples
- Definition of columns for metadata in noreponly_metadata_fixed.csv:
- Sample - Unique Sample ID used for analyses
- Seq_ID - Unique Sequencer ID
- CrocID - Unique individual ID combining Mark Code and Morph_Species
- Longitude - EDITED Longitude value**
- Latitude - EDITED Latitude value**
- Capture_Date - Capture date by date-month-year
- Monitoring.Unit - Sampling locality
- Abbrv_Monitoring.Unit - Abbreviation for Monitoring.Unit
- Subdivision - District in Belize
- Water Temp (C) - Water temperature (Celsius)
- Air Temp (C) - Air temperature (Celsius)
- Salinity (ppt) - Water Salinity (parts per thousand)
- pH - water pH
- Mark Code - Unique mark code for sampled individual
- Morph_Species - Morphological species group
- Size Class - Size class (Adult, Subadult, Juvenile, Hatchling)
- Sex - Male/Female
- HL (cm) - Head length (centimeters)
- SL (cm) - Snout length (centimeters)
- CW (cm) - Cranial width (centimeters)
- MAX W (cm) - Maxillary Width (centimeters)
- PMax W (cm) - Premaxilary width (centimeters)
- POb L (cm) - Preorbital length (centimeters)
- POb W (cm) - Preorbital width (centimeters)
- TL (cm) - Total length (centimeters)
- SVL (cm) - Snout vent length (centimeters)
- TW (cm) - Tail width (centimeters)
- HF (cm) - Hind Right foot length (centimeters)
- Weight (kg) - Weight (kilograms)
- Paratrichosoma - Presence/Absence of Paratrichosoma
- Tail Condition
- Skin Condition
- Musculature Condition
- Teeth Condition
- Skeletal Structure
- Nuchal Photo
- Post-Occipital
- Nuchal Scutes
- Post-Occipital/Nuchal Pattern
- Transverse Rows
- Double Whorls
- Single Whorls
- Notes
- Wounds
- NOTE: In order to preserve sensitive occurrence data for threatened/at-risk species, we generalized the precision of the geographic coordinates (Lat/Long) by reducing the number of decimal places to 0.1 decimal degrees as recommended by Guide to Best Practices for Generalising Sensitive Species Occurrence Data [Chapman AD (2020) Current Best Practices for Generalizing Sensitive Species Occurrence Data. Copenhagen: GBIF Secretariat. https://doi.org/10.15468/doc-5jp4-5g10.]. To request this data, please contact the corresponding author Helen Sung (hwsung@hawaii.edu).
- Definition of columns for metadata in noreponly_metadata_fixed.csv:
We used a reduced representation sequencing approach (3RAD) to collect loci from 273 wild crocodiles (Crocodylus acutus and Crocodylus moreletii) sampled throughout Belize from 2014 to 2021. We used population genomic approaches to evaluate hybridization dynamics and genetic diversity and quantified the age of admixture for evaluating conservation implications.