Data from: Ring species dynamics in California mygalomorph spiders (Nemesiidae, Calisoga)
Data files
Mar 12, 2024 version files 40.61 MB
Abstract
Idealized ring species, with approximately continuous gene flow around a geographic barrier but singular reproductive isolation at a ring terminus, are rare in nature. A broken ring species model preserves the geographic setting and fundamental features of an idealized model but accommodates varying degrees of gene flow restriction over complex landscapes through evolutionary time. Here we examine broken ring species dynamics in Calisoga spiders, which like classic Ensatina salamanders, are distributed around the Central Valley of California. Using nuclear and mitogenomic data we test key predictions of common ancestry, ring-like biogeography, biogeographic timing, population connectivity and terminal overlap. We show that a ring complex of populations shares a single common ancestor, and from an ancestral area in the Sierra Nevada mountains, two distributional and phylogenomic arms encircle the Central Valley. Isolation by distance occurs along these distributional arms, although gene flow restriction is also evident. Where divergent lineages meet in the South Coast Ranges we find rare lineage sympatry, without evidence for nuclear gene flow, and with clear evidence for morphological and ecological divergence. We discuss general insights provided by broken ring species, and how such a model could be explored and extended in other systems and future studies.
README: Ring species dynamics in California mygalomorph spiders (Nemesiidae, Calisoga)
Citation:
Monjaraz-Ruedas, R., Starret, J., Dean Leavitt, D., Hedin, M. 2024. Broken ring speciation in California mygalomorph spiders (Nemesiidae, Calisoga). The American Naturalist.
Authors:
Rodrigo Monjaraz-Ruedas, James Starrett, Dean Leavitt, Marshal Hedin
Overview:
In this manucript we examine broken ring species dynamics in Calisoga spiders, which are distributed around the Central Valley of California. Using nuclear and mitogenomic data we test key predictions of common ancestry, ring-like biogeography, biogeographic timing, population connectivity and terminal overlap. This repository contains input data, output files, alignments, SNP data and scripts used for analyses described in detail in the Methods section of the manuscript.
Contact: For questions or issues related with code, data and analysis:
Rodrigo Monjaraz-Ruedas - monroderik@gmail.com
Folder structure
├── BPP
│ ├── BPP_North
│ └── BPP_South
├── Data
│ ├── SNPS
│ └── UCE_Aligmnets
│ ├── ContactZone-80p
│ ├── UCEs_InGroup-80p
│ ├── UCEs_Outgroup-80p
│ └── mtDNA
├── Extended_Methods.docx
├── IBD
├── Morpho
├── README.md
├── Samples_Metadata.csv
├── SNAPPER
│ └── Runs_Results
├── Scripts
│ └── FUSe
└── Trees
Description
Data
Raw UCE data is deposited under BioProject PRJNA1073270 on the GenBank SRA repository. Directory contain UCE Alignments and SNPs used for analysis across the entire manuscript. For a reference of input files used for every analysis see Table S2 in manuscript Supplementary Material.
Trees
This folder contains multiple tree files resulted from a variety of analyses described in details in the article Supplemnetary Methods and Table S2. Asociated Input data is located in Data/ folder as described below.
- In-Group analyses using as input data, folder
UCEs_InGroup-80p
:-
InGroup-80p_IQTree.tre
- Concatenated IQTree analysis. -
InGroup_80p_Concordance_IQTree.cf.tree
- Concordance factors analysis in IQTree. -
Ingroup_80p_loci_Trees_IQTree.tre
- Loci/Gene trees computed for Astral and Concordance factors analyses.
-
- In-Group analyses using as input data, folder
ContactZone-80p
:-
ContactZone_IQtree_80p.tre
- Concatenated IQTree analysis.
-
- Outgroup analyses using as input data, folder
UCEs_Outgroup-80p
:-
Outgroup_IQTree_Concordance.tree
- Concordance factors analysis in IQTree. -
Outgroup_Astral.tre
- Summary multispecies coalescent analysis using ASTRAL, support in posterior probabilities. -
Outgroup_Astral_qs.tre
- Summary multispecies coalescent analysis using ASTRAL, support in quartet scores. -
Outgroup_loci_Trees.tre
- Loci/Gene trees computed for Astral and Concordance factors analyses.
-
- Outgroup analyses using as input data, folder
mtDNA
:-
Outgroup-mtDNA_IQTree.tre
- Mitogenome concatenated analysis using IQTree.
-
BPP
Folder with input, output and configuration files for replicating analysis of BPP as described in Extended Methods
. Briefly, the data set was subdivided in two sections, North and South and we ran this two datasets independently (BPP_North and BPP_South directories). This two folders contain input files (InGroupN.txt
, InGroupS.txt
), configuration files (casoga.InGpN.ctl
and casoga.InGpS.ctl
) and samples to species map (imapN.txt
, imapS.txt
), output files are labeled *outout_R1.txt
or *outout_R2.txt
for every independent run on each Dataset.
Analyses were run in bpp v4.4.0_linux_x86_64, 126GB RAM, 24 cores
IBD
Data and R scripts used for analysis if insolation by distance. For a detailed description of the methods see Extended Methods
. Briefly, conductance raster created in QGIS v3.16 (Elevation_Clipped.tif
) was used for calculating "corrected" geographic distances. IBD_Master.R
contains all commands used for calculating IBD for genetic clusters and major clades (North and South) using Euclidean geographic distances and Corrected geographic distances. Input Files for this script are a VCF file with all SNPs (InGp_QC_IndMiss_DP_LocMiss.recode.vcf
, located in Data
folder) and a population file (pops.txt
).
Script was run under R version 4.2.2 (2022-10-31) details about packages versions and dependencies are listed in IBD_Master.R
including links to documentation for needed R packages.
Morpho
Data and R script used for MDS, PCA and Morphological Distances analyses. Calisoga_measures.csv
Is the input file needed for running the script morphoPCA.R
, this input file is a subset of the article Suppelementary Material Table S3, the subset is indicated with color codes in Table S3 as follows:
- Red - Samples excluded from analysis
- Light Green - Character Variables used in analysis
- Yellow - Samples from the Contact Zone
- Empty cells in
Calisoga_measures.csv
denote missing data
Script runs under R version 4.2.2 (2022-10-31) details about packages versions and dependencies are listed in morphoPCA.R
including links to documentation for needed R packages.
SNAPPER
Folder with input and output data as well as configuration files for replicating SNAPPER analisis as described in the Extended Methods
. Briefly, input file Calisoga_unlinked_noMissingData.phy
is the input phylip file with 413 unlinked biallelic SNPs (see Table S2), constrains.txt
and species_list.txt
are configuration files required by snapp_prep.rb
available here. Casoga_Dated_snapper.xml
is the configuration file resulted from snapp_prep.rb
needed for Beast analysis. "Runs Results" directory holds all output files (.log and .trees) for each independent run of Beast, file Calisoga_snapper_DATED_TREE.tre
is the final Dated tree used for discussion of the manuscript.
For a more detailed tutorial on how to estimate divergence times from SNPs data, see the divergence_time_estimation_with_snp_data tutorial By Michael Matschiner.
Analyses were run in Beast v2.6.4
Scripts
Scripts used for UCEs and SNPs processing.
-
FUSe
(Filtering UCE Sequences) - This is a custom pipeline written by us in order to automatize and simplify the handling of UCE sequences for their use in phylogenomics, the pipeline can be used for aligning, trimming and filtering sequences, is publicly available at GitHub, for detailed information on usage, installation and examples see README file included within FESe directory and also temporary at Gitfront- FUSe.py - Python script.
- README.md - Documentation for installation and usage of FUSe.py
-
snps_calling.sh
- Bash script for automatizing the process of calling SNPs with UCE data. This scripts creates a reference file from UCE alignments using CIAlign and then map reads using BWA followed by variant calling using bcftools and finally gets biallelic SNPs using vcftools. Input files for this script are a folder of alignments from which a reference will be created and a folder with fastq files which will be mapped to the reference. Output is a VCF file with biallelic SNPs. SeeSupplemental Methods
for details. -
mtdna_byCatch.sh
- Bash script for extracting Mitogenomes from UCE sequences, uses BWA to map reads to the provided mitochondrial reference and then uses Smatools v1.15 (or above) functionconsensus
to get a final consensus sequence of the requested reference. This script requires phyluce 1.7 for aligning and processing sequences. Input files are a Mitochodnrial reference in fasta format and a folder of fastq files. Output is a folder with alignments for every locus on the reference. SeeExtended Methods
for details.
Metadata
- Extended_Methods - File containing detailed information of the methods, analyses and parameters used for testing our hypothesis for common ancestry, ring-like biogeography, biogeographic timing, population connectivity and terminal overlap.
- Samples_Metadata - Samples metadata including voucher collection data and codes, geographic coordinates and elevation.