Data from: Ring species dynamics in California mygalomorph spiders (Nemesiidae, Calisoga)

Monjaraz-Ruedas, Rodrigo 1 ; Starrett, James2 ; Leavitt, Dean3 ; Hedin, Marshal1

Published Mar 12, 2024 on Dryad. https://doi.org/10.5061/dryad.70rxwdc47

Data files

Mar 12, 2024 version files 40.61 MB

Calisoga_ring_species_data.zip

40.61 MB
README.md

8.83 KB

Abstract

Idealized ring species, with approximately continuous gene flow around a geographic barrier but singular reproductive isolation at a ring terminus, are rare in nature. A broken ring species model preserves the geographic setting and fundamental features of an idealized model but accommodates varying degrees of gene flow restriction over complex landscapes through evolutionary time. Here we examine broken ring species dynamics in Calisoga spiders, which like classic Ensatina salamanders, are distributed around the Central Valley of California. Using nuclear and mitogenomic data we test key predictions of common ancestry, ring-like biogeography, biogeographic timing, population connectivity and terminal overlap. We show that a ring complex of populations shares a single common ancestor, and from an ancestral area in the Sierra Nevada mountains, two distributional and phylogenomic arms encircle the Central Valley. Isolation by distance occurs along these distributional arms, although gene flow restriction is also evident. Where divergent lineages meet in the South Coast Ranges we find rare lineage sympatry, without evidence for nuclear gene flow, and with clear evidence for morphological and ecological divergence. We discuss general insights provided by broken ring species, and how such a model could be explored and extended in other systems and future studies.

Citation:
Monjaraz-Ruedas, R., Starret, J., Dean Leavitt, D., Hedin, M. 2024. Broken ring speciation in California mygalomorph spiders (Nemesiidae, Calisoga). The American Naturalist.

Authors:
Rodrigo Monjaraz-Ruedas, James Starrett, Dean Leavitt, Marshal Hedin

Overview:
In this manucript we examine broken ring species dynamics in Calisoga spiders, which are distributed around the Central Valley of California. Using nuclear and mitogenomic data we test key predictions of common ancestry, ring-like biogeography, biogeographic timing, population connectivity and terminal overlap. This repository contains input data, output files, alignments, SNP data and scripts used for analyses described in detail in the Methods section of the manuscript.

Contact: For questions or issues related with code, data and analysis:
Rodrigo Monjaraz-Ruedas - monroderik@gmail.com

Folder structure

├── BPP
│ ├── BPP_North
│ └── BPP_South
├── Data
│ ├── SNPS
│ └── UCE_Aligmnets
│ ├── ContactZone-80p
│ ├── UCEs_InGroup-80p
│ ├── UCEs_Outgroup-80p
│ └── mtDNA
├── Extended_Methods.docx
├── IBD
├── Morpho
├── README.md
├── Samples_Metadata.csv
├── SNAPPER
│ └── Runs_Results
├── Scripts
│ └── FUSe
└── Trees

Description

Data

Raw UCE data is deposited under BioProject PRJNA1073270 on the GenBank SRA repository. Directory contain UCE Alignments and SNPs used for analysis across the entire manuscript. For a reference of input files used for every analysis see Table S2 in manuscript Supplementary Material.

Trees

This folder contains multiple tree files resulted from a variety of analyses described in details in the article Supplemnetary Methods and Table S2. Asociated Input data is located in Data/ folder as described below.

In-Group analyses using as input data, folder UCEs_InGroup-80p:
- InGroup-80p_IQTree.tre - Concatenated IQTree analysis.
- InGroup_80p_Concordance_IQTree.cf.tree - Concordance factors analysis in IQTree.
- Ingroup_80p_loci_Trees_IQTree.tre - Loci/Gene trees computed for Astral and Concordance factors analyses.
In-Group analyses using as input data, folder ContactZone-80p:
- ContactZone_IQtree_80p.tre - Concatenated IQTree analysis.
Outgroup analyses using as input data, folder UCEs_Outgroup-80p:
- Outgroup_IQTree_Concordance.tree - Concordance factors analysis in IQTree.
- Outgroup_Astral.tre - Summary multispecies coalescent analysis using ASTRAL, support in posterior probabilities.
- Outgroup_Astral_qs.tre - Summary multispecies coalescent analysis using ASTRAL, support in quartet scores.
- Outgroup_loci_Trees.tre - Loci/Gene trees computed for Astral and Concordance factors analyses.
Outgroup analyses using as input data, folder mtDNA:
- Outgroup-mtDNA_IQTree.tre - Mitogenome concatenated analysis using IQTree.

BPP

Folder with input, output and configuration files for replicating analysis of BPP as described in Extended Methods. Briefly, the data set was subdivided in two sections, North and South and we ran this two datasets independently (BPP_North and BPP_South directories). This two folders contain input files (InGroupN.txt, InGroupS.txt), configuration files (casoga.InGpN.ctl and casoga.InGpS.ctl) and samples to species map (imapN.txt, imapS.txt), output files are labeled *outout_R1.txt or *outout_R2.txt for every independent run on each Dataset.

Analyses were run in bpp v4.4.0_linux_x86_64, 126GB RAM, 24 cores

IBD

Data and R scripts used for analysis if insolation by distance. For a detailed description of the methods see Extended Methods. Briefly, conductance raster created in QGIS v3.16 (Elevation_Clipped.tif) was used for calculating "corrected" geographic distances. IBD_Master.R contains all commands used for calculating IBD for genetic clusters and major clades (North and South) using Euclidean geographic distances and Corrected geographic distances. Input Files for this script are a VCF file with all SNPs (InGp_QC_IndMiss_DP_LocMiss.recode.vcf, located in Data folder) and a population file (pops.txt).

Script was run under R version 4.2.2 (2022-10-31) details about packages versions and dependencies are listed in IBD_Master.R including links to documentation for needed R packages.

Morpho

Data and R script used for MDS, PCA and Morphological Distances analyses. Calisoga_measures.csv Is the input file needed for running the script morphoPCA.R, this input file is a subset of the article Suppelementary Material Table S3, the subset is indicated with color codes in Table S3 as follows:

Red - Samples excluded from analysis
Light Green - Character Variables used in analysis
Yellow - Samples from the Contact Zone
Empty cells in Calisoga_measures.csv denote missing data

Script runs under R version 4.2.2 (2022-10-31) details about packages versions and dependencies are listed in morphoPCA.R including links to documentation for needed R packages.

SNAPPER

Folder with input and output data as well as configuration files for replicating SNAPPER analisis as described in the Extended Methods. Briefly, input file Calisoga_unlinked_noMissingData.phy is the input phylip file with 413 unlinked biallelic SNPs (see Table S2), constrains.txt and species_list.txt are configuration files required by snapp_prep.rb available here. Casoga_Dated_snapper.xml is the configuration file resulted from snapp_prep.rb needed for Beast analysis. "Runs Results" directory holds all output files (.log and .trees) for each independent run of Beast, file Calisoga_snapper_DATED_TREE.tre is the final Dated tree used for discussion of the manuscript.

For a more detailed tutorial on how to estimate divergence times from SNPs data, see the divergence_time_estimation_with_snp_data tutorial By Michael Matschiner.

Analyses were run in Beast v2.6.4

Scripts

Scripts used for UCEs and SNPs processing.

FUSe (Filtering UCE Sequences) - This is a custom pipeline written by us in order to automatize and simplify the handling of UCE sequences for their use in phylogenomics, the pipeline can be used for aligning, trimming and filtering sequences, is publicly available at GitHub, for detailed information on usage, installation and examples see README file included within FESe directory and also temporary at Gitfront
- FUSe.py - Python script.
- README.md - Documentation for installation and usage of FUSe.py
snps_calling.sh - Bash script for automatizing the process of calling SNPs with UCE data. This scripts creates a reference file from UCE alignments using CIAlign and then map reads using BWA followed by variant calling using bcftools and finally gets biallelic SNPs using vcftools. Input files for this script are a folder of alignments from which a reference will be created and a folder with fastq files which will be mapped to the reference. Output is a VCF file with biallelic SNPs. See Supplemental Methods for details.
mtdna_byCatch.sh - Bash script for extracting Mitogenomes from UCE sequences, uses BWA to map reads to the provided mitochondrial reference and then uses Smatools v1.15 (or above) function consensus to get a final consensus sequence of the requested reference. This script requires phyluce 1.7 for aligning and processing sequences. Input files are a Mitochodnrial reference in fasta format and a folder of fastq files. Output is a folder with alignments for every locus on the reference. See Extended Methods for details.

Metadata

Extended_Methods - File containing detailed information of the methods, analyses and parameters used for testing our hypothesis for common ancestry, ring-like biogeography, biogeographic timing, population connectivity and terminal overlap.
Samples_Metadata - Samples metadata including voucher collection data and codes, geographic coordinates and elevation.