Data from: Genomic insights from natural history collections reveal cryptic speciation in coral-guard crabs (family: Trapeziidae)

Pollard, Kenzie 1 ; Leiva, Carlos2 ; Rouze, Heloise2 ; Lemer, Sarah3

Research facility: University of Guam

Published Jan 16, 2026 on Dryad. https://doi.org/10.5061/dryad.x0k6djhtr

Data files

Jan 16, 2026 version files 62.78 GB

populations_allsp.vcf

46.46 MB
populations_biden-neutral.vcf

13.43 MB
populations_biden-selection.vcf

1.86 MB
README.md

3.70 KB
trapezia_metadata.xlsx

13.22 KB
trapezia_raw.zip

62.72 GB

Abstract

Mutualistic relationships such as the one between Trapezia crabs and coral colonies are common in reef organisms and play a crucial role in coral resilience and resistance to climate-induced stressors, yet very little is known about the taxonomic diversity and evolutionary history of the species involved. Despite being essential actors of coral reefs and threatened by the ongoing degradation of their habitat, little genetic information is available for Trapezia crabs, including the exact number of species and their relationships. To overcome this limitation, we sampled Natural History collections, an important and underutilized source of genomic data. We used a novel approach optimized for degraded DNA, to generate high-quality genomic data from a combination of 166 museum tissues and freshly collected samples and recovered a strongly supported phylogeny of the Trapezia genus, clarifying species relationships of a majority of taxa and suggesting the potential division of Trapezia into two genera. We then focused on the most widespread species T. bidentata and identified four distinct genetic clusters suggesting high divergence and cryptic speciation in the Indian Ocean and the Marquesas Islands. Populations of the Central and West Pacific, showed signs of admixture across a heterogeneous seascape, attributing to a potentially long pelagic dispersal phase and expansive gene pool. Our results highlight the need to further explore the genetic diversity within other Trapezia species and other coral-associated organisms, as they are likely to exhibit more complex genetic patterns than previously understood.

https://doi.org/10.5061/dryad.x0k6djhtr

Data description

SNP Matrices - created in STACKS v2.60 (Catchen et al. 2011, 2013)

Short reads were demultiplexed, adaptors were removed, and sequences were trimmed to 100 bp using process_shortreads. To optimize STACKS parameters, five random samplings of three individuals each were used to trial M = 1-9, as in Jeffries et al. 2016. Once M was set, all samples were included to trial n = 3-5, allowing only for SNPs present in 80% of samples (Paris et al. 2017). The remaining parameters were kept at default settings (m = 3, r = 0.5, and min-maf = 0.05). This assessment was repeated for both the phylogeny and phylogeography datasets.

populations_allsp.vcf:

SNP matrix of all samples included in phylogenetic analyses (species inc. Trapezia lutea, T. guttata, T. serenei, T. rufopunctata, T. flavopunctata, T. punctimanus, T. speciosa, T. digitalis, T. formosa, T. cymodoce, *T. tigrina *and T. bidentata). Loci assembly parameters were set to m=3, M=4, and n=5 to retain loci that were present in 50% of individuals (r = 0.5), with an average coverage of 12X. The minimum minor allele frequency was set to 5% and putative SNPs under selection (14 loci and 107 SNP) were removed with BayeScan v.2.1 (Foll and Gaggiotti 2008) using default parameters (5000 iterations, 20 pilot runs). A total of 164 samples represented by 3,386 loci and 11,478 SNPs were retained after filtering for this dataset.

populations_biden-neutral.vcf:

SNP matrix of all neutral SNPs within samples of* Trapezia bidentata. *Loci assembly parameters were set to m=3, M=4, and n=5 to retain loci that were present in 50% of individuals (r = 0.5), with an average coverage of 16X. Putative loci under selection (602 loci and 1027 SNPs total) were identified with BayeScan v.2.1 and removed. A single SNP per locus was selected in STACKS using a maximum-likelihood model to minimize the effect of linkage disequilibrium. After filtering, a total of 119 samples represented by 4,484 loci and 4,481 SNPs were retained for this dataset.

populations_biden-selection.vcf:

SNP matrix comprised of putative SNPs under selection filtered out of the neutral* Trapezia bidentata* SNP matrix above. 119 samples represented by 602 loci and 1027 SNPs.

Raw Data

trapezia_raw.zip:

Zip file contains raw reads as received from the UC Davis Genome Center after sequencing. The folder contains compressed fastq files of raw reads from GRAS-Di (GRAS-Di®, Enoki & Takeuchi 2019) sequencing comprised of 180 samples from 12 Trapezia species (specified above). Total of 360 files (R1 and R2 for each sample). Naming scheme for sample files can be found in trapezia_metadata.xlsx.

Metadata

trapezia_metadata.xlsx:

Table including sample name (as found in sequencing data), species identity, and broad locality.

Scripts (Zenodo - Software)

trapezia_scripts.zip

Scripts for analyses included in the publication including data filtering, PCA, phylogeny construction, general population genetic statistics, SNP yields (inc. data used to construct plots - SNP_filtering*_*analysis.csv), and STRUCTURE (inc. necessary files for analysis - extraparams, mainparams, structure.exe, seed.txt, and the script - command_Trapezia.sh).

Code/Software

Scripts for this work can be found under this submission (trapezia_scripts.zip) and are also available at: https://github.com/kenziepollard/trapezia