Data from: Phylogenomics of Amazonian squirrel monkeys (Saimiri: Primates, Cebidae)
Data files
Nov 12, 2025 version files 78.46 MB
-
Eastern.str
4.10 MB
-
macrodon.str
4.36 MB
-
Main_RAXML.nex
33.40 MB
-
README.md
3.29 KB
-
Roman.str
236.63 KB
-
strMain_cleaned.str
26.96 MB
-
TimeTree.nex
4.85 MB
-
U_SNP_75_Coverage_50Samps.nex
2.14 MB
-
U_SNP_75_Coverage_56Samps.nex
1.97 MB
-
ustus.str
441.68 KB
Abstract
Phylogenetic relationships among squirrel monkeys (genus Saimiri) are still poorly resolved. Here, we gathered the first phylogenomic dataset for Saimiri using double-digest restriction-site associated DNA sequencing (ddRadseq) to construct a phylogeny for squirrel monkeys in Amazonia. All the phylogenomic analyses strongly support the division of the genus into two main clades, corresponding to the Gothic and Roman arch groups based on morphology, and they provided strong support for five major lineages. Structure analyses showed evidence for population clusters based on geography within these lineages, but also gene flow/hybridization across clusters. Our time-calibrated tree confirmed that the diversification of Amazonian Saimiri occurred during the Pleistocene.
Dataset DOI: 10.5061/dryad.djh9w0wc0
Files and variables
File: Eastern.str
Description: For all structure files, data was converted from .fasta into structure (.str) format, where the first column indicates the sample name, the second indicates population, and the rest of the columns give genotype using standard STRUCTURE notation. Thus, the number of columns - 2 gives the number of sites represented.
Subset of data from the Main dataset, filtered to only include individuals identified morphologically as part of the Eastern clade (n = 25; 70,099 sites).
File: Roman.str
Description: Subset of data from the Main dataset, filtered to only include individuals identified morphologically as part of the Roman clade (n = 5; 20,512 sites).
File: macrodon.str
Description: Subset of data from the Main dataset, filtered to only include individuals identified morphologically as part of the macrodon clade, N = 15; 113,220 sites.
File: Main_RAXML.nex
Description: Nexus alignment file of all SNPS for the 56 sequences analyzed for the main ML-based analysis (and the basis of figure 2). SNPs were pulled if at a site there was variation across samples at the site not caused from N or? (missing data). Ambiguous data (IUPAC ambiguity characters DO count as variation). Used for RAXML analysis. N = 56; 518,558 sites.
File: TimeTree.nex
Description: Nexus alignment file of SNPs for select Saimiri samples, as well as a Sapajus sample as outgroup. This dataset was filtered to only include sites that had at least 85% of samples represented at a given site. Used for BEAST2 analysis. N = 13 + Outgroup; 297,748 sites.
File: ustus.str
Description: Subset of data from the Main dataset, filtered to only include individuals identified morphologically as part of the ustus clade (n = 6; 32,138 sites).
File: strMain_cleaned.str
Description: strMain_cleaned.str is all snps from Main_RAXML but filtered to not include any ambiguous sites (IUPAC ambiguity symbols).
File: U_SNP_75_Coverage_56Samps.nex
Description: The U SNP file includes only the "best" (highest coverage) SNP per read, making it a more representative dataset for a given genome than all SNPs as in Main_RAXML.nex (which includes every SNP, even if more than one is on a read). This alignment file filtered a larger U SNP file to include only sites that had at least 75% coverage across samples per site.
File: U_SNP_75_Coverage_50Samps.nex
Description: This alignment filtered out 6 samples that had poorer coverage from the previous dataset, and THEN filtered to include sites that had at least 75% coverage across samples per site.
Code/software
UGENE (ugene.net) is a free bioinformatics tool that one may use to view NEXUS files (ending in .nex), available for all major operating systems.
For Structure files (ending in .str), one may view them using a standard table viewer (such as R using the "read.table" command.
Access information
Other publicly accessible locations of the data:
- None
Data was derived from the following sources:
- This does not apply to our datasets.
