Data from: Genomic footprint of cladogenesis revealed through RADseq and Sanger sequencing demonstrates congruent patterns in the velvet worm Peripatopsis sedgwicki species complex (Onychophora: Peripatopsidae)
Data files
Mar 06, 2024 version files 26.22 GB
Abstract
In the present study, first generation DNA sequencing (mitochondrial cytochrome c oxidase subunit one, COI) and reduced-representative genomic RADseq data were used to understand the patterns and processes of diversification of the velvet worm, Peripatopsis sedgwicki species complex across its distribution range in South Africa. For the RADseq data, three datasets (two primary and one supplementary) were generated corresponding to 1259 - 11,468 SNPs, in order to assess the species diversity and phylogeographic of the species complex. Tree topologies for the two primary datasets were inferred using maximum likelihood and Bayesian inferences methods. Phylogenetic analyses using the COI datasets retrieved four distinct, statistically well-supported clades within the species complex. Five species delimitation methods applied to the COI data (ASAP, bPTP, bGMYC, STACEY, and iBPP) all showed support for the distinction of the Fort Fordyce Nature Reserve specimens. In the main P. sedgwicki species complex, the species delimitation methods revealed a variable number of operational taxonomic units and overestimated the number of putative taxa. Divergence time estimates coupled with the geographic exclusivity of species and phylogeographic results suggest recent cladogenesis during the Plio/Pleistocene. The RADseq were subjected to a principal components analysis and a discriminant analysis of principal components, under a maximum-likelihood framework. The latter results corroborate the four main clades observed using the COI data, however, applying additional filtering revealed additional diversity. The high overall congruence observed between the RADseq and COI data suggests that first generation sequence data remain a cheap and effective method for evolutionary studies, although RADseq does provide a far greater resolution of contemporary temporo-spatial patterns.
README: Peripatopsis sedgwicki genomic and mitochondrial sequence data
https://doi.org/10.5061/dryad.z08kprrmw
This dataset consists of genomic and mitochondrial sequence data for the Peripatopsis sedgwicki species complex, a forest-dwelling endemic species of velvet worm distributed from the Tsitsikamma Afrotemperate forests in the southern Cape of South Africa to Gqeberha and Makanda in the Eastern Cape, into the Baviaanskloof. This dataset consists of two files, namely genomic restriction enzyme-associated sequence (RADseq) data (output.fastq.gz) in compressed Illumina sequence data format, as well as mitochondrial cytochrome c oxidase subunit 1 (COI) data (Psedgwicki_COI_newout.nex) in Nexus format. The RADseq dataset was generated using the restriction enzyme SbfI, and based on 93 samples. The COI dataset was constructed using 195 samples, and are 648 base pairs in length.
Description of the data and file structure
The raw RADseq reads can be cleaned, processed and filtered using Stacks v2.62, with output files including sequence data in variant call format (for principal components analysis (PCA) and discriminant analysis of principal components (DAPC)), PLINK .bed .bim and .fam data files (for ADMIXTURE (version 1.3.0) analysis), PHYLIP sequence data (for phylogenetic reconstruction in a program such as BEAST2, as well as Bayesian Phylogenetics and Phylogeography (BPP) and Bayes factor delimitation (BFD) analyses).
The COI mtDNA can be processed and analysed using standard protocols. In this study, the IQ-Tree web server was used to select for the optimal DNA substitution model and the best fit likelihood score which was chosen using the Akaike information criterion (AIC) for maximum likelihood (ML) phylogenetic reconstruction. PAUP* was used to compute the uncorrected sequence divergence. For the BI analyses, the AIC was used to select the optimal DNA substitution model in jModelTest2 on XSEDE through CIPRES. The ML tree inference was performed using the IQ-Tree web server. Bayesian analyses were conducted on the CIPRES Science Gateway. Divergence time estimations were conducted using a Bayesian framework using BEAST2. A haplotype network was constructed using TCS (version 1.21). Population genetic structure was estimated on ARLEQUIN (ver. 3.5.1.2). Species delimitation models used in this study using this dataset included the assemble species by automatic partitioning model (ASAP), a Bayesian implementation of the Poison tree process (bPTP), a Bayesian implementation of the GMYC model using the R package bGMYC, and a multilocus coalescent model using STACEY (ver. 1.2.1).