Data from: Adaptive radiation of the Callicarpa genus in the Bonin Islands revealed through double-digest restriction site–associated DNA sequencing analysis
Data files
Aug 26, 2024 version files 164.78 MB
Abstract
The Bonin Islands, comprised of the Mukojima, Chichijima, and Hahajima Islands, are known for their isolated and distinctive habitats, hosting a diverse array of endemic flora and fauna. In these islands, adaptive radiation has played a remarkable role in speciation, particularly evident in the Callicarpa genus that is represented by three species: Callicarpa parvifolia and C. glabra exclusive to the Chichijima Islands, and Callicarpa subpubescens, distributed across the entire Bonin Islands. Notably, C. subpubescens exhibits multiple ecotypes, differing in leaf hair density, flowering time, and tree size. In this study, we aimed to investigate species and ecotype diversification patterns, estimate divergence times, and explore cryptic species within Callicarpa in the Bonin Islands using phenotypic and genetic data (double-digest restriction site–associated DNA sequencing). Genetic analysis revealed that C. parvifolia and C. glabra both formed single, distinct genetic groups. Conversely, C. subpubescens consisted of six genetic groups corresponding to different ecotypes and regions, and a hybrid group resulting from the hybridization between two of these genetic groups. Population demography analysis focusing on six Chichijima and Hahajima Islands–based species/ecotypes indicated that all species and ecotypes except one ecotype diverged simultaneously around 73–77 kya. The star-shaped neighbor-net tree also suggests the simultaneous divergence of species and ecotypes. The species and ecotypes that simultaneously diverged adapted to dry environments and understory forests, suggesting that aridification may have contributed to this process of adaptive radiation. Moreover, leaf morphology, flowering time, and genetic analyses suggested the presence of two cryptic species and one hybrid species within C. subpubescens.
README: Data from: Adaptive radiation of the Callicarpa genus in the Bonin Islands revealed through double-digest restriction site–associated DNA sequencing analysis
https://doi.org/10.5061/dryad.05qfttfc1
This study investigated species and ecotype diversification patterns, estimated divergence times, and explored cryptic species within Callicarpa in the Bonin Islands using phenotypic and dagenetic data, revealing simultaneous divergence driven by aridification and identifying two cryptic species and one hybrid species within C. subpubescens. This data includes SNPs of 96 individuals characterized by ddRAD-Seq.
Description of the data and file structure
MS Excel sheet Individual_information.xlsx contains "Individual ID: Column A", "Population ID: Column B", and "Species/ecotype: Column C" of 96 individuals used in this study. The abbreviations for the species and ecotypes are as follows:
P: C. parvifolia;
G: C. glabra;
S: ecotype of C. subpubescens in the Chichijima Islands;
SG: glabrescent ecotype of C. subpubescens in the Hahajima Islands;
ST: tall ecotype of C. subpubescens in the Hahajima Islands;
SD: dwarf ecotype of C. subpubescens in the Hahajima Islands;
SH: hybrid ecotype of C. subpubescens in the Hahajima Islands;
STm: ecotype of C. subpubescens similar to ecotype ST in the Mukojima Islands;
Sm: C. subpubescens similar to ecotype S in the Mukojima Islands.
Denovo_ADMIXTURE.vcf is the SNP dataset used in the ADMIXTURE analysis with the denovo dataset. The VCF file contains SNP information for each individual and each locus, with individual IDs listed in the Individual_information.xlsx
Referenced_SplitsTree_with_SH.vcf and Referenced_SplitsTree_without_SH.vcf are SNP datasets used in SplitsTree, based on the referenced dataset. The former includes the SH ecotype, while the latter excludes it. The VCF file contains SNP information for each individual and each locus, with individual IDs listed in the Individual_information.xlsx
Referenced_30_RAxML-ng_without_SH.vcf, Referenced_50_RAxML-ng_without_SH.vcf, and Referenced_80_RAxML-ng_without_SH.vcf are SNP datasets used in RaxML-ng analysis, each filtered based on the referenced dataset with genotyping rates of 30%, 50%, and 80%, respectively. The VCF file contains SNP information for each individual and each locus, with individual IDs listed in the Individual_information.xlsx
Demography_model_a.vcf, Demography_model_b.vcf, Demography_model_c.vcf, Demography_model_d.vcf, Demography_model_e.vcf, Demography_model_f.vcf, and Demography_model_g.vcf contain SNP datasets used in demographic analyses for models a, b, c, d, e, f, and g, respectively, with fastsimcoal2. The VCF file contains SNP information for each individual and each locus, with individual IDs listed in the Individual_information.xlsx
Usage Notes
Excel can be used to access the .xlsx file. All .vcf files can be viewed using TASSEL or a standard text editor.
Sharing/access Information
Links to publications that cite or use the data:
Setsuko S, Narita S, Tamaki I, Sugai K, Nagano AJ, Ihara-Udino T, Kato H, Isagi Y (2024) Adaptive radiation of the Callicarpa genus in the Bonin Islands revealed through double-digest restriction site–associated DNA sequencing analysis. Ecology and Evolution
Links to other publicly accessible locations of the data: No
Links/relationships to ancillary data sets: No
Was data derived from another source? No
If yes, list source(s): NA
Recommended citation for this dataset:
Setsuko S, Narita S, Tamaki I, Sugai K, Nagano AJ, Ihara-Udino T, Kato H, Isagi Y (2024) Adaptive radiation of the Callicarpa genus in the Bonin Islands revealed through double-digest restriction site–associated DNA sequencing analysis. Dryad Digital Repository.
Methods
Study species and sampling
Callicarpa parvifolia grows in sunny dry dwarf scrub on rocky ground in the Chichijima Islands, whereas C. glabra grows in the understory of dry scrub in the Chichijima Islands. In contrast, C. subpubescens is not listed as a threatened species and is widely distributed in the Bonin and Volcano Islands, situated approximately 150 km southwest of the Bonin Islands. Callicarpa subpubescens exhibits different ecotypes, each with distinct habitats and some with different flowering peaks. For example, the Chichijima Islands’ ecotype (S) inhabits the forest edge of mesic forests, with peak flowering in June. The Hahajima Islands have four ecotypes: the glabrescent ecotype (SG), the tall ecotype (ST), the dwarf ecotype (SD), and the hybrid ecotype (SH). The ecotype SG inhabits the understory of mesic forests. The ecotype ST forms the canopy of tall mesic forests. The ecotype SD forms the canopy of dry scrub. The ecotype SH forms the canopy of mesic scrub (cloud forests) or inhabits the forest edge of mesic forests. The Mukojima Islands have two ecotypes (STm and Sm), which are genetically close to the ecotype ST in the Hahajima Islands and the ecotype S in the Chichijima Islands, respectively.
To cover species and ecotypes of each island in the Bonin Islands, leaf samples were collected from 94 individuals across 14 populations for DNA extraction (Table 1; Fig. S1). These samples included two populations (ecotypes STm and Sm) in the Mukojima Islands, six populations from the Chichijima Islands, representing three species and ecotypes (P, G, S) from two islands, Anijima and Chichijima Islands. Additionally, six populations from the Hahajima comprised two ecotypes (SG and SD) from the two islands, Hahajima and Imoutojima Islands, and ecotypes ST and SH from Hahajima Island. As outgroups, one individual each of Callicarpa japonica and Callicarpa mollis, both of which grow in Kyoto Prefecture, mainland Japan, was also collected (Fig. S1). Silica gel was used to immediately dry leaf samples used for DNA extraction.
ddRAD genotyping and SNP filtering
DNA was extracted using a modified CTAB method (Milligan, 1992). The DNA samples were quantified using a Qubit 2.0 Fluorometer (Invitrogen, MA, USA) and adjusted to 12.6 ng/μl through dilution with TE buffer. Sequencing libraries were prepared following a modified version of Peterson’s protocol for ddRAD-seq (Peterson et al., 2012). For detailed library preparation methods, refer to appendix 1. The libraries were sequenced using a HiSeq2000 platform (Illumina, CA, USA) with 51 bp single-end reads at BGI Japan (Kobe, Japan).
To ensure appropriate data resolution and accuracy for each specific analysis, three datasets were created: denovo, referenced, and demography datasets. SNPs were detected using dDocent (Puritz et al., 2014) and Stacks version 2.60 (Catchen et al., 2013; Catchen et al., 2011). The detection conditions and number of SNPs used in each analysis are summarized in Table S1. In all data sets, we excluded five individuals with low individual-level genotyping rates from SNP detection. In the referenced and denovo datasets, SNPs were detected using dDocent, following its tutorial. When creating the denovo dataset, the reference genome of C. subpubescens was not used as a reference sequence and the two outgroup individuals were not included. This dataset is optimal for detecting population genetic structure without reference bias. In contrast, when creating the referenced dataset, the reference genome and two outgroup individuals were used. This dataset provides more accurate SNP calling for phylogenetic analysis. Total raw SNPs generated via dDocent were filtered using vcftools -0.1.14 to meet the conditions outlined in Table S1.
For the demography dataset, SNPs were re-extracted from the .bam files created for the referenced dataset using dDocent. First, gstacks from Stacks was used to generate catalogs of variable sites (Rochette et al., 2019). Subsequently, populations from Stacks were employed to extract SNPs with the following options: -r 0.8 -p X --min-mac 1 --max-obs-het 0.5 --vcf (where X represents the number of species/ecotypes in each dataset). The pairwise two-dimensional minor allele site frequency spectrum (2D-mSFS) was calculated from the .vcf file using the R script 2D-msfs-R (https://github.com/garageit46/2D-msfs-R). Missing data were addressed through bootstrapping within the same ecotype. This dataset includes non-variable sites and low-frequency SNPs, making it suitable for inferring evolutionary processes and population history.