Data from: Recent range expansion and genomic admixture in a kleptoparasitic spider, Argyrodes lanyuensis: A case of adaptive introgression on isolated small island of the Taiwan-Philippine transition zone?
Data files
Nov 18, 2024 version files 3.64 GB
-
AccessionReport.tsv
208 B
-
COI_alignment.fasta
7.92 KB
-
COI_consensus_tree.txt
9.44 KB
-
COI.trees.txt
43.55 MB
-
DIYABC_Lan2_snps_filtered.snp
688.48 KB
-
flatfile.txt
9.78 KB
-
Lan1_snps.vcf
114.05 MB
-
Lan2_snps.vcf
15.91 MB
-
Lan3_snps.vcf
216.05 MB
-
RAD_seq_Coalescent_rep1.trees.txt
674.56 MB
-
RAD_seq_Coalescent_rep2.trees.txt
674.54 MB
-
RAD_seq_Coalescent_rep3.trees.txt
674.44 MB
-
RAD_seq_Yule_rep1.trees.txt
286.62 MB
-
RAD_seq_Yule_rep2.trees.txt
246.01 MB
-
RAD_seq_Yule_rep3.trees.txt
673.58 MB
-
RASP_result.txt
16.16 KB
-
README.md
2.70 KB
-
SNAPP_fossil_lower_limit.trees
5.77 MB
-
SNAPP_fossil_upper_limit.trees
5.76 MB
-
SNAPP_taiwan_age_combined.trees
6.21 MB
Abstract
Adaptive introgression involves the acquisition of advantageous genetic variants through hybridization, which subsequently are favored by natural selection due to their association with beneficial traits. These post-introgression adaptive alleles inherited from related species may allow the hybrid lineage to adapt to new environmental changes or exploit novel ecological niches. Here, we analyzed speciation patterns of the kleptoparasitic spider Argyrodes lanyuensis through genomic analyses and tested for genetic evidence of adaptive introgression at the Taiwan-Philippines transition zone. Using highly polymorphic SNPs, our study demonstrated that speciation occurred when the Hualien (on Taiwan Island) and Philippine (including the Orchid Island) lineages separated during the early to mid-Pleistocene. The best colonization model suggested by Approximate Bayesian Computation and Random Forests supported an inference of a bottleneck during speciation, an interpretation reinforced by observation of lower FST values and reduced genetic diversity of the Philippine+Orchid Island lineage. We found the highest support for the occurrence of introgression on the youngest island (Green Island) of the Taiwan-Philippines transition zone based on ABBA-BABA test. Additionally, we have identified a putative adaptive locus under balancing selection on Green Island, suggesting evolution by adaptive introgression in a newly-formed niche (or novel geographical context). Our study highlights a possible rare case of introgression at the Taiwan-Philippines transition zone under balancing selection, that could be an evolutionary response to a unique climatological zone lying between the tropical climate of the Philippines and the subtropical climate of Hualien, Taiwan.
COI Dataset:
"COI_alignment.fasta", an alignment fasta includes ingroup species, *Argyrodes lanyuensis, *from two Hualien samples and two Green Island samples extracted from RAD markers, four previously published mt-COI haplotype sequences of Responte et al., (2021) available in Genbank (MN881069.1, MN881070.1, MN881071.1, MN881072.1), and one outgroup species, *Argyrodes rainbowi *(MW549752.1)
"COI.trees" and "COI_consensus_tree" are the Phylogenetic trees and the consensus tree performed by BEAST under the coalescent model. We set the uncorrelated lognormal relaxed clock and the mitochondrial substitution rate estimates in spiders as priors to calibrate divergence time.
"flatfile.txt" and "AccessionReport.tsv" are received from the NCBI submission system, containing Argyrodes lanyuensis haplotypes that we collected from Green Island and Hualien in this artical.
RAD-seq Dataset:
"Lan1.vcf" and "Lan2.vcf" contain a total of 130 individual Argyrodes lanyuensis from three separate landmasses of Taiwan, and 14 localities (6 islands) from the Philippines. The RAD-seq assembly exported loci that are shared in at least 50% of the total samples. We kept a total of 23,989 loci for the 50% coverage in the Lan1 dataset. This configuration is composed of all SNPs per locus, which produced 59,080 SNPs. For the Lan2 dataset, we assembled a more conservative matrix by retaining only one unlinked SNP per locus, which produced 9,187 SNPs. The "Lan3.vcf" dataset assembled all SNPs for Argyrodes lanyuensis samples and the outgroup, which produced a total of 69,411 SNPs.
"RAD_seq_Coalescent_rep.trees" and "RAD_seq_Yule_rep.trees" are the three repeat trees of each model performed by the BEAST, Coalescent, and Yule processes.
"SNAPP_fossil_lower_limit.trees", "SNAPP_fossil_upper_limit.trees" and "SNAPP_taiwan_age_combined.trees", are the three divergence time calibration schemes tested by SNAPP using the multispecies coalescent model.
"RASP_result" is the standard output from Reconstruct Ancestral State in Phylogenies.
"DIYABC_Group1_scenarios_headerRF.txt" and "DIYABC_Group2_scenarios_headerRF.txt", are the two assigned population scenarios for Approximate Bayesian Computation (ABC) with supervised machine learning method, Random Forest (DIYABC).
"DIYABC_Lan2_snp_filtered.snp" is the filtered SNPs dataset to fit the input qualify by the DIYABC, at least one genotyped individual per population for each locus. In total of 2,616 SNPs were retained.
"CLUMPAK_STRUCTURE_summary.pdf" is the summary of K cluster by CLUMPAK website using the Delta K approach.
mt-COI dataset:
We extract the mt-COI gene sequences of 2 Hualien samples and 2 Green Island samples from our RAD markers by minimap2 (Heng Li, 2018). We aligned these sequences with four previously published mt-COI haplotype sequences of Responte et al., (2021) available in Genbank (MN881069.1, MN881070.1, MN881071.1, MN881072.1) and one outgroup species Argyrodes rainbowi (MW549752.1). In total, we aligned eight mt-COI sequences from the ingroup and one additional outgroup species using mafft (Katoh, K., & Standley, D. M., 2013). We implemented the coalescent tree model (Kingman, J. F. C., 1982) and the uncorrelated lognormal relaxed clock model (Drummond et al., 2006) as priors in BEAST. A ucld.mean of 0.0112 site-1, derived from the mitochondrial substitution rate estimates in spiders (Bidegaray-Batista & Arnedo, 2011; Kuntner et al., 2013), and used a standard deviation of ucld.stdv = 0.01. We ran the MCMC chain for 5 x 108 generations with tree sampling frequency per 1 x 104 generations in three replicates. Burn-in was determined using Tracer v.1.7.1 (Rambaut et al., 2018), in which the first 10 percent of the trees were discarded (Rambaut et al., 2018). The maximum clade credibility (MCC) tree was summarized in TreeAnnotator (Drummond et al., 2012) and visualized using FigTree (Rambaut, 2014).
RAD-seq dataset:
We used the Multiplex Shotgun Genotyping (MSG; Andolfatto et al., 2011) with modifications from our in-house protocol to prepare the high-throughput sequencing libraries. After size selection using Pippin Prep with a 1.5% agarose gel cassette, the size selected fragments (300-450 bp) were amplified using the Phusion High Fidelity PCR kit (NEB, USA). Then, we purified the DNA libraries using AMPure XP magnetic beads (Beckman, USA) following the manufacturer’s instructions. We have multiplexed 130 samples and were sequenced using the Illumina Nova-seq 6000 in a sequencing facility (Genomics Co. Ltd., Taiwan). We demultiplexed Illumina raw reads using the process_radtags program from stacks manual version 2.4 (Catchen et al., 2013) to clean and trim the sequences. The combined forward and reverse reads from PEAR version 0.9.6 (Zhang et al., 2014) were aligned with the A. miniaceus reference genome using the Burrows-Wheeler Alignment (BWA) tool (Li & Durbin, 2009).
