Arthropod mtDNA paraphyly: a case study of introgressive origin
Data files
Nov 25, 2024 version files 13.30 MB
-
a_PCA_uHe_Fst.zip
930.49 KB
-
b_Structure.zip
931.40 KB
-
c_Pi.zip
7.41 MB
-
d_SNAPP.zip
96.33 KB
-
e_Fastsimcoal.zip
3.29 MB
-
f_Centrifuge.zip
632.82 KB
-
README.md
11.62 KB
Abstract
Mitochondrial paraphyly between arthropod species is not uncommon, and has been speculated to largely be the result of incomplete lineage sorting (ILS) of ancestral variation within the common ancestor of both species, with hybridisation playing only a minor role. However, in the absence of comparable nuclear genetic data, the relative roles of ILS and hybridisation in explaining mitochondrial DNA (mtDNA) paraphyly remain unclear. Hybridisation itself is a multifaceted gateway to paraphyly, which may lead to paraphyly across both the nuclear and mitochondrial genomes, or paraphyly that is largely restricted to the mitochondrial genome. These different outcomes will depend upon the frequency of hybridisation, its demographic context, and the extent to which mtDNA is subject to direct selection, indirect selection, or neutral processes. Here we describe extensive mtDNA paraphyly between two species of iron-clad beetle (Zopheridae) and evaluate competing explanations for its origin. We first test between hypotheses of ILS and hybridisation, revealing strong nuclear genetic differentiation between species, but with the complete replacement of Tarphius simplex mtDNA through the introgression of at least five mtDNA haplotypes from T. canariensis. We then contrast explanations of direct selection, indirect selection, or genetic drift for observed patterns of mtDNA introgression. Our results highlight how introgression can lead to complex patterns of mtDNA paraphyly across arthropod species, while simultaneously revealing the challenges for understanding the selective or neutral drivers that underpin such patterns.
README: Arthropod mtDNA paraphyly: a case study of introgressive origin
https://doi.org/10.5061/dryad.866t1g219
Description of the data and file structure
Specimens of Tarphius canariensis and T. simplex were sampled from 19 sites along the dorsal ridge of the Anaga peninsula (Figure 1), yielding a total of 108 T. canariensis and 81 T. simplex, with individuals of both species sampled together at 16 sites (Table S1). See Supplementary Methods S1 for further details on sampling.
Genomic DNA was extracted from each individual using the Biosprint DNA Blood Kit (Qiagen) on a Thermo KingFisher Flex automated extraction instrument. The barcode region of the mitochondrial DNA (mtDNA) cytochrome c oxidase subunit I (COI) was amplified using the primers Fol-degen-for and Fol-degen-rev (Yu et al., 2012). PCR conditions are described in Table S2. PCR products were purified with enzymes ExoI and rSAP (New England Biolabs, Ipswich, MA, USA), Sanger sequenced (Macrogen, Madrid, Spain), edited with geneious prime 2021.1.1 and aligned using mafft (FFT-NS-i method; Katoh & Standley, 2013).
A double-digestion restriction-site associated DNA sequencing (ddRADseq) protocol, as described by Salces-Castellano et al. (2020), was applied. In brief, individual DNA extracts were digested with the restriction enzymes MseI and EcoRI (New England Biolabs), genomic libraries were pooled at equimolar ratios, size selected for fragments between 200-300 base pairs (bp), and then paired-end sequenced (150 bp) on an Illumina NovaSeq6000 (Novogene, Cambridge, UK).
Files and variables
This README file was generated on 2024-11-18 by Víctor Noguerales.
GENERAL INFORMATION
1. Title of Dataset: Arthropod mtDNA paraphyly: a case study of introgressive origin
2. Author Information
Authors and institution for correspondence:
Instituto de Productos Naturales y Agrobiología (IPNA-CSIC), San Cristóbal de La Laguna, Canary Islands, Spain
Víctor Noguerales, email: victor.noguerales@csic.es, https://orcid.org/0000-0003-3185-778X
Brent C. Emerson, email: bemerson@ipna.csic.es, https://orcid.org/0000-0003-4067-9858
3. Date of data collection (single date, range, approximate date): 2016-2021
4. Geographic location of data collection: Tenerife, Canary Islands, Spain
5. Information about funding sources that supported the collection of the data: This work was supported by project CGL2017‐85718‐P financed by MCIN/AEI/10.13039/501100011033 and cofinanced by FEDER, and project PID2020-116788GB-I00 financed by MCIN/AEI/10.13039/501100011033. VN was supported by a Juan de la Cierva-Formación postdoctoral fellowship (FJC2018-035611-I) funded by MCIN/AEI/10.13039/501100011033.
SHARING/ACCESS INFORMATION
1. Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0 1.0) Public Domain
2. Links to publications that cite or use the data: Noguerales, V. & B.C. Emerson (2024). Arthropod mtDNA paraphyly: a case study of introgressive origin. Journal of Evolutionary Biology. https://doi.org/10.5061/dryad.866t1g219
3. Links to other publicly accessible locations of the data: None
4. Links/relationships to ancillary data sets: None
5. Was data derived from another source? No
A. If yes, list source(s): NA
6. Recommended citation for this dataset: Noguerales, V. & B.C. Emerson (2024). Arthropod mtDNA paraphyly: a case study of introgressive origin. Journal of Evolutionary Biology. Dryad Digital Repository.
DATA & FILE OVERVIEW
1. File List:
1) a_PCA_uHe_Fst.zip
2) b_Structure.zip
3) c_Pi.zip
4) d_SNAPP.zip
5) e_Fastsimcoal.zip
6) f_Centrifuge.zip
2. Relationship between files, if important: None
3. Additional related data collected that was not included in the current data package: None
4. Are there multiple versions of the dataset? No
A. If yes, name of file(s) that was updated: NA
i. Why was the file updated? NA
ii. When was the file updated? NA
#############################################################################
DATA-SPECIFIC INFORMATION FOR: a_PCA_uHe_Fst.zip
The ZIP folder "a_PCA_uHe_Fst.zip" contains three files in STRUCTURE format (.str) used for running both PCA and calculating uHe and Fst estimators, as detailed in Material and Methods in the manuscript.
File “PCA_Tarphiussimplex_Tarphiuscanariensis_n189.str” contains the full dataset of 189 individuals of the two species, including 2357 unlinked SNPs. Individuals are coded in rows (2 rows per individual), with each column representing an unlinked SNP. Missing data is coded as “-9”.
File “PCA_uHe_Fst_Tarphiuscanariensis_n108.str” contains the full dataset of 108 individuals of Tarphius canariensis, including 3161 unlinked SNPs. Individuals are coded in rows (2 rows per individual), with each column representing an unlinked SNP. Missing data is coded as “-9”.
File “PCA_uHe_Fst_Tarphiussimplex_n81.str” contains the full dataset of 81 individuals of Tarphius simplex, including 7930 unlinked SNPs. Individuals are coded in rows (2 rows per individual), with each column representing an unlinked SNP. Missing data is coded as “-9”.
#############################################################################
DATA-SPECIFIC INFORMATION FOR: b_Structure.zip
The ZIP folder "b_Structure.zip" contains three files in STRUCTURE format (.str) used for running Structure.
File “PCA_Tarphiussimplex_Tarphiuscanariensis_n189.str” contains the full dataset of 189 individuals of the two species, including 2357 unlinked SNPs. Individuals are coded in rows (2 rows per individual), with each column representing an unlinked SNP. Missing data is coded as “-9”. Second column includes population label for each of the sample individuals.
File “PCA_uHe_Fst_Tarphiuscanariensis_n108.str” contains the full dataset of 180 individuals of Tarphius canariensis, including 3161 unlinked SNPs. Individuals are coded in rows (2 rows per individual), with each column representing an unlinked SNP. Missing data is coded as “-9”. Second column includes population label for each of the sample individuals.
File “PCA_uHe_Fst_Tarphiussimplex_n81.str” contains the full dataset of 81 individuals of Tarphius simplex, including 7930 unlinked SNPs. Individuals are coded in rows (2 rows per individual), with each column representing an unlinked SNP. Missing data is coded as “-9”. Second column includes population label for each of the sample individuals.
#############################################################################
DATA-SPECIFIC INFORMATION FOR: c_Pi.zip
The ZIP folder "c_Pi.zip" contains two DNA sequence files in IPYRAD format (.alleles files) which are used to calculate nucleotide diversity per sampling site using DNASP, as detailed in Material and Methods in the manuscript.
File “Pi_Tarphiuscanariensis_n108.alleles” contains the full dataset of 108 individuals of Tarphius canariensis. Individuals are coded in rows (2 rows per individual), as done for .alleles files derived from IPYRAD. File contains phased data from polymorphic and non-polymorphic loci contained in the .allele file from IPYRAD.
File “Pi_Tarphiussimplex_n81.alleles” contains the full dataset of 81 individuals of Tarphius simplex. Individuals are coded in rows (2 rows per individual), as done for .alleles files derived from IPYRAD. File contains phased data from polymorphic and non-polymorphic loci contained in the .allele file from IPYRAD.
#############################################################################
DATA-SPECIFIC INFORMATION FOR: d_SNAPP.zip
The ZIP folder "d_SNAPP.zip" contains the inputs files for SNAPP including the subset of 41 individuals, for a total of 5420 unlinked SNPs. Theses input file is used to reconstruct phylogenetic relationships among the two main genetic groups of each speciesin SNAPP.
File "SNAPP_Tarphiussimplex_Tarphiuscanariensis_n41.nex" is a NEXUS file that contains the dataset of 41 individuals of both species, including 5420 unlinked SNPs. This file is used to the prepare the following XML file in BEAUti. Individuals are coded in rows (1 row per individual), with each subsequent column representing an unlinked SNP. Missing data is coded as “?”.
File "SNAPP_Tarphiussimplex_Tarphiuscanariensis_n41.xml" contains the input in BEAST format including the subset of 41 individuals, for a total of 5420 unlinked SNPs. This input file is used to reconstruct phylogenetic relationships among the two main genetic groups of each speciesin SNAPP. Individuals are coded in rows (1 row per individual), with each subsequent column representing an unlinked SNP. Missing data is coded as “?”.
#############################################################################
DATA-SPECIFIC INFORMATION FOR: "e_Fastsimcoal.zip"
The ZIP folder "e_Fastsimcoal.zip" contains the input files for demographic analyses in FASTSIMCOAL2. For each of the alternative models, two different files are provided. The .est files contain the information for model specification in terms of migration matrices and historical events. The .tpl files contains priors and rules information for each of the parameters specified in the respective .est file. The SFS used for all models is contained in the “Fastsimcoal_Tarphiussimplex_Tarphiuscanariensis_n41_MSFS” file. The VCF used for preparing this SFS is contained in the "Fastsimcoal_Tarphiussimplex_Tarphiuscanariensis_n41.vcf" file
#############################################################################
DATA-SPECIFIC INFORMATION FOR: "f_Centrifuge.zip"
The ZIP folder "08_Fastsimcoal2.zip" contains the raw reads that were assigned to Wolbachia, once these were quality filtered and curated, as detailed in Material and Methods in the manuscript.
The folder "01_rawreads_assignedtoWb_postfiltering" contains the files in FASTA format for those individuals with assigned reads to Wolbachia. Reads correspond to sequences assigned to Wolbachia, which have been quality filtered and curated, as detailed in Material and Methods in the manuscript.
The folder "02_contigs_verifiedasWb_assembled_curated" contains the aforementioned sequences once they were assembled in contigs and verified as belonging to Wolbachia, as detailed in Material and Methods in the manuscript. Contigs are organized in folder according to these are composed of sequences belonging to a single species ("Contigs_only1spp"") or sequences of both Tarphius specis are present in a given folder ("Contigs_with_TcaTsi""). Within these two folders, contigs are further organized according to they are variable or monomorphic.
#############################################################################
Code/software
For handling mtDNA data:
Geneious Prime 2021.1
MAFFT (FFT-NS-i method)
FABOX 1.61 (Villesen, 2007)
POPART 1.7
DNASP 5.10.1
For handling ddRADseq data:
IPYRAD 0.9.81
STRUCTURE 2.3.3
‘gl.pcoa’ , gl.fst.pop’ and ‘gl.report.heterozygosity’ functions* *as implemented in the package *dartR *in R 4.2.2
Packages vegan *and geodist *in R 4.2.2
DNASP 6.12.03
FASTSIMCOAL 2.5.2.21
EASYSFS 0.0.1
For analyses of Wolbachia detection ddRADseq data:
CENTRIFUGE 1.0.4
FASTQC 0.11.7
TRIMOMMATIC 0.39
CUTADAPT 3.5
SEQTK 1.3
GENEIOUS PRIME 2021.1.1
blast+ 2.15.0