Data from: Repeated mitochondrial capture with limited genomic introgression in a lizard group
Data files
Apr 22, 2025 version files 37.66 MB
-
BPP.zip
3.59 MB
-
Exons.zip
8.80 MB
-
Genetrees.zip
1.06 MB
-
mtDNA.zip
38.19 KB
-
Phylos.zip
19.76 KB
-
R_data.zip
24.14 MB
-
README.md
5.77 KB
Abstract
Mitochondrial introgression is common among animals and is often first identified through mitonuclear discordance — discrepancies between evolutionary relationships inferred from mitochondrial DNA (mtDNA) and nuclear DNA (nuDNA). Over recent decades, genomic data have also revealed extensive nuclear introgression in many animal groups, with implications for genetic and phenotypic diversity. However, the extent to which mtDNA introgression corresponds to nuDNA introgression varies. Here, we investigated historical and recent introgression in the Gehyra nana-occidentalis clade, a complex group of Australian geckos with documented cases of mitonuclear discordance suggestive of repeated mtDNA introgression. We hypothesised that mitonuclear discordance in this clade reflects mtDNA introgression with substantial nuclear introgression. Despite evidence of repeated mtDNA introgression, however, we found little to no evidence of historical nuDNA introgression using exon capture and genome-wide single nucleotide polymorphism (SNP) data. We also found no evidence of gene flow at modern contact zones and detected only a single early generation hybrid. Unsurprising given these results, we found no evidence of transgressive, intermediate, or more variable morphological phenotypes in taxa with introgressed mtDNA. These findings suggest that hybridisation in this system has, at least in some cases, resulted in repeated mitochondrial introgression with little or no nuclear introgression. This pattern aligns with other studies showing limited nuDNA introgression in taxa with mitonuclear discordance, highlighting a potentially broader trend in animal radiations.
Description of the data and file structure
Six folders are included here, the contents of each are outlined. The R script is available via Zenodo.
Files and variables
File: mtDNA.zip
Description: This folder simply contains two ND2 mitochondrial DNA sequence alignments in fasta format along with the respective partition file (nd2_partitions.txt). The two sequence alignments are: 1) nd2-600.fasta, which contains all 600 samples; and 2) nd2-subset-78.fasta, which is a representative subset of 78 of these 600 samples to create a phylogeny that can actually fit onto a multi-panel figure.
File: Exons.zip
Description: This folder contains three files. The concat.all.fasta file is a concatenated sequence alignment of 1478 exotic loci, with the concat.partitions.txt file being the associated partition file. The all-100lc.phy file contains the 1000 longest exotic loci in Philip format, with loci already partitioned.
File: Phylos.zip
Description: This folder contains four phylogeny files in Newick format. The two mtDNA trees inferred from IQ-TREE are nd2-600.tree and nd2-subset-78.tree, which correspond to the respective alignments mentioned above. The Astral_thresh30.tree file is the ASTRAL-III phylogeny inferred from 1478 gene trees that had branches with bootstrap support less than 30 collapsed (see before for the respective gene tree file). The bpp-A00-1000lc.tree file is the output of phylogenetic analysis using BPP across all lineages.
File: BPP.zip
Description: There are four subfolders within this folder. Each folder contains a additional subfolders (explained below), each with a the three files needed to run the respective BPP analysis: a sequence alignment (1000 exons in phylip format), an imap file (matching samples to lineages; txt format), and a control (.ctl) file that specifies the settings for the respective BPP analysis.
1. all-lineages-phylo contains a subfolder to run the A01 analysis (phylogenetic inference) and the A00 analysis (inference of branch lengths [tau] and population sizes [theta]). Each contains the necessary Phylip alignment, iMap file, and control file.
2. MSCi_multi-NM contains three subfolders to run the MSCi analyses testing unidirectional introgression from multiporosa to nanamulti (m-nm-uni), unidirectional introgression from nanamulti to multiporosa (nm-m-uni), and bidirectional introgression between these two (m-nm-bd). Each contains the necessary Philip alignment, iMap file, and control file.
3. MSCi_multi-OR contains three subfolders to run the MSCi analyses testing unidirectional introgression from multiporosa to occiOR (m-OR-uni), unidirectional introgression from occiOR to multiporosa (OR-m-uni), and bidirectional introgression between these two (m-OR-bd). Each contains the necessary Phylip alignment, iMap file, and control file.
4. MSCi_YI-KL contains three subfolders to run the MSCi analyses testing unidirectional introgression from occiYI to occiKL (YI-KL-uni), unidirectional introgression from occiKL to occiYI (KL-YI-uni), and bidirectional introgression between these two (YI-KL-bd). Each contains the necessary Phylip alignment, iMap file, and control file.
File: Genetrees.zip
Description: This folder contains two newick files containing 1478 gene trees inferred using IQ-TREE for subsequent analysis via ASTRAL-III. The genetrees-no-collapse.newick file contains all unmodified gene trees. The genetrees-30collapse.newick file contains gene trees that have had branches with bootstrap support values less than 30 collapsed (the one used for ASTRAL analysis).
File: R_data.zip
Description: This folder contains three files needed to run the R script (MolecEcol_2025.R; see below) that performs SNP-based analyses and as multivariate analyses of morphological data.
1. nana-group_dart.csv is the full DArT SNP data. This is in the typical format provided by DArT, where columns represent individuals and rows represent variant sites. For variant sites, a 0 value indicates the sample is homozygous for the reference allele, a 1 value represents a heterozygote for that site, and a 2 value represents a heterozygote for that site. Data in this format can be imported and analysed in R using the package dartR.
2. nana-group-meta-2023.csv is the metadata associated with the SNP data mentioned above. Columns show the sample ID, the lineage to which the sample belongs, and the latitude and longitude of the respective sample.
3. nana-occi_morph.csv is the morphological data. All measurements are in mm. Rows represent individuals and columns represent: "id", the id number of the specimen; "sex", the sex of the specimen if it could be determined; "lineage", the lineage to which the sample belongs; "pop", a shortened version of the lineage name; "group", whether the lineage belongs to the "occi" group or the "nana" group; "SVL", snout-to-vent length; "HL", head length; "HD", head depth; "HW", head width; "SL", snout length; "OW", orbit width; "WBE", width between eyes; "ILL", inter-limb length; "HLL", hindlimb length; "FLL", forelimb length.
Code
One R script is included with the Zenodo files:
MolecEcol_2025.R — This imports the SNP data and morphology data to perform SNP filtering, Principal Coordinates Analysis, New Hybrids, isolation-by-distance analysis at contact zones, sNMF analysis, MANOVA to test for sexual dimorphism, MANOVA to test for morphological divergence among lineages, Principal Components Analysis of morphological data, Fligner-Killeen test to assess whether body size variance differs among lineages.
