Reproductive barriers and genomic hotspots of adaptation during allopatric species divergence: datasets for all phylogenetic reconstructions represented in Fig 2
Data files
Mar 03, 2025 version files 73.40 MB
-
Phylogenetic_reconstruction.zip
73.40 MB
-
README.md
3.52 KB
Abstract
Theory predicts that in allopatric populations, genomic divergence and reproductive barriers may be driven by random genetic drift, and thereby evolve slowly in large populations. However, local adaptation and divergence under selection may also play important roles, which remain poorly characterised. Here we address three key questions in young allopatric species: (a) How widespread are genomic signatures of adaptive divergence?, (b) What is the functional space along which young sister species show divergence at the genomic level?, and (c) How quickly might prezygotic and postzygotic reproductive barriers evolve? Analysis of 82 re-sequenced genomes of the Oriental Papilio polytes species group revealed surprisingly widespread hotspots of intense selection and selective sweeps at hundreds of genes, and spanning all chromosomes, rather than divergence only in a few genomic islands. These genes are involved in diverse ecologically important adaptive functions such as wing development, colour patterning, courtship behaviour, mimicry, pheromone synthesis and olfaction, and host plant use and digestion of secondary metabolites, that could contribute to local adaptation and subsequent reproductive isolation. Divergence at such functional genes appeared to have evolved in conjunction with reproductive consequences: behavioural and hybridisation experiments revealed strong assortative mate preference (prezygotic barriers) as well as postzygotic barriers to hybridisation in timespans as short as 1.5 my, indicating that speciation was already complete, rather than incipient. Our study thus demonstrates an underappreciated role of intense selection and potential local adaptation in creating genome-wide hotspots of rapid molecular evolution and divergence, during differentiation and speciation in young allopatric species.
https://doi.org/10.5061/dryad.wpzgmsbxx
Description of the data and file structure
Each sub-folder contains the alignment files, partition files and tree files for each phylogeny from Fig. 2. There are five sub-folders, one larger folder for extracted fasta files of individual genes (2. single_genes
), and one for each type of dataset: nuclear markers (thiolase, carbamoyl-phosphate synthetase 2, aspartate transcarbamylase, and dihydroorotase (CAD), catalase (CAT), dopa decarboxylase (DDC), glyceraldehyde 3-phosphate dehydrogenase (GAPDH), isocitrate dehydrogenase (IDH), malate dehydrogenase (MDH), ribosomal protein S2 (RPS2), ribosomal protein S5 (RPS5), hairy cell leukemia protein 1 (HCL), elongation factor-I alpha (EF1-a), and wingless) (3. nuclear
), mitochondrial markers (cytochrome c oxidase-I, tRNA leucine, cytochrome c oxidase II, 16S and ND5) (4. mitochondrial
), both nuclear and mitochondrial markers (1. 17-genes
) and genomic SNPs (5. genomic
).
Sub-folders “alignment
” contain individual sequence files and the .fasta alignment files, the “PartitionFinder
” folders contain the output files from PartitionFinder2 with recommended partitions and “MrBayes
” folders contain log files and final .tre files from the phylogenetic reconstruction runs.
The marker sequences were extracted using whole-genome sequencing data aligned to the Papilio polytes reference genome for six Papilio polytes group species (P. polytes, P. javanus, P. alphenor, P. phestus, P. ambrax, and P. protenor).
Access information
The source data for the fasta alignments comes from whole genome sequencing data available on the SRA database (accession numbers PRJNA1166847, PRJNA234541 and PRJNA396246).
Usage Notes
- Sequence and alignment files are organized as .fa or .fas files. These are standard FASTA files.
- The partition file in each folder is called “best_scheme.txt” and contains the partition models in standard nexus format, which can be read by most phylogenetic tools, including RAxML and MrBayes.
- MrBayes runs and the intermediate files for each MCMC run are in the standard nexus format as well.
- .tre files can be visualized in tools such as FigTree (open source)
- Extraction of genomic SNPs involved using the VCF file containing variants for all samples and converting it into a fasta alignment. This folder contains both VCF and intermediate files for thinning of SNPs to obtain the final FASTA file with a desired size for reasonable run times with phylogenetic analysis. VCF files can be accessed with several open-source tools such as bcftools and vcftools.
Corresponding author information
Name: Krushnamegh Kunte
ORCID: https://orcid.org/0000-0002-3860-6118
Affiliation: National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bengaluru 560065, India
email: krushnamegh@ncbs.res.in
Alternative contact information
Name: Riddhi Deshmukh
ORCID: https://orcid.org/0000-0002-7634-2029
Affiliation: Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
email: riddhi.deshmukh@unil.ch
Phylogenetic reconstruction was performed using sequences of individuals from the Papilio polytes species group.