Data and code from: The interplay of sexual selection and hybridization can drive sexual radiation
Data files
Dec 15, 2025 version files 2.01 GB
-
Kagawa_HybridSexualRadiation_20251213.zip
2.01 GB
-
README.md
26.95 KB
Abstract
Many evolutionary radiations involve diversification of exaggerated sexual displays, but little is known about how they are formed. It has been theorized that there can be multiple stable equilibria in the evolution of sexually selected traits, by which multiple species having distinct sexual displays and mate preferences can stably persist. However, these theories alone do not fully explain the speciation process because the stability of each equilibrium will impede the formation of a new species from an existing species that is already occupying one of the stable equilibria. To explain the evolutionary process that forms species with diverse sexual displays, we here propose that hybridisation between phenotypically similar but genetically divergent lineages can catalyse speciation by sexual selection, via the creation of genotypic and phenotypic variation. Using evolutionary simulations, we show that hybridisation giving rise to variation in mate preference and sexual displays can alter the sexual selection regime, thereby creating opportunities for evolutionary shifts toward previously unoccupied stable equilibria. Thus, hybridisation can facilitate evolutionary exploration of the stable evolutionary equilibria that sexual selection generates, leading to speciation accompanied by the diversification of exaggerated sexual displays.
https://doi.org/10.5061/dryad.np5hqc04n
Description of the data and file structure
Files and variables
File: Kagawa_HybridSexualRadiation_20251213.zip
Description: This data set contains three directories (Fig. 1). The directory Code contains three directories including source codes of programs. First, the directory Java_Simulation contains the programs for the individual-based evolutionary simulations. The subdirectory Java_Simulation/src contains five Java source codes, and the subdirectory Java_Simulation/configs contains five text files for specifying the simulation settings; Java_Simulation/Appendix_configs_Figures contains these files used for the simulations shown in figures of the main paper. Second, the directory Julia_Analysis contains three Julia programs to analyze raw output files of the individual-based simulation. Third, the directory R_Visualization contains a R script for visualizing simulation results.
The remaining two directories contain output files from the simulation. The directory “SimulationFinalResults” contains text files summarizing results from many simulations, which are the main results of the paper. These files were generated by analyzing raw output files of simulations using the Julia programs (Analysis_Equilibria.jl, Analysis_Hybrid_Zone.jl, and Analysis_Meta_Population.jl). The directory RawSimulationOutputExamples contains raw output files of the simulation. These raw simulation outputs are intermediate files to be processed by the Julia program and are not necessary for reproducing the study, but they are useful for checking how the Julia programs work. Due to the limitation for the size of a data set to be uploaded, output files from only a single simulation run are involved in the data set.
Code/software
Instructions for running the Java program for the simulation
Step 1: installation of an external package
The Java programs rely on the Apache Commons RNG 1.5 package (https://commons.apache.org/proper/commons-rng/). Please download binaries of this package (e.g. commons-rng-1.5.zip) from the website and copy all the JAR files in the downloaded file into the following directory: Java_Simulation/rng.
Step 2: compilation of the programs
Prior to running the simulations, all java source codes files (Simulation.java, Model.java, Functions_Simulation.java, Functions_Basic.java, and DataTable.java) have to be compiled with the javac command of the Open Java Development Kit (Open JDK) or Oracle Java Development Kit (Oracle JDK). Since the programs use the Apache Commons RNG package, which should be located in Java_Simulation/rng, paths for all the JAR files in this directory must be indicated using the -classpath option of the javac command; thus, the programs should be compiled with the following command:
*javac -classpath $paths .java
where $paths is the colon-delimited (or semicolon-delimited in Windows) list of paths of all the JAR files in the directory Java_Simulation/rng. On Linux, compilation and execution of the java program can be done with the commands in the Java_Simulation/java_run.sh.
Step 3: simulating evolutionary dynamics with three geographic settings
After compilation, simulations can be run by executing the Simulation class with the java command as follows:
java -classpath $paths Simulation s n
where $paths is the colon-delimited list of paths of all the JAR files in the directory Java_Simulation/rng, and s and n are command line arguments to control the simulation scenario and the number of threads for parallel computation, respectively. The program can conduct simulations with three distinct scenarios, and the simulation scenario can be chosen by using the first command line argument (s).
Setting 1: Simulating evolution in a closed population
With the command line argument “ClosedPopulation”, the program conducts multiple simulations of evolution within single closed populations. In the main paper, this simulation scenario was used to explore stable equilibria of the evolutionary dynamics without hybridization. Parameter condition for the simulations is controlled by two text files: IBM/configs/default_parameters.txt and IBM/configs/parameter_table_Equilibria.txt. The former file controls the default values for all parameters and the latter file is the list of all alternative values of parameters to be explored. The program automatically extracts data from these files.
Setting 2: Simulating evolution in a secondary contact zone
With the command line argument “HybridZone”, the program conducts multiple simulations considering a secondary contact between two genetically distinct lineages that have evolved in allopatry from a common ancestral lineage. This scenario consists of the following three phases: (0) the preparation of the common ancestral lineage (burn-in), (1) the independent evolution of two parental lineages in allopatry, and (2) the secondary contact between the parental lineages forming a hybrid zone. Simulation of each phase of this scenario can be run separately with the following command line arguments if there are output files from the precedent phase: “Phase0”, “HybridZone_Phase1”, and “HybridZone_Phase2”. Values of parameters to be explored are listed in IBM/configs/parameter_table_AH.txt and IBM/configs/parameter_table_BH.txt. The former is the list of alternative values for parameters that are applicable to only the secondary contact phase (e.g. the intensity of immigration from parental populations to the secondary contact zone), and the latter is that for parameters that are applicable to all three phases.
Setting 3: Simulating evolution through fission and fusion of populations
Third, with the command line argument “MetaPop”, the program conducts multiple simulations considering a cycle of temporal isolation and reconnection of four habitable patches. This scenario consists of the following four phases: (0) the preparation of the common ancestral lineage (burn-in), (1) the colonization of the ancestral lineage into the four patches, (2) the independent evolution of four local populations without gene flow between them, (3) reconnection of four patches leading to a secondary contact between local populations. Simulation of each phase of this scenario can be run separately with the following command line arguments if there are output files from the precedent phase: “Phase0”, “MetaPop_Phase1”, “MetaPop_Phase2”, and “MetaPop_Phase3”. Values of parameters to be explored are listed in IBM/configs/parameter_table_MetaPop.txt and IBM/configs/parameter_table_BH.txt. The former is the list of alternative values for parameters that are applicable to only the secondary contact phase (e.g. the migration rate), and the latter is that for parameters that are applicable to all four phases.
Step 4: Running simulations with various parameter conditions
Simulations with varied values of parameters can be run by editing the text files in the directory IBM/configs (default_parameters.txt, parameter_table_Equilibria.txt, parameter_table_AH.txt, parameter_table_BH.txt, and parameter_table_MetaPop.txt). Parameters involve the preference function model (beta, Gaussian, and exponential) as well as the mode of immigration of parental lineages to the secondary contact zone (continuous or one-time). Simulations for each figure can be reproduced by using the input files in folders with names of figures in the directory Appendix_configs_Figures. The correspondence relationship between parameters in the paper and those in the Java source codes and R scripts are summarized in Table 1.
Instructions for running the Julia programs for analyzing results
There are three Julia programs for categorizing evolutionary outcomes in simulations: Analysis_Equilibria.jl, Analysis_Hybrid_Zone.jl, and Analysis_Meta_Population.jl. Each of them is to analyze a set of raw output files from multiple simulations considering one of the three simulation scenarios. First, Analysis_Equilibria.jl analyzes output files from simulations of evolution in single closed populations to generate an output file Equilibria.txt. Second, Analysis_Hybrid_Zone.jl categorizes evolutionary outcomes in simulations considering evolution in a secondary contact zone and generates an output file HybridZone_Outcomes.txt. Third, Analysis_Meta_Population.jl counts the number of species that have evolved in simulations considering a cycle of fission and fusion of populations. The output file is MetaPop_Outcomes.txt. All the three programs analyze all the raw output files of simulations existing in the focal directory, which can be specified using a command line argument (example: julia Analysis_Hybrid_Zone.jl [path-of-the-directory]). In the output file of each program, values of all parameters and the category of evolutionary outcome (or the number of species) are listed for all the simulations. The output files follow the regular format of dataframe and can be visualized using, for example, R. Another program, Functions_Analysis.jl, implements functions to be used in the above three programs. The algorithms for categorizing evolutionary outcomes and counting species are described in the Supplementary Method of the paper.
Instructions for running the R script to visualize evolutionary dynamics
The R script (Plot_Dynamics.R) generates four figures visualizing:
(1) Evolutionary dynamics of distributions of male genetic trait values (d1, d2)
(2) Evolutionary dynamics of distributions of female genetic trait values (a1, a2, b1, b2 for the beta preference function; x1, x2 for the Gaussian preference function; p1, p2 for the exponential preference function)
(3) Evolutionary changes of the correlation coefficient between the level of paternal investment to offspring (fi) and the degree of sexual display exaggeration (Σs|dsi|) in males
(4) Shapes of the mate preference function of females in each population at multiple time points
The script refers to the following raw output files of simulations existing in the focal directory (or its subdirectories): population_x_.txt (x = 0, 1, 2, …). The focal directory is indicated by the absolute path written in the script.
Format of raw output files of the individual-based simulation
The java program runs multiple simulations simultaneously, and the x-th simulation generates a set of the following raw output files:
- substitutions_x_y_.txt: The list of all derived alleles that were existing in the system at the generation y; x is the ID number of the simulation. Each row stores the following five properties of each derived allele: the ID of the allele (“id”), the position of the derived allele in the genome (“position”), the ID of trait that the allele affects (“trait”), ID of the genetic locus that the allele belongs to (“locus”), the phenotypic effect value of the allele (“effect”).
- loci_x.txt: This file stores positions of all genetic loci that can potentially affect phenotypes in the simulation run. All simulations starting from the same common ancestor share the same list of positions of loci.
- individuals_x_y_.txt: This file stores information of genotypes of all individuals that existed in the generation y. Each row stores the information of a haploid genotype as a list of IDs of derived alleles. The first five columns store the following properties of the haploid genome and its carrier (a diploid individual): generation, which should equals y; name of the population where the individual belonged to (“population_id”); sex of the individual (“female”; 0 -> male, 1 -> female); ID of the individual (“id_no”); and the ID of the haploid genome (“chromo_no”; 0 or 1). All the remaining columns list the derived alleles involved in the haploid genome.
- para_x_.txt: This file stores values for all parameters used in the simulation.
- population_x_.txt: This is the main file of raw simulation output, in which properties of all individuals in the system are recorded at a constant interval. Table 2 summarizes the meanings of all the rows of this file.
- population_x_first_last100_.txt: In this file, properties of all individuals in the system are recorded every generation in the following two periods: the first 100 generations and the last 100 generations of the simulation. The format of the file is the same as “population_x_.txt”.
- reproFitLandscape_x_pop=z: This file contains the information of the reproductive fitness landscape in the population z at multiple time points. The first row (“generation”) indicates the time point. The fourth row (“score”) indicates the average level of female preference for the combination of male display values of the second and third rows (“s0” and “s1”). This score equals the probability that a male whose trait values are s0 and s1 is accepted by a randomly sampled female of the population z within the first event of encountering between a female and a male in the mating period of the generation.
Format of the final output file of the Julia program
Each of the three Julia programs (Analysis_Equilibria.jl, Analysis_Hybrid_Zone.jl, and Analysis_Meta_Population.jl) generates a single output file summarizing results from a set of many simulations with systematically varied values of parameters. In all the three output files (Equilibria.txt, HybridZone_Outcomes.txt, and MetaPop_Outcomes.txt), each row shows information of a single simulation, and the first 23 columns show the values of all parameters used in the simulation. The remaining columns show evolutionary outcomes in the simulation. In Equilibria.txt (output file of Analysis_Equilibria.jl), the column “outcome” shows the category of evolutionary outcome. In HybridZone_Outcomes.txt (output file of Analysis_Hybrid_Zone.jl), evolutionary outcome is summarized by four columns: “RI”, “gene_flow_male”, “gene_flow_female”, and “outcome”. The column “RI” is the intensity of reproductive isolation between the hybrid population and two parental populations. The value of “RI” was calculated from the intensities of gene flow caused by immigrant male and female individuals from parental populations to the hybrid population (Supplementary Method), which are shown in columns “gene_flow_male” and “gene_flow_female”, respectively. The column “outcome” shows the category of evolutionary outcome in the simulation. In MetaPop_Outcomes.txt (output file of Analysis_Meta_Population.jl), columns “species_count_first” and “species_count_last” show the number of species at the beginning of the secondary contact period and and the end of the simulation, respectively. Since the algorithm counts only species with exaggerated male sexual displays, “species_count_last” can become zero when none of the four metapopulation patches harbor populations with exaggerated sexual displays.
Table 1. Parameters for the simulation program and their symbols in the main paper
| Definition | Symbol | Name in programs |
|---|---|---|
| The number of loci controlling each trait | L | num_loci |
| Mutation rate/locus/generation (fixed to 10-5) | - | mutation_per_locus |
| Phenotypic effect size of mutations | σm | mutation_jump |
| Population density at which survival rate becomes 0.5 | K | pop_capacity |
| Intensity of immigration in the simulations scenario 1 | m | num_parental_immigration |
| Migration rate in the secondary contact period of the simulation scenario 2 | κ | mig_rate |
| The duration of allopatric evolution of parental lineages (generations) | T0 | generation_load |
| Upper limit number of mating per male individual | Mm | num_mating_male |
| Initial ratio of two parental lineages in simulations with the one-time immigration mode | r0 | IniRatio_L1 |
| Mate search cost | θ | search_cost |
| The level of paternal investment with which the survival fitness of newborn offspring reaches 0.5 | hc | half_satur |
| The upper limit value for the male resource budget | Emax | max_energy |
| Environmental variance of the male resource budget | 𝜎e | var_energy |
| The generation at which simulation is finished | - | max_generation |
| The mode of migration in the simulation scenario 1 (continuous or one-time) | - | migration_mode |
| Intensity of immigration after the density regulation (fixed to 0 except for simulations shown in Fig. S9) | - | adult_mig_intensity |
| The number of male sexual display traits (fixed to 2) | - | dim_mating_trait |
| The mate preference function model (beta, gaussian, or exponential) | - | mating_mode |
| Recombination rate/generation/cM (fixed to 0.01) | - | recombi_rate |
| Female maximum fecundity (fixed to 100) | - | fecundity |
| Length of each genetic locus (fixed to 0.005cM) | - | single_gene_length |
| The number of chromosomes (fixed to 15) | - | num_chromo |
| Length of each chromosome (fixed to 200cM) | - | single_chromo_length |
Table 2. Contents of “population_x_.txt” (the main file of raw simulation outputs)
| Column name | Meaning |
|---|---|
| generation | Generation from the starting of the simulation |
| id | The ID number of the individual |
| mother_id | The ID number of the mother of the individual |
| father_id | The ID number of the father of the individual |
| population_id | The ID of the population that the individual finally belonged to (0, 1 -> parental population, 2 -> hybrid zone) |
| original_pop_id | The ID of the population where the individual had bone |
| sex | The sex of the individual (0 -> male, 1 -> female) |
| trait_0 | Value of the male genetic trait that determine the first sexual display phenotype (d1) |
| trait_1 | Value of the male genetic trait that determine the second sexual display phenotype (d2) |
| trait_2 | Value of the first female genetic trait (a1 of the beta preference function, or x1 of the Gaussian preference function, or p1 of the exponential preference function) |
| trait_3 | Value of the second female genetic trait (a2 of the beta preference function, or x2 of the Gaussian preference function, or p2 of the exponential preference function) |
| trait_4 | Value of the third female genetic trait (b1 of the beta preference function; Not applicable to simulations assuming other mate preference functions) |
| trait_5 | Value of the fourth female genetic trait (b2 of the beta preference function; Not applicable to simulations assuming other mate preference functions) |
| phenotype_0 | Phenotypic value of the first male sexual display (D1) |
| phenotype_1 | Phenotypic value of the second male sexual display (D2) |
| phenotype_2 | Phenotypic value of the first female trait (α1 of the beta preference function, or x1 of the Gaussian preference function, or p1 of the exponential preference function) |
| phenotype_3 | Phenotypic value of the second female trait (α2 of the beta preference function, or x2 of the Gaussian preference function, or p2 of the exponential preference function) |
| phenotype_4 | Phenotypic value of the third female trait (β1 of the beta preference function; Not applicable to simulations assuming other mate preference functions) |
| phenotype_5 | Phenotypic value of the fourth female trait (β2 of the beta preference function; Not applicable to simulations assuming other mate preference functions) |
| num_offsprings | The total number of offspring that the individual produced in its lifetime |
| paternal_investment | Value of the paternal investment (only for males) |
| num_mating | Number of times that the individual mated |
| rejected_encounter | Number of times that the individual rejected mating opportunities before mating (only for females) |
| origin_mate | The ID of the population where the mating partner of the individual bone in |
Figure 1. Hierarchical structure of the data set
Dryad_HybridSexualRadiation_20251025.zip
├── Code
│ ├── Java_Simulation
│ │ ├── java_run.sh
│ │ ├── configs
│ │ │ ├── default_parameters.txt
│ │ │ ├── parameter_table_AH.txt
│ │ │ ├── parameter_table_BH.txt
│ │ │ ├── parameter_table_Equilibria.txt
│ │ │ └── parameter_table_MetaPop.txt
│ │ ├── Appendix_configs_Figures #cont. 13 folders
│ │ ├── rng #cont. 15 files
│ │ └── src
│ │ ├── Data_Table.java
│ │ ├── Functions_Basic.java
│ │ ├── Functions_Simulation.java
│ │ ├── Model.java
│ │ └── Simulation.java
│ ├── Julia
│ │ ├── Analysis_Equilibria.jl
│ │ ├── Analysis_Hybrid_Zone.jl
│ │ ├── Analysis_Meta_Population.jl
│ │ └── Functions_Analysis.jl
│ └── R
│ └── Plot_Dynamics.R
└── Data
├── RawSimulationOutputExamples #cont. 3 folders
└── SimulationFinalResults
├── Equilibria_FigS4a.txt
├── Equilibria_FigS4b.txt
├── Equilibria_FigS4c.txt
├── HybridZone_Outcomes_Fig3a.txt
├── HybridZone_Outcomes_Fig3b.txt
├── HybridZone_Outcomes_Fig3c.txt
├── HybridZone_Outcomes_FigS9.txt
├── HybridZone_Outcomes_FigS11.txt
├── HybridZone_Outcomes_FigS18a.txt
├── HybridZone_Outcomes_FigS18b.txt
├── HybridZone_Outcomes_FigS18c.txt
├── MetaPop_Outcomes_Fig4.txt
├── MetaPop_Outcomes_FigS20.txt
└── MetaPop_Outcomes_FigS21.txt
The main paper provides individual-based simulations for evolution of sexually selected traits in a hybrid population between genetically diverged lineages. The computer program for the evolutionary simulations was developed by the author using Java programming language.
