Simulated genotype data from: Effective population size estimation in large marine populations: Considering current challenges and opportunities when simulating large datasets with high-density genomic information
Data files
Aug 20, 2025 version files 134.90 MB
-
gen_output_5000_m001_simR_1.zip
14.77 MB
-
gen_output_5000_m001_simR_2.zip
14.84 MB
-
gen_output_5000_m001_simR_3.zip
15.10 MB
-
gen_output_5000_m005_simR_1.zip
14.89 MB
-
gen_output_5000_m005_simR_2.zip
14.95 MB
-
gen_output_5000_m005_simR_3.zip
15.04 MB
-
gen_output_5000_m010_simR_1.zip
15.06 MB
-
gen_output_5000_m010_simR_2.zip
15.15 MB
-
gen_output_5000_m010_simR_3.zip
15.11 MB
-
README.md
2.59 KB
Abstract
Next-generation sequencing has broadened perspectives regarding the estimation of the effective population size (Ne) by providing high-density genomic information. These technologies have expanded data collection and analytical tools in population genetics, increasing understanding of populations with high abundance, such as marine species with high commercial or conservation priority. Several common methods for estimating Ne are based on allele frequency spectra or linkage disequilibrium between loci. However, their specific constraints make it difficult to apply them to large populations, especially with confounding factors such as migration rates, complex sampling schemes, or non-independence between loci. Computer simulations have long represented invaluable tools to explore the influence of biological or logistical factors on Ne estimation and to assess the robustness of dedicated methods. Here, we outline several Ne estimation methods and their foundational principles, requirements, and likely caveats regarding application to populations of high abundance. Thereafter, we present a simulation framework built upon recent computational genomic tools that combine the possibility to generate biologically realistic datasets with realistic patterns of long-term neutral genetic diversity. This framework aims at reproducing and tracking the main critical features of data derived from a large natural population when running a simulation-based population genetics study, e.g., evaluating the strengths and limitations of various Ne estimation methods. We illustrate this framework by generating genotype datasets with varying sample sizes and locus numbers and analyzing them with three software tools (NeEstimator2, GONE, and GADMA). Detailed and annotated simulation scripts are provided to ensure reproducibility and to support future research on Ne estimation. These resources can support method comparisons and validations, particularly for nonspecialists, such as conservation practitioners and students.
Dataset DOI: 10.5061/dryad.6wwpzgn9w
Description of the data and file structure
This repository contains genotype data, in the "genepop" format, obtained from the simulation framework described in the related article "Effective population size estimation in large marine populations: Considering current challenges and opportunities when simulating large datasets with high-density genomic information" published in Evolutionary Applications.
A total of 108 genotypic data subsets (9 folders X 12 data files) were derived from 3 independent simulations conducted with gene flows 'm' between 2 populations set at 0.01, 0.05, or 0.10, respectively, each simulation repeated 3 times (leading to 3 replicates for each value of 'm').
Each .zip file corresponds to a simulation from a given combination of 'm' and replicate, and contains 12 genepop files with varying sample sizes (14, 50, 56 or 140) and loci numbers (1000, 10000 or 30000) from the simulated data.
Files and variables
File: gen_output_5000_m001_simR_1.zip
Description: contains 12 genepop files derived from a simulation with gene 'm' = 0.01 (replicate 1)
File: gen_output_5000_m001_simR_2.zip
Description: contains 12 genepop files derived from a simulation with gene 'm' = 0.01 (replicate 2)
File: gen_output_5000_m001_simR_3.zip
Description: contains 12 genepop files derived from a simulation with gene 'm' = 0.01 (replicate 3)
File: gen_output_5000_m005_simR_1.zip
Description: contains 12 genepop files derived from a simulation with gene 'm' = 0.05 (replicate 1)
File: gen_output_5000_m005_simR_2.zip
Description: contains 12 genepop files derived from a simulation with gene 'm' = 0.05 (replicate 2)
File: gen_output_5000_m005_simR_3.zip
Description: contains 12 genepop files derived from a simulation with gene 'm' = 0.05 (replicate 3)
File: gen_output_5000_m010_simR_1.zip
Description: contains 12 genepop files derived from a simulation with gene 'm' = 0.10 (replicate 1)
File: gen_output_5000_m010_simR_2.zip
Description: contains 12 genepop files derived from a simulation with gene 'm' = 0.10 (replicate 2)
File: gen_output_5000_m010_simR_3.zip
Description: contains 12 genepop files derived from a simulation with gene 'm' = 0.10 (replicate 3)
