Inferences of population structure and more precisely the identification of genetically homogeneous groups of individuals are essential to the fields of ecology, evolutionary biology, and conservation biology. Such population structure inferences are routinely investigated via the program STRUCTURE implementing a Bayesian algorithm to identify groups of individuals at Hardy-Weinberg and linkage equilibrium. While the method is performing relatively well under various population models with even sampling between subpopulations, the robustness of the method to uneven sample size between subpopulations and/or hierarchical levels of population structure has not yet been tested despite being commonly encountered in empirical datasets. In this study, I used simulated and empirical microsatellite datasets to investigate the impact of uneven sample size between subpopulations and/or hierarchical levels of population structure on the detected population structure. The results demonstrated that uneven sampling often leads to wrong inferences on hierarchical structure and downward biased estimates of the true number of subpopulations. Distinct subpopulations with reduced sampling tended to be merged together, whilst at the same time, individuals from extensively sampled subpopulations were generally split, despite belonging to the same panmictic population. Four new supervised methods to detect the number of clusters were developed and tested as part of this study and were found to outperform the existing methods using both evenly and unevenly sampled datasets. Additionally, a sub-sampling strategy aiming to reduce sampling unevenness between subpopulations is presented and tested. These results altogether demonstrate that when sampling evenness is accounted for, the detection of the correct population structure is greatly improved.
Model 1 datasets (island model), full datasets
This zipped file contains 50 datasets simulated under an island model with 10 subpopulations and 200 individuals per subpopulation. The files, in Fstat format, were created by the EASYPOP program.
M1.zip
Model 1 datasets (island model), sampling strategy No. 1
This zipped file contains 50 datasets simulated under an island model, with 10 subpopulations evenly sampled and a total of 620 individuals (sampling strategy no. 1; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M1_SS1.zip
Model 1 datasets (island model), sampling strategy No. 2.1
This zipped file contains 50 datasets simulated under an island model, with 10 subpopulations unevenly sampled and a total of 620 individuals (sampling strategy no. 2; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M1_SS2-1.zip
Model 1 datasets (island model), sampling strategy No. 2.2
This zipped file contains 50 datasets simulated under an island model, with 10 subpopulations sampled (20 individuals sampled in eight subpopulations and 10 in the remaining two), and a total of 180 individuals (cf. 'Sub-Sampling' section in M&M). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M1_SS2-2.zip
Model 1 datasets (island model), sampling strategy No. 3
This zipped file contains 50 datasets simulated under an island model, with 6 subpopulations evenly sampled (4 subpopulations not sampled) and a total of 150 individuals (sampling strategy no. 3; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M1_SS3.zip
Model 1 datasets (island model), sampling strategy No. 4
This zipped file contains 50 datasets simulated under an island model, with 6 subpopulations unevenly sampled (4 subpopulations not sampled) and a total of 150 individuals (sampling strategy no. 4; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M1_SS4.zip
Model 2 datasets (hierarchical island model), full datasets
This zipped file contains 50 datasets simulated under a hierarchical island model with 10 subpopulations and 200 individuals per subpopulation. The files, in Fstat format, were created by the EASYPOP program.
M2.zip
Model 2 datasets (hierarchical island model), sampling strategy No. 1
This zipped file contains 50 datasets simulated under a hierarchical island model, with 10 subpopulations evenly sampled and a total of 620 individuals (sampling strategy no. 1; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M2_SS1.zip
Model 2 datasets (hierarchical island model), sampling strategy No. 2.1
This zipped file contains 50 datasets simulated under a hierarchical island model, with 10 subpopulations unevenly sampled and a total of 620 individuals (sampling strategy no. 2; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M2_SS2-1.zip
Model 2 datasets (hierarchical island model), sampling strategy No. 2.2
This zipped file contains 50 datasets simulated under a hierarchical island model, with 10 subpopulations sampled (20 individuals sampled in eight subpopulations and 10 in the remaining two), and a total of 180 individuals (cf. 'Sub-Sampling' section in M&M). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M2_SS2-2.zip
Model 2 datasets (hierarchical island model), sampling strategy No. 3
This zipped file contains 50 datasets simulated under a hierarchical island model, with 6 subpopulations evenly sampled (4 subpopulations not sampled) and a total of 150 individuals (sampling strategy no. 3; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M2_SS3.zip
Model 2 datasets (hierarchical island model), sampling strategy No. 4
This zipped file contains 50 datasets simulated under a hierarchical island model, with 6 subpopulations unevenly sampled (4 subpopulations not sampled) and a total of 150 individuals (sampling strategy no. 4; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M2_SS4.zip
Model 3 datasets (hierarchical stepping stone model), full datasets
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model with 10 subpopulations and 200 individuals per subpopulation. The files, in Fstat format, were created by the EASYPOP program.
M3.zip
Model 3 datasets (hierarchical stepping stone model), sampling strategy No. 1
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model, with 10 subpopulations evenly sampled and a total of 620 individuals (sampling strategy no. 1; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M3_SS1.zip
Model 3 datasets (hierarchical stepping stone model), sampling strategy No. 2.1
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model, with 10 subpopulations unevenly sampled and a total of 620 individuals (sampling strategy no. 2; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M3_SS2-1.zip
Model 3 datasets (hierarchical stepping stone model), sampling strategy No. 2.2
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model, with 10 subpopulations sampled (20 individuals sampled in eight subpopulations and 10 in the remaining two), and a total of 180 individuals (cf. 'Sub-Sampling' section in M&M). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M3_SS2-2.zip
Model 3 datasets (hierarchical stepping stone model), sampling strategy No. 3
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model, with 6 subpopulations evenly sampled (4 subpopulations not sampled) and a total of 150 individuals (sampling strategy no. 3; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M3_SS3.zip
Model 3 datasets (hierarchical stepping stone model), sampling strategy No. 4
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model, with 6 subpopulations unevenly sampled (4 subpopulations not sampled) and a total of 150 individuals (sampling strategy no. 4; cf. Table 1). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M3_SS4.zip
Model 4 datasets (island model), full datasets
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model with 10 subpopulations and 200 individuals per subpopulation. The parameters used to simulate Model 4 were identical to those used for Model 1, except that the migration rate was doubled (set to 0.02) (cf. Table S1 for Fst values). The files, in Fstat format, were created by the EASYPOP program.
M4.zip
Model 4 datasets (island model), sampling strategy No. 1
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model, with 10 subpopulations evenly sampled and a total of 620 individuals (sampling strategy no. 1; cf. Table 1). The parameters used to simulate Model 4 were identical to those used for Model 1, except that the migration rate was doubled (set to 0.02) (cf. Table S1 for Fst values). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M4_SS1.zip
Model 5 datasets (hierarchical island model), full datasets
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model with 10 subpopulations and 200 individuals per subpopulation. The parameters used to simulate Model 5 were identical to those used for Model 2, except that the migration rates between subpopulations within and between archipelagos were doubled (set to 0.02 & 0.002 respectively) (cf. Table S1 for Fst values).The files, in Fstat format, were created by the EASYPOP program.
M5.zip
Model 5 datasets (hierarchical island model), sampling strategy No. 1
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model, with 10 subpopulations evenly sampled and a total of 620 individuals (sampling strategy no. 1; cf. Table 1). The parameters used to simulate Model 5 were identical to those used for Model 2, except that the migration rates between subpopulations within and between archipelagos were doubled (set to 0.02 & 0.002 respectively) (cf. Table S1 for Fst values). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M5_SS1.zip
Model 6 datasets (hierarchical stepping stone model), full datasets
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model with 10 subpopulations and 200 individuals per subpopulation. The parameters used to simulate Model 6 were identical to those used for Model 3, except that the migration rates between subpopulations on one side versus across the contact zone were doubled (set to 0.02 & 0.002 respectively) (cf. Table S1 for Fst values).The files, in Fstat format, were created by the EASYPOP program.
M6.zip
Model 6 datasets (hierarchical stepping stone model), sampling strategy No. 1
This zipped file contains 50 datasets simulated under a hierarchical stepping stone model, with 10 subpopulations evenly sampled and a total of 620 individuals (sampling strategy no. 1; cf. Table 1). The parameters used to simulate Model 6 were identical to those used for Model 3, except that the migration rates between subpopulations on one side versus across the contact zone were doubled (set to 0.02 & 0.002 respectively) (cf. Table S1 for Fst values). The files, in Fstat format, were simulated in the EASYPOP program and sampled in R via a custom written script.
M6_SS1.zip