Skip to main content
Dryad logo

Data from: The program STRUCTURE does not reliably recover the correct population structure when sampling is uneven: sub-sampling and new estimators alleviate the problem

Citation

Puechmaille, Sébastien J. (2016), Data from: The program STRUCTURE does not reliably recover the correct population structure when sampling is uneven: sub-sampling and new estimators alleviate the problem, Dryad, Dataset, https://doi.org/10.5061/dryad.2d4m9

Abstract

Inferences of population structure and more precisely the identification of genetically homogeneous groups of individuals are essential to the fields of ecology, evolutionary biology, and conservation biology. Such population structure inferences are routinely investigated via the program STRUCTURE implementing a Bayesian algorithm to identify groups of individuals at Hardy-Weinberg and linkage equilibrium. While the method is performing relatively well under various population models with even sampling between subpopulations, the robustness of the method to uneven sample size between subpopulations and/or hierarchical levels of population structure has not yet been tested despite being commonly encountered in empirical datasets. In this study, I used simulated and empirical microsatellite datasets to investigate the impact of uneven sample size between subpopulations and/or hierarchical levels of population structure on the detected population structure. The results demonstrated that uneven sampling often leads to wrong inferences on hierarchical structure and downward biased estimates of the true number of subpopulations. Distinct subpopulations with reduced sampling tended to be merged together, whilst at the same time, individuals from extensively sampled subpopulations were generally split, despite belonging to the same panmictic population. Four new supervised methods to detect the number of clusters were developed and tested as part of this study and were found to outperform the existing methods using both evenly and unevenly sampled datasets. Additionally, a sub-sampling strategy aiming to reduce sampling unevenness between subpopulations is presented and tested. These results altogether demonstrate that when sampling evenness is accounted for, the detection of the correct population structure is greatly improved.

Usage Notes

Location

Asia
Europe
America
Africa
Oceania