Skip to main content
Dryad

Data from: The computer program structure for assigning individuals to populations: easy to use but easier to misuse

Cite this dataset

Wang, Jinliang (2016). Data from: The computer program structure for assigning individuals to populations: easy to use but easier to misuse [Dataset]. Dryad. https://doi.org/10.5061/dryad.f8n5j

Abstract

The computer program Structure implements a Bayesian method, based on a population genetics model, to assign individuals to their source populations using genetic marker data. It is widely applied in the fields of ecology, evolutionary biology, human genetics and conservation biology for detecting hidden genetic structures, inferring the most likely number of populations (K), assigning individuals to source populations and estimating admixture and migration rates. Recently, several simulation studies repeatedly concluded that the program yields erroneous inferences when samples from different populations are highly unbalanced in size. Analysing both simulated and empirical data sets, this study confirms that Structure indeed yields poor individual assignments to source populations and gives frequently incorrect estimates of K when sampling is unbalanced. However, this poor performance is mainly caused by the adoption of the default ancestry prior, which assumes all source populations contribute equally to the pooled sample of individuals. When the alternative ancestry prior, which allows for unequal representations of the source populations by the sample, is adopted, accurate individual assignments could be obtained even if sampling is highly unbalanced. The alternative prior also improves the inference of K by two estimators, albeit the improvement is not as much as that in individual assignments to populations. For the difficult case of many populations and unbalanced sampling, a rarely used parameter combination of the alternative ancestry prior, an initial ALPHA value much smaller than the default and the uncorrelated allele frequency model is required for Structure to yield accurate inferences. I conclude that Structure is easy to use but is easier to misuse because of its complicated genetic model and many parameter (prior) options which may not be obvious to choose, and suggest using multiple plausible models (parameters) and K estimators in conducting comparative and exploratory Structure analysis.

Usage notes