# Data from: Effects of sampling close relatives on some elementary population genetics analyses

## Citation

Wang, Jinliang (2017), Data from: Effects of sampling close relatives on some elementary population genetics analyses, Dryad, Dataset, https://doi.org/10.5061/dryad.4pv20

## Abstract

Many molecular ecology analyses assume the genotyped individuals are sampled at random from a population and thus are representative of the population. Realistically, however, a sample may contain excessive close relatives (ECR) because, for example, localized juveniles are drawn from fecund species. Our knowledge is limited about how ECR affect the routinely conducted elementary genetics analyses, and how ECR are best dealt with to yield unbiased and accurate parameter estimates. This study quantifies the effects of ECR on some popular population genetics analyses of marker data, including the estimation of allele frequencies, F-statistics, expected heterozygosity (He), effective and observed numbers of alleles, and the tests of Hardy-Weinberg equilibrium (HWE) and linkage equilibrium (LE). It also investigates several strategies for handling ECR to mitigate their impact and to yield accurate parameter estimates. My analytical work, assisted by simulations, shows that ECR have large and global effects on all of the above marker analyses. The naïve approach of simply ignoring ECR could yield low-precision and often biased parameter estimates, and could cause too many false rejections of HWE and LE. The bold approach, which simply identifies and removes ECR, and the cautious approach, which estimates target parameters (e.g. He) by accounting for ECR and using naïve allele frequency estimates, eliminate the bias and the false HWE and LE rejections, but could reduce estimation precision substantially. The likelihood approach, which accounts for ECR in estimating allele frequencies and thus target parameters relying on allele frequencies, usually yields unbiased and the most accurate parameter estimates. Which of the four approaches is the most effective and efficient may depend on the particular marker analysis to be conducted. The results are discussed in the context of using marker data for understanding population properties and marker properties.