Many molecular ecology analyses assume the genotyped individuals are sampled at random from a population and thus are representative of the population. Realistically, however, a sample may contain excessive close relatives (ECR) because, for example, localized juveniles are drawn from fecund species. Our knowledge is limited about how ECR affect the routinely conducted elementary genetics analyses, and how ECR are best dealt with to yield unbiased and accurate parameter estimates. This study quantifies the effects of ECR on some popular population genetics analyses of marker data, including the estimation of allele frequencies, F-statistics, expected heterozygosity (He), effective and observed numbers of alleles, and the tests of Hardy-Weinberg equilibrium (HWE) and linkage equilibrium (LE). It also investigates several strategies for handling ECR to mitigate their impact and to yield accurate parameter estimates. My analytical work, assisted by simulations, shows that ECR have large and global effects on all of the above marker analyses. The naïve approach of simply ignoring ECR could yield low-precision and often biased parameter estimates, and could cause too many false rejections of HWE and LE. The bold approach, which simply identifies and removes ECR, and the cautious approach, which estimates target parameters (e.g. He) by accounting for ECR and using naïve allele frequency estimates, eliminate the bias and the false HWE and LE rejections, but could reduce estimation precision substantially. The likelihood approach, which accounts for ECR in estimating allele frequencies and thus target parameters relying on allele frequencies, usually yields unbiased and the most accurate parameter estimates. Which of the four approaches is the most effective and efficient may depend on the particular marker analysis to be conducted. The results are discussed in the context of using marker data for understanding population properties and marker properties.

Allele frequency simulation code

Fortran source code for simulating genotype data, for estimating allele frequencies by different methods from the data, and for assessing the accuracy of different methods

AlleleFre.f90

Allele frequency simulation executable

The compiled executable of file AlleleFre.f90

AlleleFre.exe

Fst simulation code

Fortran source code for simulating genotype data, estimating Fst by using estimated allele frequencies from different methods, and for assessing the accuracy of different methods

Fst.f90

fst simulation executable

Compiled from file Fst.f90

fst.exe

He simulation code

Fortran code for simulating genotype data, estimating expected heterozygosity from the data by different methods, and assessing the accuracy of different methods

He.f90

He simulation executable

Compiled from He.f90

He.exe

HWE Test simulation code

Fortran code for simulating genotype data and testing Hardy-Weinberg equilibrium from the data

HWE_Test.f90

HWE test simulation executable

Compiled from HWE_test.f90

hwe_test.exe

LD Test simulation code

Fortran code for simulating genotype data, testing linkage disequilibrium.

LD_Test.f90

LD Test simulation executable

Compiled from LD_test.f90

LD_test.exe

Data from: Effects of sampling close relatives on some elementary population genetics analyses

Data files

Abstract

Allele frequency simulation code

Allele frequency simulation executable

Fst simulation code

fst simulation executable

He simulation code

He simulation executable

HWE Test simulation code

HWE test simulation executable

LD Test simulation code

LD Test simulation executable

Data from: Effects of sampling close relatives on some elementary population genetics analyses

Data files

Abstract

Usage notes

Allele frequency simulation code

Allele frequency simulation executable

Fst simulation code

fst simulation executable

He simulation code

He simulation executable

HWE Test simulation code

HWE test simulation executable

LD Test simulation code

LD Test simulation executable

Works referencing this dataset