Data from: Genomic distribution and estimation of nucleotide diversity in natural populations: perspectives from the collared flycatcher (Ficedula albicollis) genome
Dutoit, Ludovic et al. (2016), Data from: Genomic distribution and estimation of nucleotide diversity in natural populations: perspectives from the collared flycatcher (Ficedula albicollis) genome, Dryad, Dataset, https://doi.org/10.5061/dryad.1n84v
Properly estimating genetic diversity in populations of non-model species requires a basic understanding of how diversity is distributed across the genome and among individuals. To this end we analysed whole-genome re-sequencing data from 20 collared flycatchers (genome size ≈1.1 Gb; 10.13 million single nucleotide polymorphisms detected). Genome-wide nucleotide diversity was almost identical among individuals (mean = 0.00394, range = 0.00384-0.00401) but diversity levels varied extensively across the genome (95% confidence interval for 200 kb windows = 0.0013-0.0053). Diversity was related to selective constraint such that in comparison to intergenic DNA, diversity at fourfold degenerate sites was reduced to 85%, 3’ UTRs to 82%, 5’ UTRs to 70% and non-degenerate sites to 12%. There was a strong positive correlation between diversity and chromosome size, probably driven by a higher density of targets for selection on smaller chromosomes increasing the diversity-reducing effect of linked selection. Simulations exploring the ability of sequence data from a small number of genetic markers to capture the observed diversity clearly demonstrated that diversity estimation from finite sampling of such data is bound to be associated with large confidence intervals. Nevertheless, we show that precision in diversity estimation in large outbred populations benefits from increasing the number of loci rather than the number of individuals. Simulations mimicking RAD sequencing showed that this approach gives accurate estimates of genome-wide diversity. Based on the patterns of observed diversity and the performed simulations we provide broad recommendations for how genetic diversity should be estimated in natural populations.