Data from: Minimum sample sizes for population genomics: an empirical study from an Amazonian plant species
Nazareno, Alison G.; Bemmels, Jordan B.; Dick, Christopher W.; Lohmann, Lúcia G. (2017), Data from: Minimum sample sizes for population genomics: an empirical study from an Amazonian plant species, Dryad, Dataset, https://doi.org/10.5061/dryad.bm98q
High throughput DNA sequencing facilitates the analysis of large portions of the genome in non-model organisms, ensuring high accuracy of population genetic parameters. However, empirical studies evaluating the appropriate sample size for these kinds of studies are still scarce. In this study, we use double digest restriction associated DNA sequencing (ddRADseq) to recover thousands of single nucleotide polymorphisms (SNPs) for two physically isolated populations of Amphirrhox longifolia (Violaceae), a non-model plant species for which no reference genome is available. We used resampling techniques to construct simulated populations with a random subset of individuals and SNPs to determine how many individuals and bi-allelic markers should be sampled for accurate estimates of intra- and interpopulation genetic diversity. We identified 3,646 and 4,900 polymorphic SNPs for the two populations of A. longifolia, respectively. Our simulations show that, overall, a sample size greater than eight individuals has little impact on estimates of genetic diversity within A. longifolia populations, when 1,000 SNPs or higher are used. Our results also show that even at a very small sample size (i.e., two individuals), accurate estimates of FST can be obtained with a large number of SNPs (≥ 1,500). These results highlight the potential of high-throughput genomic sequencing approaches to address questions related to evolutionary biology in non-model organisms. Furthermore, our findings also provide insights into the optimization of sampling strategies in the era of population genomics.