Skip to main content

SNP dataset for the threatened plant species Dinizia jueirana-facao (Fabaceae)

Cite this dataset

Nazareno, Alison; Knowles, Lacey (2021). SNP dataset for the threatened plant species Dinizia jueirana-facao (Fabaceae) [Dataset]. Dryad.


By performing sensitivity analyses, we empirically investigated how decisions about the percentage of missing data (MD) and the minor allele frequency (MAF) set in bioinformatic processing of genomic data affect direct (i.e., parentage analysis) and indirect (i.e., fine-scale spatial genetic structure - SGS) gene flow estimates. We focus specification on these manifestations in small plant populations, and specifically, in the rare tropical plant species Dinizia jueirana-facao, where assumptions implicit to analytical procedures for accurate estimates of gene flow may not hold. Avoiding biases in dispersal estimates are essential given this species is facing extinction risks due to habitat loss, and so we also investigate the effects of forest fragmentation on the accuracy of dispersal estimates under different filtering criteria by testing for recent decrease in the scale of gene flow. Our sensitivity analyses demonstrate that gene flow estimates are robust to different setting of MAF (0.05 to 0.35) and MD (0 to 20%). Comparing the direct and indirect estimates of dispersal, we find that contemporary estimates of gene dispersal distance (σrt = 41.8 m) was ~ fourfold smaller than the historical estimates, supporting the hypothesis of a temporal shift in the scale of gene flow in D. jueirana-facao, which is consistent with predictions based on recent, dramatic forest fragmentation process. While we identified settings for filtering genomic data to avoid biases in gene flow estimates, we stress that there is no ‘rule of thumb’ for bioinformatic filtering or that relying on default program settings is advisable. Instead, we suggest that the approach implemented here be applied independently in each separate empirical study to confirm appropriate settings to obtain unbiased population genetics estimates.


We created one genomic library using a double-digest restriction site-associated DNA sequencing (i.e., ddRADseq) protocol (Peterson et al. 2012), with modifications to minimize the risk of high variance in the number of reads per individual (see Nazareno et al. 2017 for more details).

Files containing the raw sequence reads were analyzed in Stacks 2.41 (Catchen et al. 2011, Catchen et al. 2013, Rochette et al. 2019) using de novo assembly. We used the process_radtags program in Stacks to initially assign reads to individuals and eliminate poor quality reads and reads missing the expected EcoRI cut site (options –barcode_dist 2 -q -e ecoRI). All sequences were processed in ustacks to produce consensus sequences of RAD tags, applying a maximum-likelihood framework to estimate the diploid genotype for each individual at each nucleotide position (Hohenlohe et al. 2011).

Usage notes

The csv.file contains the genotypes for all the 50 samples (rows 2 to 51) for each of the 256 loci (columns D to IY). The geographic coordinates (X and Y) for each sample are presented at the columns B and C.