SNP sets of Bolivian Parajubaea species
Data files
Jul 29, 2024 version files 9.50 MB
Abstract
Conservation and sustainable management of lineages providing non-timber forest products are imperative under the current global biodiversity loss. Most non-timber forest species, however, lack genomic studies that characterize their intraspecific variation and evolutionary history, which inform species’ conservation practices. Contrary to many lineages in the Andean biodiversity hotspot that exhibit high diversification, the genus Parajubaea (Arecaceae) has only three species despite the genus’ origin 22 million years ago. Two of the three palm species, P. torallyi and P. sunkha, are non-timber forest species endemic to the Andes of Bolivia, and are listed as IUCN endangered. The third species, P. cocoides, is a vulnerable species with unknown wild populations. We investigated the evolutionary relationships of Parajubaea species, and the genetic diversity and structure of wild Bolivian populations. Sequencing of five low-copy nuclear genes (3,753 bp) challenged the hypothesis that P. cocoides is a cultigen that originated from the wild Bolivian species. We further obtained up to 15,134 de novo single-nucleotide polymorphism markers by genotyping-by-sequencing of 194 wild Parajubaea individuals. Our total DNA sequencing effort rejected the taxonomic separation of the two Bolivian species. As expected for narrow endemic species, we observed low genetic diversity, but no inbreeding signal. We found three genetic clusters shaped by geographic distance, which we use to propose three management units. Different percentages of missing genotypic data did not impact the genetic structure of populations. We use the management units to recommend in-situ conservation by creating new protected areas, and ex-situ conservation through seed collection.
README: SNP sets of Bolivian Parajubaea species
https://doi.org/10.5061/dryad.k0p2ngfgx
Data files accompanying the article: Peñafiel Loaiza, N., Chafe, A.H., Moraes, R.M., Oleas, N.H., Roncal, J. (2024) Genotyping by sequencing informs conservation of Andean palms sources of non-timber forest products
This folder contains thirteen SNP sets.
1. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_denovo_ALL-SNPs.str
SNP set created through a de novo pipeline in Stacks (populations parameters: -p 2 -r 0.5) for the analysis of genetic structure among individuals of nine collection sites of Parajubaea palms in Bolivia (1=Mataralcito[Mat], 2=UAGRM, 3=Quebrada honda[Qh], 4=El Palmar[EP], 5=Ruditayoj[Ru], 6=Lajas[La], 7=Sauce Mayo[SM], 8=Palmarcito[Pal], 9=Sucre[Su]).
The file contains information on 15134 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha and 146 individuals of six collection sites of Parajubaea torallyi.
This file was used to explore the genetic structure among all Parajubaea collection sites with the STRUCTURE software and by means of DAPC analyses.
Additionally, this SNP set was used to estimate collections sites' genetic diversity metrics (No. of private alleles, nucleotide diversity, Observed and Expected Heterozygosities and Inbreeding coefficient Fis) and to calculate pairwise Fst values among all Parajubaea collection sites.
2. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_denovo_ONE-SNP-PER-LOCUS.str
SNP set created through a de novo pipeline in Stacks (populations parameters: -p 2 -r 0.5) selecting only one SNP per locus (--write-single-snp flag) for the analysis of genetic structure among individuals of nine collection sites of Parajubaea palms in Bolivia (1=Mataralcito[Mat], 2=UAGRM, 3=Quebrada honda[Qh], 4=El Palmar[EP], 5=Ruditayoj[Ru], 6=Lajas[La], 7=Sauce Mayo[SM], 8=Palmarcito[Pal], 9=Sucre[Su]).
The file contains information on 4710 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha and 146 individuals of six collection sites of Parajubaea torallyi.
This file was used in the analysis of genetic structure among all Parajubaea collection sites with the STRUCTURE software.
3. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_denovo_TORALLYI-only.str
SNP set created through a de novo pipeline in Stacks (populations parameters: -p 2 -r 0.5) for the analysis of genetic structure among individuals of Parajubaea torallyi palms in Bolivia.
The file contains information on 15134 SNPs scored for 146 individuals of six collection sites of Parajubaea torallyi (1=El Palmar[EP], 2=Ruditayoj[Ru], 3=Lajas[La], 4=Sauce Mayo[SM], 5=Palmarcito[Pal], 6=Sucre[Su]).
This file was used fo the analysis of genetic structure among Parajubaea torallyi collections sites with the STRUCTURE software.
This is subset of SNP set #1
4. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_denovo_SUNKHA-only.str
SNP set created through a de novo pipeline in Stacks (populations parameters: -p 2 -r 0.5) for the analysis of genetic structure among individuals of Parajubaea sunkha palms in Bolivia.
The file contains information on 15134 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha (1=Mataralcito[Mat], 2=UAGRM, 3=Quebrada honda[Qh]).
This file was used fo the analysis of genetic structure among Parajubaea sunkha collection sites with the STRUCTURE software.
This is subset of SNP set #1
5. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_denovo_GENETIC-CLUSTERS.str
SNP set created through a de novo pipeline in Stacks (populations parameters: -p2 -r 0.1) for the estimation of genetic diversity metrics and genetic differentiation of the genetic clusters identified for Parajubaea palms in Bolivia.
The file contains information on 2317 SNPs scored for 186 individuals of Parajubaea sunkha and Parajubaea torallyi assigned to three genetic groups.
6. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_pseudoref_R20.str
SNP set created through a pseudo-reference method (see supplementary file) in Stacks (populations parameters: -R 0.2) for the analysis of DAPC clustering among identified for Parajubaea palms in Bolivia.
The file contains information on 19087 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha and 138 individuals of five collection sites of Parajubaea torallyi.
This file was used to evaluate the impact of filtering parameters on the resulting amount of missing data and on the DAPC clustering pattern and in comparison to results obtained with dataset #7.
7. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_pseudoref_R20_IMPUTATION.str
SNP set created through a pseudo-reference method (see supplementary file) in Stacks (populations parameters: -R 0.2) for the analysis of DAPC clustering among identified for Parajubaea palms in Bolivia. Missing data were imputed in GENODIVE, based on overall allele frequencies.
The file contains information on 19087 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha and 138 individuals of five collection sites of Parajubaea torallyi.
This file was used to evaluate the impact of filtering parameters on the DAPC clustering pattern in comparison to results obtained with dataset #6.
8. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_pseudoref_R40.str
SNP set created through a pseudo-reference method (see supplementary file) in Stacks (populations parameters: -R 0.4) for the analysis of DAPC clustering among identified for Parajubaea palms in Bolivia.
The file contains information on 9668 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha and 138 individuals of five collection sites of Parajubaea torallyi.
This file was used to evaluate the impact of filtering parameters on the resulting amount of missing data and on the DAPC clustering pattern, and in comparison to results obtained with dataset #9.
9. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_pseudoref_R40_IMPUTATION.str
SNP set created through a pseudo-reference method (see supplementary file) in Stacks (populations parameters: -R 0.4) for the analysis of DAPC clustering among identified for Parajubaea palms in Bolivia. Missing data were imputed in GENODIVE, based on overall allele frequencies.
The file contains information on 9668 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha and 138 individuals of five collection sites of Parajubaea torallyi.
This file was used to evaluate the impact of filtering parameters on the DAPC clustering pattern in comparison to results obtained with dataset #8.
10. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_pseudoref_R60.str
SNP set created through a pseudo-reference method (see supplementary file) in Stacks (populations parameters: -R 0.6) for the analysis of DAPC clustering among identified for Parajubaea palms in Bolivia.
The file contains information on 5612 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha and 138 individuals of five collection sites of Parajubaea torallyi.
This file was used to evaluate the impact of filtering parameters on the resulting amount of missing data and on the DAPC clustering pattern, and in comparison to results obtained with dataset #11.
11. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_pseudoref_R60_IMPUTATION.str
SNP set created through a pseudo-reference method (see supplementary file) in Stacks (populations parameters: -R 0.6) for the analysis of DAPC clustering among identified for Parajubaea palms in Bolivia. Missing data were imputed in GENODIVE, based on overall allele frequencies.
The file contains information on 5612 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha and 138 individuals of five collection sites of Parajubaea torallyi.
This file was used to evaluate the impact of filtering parameters on the DAPC clustering pattern in comparison to results obtained with dataset #10.
12. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_pseudoref_R80.str
SNP set created through a pseudo-reference method (see supplementary file) in Stacks (populations parameters: -R 0.8) for the analysis of DAPC clustering among identified for Parajubaea palms in Bolivia.
The file contains information on 2437 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha and 138 individuals of five collection sites of Parajubaea torallyi.
This file was used to evaluate the impact of filtering parameters on the resulting amount of missing data and on the DAPC clustering pattern, and in comparison to results obtained with dataset #13.
13. Peñafiel_Loaiza-et-al_2024_Conservation-genomics-of-Andean-palms_pseudoref_R80_IMPUTATION.str
SNP set created through a pseudo-reference method (see supplementary file) in Stacks (populations parameters: -R 0.8) for the analysis of DAPC clustering among identified for Parajubaea palms in Bolivia. Missing data were imputed in GENODIVE, based on overall allele frequencies.
The file contains information on 2437 SNPs scored for 48 individuals of three collection sites of Parajubaea sunkha and 138 individuals of five collection sites of Parajubaea torallyi.
This file was used to evaluate the impact of filtering parameters on the DAPC clustering pattern in comparison to results obtained with dataset #12.
Notes:
- Files with '.str' extension are in 'structure' format.
- The header containing the marker names has been manually removed.
- Samples are organized by row. Note that every sample occupies two consecutive rows. These correspond to the two alleles of each locus.
- Column 1 corresponds to the name of the sample.
- Column 2 indicates the population the sample belongs to.
- Columns 3 and on indicate the alleles for a particular SNP position (each column corresponds to a one SNP). The nucleotides are coded as follows: 1=A, 2=C, 3=G, 4=T, 0=Missing data.
Methods
DNA was extracted from silica-dried leaf fragments. Raw reads were generated by Genotyping-by-Sequencing. SNP sets were produced with Stacks under different filtering parameters of the populations module.