Microsatellite loci genotypes dataset (N=121 unique individuals) from: Sex-mediated gene flow of grayfoot chacma baboons (Papio ursinus griseipes ) in a highly seasonal habitat of Gorongosa National Park, Mozambique
Data files
Jul 18, 2025 version files 32.98 KB
-
README.md
1.15 KB
-
STRs_GNP_submitted_MJFS.xlsx
31.83 KB
Abstract
Investigating primates’ behavioral variation at the inter-population level is important for the advancement of biological anthropology and the understanding of the evolutionary processes leading to species-specific behavioral patterns. The study of behavioral diversity among populations also contributes to improve primates’ conservation efforts. Dispersal patterns tend to be similar among close phylogenetic lineages but may vary in response to individual-based responses to local demographic and environmental variation. Here, we investigate dispersal patterns of chacma baboons (Papio ursinus griseipes) living in Gorongosa National Park and the Catapu Forest Reserve in central Mozambique. The park consists of a mosaic landscape, located in a region of high seasonal variability. This area was the epicenter of a major war, which decimated most apex predators resulting in limited mammalian predation on baboons and a steep increase in groups number. Factors such as anthropogenic habitats disturbance, decreased interactions with predators, and increase in groups density have the potential for altering individual dispersal behaviour in primates, which may lead to lack of sex-bias in dispersal. We used a large genetic dataset of non-invasive DNA samples analyzed for uni- and bi-parentally inherited markers to characterize the spatial distribution of genetic variation and investigate the extent and direction of sex-mediated gene flow. We found high levels of genetic diversity and evidence for historical and very recent male-biased gene flow. Our study highlights the strong conservation of male-biased dispersal patterns in chacma baboon populations facing highly unpredictable environments and suggests that dispersal behaviors in Papio sp. are resilient to environmental variability and high seasonality.
https://doi.org/10.5061/dryad.gb5mkkwxv
Description of the data and file structure
Data is organized in an excel sheet, following Genelex software format. The first line has information on column information. First column is sample code, second column is the information of each sample on Quality Index (QI) estimated as in Miquel et al 2006, third column has the information on the population (Gorongosa or CFP), fourth and fifth column describes latitude and longitude coordinates (in decimal degrees), six column has information on the sex of sample (Male, Female or NA - Non identified), and next columns describe alleles size genotyped for each loci. Please note that DYS576 was only genotyped for males (samples identified as females are indicated by "female (NA - non applicable). Missing data (samples not reaching a consensus genotype) are indicated by "0" (zero).
Study site
We carried out the study at Gorongosa National Park (GNP, 18°43'56.54" S, 34°16'31.58" E, total area 3,770 km2), which included the floodplain central area (GNP core), and the western entrance of the park (Boé-Maria, BM).The study also included samples collected in the Catapu Forest Reserve (CFR, 16°50'54.11" S, 34°11'11.94" E, total area 250 km2).
DNA sampling
Fecal samples from unidentified individuals were sampled opportunistically along unpaved roads and in areas usually frequented by baboons, such as the edge of drinking spots or foraging paths, during field expeditions carried out in the dry seasons of 2017, 2018 and 2019. One tissue sample was retrieved from an individual who was the subject of infanticide, and one blood sample was collected during GPS collar placement as part of a different study (Lewis-Bevan et al., in prep). Both samples have been analysed before (Santander et al., 2022). Fecal samples were collected in two main areas, GNP and CFR (Fig. 1). At GNP, sampling was carried out mainly at two locations – the core of GNP around Lake Urema and at Boé Maria- at a maximum distance of 30 km. BM is located approximately 11 km from the closest site at the core of GNP. At CFR, sampling was carried out at four sites located a maximum of 6 km apart (Fig. 1). Fecal samples were geo-referenced using a Global Positioning System (GPS) device (Garmin GPSMAP 64s). Notes were taken concerning samples’ preservation status and of the sex of individuals (when observed) and group size. Samples were collected at least 2 m apart to minimize the likelihood of sampling the same individuals. Gloves, facemasks, and hairnets were used during sampling to limit possible human cross-contamination. The collected fecal samples were preserved until DNA extraction using the “two-step protocol” (Roeder et al., 2004), a procedure in which a 5 ml of fecal material is collected from the exterior part of the sample, by scraping the surface using a wooden stick, and immediately immersed in 99% ethanol for 24-48h. After that period, the samples were transferred to a tube containing 30g of Silica Gel (Type III, S-7625, indicating for desiccation, Sigma-Aldrich) and preserved as such until DNA extraction. The tissue and the blood sample were preserved in 99% ethanol at room temperature until DNA extraction.
DNA extraction
Fecal samples were transported to CIBIO laboratory facilities, Porto University, Portugal, for DNA extraction. Total genomic DNA was extracted using the QIAamp® DNA Stool Mini Kit (QIAGEN ®) with some modifications from the manufacturer’s protocol (Ferreira da Silva et al., 2014). DNA extracts were eluted in 200ul Buffer AE and stored at -20ºC. Several precautions were taken to avoid contamination with exogenous DNA during sample extraction (Ferreira da Silva 2012). Negative controls were subjected to all the extraction procedures to test for the possible contamination with human DNA and/or cross-contamination between samples. Tissue and the blood DNA extraction was performed in GNP molecular lab, using the QIAGEN “DNeasy Blood & Tissue Kits” following the manufactures’ protocol. DNA extracts were eluted in 200ul Buffer AE and stored at -20ºC.
DNA amplification and data production
DNA samples were amplified via Polymerase Chain Reaction (PCR) to recover 490 base pairs (bp) of the mitochondrial control region (hypervariable region I) (mtDNA) (Hapke et al., 2001), 14 autosomal microsatellite loci, and the Y chromosome-linked DYS576 locus. Microsatellite loci were human-derived with cross-amplification in other Papio sp. (Ferreira da Silva et al., 2014). The Y-locus (DYS576) is one of the few human-derived markers showing variation in Papio sp. (e.g., Jolly et al., 2011). The sex of the individuals was identified using the molecular protocol developed by Di Fiore (2005) in which a fragment of the amelogenin X gene (~200 bp) and a fragment of the SRY gene (approximately 165 bp) are co-amplified in a multiplex PCR. Primer sequences and the PCR amplification cycling details are described in supplementary material: 1) procedures to limit cross-contamination between samples and by external DNA were implemented (supplementary material; 2) all PCRs were carried out in a T100TM BIO-RAD 96 Well Thermal Cycler at CTM/CIBIO.
The set of fourteen autosomal microsatellite loci and the Y-chromosome locus were amplified using five multiplex PCR systems (Table S1, supplementary material 3). The range of amplified alleles sizes varies between approximately 123 and 270 bp. All markers were tetranucleotide repeats, except D7S503 and D5S1457 that were dinucleotides (Table S1, supplementary material 3). Successful PCR products were analysed using CTM fragment analysis service and run on an ABI3730XL capillary analyzer using a 16 GeneScanTM -500 LIZ ® size-standard.
Microsatellite loci genotyping followed a modified version of the “multi-tubes” approach from (Taberlet et al., 1996) in which the number of required repetitions across loci necessary to obtain reliable consensus genotypes was estimated on empirical data, following Ferreira da Silva et al., (2014) (supplementary material 5). As a result, we carried out a minimum of four amplifications per locus per sample. The reliability of consensus genotypes was quantified using the mean “Quality Index” (QI, Miquel et al., 2006) across loci. Genotypes with a QI below 0.55 were removed from the final dataset (Miquel et al., 2006). The Probability of Identity (PI), which is the probability that two individuals sampled randomly from the population have the same genotype at all typed loci, and the PI between siblings (PIsibs) (Waits et al., 2001) were estimated using GenAlEx v.6.503 (Peakall & Smouse, 2006). The minimum number of loci required to distinguish between different individuals using the study’s set of microsatellite loci was estimated using poppr 2.9.3 package (Kamvar et al., 2015) in R Studio v.1.3.1093. Samples with the same genotype (possibly belonging from re-sampled individuals) were removed from the dataset.
Departure from Hardy-Weinberg equilibrium (HWE) was calculated per locus for the overall dataset and separately for GNP+BM and CFR using PopGenReport 2.2.2 R package (Adamack & Gruber, 2014). The p-value for significant deviations from HWE expectations was corrected using the Bonferroni adjustment for multiple comparisons. Micro-Checker version 2.2.3 (van Oosterhout et al., 2006) was used to test locus-specific deficiency in heterozygotes due to null alleles, stutter band-related scoring errors and large-allele dropout, all three with a 95% confidence interval. An exact test for linkage disequilibrium (LD) between all pairs of loci was computed for each sampling area using Arlequin v.3.11 (Excoffier et al., 2005). The significance of the association between pairs of loci was tested via a likelihood ratio test using 10,000 permutations. Percentage of missing data for the overall dataset and by sampling locations was estimated using poppr 2.9.3 R package.