Data from: Estimating the effective population size across space and time in the Critically Endangered western chimpanzee in Guinea-Bissau: Challenges and implications for conservation management
Data files
Oct 07, 2025 version files 172.28 GB
Oct 25, 2025 version files 172.28 GB
-
Bella_PT_GB.Reference_genome.realigned.bam.zip
25.65 GB
-
Bo_PT_GB.Reference_genome.realigned.bam.zip
27.95 GB
-
Database_10_loci_estimatingNechimps_less_accuracy.xlsx
110.90 KB
-
Emi_PT_GB.Reference_genome.realigned.bam.zip
30.68 GB
-
README.md
2.61 KB
-
Simao_PT_GB.Reference_genome.realigned.bam.zip
28.63 GB
-
T-3_Chimp.Reference_genome.realigned.bam.zip
59.37 GB
Abstract
Effective population size (Ne) is a key concept in evolutionary and conservation biology. The western chimpanzee (Pan troglodytes verus) is a Critically Endangered taxon. In Guinea-Bissau, chimpanzees are mainly threatened by habitat loss, hunting, and diseases. Guinea-Bissau is considered a key area for its conservation. Genetic tools have not yet been applied to inform management, and no estimates of Ne have been obtained. In this study, we use the country’s range-wide microsatellite data and five whole-genome sequences to estimate several Ne and infer the recent and ancient demographic history of populations using different methods. We also aim to integrate the different Ne estimates to improve our understanding of the evolutionary history and current demography of this great ape and to discuss the strengths and limitations of each estimator and their complementarity in informing conservation decisions. Results from the PSMC method suggest a large ancestral Ne, likely due to ancient structure over the whole subspecies distribution until approximately 10-15,000 years ago. After that, a change in connectivity, a real decrease in size or a combination of both occurred, which reduced the then still large ancestral population to a smaller size (MSVAR: ~10,000 decreasing to 1,000-6,000 individuals), possibly indicating a fragmentation into coastal and inner subpopulations. In the most recent past, contemporary Ne is below or close to 500 (GONE: 116-580, NeEstimator: 107-549), suggesting a high risk of extinction. The populations at coastal Parks may have been small or isolated for several generations whereas the Boé Park one exhibit higher long-term Ne estimates and can be considered a stronghold of chimpanzee conservation. Through combining different types of molecular markers and analytical methodologies, we try to overcome the limitations of obtaining high quality DNA sampling from wild threatened populations and estimate Ne at different temporal and spatial scales, which is crucial information to make informed conservation decisions at local and regional scales.
Whole genome sequencing data
Whole genome sequencing data from 5 individuals analysed in the study entitled: Estimating the effective population size across space and time in the Critically Endangered western chimpanzee in Guinea-Bissau: challenges and implications for conservation management.
We deposited the BAM files of each sample aligned with the reference genome. Files are named according to name of the chimpanzee individuals - Bella, Bo, Emilia, Simão, and T3
Bella_PT_GB.Reference_genome.realigned.bam.zip
Bo_PT_GB.Reference_genome.realigned.bam.zip
Emi_PT_GB.Reference_genome.realigned.bam.zip
Simao_PT_GB.Reference_genome.realigned.bam.zip
T-3_Chimp.Reference_genome.realigned.bam.zip
Unique genotypes (10 microsatellite loci) of the western chimpanzees in Guinea-Bissau, West Africa
Description of the data and file structure
File: Database_10_loci_estimatingNechimps_less_accuracy.xlsx
The dataset consists of 143 unique genotypes for 10 microsatellite loci derived from non-invasive fecal samples collected in four protected areas in Guinea-Bissau - Cantanhez National Park (CNP), Cufada Lagoons National Park (CLNP), Dulombi National Park (DNP) and Boé National Park (BNP). Name of microsatellite loci are D5s1457, D13s159, D2s1326, D10s1432, D16s2624, D1s207, D14s306, D6s311, D4s1627 and HUMFIBRA.
In the first sheet it is shown the whole dataset. First column shows the code of the samples, second column contains sampling site (Cantanhez National Park (CNP), Cufada Lagoons National Park (CLNP), Dulombi National Park (DNP) and Boé National Park (BNP), third column contains the sex of the individual as determined using a molecular protocol, the fourth column contains the quality index (QI) across loci (Miquel et al. 2006) and from the fifth column onwards, it is shown the allele size for each of the 10 loci. Zero stands for missing data.
There are 10 other sheets in the excel file. These correspond to the sub-datasets used to estimate the Ne and the demographic dynamics of each of the populations inhabiting each park, a dataset contain unrelated individuals and 5 random datasets (see paper Ferreira da Silva et al for more details on how each dataset was built).
The GPS coordinates were removed to protect the primate social groups from potential hunters. As this is a Critically Endangered species and live individuals are targeted for illegal trade in the country, disclosing precise locations could pose a conservation risk. This omission does not affect the replicability of the study, as genotypes were grouped by park for all analyses.
Genomic data
Whole-genome sequences were produced from biological material collected from wild born chimpanzees: one road-killed (tissue sample T3-Chimp collected in 2011) and blood samples from four individuals (blood samples from Bo, Bella, Simão and Emilia chimpanzees, collected between 2018 and 2019) confiscated by the Institute for Biodiversity and Protected Areas (IBAP) from private premises. Blood was collected as part of the placement of the individuals in a sanctuary abroad (Sweetwaters Chimpanzee Sanctuary, Ol Pejeta, Kenya). The blood samples were drawn by a wildlife veterinarian (P. Melo, vet_natura, https://www.vetnatura.pt/) for health screening and as part of a parasites and virus detection procedure prior to translocation (Melo et al., 2018). Samples were collected in 5 mL collection tubes filled up with the anticoagulant ethylenediamine tetraacetic acid (EDTA) and preserved fresh until DNA extraction. The road-killed individual was found in the road next to CLNP and a sample of muscle tissue was collected and preserved in 98% ethanol up to DNA extraction.
DNA was extracted from the five samples adapting the method by Vallet et al., (2008). We used 500 µL of each blood sample and about 10 mg of tissue from the road-killed individual. We tested the quality of DNA extractions in 2% agarose gels and quantified DNA concentration using a Nanodrop microvolume spectrophotometer (ThermoFisher Scientific). Laboratory procedures took place at the Instituto Gulbenkian de Ciencia, and extractions were carried out in a biological safety cabinet in a Biosafety Level 2 dedicated room. Library preparation and sequencing were performed by Macrogen at a coverage of 30-15x using the Illumina Hiseq X and TruSeq platforms.
After all samples passed quality control tests, we used the BAM pipeline from PALEOMIX to process the sequences for downstream analysis at the Globe Institute’s (University of Copenhagen, Demark) High-Performance Computing (HPC) cluster. This pipeline trims adapter sequences, filters low quality reads, removes PCR duplicates, and aligns reads and maps them to a reference genome. We used a “makefile” (.yaml file), that allows the specification of the tasks to be performed, BWA as the aligner software and the algorithm “mem”. The BWA-mem algorithm shows great performance with sequencing errors and is most adequate for short reads, as it is the case of this study (Li, 2013). The “MinQuality” parameter was used to exclude reads with a mapping quality (or Phred score) below zero.
Microsatellite data
We also generated a dataset of 143 unique genotypes for 10 microsatellite loci derived from non-invasive fecal samples (Borges, 2017; Gerini 2018; Sá 2013).
Eighty-five genotypes correspond to samples collected between 2015 and 2017 in CLNP (N=38), BNP (N=34), and DNP (N=13) and the remaining consisted of previously determined genotypes from CNP (N=58) (Sá, 2013).
Fecal samples were collected fresh and from unhabituated and unidentified individuals, in sites used by chimpanzee groups for sleeping, foraging and drinking. The techniques and methods to preserve the fecal samples until DNA extraction are described in Ferreira da Silva et al. (2014).
DNA extraction was carried out using two methods: i) the QIAamp®DNA Stool Mini Kit (QIAGEN®) at MWB research group laboratory facilities at School of Biosciences, Cardiff University, UK (Sá, 2013) and ii) the CTAB method (Vallet et al., 2008, adapted by Quéméré et al. 2010) for samples collected between 2015-2017, which were extracted at Instituto Gulbenkian de Ciência (IGC, Oeiras, Portugal) laboratory facilities.
DNA samples were identified to the species level using a mitochondrial DNA hypervariable region I fragment (approximately 600 base pairs, using primers L15926 and H16555, as described in Sá, 2013). Consensus sequences were derived from forward and reverse sequencing by visual comparison using Geneious Pro v.4.8.5 (Biomatters, Biomatters Ltd, New Zealand). Standard Nucleotide BLAST in NCBI (http://www.ncbi.nlm.nih.gov/) was used to identify accessions closely related to the generated sequences and confirm that samples were from P. troglodytes verus (i.e., GenBank Accession code D38113).
Allele size standardization between datasets was carried out using re-extraction and re-analyses of DNA extracts of five samples included in Sá (2013) together with the novel samples analyzed in Borges (2017) and Gerini (2018). Allele scoring followed previously described procedures to guarantee minimal impact of allelic dropout and false alleles errors: four replicates were carried out per sample and the rules to reach a consensus genotype were determined per locus (Ferreira da Silva et al., 2014). The consensus genotype was classified according to the Quality Index (QI, Miquel et al., 2006), and genotypes with a mean across loci below 0.55 were excluded from the dataset.
The probability of identity (PI) and the probability of identity between siblings (PIsibs) (Waits et al., 2001), estimated using GenAIEx v.6.503 (Peakall & Smouse,** 2006), was of 1.5 x 10-11 and 8.9 x 10-05, respectively, which in principle allows to distinguish between unique genotypes using six loci.
We could not find genotyping errors (typing errors, large-allele dropout, and locus-specific deficiency in heterozygotes due to null alleles) using MicroChecker v.2.2.3 (van Oosterhout et al., 2006) apart from locus D2S1326 for CLNP, which showed excess of homozygotes. We retained the locus in the final dataset as we found no significant departures from Hardy-Weinberg equilibrium per locus using the Bonferroni correction when geographic populations were analyzed separately. The population of chimpanzees in GB does not display significant population structure when assessed using individual-based Bayesian algorithms (e.g., STRUCTURE) (Borges, 2017 estimated K=1).
Changes after Oct 7, 2025: A file was added. This file contain a dataset of unique genotypes from western chimpanzee in Guinea-Bissau (N=143 genotypes) for 10 autosomal loci.
