Data from: Investigating the spatial, demographic, and genetic structures of Cylicodiscus gabunensis Harms, a light-demanding African timber species

Bhasin, Oriana 1 ; Doucet, Jean-Louis2 ; Ndonda Makemba, Romaric2 ; Gillet, Jean-François3 ; Deblauwe, Vincent4 ; Sonke, Bonaventure5 ; Hardy, Olivier1

Research facility: Evolutionary Biology & Ecology

Published Nov 02, 2023 on Dryad. https://doi.org/10.5061/dryad.0zpc8674f

Data files

Nov 02, 2023 version files 211.61 KB

Cgabunensis_Individuals_SiteA.csv

7.02 KB
Cgabunensis_Individuals_SiteB.csv

1.32 KB
Cgabunensis_Individuals_SiteC.csv

3.66 KB
NMpi_InputData_SiteA.txt

56.74 KB
NMpi_InputData_SiteC.txt

80.01 KB
README.md

3.15 KB
SPAGeDi_InputData_SiteA.txt

24.02 KB
SPAGeDi_InputData_SiteB.txt

8.81 KB
SPAGeDi_InputData_SiteC.txt

26.89 KB

Abstract

Most Central African rainforest canopies consist of light-demanding tree species that hold high commercial value but also suffer locally from regeneration deficits, raising concerns about the sustainability of logging. Regeneration is influenced by factors such as past perturbations (including human activity), mating systems, and seed/pollen dispersal processes that impact demographic, spatial, and genetic structures within populations. To gain a better understanding of these interactions, we studied the spatial distribution and trunk diameter structure of Cylicodiscus gabunensis (Fabaceae) - a wind-dispersed, insect-pollinated, timber species - in three plots ranging from 400 to 839 ha situated in various environmental contexts (e.g. forest types and elephant densities) across Central Africa. We also genotyped adults and juveniles using microsatellite markers to analyze the spatial genetic structure of each population and infer the selfing rate, seed and pollen dispersal capacities and selection gradients using the ‘neighborhood model’. The selfing rate was low (3 – 4 %), and seed dispersal distances (ds = 184 m) were much shorter than pollen dispersal distances (dp > 2 km). The three populations displayed contrasted spatial, demographic and genetic structures. One population showed no spatial aggregation or genetic structure, and a multimodal diameter structure indicating pulses of regeneration events. Two populations showed strong spatial aggregation and genetic structures. One exhibited a unimodal diameter structure indicating one ancient pulse of regeneration, while the other displayed a 'reverse J-shaped' diameter structure, typical of ongoing regeneration. In the latter, reproductive success appeared leptokurtic, three mother trees accounting for over 90 % of the regeneration and no tree below the minimum cutting diameter implemented by logging companies had offspring. The idiosyncratic nature of population characteristics observed in C. gabunensis suggests that, for sustainable management, a nuanced approach is needed. This involves protecting productive seed trees in areas where natural regeneration is occurring and actively supporting regeneration in areas exhibiting deficits, especially in contexts with low elephant densities.

https://doi.org/10.5061/dryad.0zpc8674f

The dataset comprises files with information on Cylicodiscus gabunensis individuals sampled in two 400 ha forest plots (site A and B) and a 839 ha one (site C). It also includes NMpi and SPAGeDi input data files for the three plots (only site A and C for NMpi). This dataset is valuable for examining both diameter distribution and spatial patterns, spatial genetic structure, mating system, gene flow and determinants of reproductives success.

Description of the data and file structure

There are three types of data files provided.

The first type consists of lists of Cylicodiscus gabunensis individuals that were sampled. The first row in these datasets serves as the header. These datasets contain individual names in the first column, longitude (x coordinates) and latitude (y coordinates) in the second and third columns, diameter at breast height in the fourth column, and canopy dominance status (Dawkins' crown illumination index) in the fifth column (only for site A). These datasets have versatile applications, enabling the creation of size-class maps for individuals categorized as saplings (dbh < 10 cm), juveniles (dbh from 10 cm to < 20 cm), and trees with dbh from 20 cm up to < 200 cm, grouped in 10 cm intervals. They can also be used to generate histograms showing frequency distribution per dbh class and calculate stand density. Additionally, they facilitate the characterization of the spatial distribution of trees (dbh ≥ 20 cm) using the 'spatstat' R package (Baddeley et al., 2015) through the pair correlation function g (PCF), which is a distance-dependent correlation function related to the derivative of the widely used K-function (Ripley, 1976).

The second type of files are the NMpi input data files. The header of each file follows this structure: np no nl nf, where np is the number of parents, no is the number of progeny, nl is the number of loci (limited to nuclear genetic markers), and nf is the number of phenotypic characters. Subsequent lines contain individual data, with each parent or progeny line starting with 0 (indicating generation), followed by ID, X and Y coordinates, cytotype, genotype, phenotypic characters, and the femaleness index.

The third type of files are the SPAGeDi input data files. These datasets follow a specific format: The first line contains six format numbers for individuals, categories, spatial coordinates, loci, allele coding digits, and ploidy level. The second line defines distance intervals, while the third line lists column labels. Starting from the fourth line, individual data is presented, including names, categories, coordinates (either coordinates or latitude and longitude), and genotypes at each locus. Each dataset concludes with the word "END." Additionally, there may be optional lines for dominant markers or polyploid data following this structured format, ensuring data organization for analysis.

In a 400 ha forest plot in Cameroon (site A), all C. gabunensis trees with a diameter at breast height (dbh) ≥ 10 cm were systematically inventoried and georeferenced with a GPS. To ensure exhaustive sampling, six individuals, spaced regularly between them, position themselves between two lines 100 meters apart, moving abreast and systematically sample the individuals found. Individuals with a dbh <10 cm were also inventoried when encountered although we do not expect to have reached an exhaustive inventory for them. The dbh of each sampled tree was measured and the dominance status of each tree, indicating whether the position of the tree crown is below, within, or above relative to the surrounding canopy layer (dominant, co-dominant or dominated) was also recorded. For each individual we collected a sample of a few cm² of cambium or a leaf that was immediately dried with silica-gel to preserve DNA.

We used the ‘neighborhood model’ (parentage model) to estimate seed and pollen dispersal kernels, selfing rate and effects of phenotypic characters on reproductive success. The model is implemented in the software NMπ (Chybicki, 2018) that requires genotype data on progeny and their putative parents, their spatial coordinates, and optionally tree quantitative phenotypic characters to estimate their effects on male and female reproductive success. Saplings and juveniles (dbh < 20 cm) were assigned as dispersed progeny, considering that potential parental trees have a dbh of at least 20 cm. Such data provides insights into female reproductive success and seed dispersal.

Pollen and seed dispersal were measured by: the selfing rate s, the proportion of pollen and seed immigration mp and ms (indicating that the father or both parents were located outside the sampled plot, or were missed during sampling) as well as four parameters of the exponential-power-von Misses distribution for modelling forward dispersal kernel (i.e., distribution of seed or pollen rains around parents). These kernel parameters are: the mean pollen and seed dispersal distance dp and ds, the shape parameter of the dispersal distribution bp and bs (i.e., b = 2 for a Gaussian distribution, b = 1 for an exponential distribution, b < 1 for a fat‐tailed distribution), the intensity of directionality (anisotropy) in dispersal kp and ks (with k = 0 under isotropic dispersal), and the azimuth of the prevailing dispersal direction ap and as (when kp > 0 or ks > 0). The effect of phenotypic characters on female and male reproductive success (selection gradients) was assessed through NMπ. This analysis involved centered and standardized dbh values and, for site A, centered and standardized dominance status (high values for trees dominating the surrounding canopy). Kendall's rank correlation coefficient (τ_b) was used to examine the relationship between DBH and tree dominance status.

Once all parameters were estimated, NMπ provided the most likely mother and father of each seed and seedling with an associated probability, accounting simultaneously for the spatial, genotypic and phenotypic character data. Considering the progeny (seedlings and juveniles) for which a father/mother was inferred with a probability of P ≥ 0.8, the dbh structure was compared between all trees, inferred mothers and inferred fathers.

The fine-scale spatial genetic structure (FSGS) in site A was characterized by the decay of the kinship coefficient, Fij, between trees (dbh ≥20 cm) with spatial distance (kinship–distance curve) following the procedure described by Vekemans & Hardy (2004), using SPAGeDi ver. 1.5d (Hardy & Vekemans, 2002). For each pair of individuals i and j from the same population, F_ijwas estimated using J. Nason's estimator (Loiselle et al., 1995) and the set of F_ijvalues were regressed on the spatial distance ln(d_ij) between individuals providing the regression slope b_Ld. The later can inform about the strength of FSGS through the statistic Sp, which synthesizes the decay of kinship coefficient between individuals with distance (Vekemans & Hardy, 2004). To visualize FSGS, F_ij values were also averaged for a set of nonoverlapping distance intervals (delimited by 50, 100, 200, 300, 500, 700, 1000, and 1500 m) to obtain the F(r) curve. Standard errors were provided by jackknifing loci (i.e. deleting information from one locus at a time). To test for FSGS, spatial positions of trees were permuted 999 times to draw 95% confidence intervals of the F(r) curves under the null hypothesis (random spatial distribution of genotypes).

Data from: Investigating the spatial, demographic, and genetic structures of Cylicodiscus gabunensis Harms, a light-demanding African timber species

Data files

Abstract

README: Dataset for investigating the spatial, demographic, and genetic structures of Cylicodiscus gabunensis Harms, a light-demanding African timber species

Description of the data and file structure

Methods

Works referencing this dataset