Skip to main content

Methodological advances for hypothesis‐driven ethnobiology

Cite this dataset

Gaoue, Orou G. et al. (2021). Methodological advances for hypothesis‐driven ethnobiology [Dataset]. Dryad.


Ethnobiology as a discipline has evolved recently to increasingly embrace theory-inspired and hypothesis driven approaches to study why and how local people choose plants and animals they interact with and use for their livelihood. However, testing complex hypotheses or a network of ethnobiological hypotheses is challenging, particularly for datasets with non-independent observations due to species phylogenetic relatedness or socio-relational links between participants. Further, to fully account for the dynamics of local ecological knowledge, it is important to account for the spatially explicit distribution of knowledge, the changes in knowledge, knowledge transmission and use. To promote the use of advanced statistical modeling approaches that address these limitations, we synthesize methodological advances for hypothesis-driven research in ethnobiology while highlighting the need for more figures than tables and more tables than text in ethnobiological literature. We present the ethnobiologicalmotivations for conducting generalized linear mixed-effect modeling, structural equation modeling, phylogenetic generalized least squares, social network analysis, species distribution modeling, and predictive modeling. For each element of the proposed ethnobiologists quantitative toolbox, we present practical applications along with script for a widespread implementation. Because these statistical modeling approaches are rarely taught in most ethnobiological programs but are essential for careers in academia or industry, it is critical to promote specialized workshops organized during annual professional meetings and focused on these advanced methods. By embracing these quantitative modeling techniques without sacrificing qualitative approaches which provide essential context, ethnobiology will further progress toward an expansive interaction with other disciplines.


These datasets include portions of data from authors' research used for illustrative purpose, data from published sources.

Qian, H. & Y. Jin. (2016) An updated megaphylogeny of plants, a tool for generating plant phylogenies and an analysis of phylogenetic community structure. Journal of Plant Ecology 9(2): 233–239

Coe MA and Gaoue OG(2021) Phylogeny explains why therapeutically redundant plant species are not necessarily facing greater use pressure. People and Nature, DOI:10.1002/pan3.10216

Bond, Matthew; Gaoue, Orou (2020), Adjacency matrices and nodal attributes for prestige and homophily predict network structure for social learning of medicinal plant knowledge, Dryad, Dataset, 

Usage notes

1. R Scripts

This includes 6 independent R scripts developed to show how to develop and implement advanced statistical models in ethnobiology.

Code R1. R script for generalized linear model and comparison with generalized linear mixed effect models.

Code R2. R script for using piecewiseSEM to develop structural equation model of the influence of socio-demographic traits and urbanization on local people knowledge of plants.

Code R3. R script for phylogenetic general least square (PGLS) testing the effect of controlling for species phylogenetic relatedness on the effects of species preference and therapeutic redundancy on use pressure. This script calls the “PhyloMaker.R” function which is included.

Code R4. R script for phylogenetic signal test for species use pressure, preference and therapeutic redundancy across plant species used by local people.

Code R5. R script for social network analysis using exponential random graph modeling (ERGM) to investigating the role of homophily in shaping knowledge distribution.

Code R6. R script to implement train/test split and cross-validation for ethnobiological data.

2. Datasets

To use these R scripts, you need the following datasets

Dataset 1: Data on the effect of socio-demographic traits and urbanization on local people knowledge of plants (data_1_ethno.csv). This data includes 144 observations for 9 variables including “nb_species”, the number of plants freelisted by an individual. This dataset is used for analyses in Code R1 and R2.

Dataset 2: This dataset is used for analyses in Code R3 and R4.

  • Data on the list of plant species and their family to build the phylogeny (data_2_phylo.csv)
  • Phylogeny which is pruned from the general phylogeny (data_3_shipibo_phylo.nex)
  • Ethnobiological data for which one which to conduct PGLS (data_4_shipibo_redundancy.csv)

To build the phylogeny used for the PGLS source data and function from Qian & Jin (2016). This includes:

  • The general phylogeny (Qian_PhytoPhylo.tre)
  • The nodes (Qian_nodes.csv)
  • The R function (Qian_S.phyloMaker.R)

Dataset 3: This dataset is used for Code S4

To test for the phylogenetic signal you need:

  • Data on the phylogeny constructed for the plant list, here as an R script (data_5_shipibo_tree.txt)
  • Ethnobiological data that includes a column of the trait(s) or variable(s) for which one wants to test for signal (data_4_shipibo_redundancy.csv)
  • Ethnobiological data for the NRI and NTI tests (data_6_ethno_shipibo.txt)

Dataset 4:  Data available at: Bond, Matthew; Gaoue, Orou (2020), Adjacency matrices and nodal attributes for prestige and homophily predict network structure for social learning of medicinal plant knowledge, Dryad, Dataset, 

  • Directed and asymmetric adjacency matrix of incoming knowledge sharing (data_7_adj_knowledge.csv)
  • Undirected and symmetric adjacency matrix of spouses (data_8_adj_married.csv)
  • Node attributes (data_9_node_attributes.csv)

Dataset 5: Data available at:

  • Census income data set (adult.csv) used here to illustrate how test/split analysis can be done in ethnobiology.