Data from: Institutional complexity emerges from socioecological complexity in small-scale human societies

Hamilton, Marcus 1 ; Walker, Robert2 ; Buchanan, Briggs3

Published Apr 18, 2024 on Dryad. https://doi.org/10.5061/dryad.gb5mkkwz3

Data files

Apr 18, 2024 version files 61.42 KB

README.md

24.91 KB
WNAI_data.xlsx

36.51 KB

Abstract

Human lifestyles vary enormously over time and space and so understanding the origins of this diversity has always been a central focus of anthropology. A major source of this cultural variation is the variation in institutional complexity; the cultural packages of rules, norms, ontologies, and expectations passed down through societies across generations. In this paper we study the emergence of institutions in small-scale societies. There are two primary schools of thought. The first is that institutions emerge top-down as rules are imposed by elites on their societies in order to gain asymmetrical access to power, resources, and influence over others through coercion. The second is that institutions emerge bottom-up to facilitate interactions within populations as they seek collective solutions to adaptive problems. Here, we use Bayesian networks to infer the causal structure of institutional complexity in 172 small-scale societies across ethnohistoric western North America reflecting the wide diversity of indigenous lifestyles across this vast region immediately prior to European colonization. Our results suggest that institutional complexity emerges from underlying socioecological complexity because institutions are solutions to coordination problems in more complex environments where human-environment interactions require increased management.

https://doi.org/10.5061/dryad.gb5mkkwz3

These data were compiled from Jorgensen JG. 1980 Western Indians: Comparative environments, languages, and cultures of 172 Western American Indian tribes. New York: WH freeman and restructured following the Methods outlined in the main paper accompanying this dataset.

Variable description

*society_name_dplace *The name of the society in D-Place

*society_xd_id *ID number for each society

*language_glottocode *The equivalent language name in the glottolog

*language_name *The name of the langauge

*language_family *The family of the language

*group *The common name of the group

*Lat *Latitude

*Long *Longitude

*phyla *Ordinal variable representing the language phyla

*technology *Ordinal variable representing the number of technological traits

*Subsistence Ordinal variable representing the number of subsistence traits *

*Politics *Ordinal variable representing the number of sociopolitical traits

*Agricultural production Ordinal variable representing the number of agricultural traits *

*material_culture Ordinal variable representing the number of material culture traits *

*div_of_labor Ordinal variable representing the division of labor *

*econ_dist Ordinal variable representing the number of ceremonial traits *

*property Ordinal variable representing the number of property rights traits *

*marriage Ordinal variable representing the number of marriage norms traits *

*family *Ordinal variable representing the number of marriage norms traits

*descent *Ordinal variable representing the number of marriage norms traits

*war *Ordinal variable representing the number of warfare traits

*ceremony *Ordinal variable representing the number of ritual traits

*lifecycle *Ordinal variable representing the number of ritual traits

*Spirits Ordinal variable representing the number of supernatural traits *

*shamans *Ordinal variable representing the number of supernatural traits

*illness *Ordinal variable representing the number of supernatural traits

*magic *Ordinal variable representing the number of supernatural traits

pop density Ordinal variable representing the level of population density

Description of the data and file structure

The Western Indians dataset contains information on a wide diversity of lifestyle traits classified by Jorgensen into eight primary categories: 1) technology and material culture; 2) subsistence economy; 3) economic organization; 4) social and kinship organization; 5) political organization; 6) ceremonialism and life cycle; 7) spirit quest, shamanism, causes of illness, magic; and 8) settlement and demography, which are further divided into subcategories. Based on Jorgensen’s primary categories, we re-categorized the data as we identified several subcategories that contained important variables that were worthy of consideration as independent categories. Our category labels and their relation to Jorgensen’s original categories and subcategories are presented in Table SI1. We clustered Jorgensen’s categories into 13 categories for our analysis including: 1) technology; 2) material culture; 3) subsistence complexity; 4) division of labor; 5) economic complexity; 6) property; 7) marriage norms; 8) sociopolitical complexity; 9) war; 10) ritual; 11) the supernatural, 12) agricultural intensity and 13) population density. Categories 1-11 are binomial variables, marked as present or absent (see below for more detail) whereas categories 12 and 13 are ordinal measuring increasing reliance on agriculture and population density, respectively.

Traits

Our goal was to quantify the complexity of the 13 trait categories across the 172 populations. We first isolated all traits in the dataset that could be dichotomized into binary variables; this included all traits that had the potential to be either present or absent in a population. In many cases, a variable might be coded by Jorgensen as one of multiple states. For these variables, we re-coded the variable as present. For example, the trait “public ceremony associated with warfare” might be coded as pre-conflict, post-conflict, or both, or neither in the original dataset, and so we code it as either present or absent in a society. Traits that could not be dichotomized in this way were excluded, and these excluded 35 variables from the dataset (see Table SI1). For example, traits such as “dominant house type”, “type of weaving”, or “place of storage”, were excluded from the data, as these cannot be absent, nor do they contribute in any meaningful sense to cultural complexity. Our cleaned dataset included 258 traits.

Trait categories

Categories are composed of traits, and the sum of the number of traits within that category indexes the complexity of that trait (see below for more detail). Thus, in our network, each node is a category whose complexity is measured by the frequency of traits in that category. Edges represent the conditional probabilities linking the nodes within the network. We label each of the 13 trait categories as demographic, socioecological, or institutional categories, but note that these labels are simply to keep track of the classes of categories and do not enter into the statistics. Our single demographic category is population density. Socioecological traits were identified as those directly associated with the ecology and technology of human-environment interactions: these include technology; material culture; subsistence complexity; and agricultural intensity. We label categories as institutions if they represent the rules, norms, and customs that define a society, likely transmitted across generations. Categories representing institutions include division of labor; economic complexity; property rights; marriage norms; sociopolitical complexity; war; ritual; and the supernatural. We do not explicitly consider any exogenous variables, such as measures of the environment, as we are interested in understanding how the interplay of the trait categories in our data endogenously constitute the lifestyle diversity we observe in our data.

Data structure

After cleaning and organizing the data, our dataset had the following structure (see Figure 2). Each of the 172 societies in our study is characterized by a lifestyle, , where , represented by a random vector of 13 discrete random variables, each of which represents the state of one of the 11 trait categories in a population, in addition to the measures of population density and agricultural intensity. A category, is the jth trait category, where , in the ith lifestyle. 11 of the categories are binomial random variables comprised by the sum of binary variables representing the presence or absence of constituent traits and the other two are ordinal estimates, as described above. So, a trait, , is the kth trait in the jth category in the ith population. Further, note that each category consists of an exclusive set of binary traits, or each trait belongs to a single trait category, as seen at the lower level of Figure 2. However, at the higher level, each lifestyle consists of a state of every trait category, i.e., every lifestyle has a population density, level of agricultural intensity, or a measure of warfare, and so on. From this structure, cultural diversity is then the multivariate random vector comprised of the hierarchical configuration of the 172 societies, each of which has 13 trait categories capturing the states of 258 individual traits.

Table 1. Variables included in the study. The first two columns indicate Jorgensen’s primary and secondary cultural trait categories. The third column indicates the category names used in this study and the last two columns indicate which of Jorgensen’s variables were included and excluded from the study. Excluded variables were not included because they were not dichotomous.

Jorgensen’s Primary category	Jorgensen’s Secondary category	Trait Category used in this study	Included variables	Excluded variables
Technology and material culture
	Hunting	Technology	138-141
	Fishing	Technology	142-145
	Gathering wild plants	Technology	146-148
	Horticulture	Technology	149-151
	Food preparation and preservation	Material culture	153-159	152
	Boats	Material culture	160
	Housing	Material culture	162-166	161
	Clothing	Material culture	167-181
	Weaving	Material culture	183	182
Subsistence economy
	Agriculture	Agricultural intensity	184
	Agriculture	Subsistence complexity	185-193
	Fishing, Sea mammal hunting, and shellfish collecting	Subsistence complexity	194-198
	Hunting	Subsistence complexity	199-203
	Gathering or extracting	Subsistence complexity	204-208
	Transportation	Subsistence complexity	209-210
	Local resource availability	none	none	211
	Food storage	none	none	212, 213
Economic organization
	Division of labor	Division of labor	214-228, 230-248	229
	Economic distribution	Economic distribution	249-265
	Ownership of property	Property rights	266-269, 271-273	270
	Inheritance of property	Property rights	274-280
Social and kinship organization
	Marriage	Marriage norms	291-301	290
	Family and household	Marriage norms	302-303, 306-307	304-305
	Descent and descent groups	Marriage norms	309-325	308
	Kinship terms	None	none	326-331
Political organization
	Leadership and succession	Sociopolitical complexity	333	332
	Local and extralocal government	Sociopolitical complexity	334
	Criteria of government	Sociopolitical complexity	335-340
	Sodalities	Sociopolitical complexity	341-348
	Warfare	War	349-357, 359-365	358
Ceremonialism, life cycle
	Ceremonialism	Ritual	366-373
	Life cycle: birth	Ritual	374-377, 379-380	378
	Life cycle: naming	Ritual	381, 383-385	382
	Life cycle: Girls' puberty rites	Ritual	386-396
	Life cycle: death	Ritual	397, 400, 402-404	398-399, 401
Spirit quest, shamanism, causes of illness, magic
	Spirit quest	Supernatural	405-410	411
	Shamanism	Supernatural	413-417	412
	Causes of illness	Supernatural	418-422
	Magic	Supernatural	423-430
Settlement and demography
	Settlement pattern	none	none	281-282
	Demography	Population density	285	283-284
	Community organization	none	none	286-289

The 172 indigenous groups were selected by Jorgensen from approximately 250 ethnic units identified in western North America by choosing groups that had high quality ethnographic information. The Western Indians dataset was developed from the earlier Cultural Element Survey of Kroeber and Driver. Jorgensen, a student of Driver, continued this work by recruiting four additional researchers to systematically assess ethnographies on the 172 groups [1]. These researchers evaluated ethnographies for a wide-ranging set of pre-European contact cultural traits and behaviors. Jorgensen notes that most of this information was gathered through interviews with elders that were asked to recall traits or behaviors prior to contact. Jorgensen (1980) employed several mechanisms to reduce biases in data recording and the scoring of each trait and performed checks on the resulting data. Once completed, Jorgensen used these data to evaluate the effect of environment and history on cultural trait variation [1,2].

Missing data imputation

Prior to any analysis, we used a random forest imputation algorithm – *missForest *package – to impute the 5.4% missing data in the dataset. This is considered a small fraction of the data [37]. For the purposes of this study, imputing data is both important and feasible as the dataset is small enough to be impacted by missing random values, but large enough to employ an algorithm thus avoiding issues of statistical circularity. The algorithm starts by imputing missing data with the mode, and then for each missing data point it fits a random forest to the observed part and predicts the missing part. The random forest algorithm is an iterative process improving estimates of the missing values as each new set of trees is trained on more data. Carrying out missing data imputation in this way allows us to use spatial and cultural proximity to improve our estimates and to ensure that our counts of traits are comparable across cultures. We then summed the number of traits within each of our categories.

1. Jorgensen JG. 1980 Western Indians: Comparative environments, languages, and cultures of 172 Western American Indian tribes. New York: WH freeman.

2. Jorgensen JG. 1983 Comparative traditional economics and ecological adaptations. Handbook of North American Indians 10, 684–710.

Sharing/Access information

Links to other publicly accessible locations of the data:

https://github.com/RobertSWalker/wnai_bayesnet

Data was derived from the following sources:

Jorgensen JG. 1980 Western Indians: Comparative environments, languages, and cultures of 172 Western American Indian tribes. New York: WH freeman.

Code/Software

The above dataset was used in combination with R code in the follwing sequence:

We first ran leave-one-out cross validation tests to evaluate the performance of 11 different Bayesian network algorithms available in the bnlearn package [41,42]. Of these, the score-based tabu search algorithm outperformed all the other options. After the Bayesian network algorithm was chosen, we generated 10,000 bootstrapped networks sampling from the original data with replacement. From this large sample of networks, we extracted edges that had strengths of at least 0.8, meaning that the association between two variables was found in at least 80% of all bootstrapped networks. In addition, edge directionalities were computed: these measure the proportion of bootstrapped networks in which a certain edge links with a particular node in a certain direction. These links represent the causal flows. A directionality score of 0.5 indicates the conditional dependencies point one way in 50% of the bootstrapped networks, whereas a directionality score of 1 indicates consistent directionality in 100% of the bootstrapped networks. Note that directionality scores close to 0.5 do not imply no directionality, but suggests the flows between variables is bidirectional.

In the second phase of the analysis, we used Bayesian hierarchical path models to conduct parameter learning in order to estimate the slope coefficients of each of the network edges, using the brms package [43]. Because we use groups (i.e., Jorgensen’s “tribes”) as the unit of analysis, this introduces non-independence from spatial and linguistic clustering. To estimate these parameters, while controlling the impact of the spatial and linguistic clustering on our results we constructed Bayesian hierarchical path models in the brms package using a Gaussian process to model space and we included a varying intercept to control for language phyla (we use a varying intercept to control for language as there is no well-established phylogeny for the native languages of western North America). Using this procedure, we estimated the slopes for all the edges in our network and r-squares for all dependent variables controlling for both spatial and linguistic autocorrelation.