Data from: Institutional complexity emerges from socioecological complexity in small-scale human societies
Data files
Apr 18, 2024 version files 61.42 KB
-
README.md
24.91 KB
-
WNAI_data.xlsx
36.51 KB
Abstract
Human lifestyles vary enormously over time and space and so understanding the origins of this diversity has always been a central focus of anthropology. A major source of this cultural variation is the variation in institutional complexity; the cultural packages of rules, norms, ontologies, and expectations passed down through societies across generations. In this paper we study the emergence of institutions in small-scale societies. There are two primary schools of thought. The first is that institutions emerge top-down as rules are imposed by elites on their societies in order to gain asymmetrical access to power, resources, and influence over others through coercion. The second is that institutions emerge bottom-up to facilitate interactions within populations as they seek collective solutions to adaptive problems. Here, we use Bayesian networks to infer the causal structure of institutional complexity in 172 small-scale societies across ethnohistoric western North America reflecting the wide diversity of indigenous lifestyles across this vast region immediately prior to European colonization. Our results suggest that institutional complexity emerges from underlying socioecological complexity because institutions are solutions to coordination problems in more complex environments where human-environment interactions require increased management.
https://doi.org/10.5061/dryad.gb5mkkwz3
These data were compiled from Jorgensen JG. 1980 Western Indians: Comparative environments, languages, and cultures of 172 Western American Indian tribes. New York: WH freeman and restructured following the Methods outlined in the main paper accompanying this dataset.
Variable description
*society_name_dplace *The name of the society in D-Place
*society_xd_id *ID number for each society
*language_glottocode *The equivalent language name in the glottolog
*language_name *The name of the langauge
*language_family *The family of the language
*group *The common name of the group
*Lat *Latitude
*Long *Longitude
*phyla *Ordinal variable representing the language phyla
*technology *Ordinal variable representing the number of technological traits
Subsistence *Ordinal variable representing the number of subsistence traits *
*Politics *Ordinal variable representing the number of sociopolitical traits
Agricultural production *Ordinal variable representing the number of agricultural traits *
material_culture *Ordinal variable representing the number of material culture traits *
div_of_labor *Ordinal variable representing the division of labor *
econ_dist *Ordinal variable representing the number of ceremonial traits *
property *Ordinal variable representing the number of property rights traits *
marriage *Ordinal variable representing the number of marriage norms traits *
*family *Ordinal variable representing the number of marriage norms traits
*descent *Ordinal variable representing the number of marriage norms traits
*war *Ordinal variable representing the number of warfare traits
*ceremony *Ordinal variable representing the number of ritual traits
*lifecycle *Ordinal variable representing the number of ritual traits
Spirits *Ordinal variable representing the number of supernatural traits *
*shamans *Ordinal variable representing the number of supernatural traits
*illness *Ordinal variable representing the number of supernatural traits
*magic *Ordinal variable representing the number of supernatural traits
pop density Ordinal variable representing the level of population density
Description of the data and file structure
The Western Indians dataset contains information on a wide diversity of lifestyle traits classified by Jorgensen into eight primary categories: 1) technology and material culture; 2) subsistence economy; 3) economic organization; 4) social and kinship organization; 5) political organization; 6) ceremonialism and life cycle; 7) spirit quest, shamanism, causes of illness, magic; and 8) settlement and demography, which are further divided into subcategories. Based on Jorgensen’s primary categories, we re-categorized the data as we identified several subcategories that contained important variables that were worthy of consideration as independent categories. Our category labels and their relation to Jorgensen’s original categories and subcategories are presented in Table SI1. We clustered Jorgensen’s categories into 13 categories for our analysis including: 1) technology; 2) material culture; 3) subsistence complexity; 4) division of labor; 5) economic complexity; 6) property; 7) marriage norms; 8) sociopolitical complexity; 9) war; 10) ritual; 11) the supernatural, 12) agricultural intensity and 13) population density. Categories 1-11 are binomial variables, marked as present or absent (see below for more detail) whereas categories 12 and 13 are ordinal measuring increasing reliance on agriculture and population density, respectively.
Traits
Our goal was to quantify the complexity of the 13 trait categories across the 172 populations. We first isolated all traits in the dataset that could be dichotomized into binary variables; this included all traits that had the potential to be either present or absent in a population. In many cases, a variable might be coded by Jorgensen as one of multiple states. For these variables, we re-coded the variable as present. For example, the trait “public ceremony associated with warfare” might be coded as pre-conflict, post-conflict, or both, or neither in the original dataset, and so we code it as either present or absent in a society. Traits that could not be dichotomized in this way were excluded, and these excluded 35 variables from the dataset (see Table SI1). For example, traits such as “dominant house type”, “type of weaving”, or “place of storage”, were excluded from the data, as these cannot be absent, nor do they contribute in any meaningful sense to cultural complexity. Our cleaned dataset included 258 traits.
Trait categories
Categories are composed of traits, and the sum of the number of traits within that category indexes the complexity of that trait (see below for more detail). Thus, in our network, each node is a category whose complexity is measured by the frequency of traits in that category. Edges represent the conditional probabilities linking the nodes within the network. We label each of the 13 trait categories as demographic, socioecological, or institutional categories, but note that these labels are simply to keep track of the classes of categories and do not enter into the statistics. Our single demographic category is population density. Socioecological traits were identified as those directly associated with the ecology and technology of human-environment interactions: these include technology; material culture; subsistence complexity; and agricultural intensity. We label categories as institutions if they represent the rules, norms, and customs that define a society, likely transmitted across generations. Categories representing institutions include division of labor; economic complexity; property rights; marriage norms; sociopolitical complexity; war; ritual; and the supernatural. We do not explicitly consider any exogenous variables, such as measures of the environment, as we are interested in understanding how the interplay of the trait categories in our data endogenously constitute the lifestyle diversity we observe in our data.
Data structure
After cleaning and organizing the data, our dataset had the following structure (see Figure 2). Each of the 172 societies in our study is characterized by a lifestyle, , where
, represented by a random vector of 13 discrete random variables, each of which represents the state of one of the 11 trait categories in a population, in addition to the measures of population density and agricultural intensity. A category,
is the jth trait category, where
, in the ith lifestyle. 11 of the categories are binomial random variables comprised by the sum of binary variables representing the presence or absence of constituent traits and the other two are ordinal estimates, as described above. So, a trait,
, is the kth trait in the jth category in the ith population. Further, note that each category consists of an exclusive set of binary traits, or each trait belongs to a single trait category, as seen at the lower level of Figure 2. However, at the higher level, each lifestyle consists of a state of every trait category, i.e., every lifestyle has a population density, level of agricultural intensity, or a measure of warfare, and so on. From this structure, cultural diversity is then the multivariate random vector comprised of the hierarchical configuration of the 172 societies, each of which has 13 trait categories capturing the states of 258 individual traits.
Table 1. Variables included in the study. The first two columns indicate Jorgensen’s primary and secondary cultural trait categories. The third column indicates the category names used in this study and the last two columns indicate which of Jorgensen’s variables were included and excluded from the study. Excluded variables were not included because they were not dichotomous.
Jorgensen’s Primary category | Jorgensen’s Secondary category | Trait Category used in this study | Included variables | Excluded variables |
---|---|---|---|---|
Technology and material culture | ||||
Hunting | Technology | 138-141 | ||
Fishing | Technology | 142-145 | ||
Gathering wild plants | Technology | 146-148 | ||
Horticulture | Technology | 149-151 | ||
Food preparation and preservation | Material culture | 153-159 | 152 | |
Boats | Material culture | 160 | ||
Housing | Material culture | 162-166 | 161 | |
Clothing | Material culture | 167-181 | ||
Weaving | Material culture | 183 | 182 | |
Subsistence economy | ||||
** ** | Agriculture | Agricultural intensity | 184 | |
Agriculture | Subsistence complexity | 185-193 | ||
Fishing, Sea mammal hunting, and shellfish collecting | Subsistence complexity | 194-198 | ||
Hunting | Subsistence complexity | 199-203 | ||
Gathering or extracting | Subsistence complexity | 204-208 | ||
Transportation | Subsistence complexity | 209-210 | ||
Local resource availability | none | none | 211 | |
Food storage | none | none | 212, 213 | |
Economic organization | ||||
Division of labor | Division of labor | 214-228, 230-248 | 229 | |
Economic distribution | Economic distribution | 249-265 | ||
Ownership of property | Property rights | 266-269, 271-273 | 270 | |
Inheritance of property | Property rights | 274-280 | ||
Social and kinship organization | ||||
Marriage | Marriage norms | 291-301 | 290 | |
Family and household | Marriage norms | 302-303, 306-307 | 304-305 | |
Descent and descent groups | Marriage norms | 309-325 | 308 | |
Kinship terms | None | none | 326-331 | |
Political organization | ||||
Leadership and succession | Sociopolitical complexity | 333 | 332 | |
Local and extralocal government | Sociopolitical complexity | 334 | ||
Criteria of government | Sociopolitical complexity | 335-340 | ||
Sodalities | Sociopolitical complexity | 341-348 | ||
Warfare | War | 349-357, 359-365 | 358 | |
Ceremonialism, life cycle | ||||
Ceremonialism | Ritual | 366-373 | ||
Life cycle: birth | Ritual | 374-377, 379-380 | 378 | |
Life cycle: naming | Ritual | 381, 383-385 | 382 | |
Life cycle: Girls’ puberty rites | Ritual | 386-396 | ||
Life cycle: death | Ritual | 397, 400, 402-404 | 398-399, 401 | |
Spirit quest, shamanism, causes of illness, magic | ||||
Spirit quest | Supernatural | 405-410 | 411 | |
Shamanism | Supernatural | 413-417 | 412 | |
Causes of illness | Supernatural | 418-422 | ||
Magic | Supernatural | 423-430 | ||
Settlement and demography | ||||
Settlement pattern | none | none | 281-282 | |
Demography | Population density | 285 | 283-284 | |
Community organization | none | none | 286-289 |
The 172 indigenous groups were selected by Jorgensen from approximately 250 ethnic units identified in western North America by choosing groups that had high quality ethnographic information. The Western Indians dataset was developed from the earlier Cultural Element Survey of Kroeber and Driver. Jorgensen, a student of Driver, continued this work by recruiting four additional researchers to systematically assess ethnographies on the 172 groups [1]. These researchers evaluated ethnographies for a wide-ranging set of pre-European contact cultural traits and behaviors. Jorgensen notes that most of this information was gathered through interviews with elders that were asked to recall traits or behaviors prior to contact. Jorgensen (1980) employed several mechanisms to reduce biases in data recording and the scoring of each trait and performed checks on the resulting data. Once completed, Jorgensen used these data to evaluate the effect of environment and history on cultural trait variation [1,2].
Missing data imputation
Prior to any analysis, we used a random forest imputation algorithm – *missForest *package – to impute the 5.4% missing data in the dataset. This is considered a small fraction of the data [37]. For the purposes of this study, imputing data is both important and feasible as the dataset is small enough to be impacted by missing random values, but large enough to employ an algorithm thus avoiding issues of statistical circularity. The algorithm starts by imputing missing data with the mode, and then for each missing data point it fits a random forest to the observed part and predicts the missing part. The random forest algorithm is an iterative process improving estimates of the missing values as each new set of trees is trained on more data. Carrying out missing data imputation in this way allows us to use spatial and cultural proximity to improve our estimates and to ensure that our counts of traits are comparable across cultures. We then summed the number of traits within each of our categories.
1. Jorgensen JG. 1980 Western Indians: Comparative environments, languages, and cultures of 172 Western American Indian tribes. New York: WH freeman.
2. Jorgensen JG. 1983 Comparative traditional economics and ecological adaptations. Handbook of North American Indians 10, 684–710.
Sharing/Access information
Links to other publicly accessible locations of the data:
Data was derived from the following sources:
- Jorgensen JG. 1980 Western Indians: Comparative environments, languages, and cultures of 172 Western American Indian tribes. New York: WH freeman.
Code/Software
The above dataset was used in combination with R code in the follwing sequence:
We first ran leave-one-out cross validation tests to evaluate the performance of 11 different Bayesian network algorithms available in the bnlearn package [41,42]. Of these, the score-based tabu search algorithm outperformed all the other options. After the Bayesian network algorithm was chosen, we generated 10,000 bootstrapped networks sampling from the original data with replacement. From this large sample of networks, we extracted edges that had strengths of at least 0.8, meaning that the association between two variables was found in at least 80% of all bootstrapped networks. In addition, edge directionalities were computed: these measure the proportion of bootstrapped networks in which a certain edge links with a particular node in a certain direction. These links represent the causal flows. A directionality score of 0.5 indicates the conditional dependencies point one way in 50% of the bootstrapped networks, whereas a directionality score of 1 indicates consistent directionality in 100% of the bootstrapped networks. Note that directionality scores close to 0.5 do not imply no directionality, but suggests the flows between variables is bidirectional.
In the second phase of the analysis, we used Bayesian hierarchical path models to conduct parameter learning in order to estimate the slope coefficients of each of the network edges, using the brms package [43]. Because we use groups (i.e., Jorgensen’s “tribes”) as the unit of analysis, this introduces non-independence from spatial and linguistic clustering. To estimate these parameters, while controlling the impact of the spatial and linguistic clustering on our results we constructed Bayesian hierarchical path models in the brms package using a Gaussian process to model space and we included a varying intercept to control for language phyla (we use a varying intercept to control for language as there is no well-established phylogeny for the native languages of western North America). Using this procedure, we estimated the slopes for all the edges in our network and r-squares for all dependent variables controlling for both spatial and linguistic autocorrelation.
These data were compiled from Jorgensen JG. 1980 Western Indians: Comparative environments, languages, and cultures of 172 Western American Indian tribes. New York: WH freeman and restructured following the Methods outlined in the main paper accompanying this dataset.