Causal evidence for social group sizes from Wikipedia editing data

Published Apr 08, 2024 on Dryad. https://doi.org/10.5061/dryad.fn2z34v36

Data files

Apr 08, 2024 version files 320.67 KB

episodeclusters.dat
2.68 KB
README.md
3.63 KB
trust.dat
296.29 KB
workclusters.dat
18.07 KB

Abstract

Human communities have self-organizing properties in which specific Dunbar Numbers may be invoked to explain group attachments. By analyzing Wikipedia editing histories across a wide range of subject pages, we show that there is an emergent coherence in the size of transient groups formed to edit the content of subject texts, with two peaks averaging at around $N=8$ for the size corresponding to maximal contention, and at around $N=4$ as a regular team. These values are consistent with the observed sizes of conversational groups, as well as the hierarchical structuring of Dunbar graphs. We use the Promise Theory model of bipartite trust to derive a scaling law that fits the data and may apply to all group size distributions, when based on attraction to a seeded group process. In addition to providing further evidence that even spontaneous communities of strangers are self-organizing, the results have important implications for the governance of the Wikipedia commons and for the security of all online social platforms and associations.

https://doi.org/10.5061/dryad.fn2z34v36

This is part of a project to formulate a practical Promise Theory model of trust for our Internet and machine enabled age. It is not related to blockchain or so-called trustless technologies, and is not specifically based on cryptographic techniques. Rather it addresses trustworthiness as an assessment of reliability in keeping specific promises and trust as a tendency to monitor or oversee these processes.

The files measure data grabbed by parsing the history logs of many Wikipedia pages. While looking for evidence of signatures about trust, we discovered evidence of ad hoc group formation in users editing pages, consistent with the Dunbar number hypothesis.

We provide the cache of data used in our paper here, in accordance with procedure, but we encourage anyone to collect data themselves using the code referred to below or their own adaptation of it. The Wikipedia data are continuously observable.

Not all of the columns are used in the analysis.

A full description can be found at:

https://github.com/markburgess/Trustability

Description of the data and file structure

The file trust.dat has columns separated by spaces like this:

output := fmt.Sprintln(

	L,         // 1 text

	LL,        // 2

	N,         // 3 users

	NL,        // 4

	N2,        // 5 users-cluster

	N2L,       // 6

	I,         // 7 issues

	IL,        // 8

	w,         // 9 process work ratio (talk/article)

	wL,        // 10

	u,         // 11 mistrust sample work ratio

	uL,        // 12

	mistrust,  // 13 s/H

	mistrustL, // 14

	TG,        // 15 av episode duration i.e. group interaction duration

	TU,        // 16 av episode duration per user

	TGL,       // 17

	TUL,       // 18

	TU2,       // 19 av episode duration per user

	TU2L,      // 20 av episode duration per user

	bot_fraction, // 21 bots/human users

)

Graphs are generated from these data as described in the gnuplot input file

https://github.com/markburgess/Trustability/blob/main/src/gnuplot.in

The contention intensity for mistrust signals:

plot [0:15] “trust.dat” using 3:7

Describe relationships between data files, missing data codes, other abbreviations used. Be as descriptive as possible.

Workcluster and episode cluster graphs are formatted as frequency histograms for representative group sizes during editing “episodes”.

…….h := float64(histogram[n])

s := fmt.Sprintf("%f %f %f %f\n",float64(n),float64(h/n_tot),math.Log(float64(n)),math.Log(float64(h/n_tot)))

Again, we refer to the extensive notes at https://github.com/markburgess/Trustability

Sharing/Access information

This is a section for linking to other ways to access the data, and for linking to sources the data is derived from, if any.

Links to other publicly accessible locations of the data:

Data was derived from the following sources:

https://en.wikipedia.org

Code/Software

Additional aggregation and graph formation can be found here

https://github.com/markburgess/Trustability/tree/main/data/GeneratePlots

Causal evidence for social group sizes from Wikipedia editing data

Data files

Abstract

README: Causal evidence for social group sizes from Wikipedia editing data

Description of the data and file structure

Sharing/Access information

Code/Software

Methods

Works referencing this dataset