Influence of number of individuals and observations per individual on a model of community structure
Data files
Jan 25, 2022 version files 269.61 KB
-
truenetwork.csv
-
TRUENETWORKCSVreadme.txt
Abstract
Social network analysis is increasingly applied to understand animal groups. However, it is rarely feasible to observe every interaction among all individuals in natural populations. Studies have assessed how missing information affects estimates of individual network positions, but less attention has been paid to metrics that characterize overall network structure such as modularity, clustering coefficient, and density. In cases such as groups displaying fission-fusion dynamics, where subgroups break apart and rejoin in changing conformations, missing information may affect estimates of global network structure differently than in groups with distinctly separated communities due to the influence single individuals can have on the connectivity of the network. Using a bat maternity group showing fission-fusion dynamics, we quantify the effect of missing data on global network measures including community detection. In our system, estimating the number of communities was less reliable than detecting community structure. Further, reliably assorting individual bats into communities required fewer individuals and fewer observations per individual than to estimate the number of communities. Specifically, our metrics of global network structure (i.e., graph density, clustering coefficient, Rcom) approached the ‘real’ values with increasing numbers of observations per individual and, as the number of individuals included increased, the variance in these estimates decreased. Similar to previous studies, we recommend that more observations per individual should be prioritized over including more individuals when resources are limited. We recommend caution when making conclusions about animal social networks when a substantial number of individuals or observations are missing, and when possible, suggest subsampling large datasets to observe how estimates are influenced by sampling intensity. Our study serves as an example of the reliability, or lack thereof, of global network measures with missing information, but further work is needed to determine how estimates will vary with different data collection methods, network structures, and sampling periods.
Methods
This dataset contains recordings of little brown myotis (Myotis lucifugus) that were implanted with passive integrated transponder (PIT) tags at 11 artificial roost boxes in Salmonier Nature Park, Newfoundland in 2016. Data were filtered such that only individuals with at least 40 observations over the course of the summer were included. For individuals with more than 40 observations, 40 observations were randomly selected to create a balanced network. All animal handling protocol was approved by the animal care committee of Saint Mary’s University, Halifax, Nova Scotia (AUP #16-12). Wildlife scientific research permits were also obtained from the Government of Newfoundland and Labrador, Department of Fisheries and Land Resources, Forestry and Wildlife Branch (# WLR2016-12).
Usage notes
It is important to note that this dataset does not contain all observations of PIT tagged bats in 2016 and has been substantially subsetted from the available data. Sample code for randomizing these data and generating the figures appearing in the manuscript can be found at https://github.com/juliasunga/sample_size_community_models