The performance of permutations and exponential random graph models when analysing animal networks (R code and data)
Data files
Aug 18, 2020 version files 136.16 KB
Abstract
Social network analysis is a suite of approaches for exploring relational data. Two approaches commonly used to analyse animal social network data are permutation-based tests of significance and exponential random graph models. However, the performance of these approaches when analysing different types of network data has not been simultaneously evaluated. Here we test both approaches to determine their performance when analysing a range of biologically realistic simulated animal social networks. We examined the false positive and false negative error rate of an effect of a two-level explanatory variable (e.g. sex) on the number and combined strength of an individual’s network connections. We measured error rates for two types of simulated data collection methods in a range of network structures, and with/without a confounding effect and missing observations. Both methods performed consistently well in networks of dyadic interactions, and worse on networks constructed using observations of individuals in groups. Exponential random graph models had a marginally lower rate of false positives than permutations in most cases. Phenotypic assortativity had a large influence on the false positive rate, and a smaller effect on the false negative rate for both methods in all network types. Aspects of within- and between-group network structure influenced error rates, but not to the same extent. In grouping-event based networks, increased sampling effort marginally decreased rates of false negatives, but increased rates of false positives for both analysis methods. These results provide guidelines for biologists analysing and interpreting their own network data using these methods.
Methods
Here we provide:
a) The R code for the simulations used in the Beahvioral Ecology paper The performance of permutations and exponential random graph models when analysing animal networks alongside a csv used to provide parameters for network generation
b) A summary dataset produced by our run of the simulations
c) The necessary code to produce the network plot and all results plots used in the paper
The R code provided for the simulations are the full set of functions used to generate and analyse the networks as described in the paper. We do not provide the wrapper code we used to run these functions on a specific high performance computing cluster based at the University of Exeter Cornwall Campus. We also provide the .csv file that provided parameter information to generate networks
The summary dataset provided contains (summarised) output from our simulation run used in the paper. It is sufficient to reproduce the plots used in the results section of the paper.
We also provide the R code used to generate these plots, and also to generate plots of networks generated using the simulation functions provided.
Usage notes
Simulation R code is provided in a format where it can be used flexibly as desired by a researcher. Use in a HPC environment will require use of wrapper scripts to run the functions multiple times with different parameter sets.
The plotting code will run with the input data files provided (network plotting requires the parameter set csv and result plotting requires the summarised data csv)