Skip to main content

Data from: Extending null scenarios with Faddy distributions in a probabilistic randomization protocol for presence-absence data

Cite this dataset

Navarro Alberto, Jorge; Manly, Bryan; Gerow, Kenneth (2021). Data from: Extending null scenarios with Faddy distributions in a probabilistic randomization protocol for presence-absence data [Dataset]. Dryad.


1. The analysis of species occurrences at discrete locations makes use of statistical methods intended to elucidate whether a random process can explain a particular observed pattern of presences-absences (1-0). Various statistical methods have contributed to the development of null model analysis of (1-0) data in community ecology using randomization tests. Frequentist techniques assuming probability distributions under the null scenarios have been proposed, as in the work by Navarro and Manly (2009) (NM), a protocol that has been applied in the analysis of plant and microbial communities, and chemical hazards.

2. The NM method assumes that presences-absences are governed by independent Bernoulli random variables, and that a non-observable non-negative random variable (“quasi-abundance”) is associated to each species on each location. The quasi-abundance is presumed to follow any of three possible distributions (Poisson, Binomial and Negative Binomial) and to be log-linearly related to the qualitative effects of species and location. By connecting the probability of occurrence of each species on each location, and the "best" quasi-abundance distribution (chosen by profile deviance), it is possible to estimate that probability by generalized linear modelling, which is used, in turn, to generate random matrices via parametric bootstrap. The question now is whether just three distributions are enough to support an “optimal” null model.

3. We provide the theoretical formulation of the original NM protocol for null model analysis, and then expand the quasi-abundance distributions, based on extended Poisson processes (Faddy 1997), to allow general distributions of over-dispersed and under-dispersed discrete random variables. The method is illustrated using presence-absence data of island lizard communities.

4. For the binomial case and Faddy distributions, nonlinear constrained optimization algorithms are needed in order to get maximum likelihood estimates thus, the null-model selection process faces challenging numerical problems (non-convergence to the global optimum). In addition, the process may end up suggesting that the best fitted probabilities for the generation of null matrices are those obtained from links different to the canonical logistic link. This property of the NM protocol should not be ignored, as an improper choice of the null matrix universe may impact the outcome of randomization tests.


The file contains coded presence-absence data of 20 lizard species on 25 islands in the Gulf of California (Case 1983; Manly 1995). Species and island numbers follow the order shown in Manly (1995). Each space-delimited line corresponds to a single species (20 lines in the example); each line always ends with a –1.  A positive number correspond to an island number, i.e., the column in the presence-absence matrix where that species appeared. With the exception of the last number on each line, a negative number will always be preceded by a positive number indicating the range of islands where the species occurs, from the positive number to the absolute value of the negative number.

Example. The third line in lizgal.txt (species Scleoporus orcutti) contains the numbers:

13_–16_22_23_–1    (_ indicates a whitespace)

The line indicates that Scleoporus orcutti occurred on islands 13, 14, 15, 16, 22 and 23.


Case, T. J. (1983) Niche overlap and the assembly of island lizard communities. Oikos 41:427-433.

Manly, B. F. J. (1995) A note on the analysis of species co-occurrences. Ecology 76:1109-1115.


Usage notes

This data file is read by R-scripts “FaddyP-NB.r”, “FaddyBinom.r” and “Faddyover-under.r” for null model selection. Each program, included in the Supporting Information section, detects the total number of rows and columns from the coded presence-absence data, and converts these into a binary matrix.