Skip to main content
Dryad logo

Data from: Phylogenetic relatedness drives protists assembly in marine and terrestrial environments


Lentendu, Guillaume; Dunthorn, Micah (2021), Data from: Phylogenetic relatedness drives protists assembly in marine and terrestrial environments, Dryad, Dataset,


Aim: Assembly of protists communities is known to be driven mainly by environmental filtering, but the imprint of phylogenetic relatedness is unknown. In this study, we aim to test the degree at which co-occurrences and co-exclusions of protists in different phylogenetic relatedness classes are deviating from random expectation in two ecosystems in order to link them to ecological processes.

Location: Global open-oceans and Neotropical rainforest soils

Major taxa: Protists

Time period: 2009-2013

Methods: Protist metabarcoding data originated from two large scale studies. Co-occurrence and co-exclusion networks were constructed using a recent method combining a null distribution model with Spearman’s rank correlation coefficients among pairs of OTU. Phylogenetic relatedness was estimated using either global pairwise sequence distance or phylogenetic distance inferred from best maximum-likelihood trees derived from multiple alignments of OTU representative sequences. Significance of observed patterns relating networks and phylogenies were evaluated by distance classes against two null models in which either the tips of the phylogenetic trees or the network edges were randomized.

Results: Closely-related protists co-occurred more often than expected by chance in all datasets, but also co-excluded less often than expected by chance in the marine dataset only. Concurrent excess of co-occurrences and co-exclusions were observed at intermediate phylogenetic distances in the marine dataset.

Main conclusions: This suggest that environmental filtering and dispersal limitation are the dominant forces driving protists co-occurrences in both environments, while signal of competitive exclusion was only detected in the marine environment. Co-exclusion differences are potentially linked to the individual environments: marine waters are more homogeneous, while the rainforest soils contain a myriad of nutrient rich micro-environment reducing the strength of mutual exclusion.


Protistan OTUs from the world’s open oceans and seas came from de Vargas et al. (2015). This marine dataset is composed of 355 samples collected at the surface and deep chlorophyll maximum (DCM) in six oceans and two seas, which produced 366,800,845 protist reads of the V9 hyper-variable region of the SSU-rRNA locus that clustered into 302,663 OTUs. To allow for comparison, the version of this marine dataset used here was re-analyzed by Mahé et al. (2017). All filter-size classes libraries of either the surface or DCM at a single station were pooled together, thus the number of samples used here reduced to 47 for surface and 32 for DCM waters. This correspond to the "Dataset S1" of this archive which an OTU table provided into the standard JSON BIOM format.

Protistan OTUs from three lowland Neotropical rainforests came from Mahé et al. (2017). This terrestrial dataset is composed of 144 samples collected at the soil surface, which produced 46,652,206 protist reads of the V4 hyper-variable region of the SSU-rRNA locus that clustered into 26,860 OTUs. Sequence processing, OTU clustering with Swarm v2 (Mahé et al., 2015), and taxonomic assignments using the PR² database (Guillou et al., 2013). This correspond to the "Dataset S2" of this archive which an OTU table provided into the standard JSON BIOM format.

The complete bash and R scripts to reproduce the analyses described in the linked publication (Lentendu & Dunthorn, 2021) are provided in the "File S1" in HTML format.

Usage Notes

The two dataset files follow the BIOM format (
Ways to open these datasets are provided in section 2.1 and 2.2 of the "File S1".
The "File S1" is a standard HTML file which can be open with any web browser.
The full list of dependencies to run the analyses described in "File S1" is provide in the section 8 of this file.
All files were produced and analysed using Linux operating systems (Ubuntu 18.04.4 on a personal computer and Debian 4.19.12-1 on an HPC).


Deutsche Forschungsgemeinschaft, Award: DU1319/5-1