Data and code from: The forest within the forest: restoration of Galapagos host-epiphyte networks
Data files
Jun 01, 2026 version files 417.35 KB
-
DATA.zip
262.42 KB
-
R_SessionInfo_packages.docx
45.80 KB
-
README.md
10.66 KB
-
SCRIPTS.zip
98.47 KB
Abstract
Ecosystem restoration is increasingly implemented to mitigate biodiversity loss. Epiphytes can represent up to a third of tropical plant diversity and perform key ecosystem functions. Yet, they are highly vulnerable to deforestation and their recovery under restoration remains understudied. We evaluated the effect of invasive plant removal (Rubus niveus and Cestrum auriculatum) on facultative host-epiphyte interactions in the endemic and endangered Scalesia pedunculata forest on Santa Cruz Island (Galapagos). Using 20 paired 10 × 10 m plots (invaded vs. 11 years of restoration), we compared host–epiphyte network structure, epiphyte diversity, and the main factors influencing epiphyte richness. While network descriptors did not differ between treatments, restored plots supported higher epiphyte richness. Epiphyte richness per host increased with moss cover and host tree diameter (DBH), both key facilitators of epiphyte colonization, and was significantly higher under restoration, indicating improved host suitability. The endemic and threatened S. pedunculata was identified as a keystone host, with the highest species strength and degree. However, its lack of regeneration in invaded plots threatens long-term epiphyte persistence, as invasive hosts offer minimal epiphyte support. Our findings demonstrate that restoration enhanced epiphyte richness by fostering suitable hosts, highlighting the relevance of integrating biotic interactions into restoration planning and monitoring.
Dataset DOI: 10.5061/dryad.g4f4qrg3h
Description of the data and file structure
These data were collected during fieldwork in the Scalesia pedunculata forest on Santa Cruz Island (Galapagos, Ecuador), near the twin volcanic sinkholes “Los Gemelos” (550–600 m a.s.l.; 00°37′20″ S, 90°23′00″ W), and are associated with manuscript entitled: "The forest within the forest: restoration of Galapagos host-epiphyte networks". Between October - December 2024, twenty 10 × 10 m plots were sampled, organized as 10 pairs representing two treatments: invaded (no management) vs. restored forest, where invasive species (Rubus niveus and Cestrum auriculatum) are regularly removed. Paired plots were located 20 m apart, and pairs were at least 50 m from the nearest pair.
Sampling followed the grid-point intercept method with 36 points per plot. At each grid point, all potential host plants (>1 m height or >5 cm diameter at breast height, DBH) were recorded, and the occurrence of associated vascular epiphytes (including mosses, ferns, angiosperms, vines, and one hemi-parasite) was assessed. For each host plant, species, DBH, proportion of moss cover, and all associated epiphytes were recorded.
This repository contains 7 data files (provided as compressed archive DATA.zip) and 8 R scripts (provided in the compressed archive SCRIPTS.zip). All analyses were conducted in R version 4.4.2 (2024-10-31). Required packages are specified in each script. All data files should be placed in the working directory, and the incidence matrices must be stored in the folder /data_plot_new within the working directory.
To facilitate reproducibility, detailed package and dependency version information is provided in the file:
R_SessionInfo_packages.docx. This file contains the output of sessioninfo::session_info() for each of the eight R scripts provided in this repository
For more details on the experimental design and methods, see the Methods section of the manuscript.
DATA
1) RAW_Data_o.fern.xlsx – raw dataset (excluding originally documented on-ground ferns (> 1 m height) as potential host plants) that was used as the basis for all analyses. The sampling unit is the host level, so that each row is a unique potential host plant individual (tree/shrub > 1 m heights and/or 5 cm DBH). A second tab has been added to this Excel file containing a detailed description of all variables (column names), including abbreviations, units of measurement, and categorical information where applicable. These variable descriptions are also provided below.
Columns are:
- "Treatment" = Whether the sampled host tree occurred in an invaded or a restored plot
- "Plot" = ID of the plot in which the individual was sampled (I = invaded, R = restored, plot number = 1-10)
- "Date" = Date of sampling
- "Point" = Grid-point-intercept location where the individual was sampled
- "Host species" = Identified species of the host
- "Host type" = Tree or shrub
- "Host_ID" = Number of the individual while sampling the corresponding plot
- "Host code" = Unique code constructed from the plot ID, the grid point, h for “host” and the Host_ID
- "Tree height (m)" = Estimated height of the host (in meters)
- "Perimeter (cm)" = Measure perimeter at breast height (in centimetres) in the field using a measuring tape
- "DBH (cm)" = diameter at breast height (calculated from perimeter)
- "moss cover (%)" = Visually estimated moss cover (in percentage %)
- "notes (host)" = Additional information about host individuals’ conditions (e.g., parts were dead, laying on the ground, damaged etc.)
- "Epiphyte species" = Identified species growing on top of the host individual
- "Abundance" = Counted individuals of the corresponding epiphyte species. Zeros indicate cases where no epiphytes were recorded on the corresponding host tree.
- "mean height" = Mean height calculated using the information of the minimum and maximum height this epiphyte species occurred (in meters). Counted individuals of the corresponding epiphyte species. Zeros indicate cases where no epiphytes were recorded on the corresponding host tree.
- "Min. height" = Lowest observed occurrence of the epiphyte species on this host, (visually estimated, in meters). Zeros indicate cases where no epiphytes were recorded on the corresponding host tree.
- "Max. height" = Highest observed occurrence of the epiphyte species on this host, (visually estimated in meters). Zeros indicate cases where no epiphytes were recorded on the corresponding host tree.
- "on moss (y/n)" = Whether the epiphyte was growing on moss (y=yes, on moss) or not (n=no moss). Zeros indicate cases where no epiphytes were recorded on the corresponding host tree.
- "flowering (f.)/sori (s.)/none" = Whether the epiphyte was flowering (f.), showing sori (s.; reproductive structures of ferns) or vegetative state (none). Zeros indicate cases where no epiphytes were recorded on the corresponding host tree.
- "notes (epiphyte)" = Additional information about the epiphyte species, when applicable
- "photo #"= Indicates whether photos were taken
2) origin_host.xlsx – Contains the list of host species and their origin status (endemic, native, introduced, invasive). This file is used in the script “single_network_final.R”.
3) data_plot_new/ - Folder containing incident matrices as .csv for all of the 20 analysed plots, (10 invaded and 10 restored plots; IDs: I1-I10 & R1-R10). The matrices were generated using the sampled data (RAW_Data_o.fern.xlsx) in R. Columns are:
- "plot" = identifier of each plot
- "pairID" = the number identifying the plot pair
- epiphyte species – all species occurring in this plot.
- Rows: sampled host species in the plot.
- Values: interaction frequency between each host and epiphyte species, corresponding to the abundance of epiphyte individuals of each species on each host species.
This folder is used in the R-script “single_network_final” to visualise interactions as bipartite networks.
4) network_metrics_results_extended.csv – Dataset at the plot level, including information for each plot ("plot", "treatment" "pairID" as in the other datasets) on various network-level metrics:
- "connectance"
- "nestedness"
- "modularity"
- "shannon"
- "interaction_evenness"
- "H2"
- "robustness_LL"
Additional variables: - "Host.spp": number of different host species sampled in each plot
- "epiphyte.spp": number of different epiphyte species sampled in each plot.
- "network.size": the total number of possible links, calculated as the product of the number of host species and epiphyte species.
5) species_metrics_combined.csv – Dataset containing species-level metrics for the seven host species shared across treatments (Host ID 1-7). Metrics were calculated separately for the restored and the invaded treatment (calculated from all the 10 plots per treatment). Metrics included:
- "normalised.degree"
- "species.strength"
- "d"= Specialisation (d`)
6) overview_o.fern.xlsx – dataset at the plot level that includes calculated information for each plot per treatment. Columns (calculated variables) are:
-
"epiphyte richness"
-
"epiphyte abundance"
-
"obligate epiphytes richness"
-
"accidental epiphytes richness"
-
"obligate epiphyte abundance"
-
"accidental epiphyte abundance"
-
potential host richness
-
"hosts species with interactions"
-
"potential host abundance"
-
"host abundance without interactions"
-
"host abundance with interactions"
-
"pairwise interactions"
-
"Shannon index"
Among these, the most relevant for analysis are: -
epiphyte richness, abundance and diversity
-
number of unique pairwise interactions in each plot.
This dataset is provided in two sheets:(1) plot overview - contains all calculated variables listed above; (2) small version - contains only the most relevant variables for analysis. This table was generated using the raw dataset in R (“RAW_Data_o.fern.xlsx”)
7) data_merged_o.fern_final.csv – combined host-level dataset (each row corresponds to a sampled individual), generated in R using information from the input datasets: (i) “RAW_Data_o.fern.xlsx” – raw host and epiphyte data and (ii) “overview_o.fern.xlsx” – plot-level calculated metrics.
Code/software (R SCRIPTS)
1) single_network_final.R – Creates bipartite networks for all 20 plots using the folder /data_plot_new as input data. Host species are colored according to their origin using the excel file origin_host.xlsx. The script first tests network visualization for invaded plot 1 (I1) and then runs a loop through all the 20 plots.
2) unique_pairwise.R – Calculates the unique pairwise interactions per treatment (invaded vs. restored) and identifies the number of pairwise interactions shared between treatment. It uses interaction information of all 10 invaded and all 10 restored plots and RAW_Data_o.fern.xlsx as input file.
3) abundance_speciesstrength_final.R – Explores the relationship between abundance and species strength of all host species across plots using using RAW_Data_o.fern.xlsx. This script was used to generate Figure S2.
4) network_o.fern_extended_final.R – Analyses all network-level metrics using the input file network_metrics_results_extended.csv and generates boxplots for Figure 3**.**
5) species_level_o.fern.R – Analyses species-level metrics for all shared host species species_metrics_combined.csv. This script is used to generate boxplots for Figure S1**.**
6) overview_paired_final_t-test.R – Analyses Epiphyte richness, abundance and diversity across treatments using overview_o.fern.xlsx dataset. Generates boxplots for Figure 4**.**
7) GLMM_zeroinfltion_Epi.species_DBH.R – Analyses epiphyte richness using data_merged_o.fern_final.csv. Performs data exploration, tests distributions, evaluates relationship with potential predictors (e.g., moss cover, DBH), fits different models, selects the best-fitting model, and creates effect plots (Figure 5)
8) GLM_Scalesia_DBH.R - Analyses the distribution of moss cover and DBH across treatments, and DBH of Scalesia pedunculata using data_merged_o.fern_final.csv. This script generates Figure S3 and Figure 6, respectively.
