Lineage-specific phylogenetic structure of boreal habitats suggests different assembly processes across phylogenetic and spatial scales
Data files
Sep 25, 2025 version files 17.47 MB
-
Environmental_variables.txt
14.16 KB
-
GAMM_analysis.R
5.76 KB
-
mpd.by.clade.R
1.66 KB
-
No_polytomies(500_tress_per_habitat).zip
17.05 MB
-
Plot_scale_analysis.R
37.82 KB
-
Plot_species_list.xlsx
232.71 KB
-
rare_species_gros_morne_1998.xlsx
9.91 KB
-
README.md
14.15 KB
-
Trees_nwk.zip
49.82 KB
-
Vascan.list881-and-habitat-list.xlsx
50.27 KB
Abstract
The phylogenetic distance among species in a community (community phylogenetic structure) has been used to infer deterministic and stochastic assembly processes, albeit with criticisms. The effect of phylogenetic scale (old versus young lineages) and spatial scale on measures of community phylogenetic structure are rarely tested simultaneously especially in the boreal biome, yet are essential to unravel different assembly processes that might operate in a community. We examined lineage-specific phylogenetic structure for six vascular plant communities defined at the habitat scale (arctic-alpine barren, bog, fen, kalmia barren, limestone barren, and serpentine barren) on the island of Newfoundland, Canada, and the phylogenetic structure of plant communities defined at a plot scale (72 plots x 1m2). Contrary to the expectation under the stress-dominance hypothesis of phylogenetic clustering in challenging boreal environments, the majority of clades across the six boreal habitats had random phylogenetic structure. However, we observed a shift from phylogenetic clustering at the deepest nodes of the angiosperms to no phylogenetic structure at shallower nodes (<110 mya), suggesting changes in assembly processes with phylogenetic scale within a habitat, and the potential role for deterministic processes at deep nodes. The random phylogenetic structure of 1m2 plots and our modeling effort to test the effect of an environmental stress gradient on community composition suggest that a complex set of stochastic and deterministic factors is responsible for species assembly at this fine spatial scale, not just abiotic filtering in hostile environments like the serpentine as predicted by the stress-dominance hypothesis. The interpretation of phylogenetic structure metrics did not change when considering species abundances or when polytomies were resolved. Taken together, inference of assembly processes must be lineage-, habitat- and spatial scale-specific, supplemented with knowledge on trait role and evolution for which we outline future research hypotheses.
This repository contains the supplementary data, R scripts, and supporting information associated with the manuscript. The materials provide additional details on the study system, methods, datasets, and results. The study examined lineage-specific community phylogenetic structure of vascular plant communities across six boreal habitats in Newfoundland, Canada, and at fine-scale vegetation plots.
1. Data files
Environmental_variables.xls
Environmental variables for 72 permanent 1 m² vegetation plots (coordinates, elevation, slope, geology, distances to water, wetland/rock/soil percentages, ecosubregion).
| Variable | Definition | Units/Values | Interpretation Key |
|---|---|---|---|
| stress | Environmental stress classification from Marilyn Anions | Categorical | Competitive, Intermediate, Harsh |
| site_id | Unique plot identifier | Text | Format: P#-# (Peatland), T#-# (Tundra) |
| site_type | Plot type | Text | P = Peatland, T = Tundra |
| sur_geo | Surface geology type | Categorical | See geological classifications below |
| bed_geo | Bedrock geology type | Categorical | See geological classifications below |
| ecoregion | Broad ecoregion classification | Text | Northern Peninsula, Southwestern Newfoundland |
| ECO_NAME | Detailed ecoregion name | Text | Forest ecoregion designation |
| SUB_NAME | Ecoregion subdivision | Text | Specific subregion within ecoregion |
| site_utm_east | UTM Easting coordinate | Meters | NAD83 coordinate system |
| site_utm_north | UTM Northing coordinate | Meters | NAD83 coordinate system |
| elevation | Site elevation above sea level | Meters | |
| slope | Slope angle category | Categorical/Degrees | Level = 0°, Slight = 10°, Steep = 30° |
| aspect | Slope aspect/direction | Degrees | 0-360° from north |
| pond_m | Distance to nearest pond | Meters | |
| river_m | Distance to nearest river | Meters | |
| p100_wet | Wetland percentage in vicinity | Percentage (0-100) | Percentage of area within 20m radius that is wetland |
| rock/soil cover | Rock and soil cover index | Scale 0-5 | 0 = minimum cover, 5 = maximum cover |
| PA_SESMPD | Standardized Effect Size of Mean Pairwise Distance (Presence-Absence) | Standardized units | Phylogenetic community structure metric |
| AW_SESMPD | Abundance-Weighted Standardized Effect Size of Mean Pairwise Distance | Standardized units | Phylogenetic community structure metric weighted by abundance |
| speciesRichness | Number of species per plot | Count | Total vascular plant species count |
Plot_species_list.xlsx
Species identity and cover (%) per plot across two habitat types.
Sheet 1: "Rawdata Arctic-Alpine" Contains two data tables:
- Table 1: Species-plot matrix with environmental data (79 species)
- Table 2: Binary presence-absence matrix for Tundra plots (T#-#)
Sheet 2: "Rawdata Peatland" Contains two data tables:
- Table 1: Species-plot matrix with environmental data (76 species)
- Table 2: Binary presence-absence matrix for Peatland plots (P#-#)
Sheet 3: "Species status"
-
Provides species conservation information for taxa recorded in both habitat types.
-
Columns include: Species, Status (IUCN global listing), and Source (URL to the IUCN Red List entry).
Example entries:
- Agrostis mertensii — Not listed (IUCN, global).
- Alnus crispa — Not listed (IUCN, global).
- Arethusa bulbosa — LC (Least Concern).
| Variable | Definition | Units/Values | Notes |
|---|---|---|---|
| # | Record number | Count | Sequential identifier (Table 1 only) |
| Plot | Plot identifier | Text | T#-# (Tundra) or P#-# (Peatland) |
| Species | Species name | Text | Binomial nomenclature with underscores |
| Cover_Adjusted | Adjusted species cover | Percentage | Vegetation cover percentage |
| UTM_East_Rounded10km | UTM Easting coordinate | Meters | NAD83 coordinate system |
| UTM_North_Rounded10km | UTM Northing coordinate | Meters | NAD83 coordinate system |
| Elevation (m) | Plot elevation | Meters | Above sea level |
| Slope | Slope category | Text | Level, Slight, Steep |
| Aspect | Slope direction | Degrees | 0-360° from north |
Binary matrices: Rows = plots, Columns = species names, Values = 1 (present) or 0 (absent)
Vascan.list881-and-habitat-list.xlsx
Full list of 881 native vascular plant species compiled from VASCAN.
Sheet 1: "habitat-list" Binary presence-absence matrix across six boreal habitats.
| Variable | Definition | Values | Notes |
|---|---|---|---|
| Species names (rows) | Full species names with family and higher taxonomy | Text | Format: Genus_species_Family_HigherTaxon |
| Kalmia heathland | Species presence in Kalmia heath habitat | Binary | 1 = present, 0 = absent |
| Limestone Barren | Species presence in limestone barren habitat | Binary | 1 = present, 0 = absent |
| Serpentine Barren | Species presence in serpentine barren habitat | Binary | 1 = present, 0 = absent |
| Arctic-Alpine | Species presence in arctic-alpine habitat | Binary | 1 = present, 0 = absent |
| Bog | Species presence in bog habitat | Binary | 1 = present, 0 = absent |
| Fen | Species presence in fen habitat | Binary | 1 = present, 0 = absent |
Sheet 2: "Vascan.list881sp-regionaltree" Master checklist with complete taxonomic information for all 881 native vascular plant species of Newfoundland.
| Variable | Definition | Values | Notes |
|---|---|---|---|
| Species | Species scientific name | Text | Binomial nomenclature |
| Taxonomy | Complete taxonomic hierarchy | Text | Format: Genus_species_Family_HigherTaxonomy_MAJORGROUP |
rare_species_gros_morne_1998.xlsx
List of rare species found in Peatland and Tundra plots with rarity status and location information.
| Variable | Definition | Values | Notes |
|---|---|---|---|
| Rare Species | Scientific name of rare species | Text | Binomial nomenclature |
| P2 | Presence in Peatland site 2 | Binary | X = present, blank = absent |
| T1, T2, T3, T4, T5, T7, T8, T11, T12 | Presence in specific Tundra sites | Binary | X = present, blank = absent |
| Rarity National | National rarity status | Categorical | N1 = critically rare nationally |
| Rarity Province | Provincial rarity status | Categorical | S1 = critically rare, S2 = threatened, SH = historically known |
| Notes | Additional information | Text | Location notes, specimen details |
Rarity Status Codes:
- S1: Critically endangered (5 or fewer locations or very few individuals)
- S2: Threatened (6 to 20 locations or a few individuals)
- SH: Historically known, but presence unverified in the past 20 years
- N1: Critically rare at the national level
Phylogenetic trees
- Newfoundland_species_tree.tre: Mega-tree reconstructed with V.PhyloMaker2
- Habitat_phylogenies/: Six pruned phylogenies, one per habitat
- Trees_nwk.zip: Archive containing the full and habitat-level trees in .nwk format
- No_polytomies(500_tress_per_habitat).zip: 500 randomly resolved trees per habitat, used to evaluate sensitivity to polytomies
2. R Scripts
- mpd.by.clade.R: Automates calculation of standardized effect size of mean pairwise distance (SES-MPD) for all clades
- GAMM_analysis.R: Fits generalized additive mixed models (GAMMs) relating SES-MPD to clade divergence time
- Plot_scale_analysis.R: Runs dbRDA, GLMs, and MEMs to test environmental and spatial effects on 1 m² plots
3. Software requirements
All analyses were performed using R v4.3.0 (R Core Team, 2023).
Core Software
- RStudio (optional IDE)
- Spreadsheet software (LibreOffice, Excel, or Google Sheets) for .xls/.xlsx files
- Phylogenetic tree viewers (optional): FigTree, iTOL, or R packages (ape, ggtree)
R Packages
- Phylogenetic analysis: V.PhyloMaker2 v0.1.0, ape v5.8-1, RRphylo v2.8.0, PhyloMeasures v2.1, picante v1.8.2
- Statistical modeling: mgcv v1.9-0, spdep v1.3-10, adespatial v0.3-28, vegan v2.7-0, cluster v2.1.8
- Data handling & visualization: dplyr v1.1+, tibble v3.2+, ggplot2 v3.4+, stringr v1.5+, readr v2.1+
4. Workflow
- Load phylogenetic trees (Trees_nwk.zip or No_polytomies...zip)
- Run mpd.by.clade.R with PhyloMeasures to compute SES-MPD
- Use GAMM_analysis.R with mgcv to model SES-MPD vs divergence time (using the extracted variables NODE, AGE, AND SES-MPD)
- Generate figures (phylogenies, GAMM fits, dbRDA) using ggplot2
- Analyze plot-scale structure with Plot_scale_analysis.R using vegan, cluster, spdep, adespatial (USING THE Environmental_variables.txt file)
5. Data standards and missing values
Missing value codes
- Environmental and species data:
NA - Species absence in habitat lists:
0 - Species absence in plot data: blank cell
- Missing cover data:
NA
Plot identifier codes
- P = Peatland plots
- T = Tundra plots
Geological classifications
Surface Geology (sur_geo):
- TILL VENEER: Thin glacial till deposits
- MARINE GRAVEL: Marine-deposited gravel
- BEDROCK: Exposed bedrock
- PEAT AND MUCK: Organic deposits
- TILL WITH FLATTENED SLOPES: Glacial till on flattened terrain
- TILL WITH SMOOTHED SLOPES: Glacial till on smoothed terrain
- TILL BLANKET: Thick glacial till cover
- BLOCKY RUBBLE (TALUS): Angular rock fragments
Bedrock Geology (bed_geo):
- BROKEN SEDIMENT (MELANGE): Mixed sedimentary rocks
- LIMESTONE: Carbonate rock
- GABBRO: Mafic igneous rock
- PERIDOTITE: Ultramafic igneous rock
- DEFORMED PERIDOTITE: Tectonically altered ultramafic rock
- GNEISS: Metamorphic rock
- SANDSTONE: Sedimentary rock
- GRANITE: Felsic igneous rock
Technical standards
- Coordinate System: NAD83 UTM
- Taxonomic Authority: VASCAN (Vascular Plants of Canada database)
- Phylogenetic Reconstruction: V.PhyloMaker2
- Plot Size: 1 m² permanent vegetation plots (n=72)
Coordinate Generalization
All geographic coordinates in this dataset were generalized to reduce spatial precision, following GBIF sensitive species best practices while maintaining scientific utility.
- Fields included:
UTM_East_Rounded10km,UTM_North_Rounded10km - Formula applied: =ROUND(VALUE(coordinate)/10000, 0)*10000
