Community assembly explains invasion differences between two contrasting forest types
Data files
Mar 25, 2026 version files 189.75 KB
-
1998__Combined_composition_data.csv
17.11 KB
-
1998_AboveDat.csv
10.58 KB
-
1998_belowDat.csv
10.31 KB
-
1998_PlotDat_summed.csv
10.58 KB
-
1998_PlotDat.csv
47.92 KB
-
2021-22__Combined_composition_data.csv
7.20 KB
-
2021-22_AboveDat.csv
4.49 KB
-
2021-22_BelowDat.csv
4.37 KB
-
2021-22_PlotDat_summed.csv
4.44 KB
-
2021-22_PlotDat.csv
18.08 KB
-
README.md
13.69 KB
-
sites.csv
3.04 KB
-
sites2.csv
3.67 KB
-
Soil.csv
1.95 KB
-
Species_list.csv
5.95 KB
-
Traits.csv
17.58 KB
-
USDA_Traits.csv
8.79 KB
Abstract
Plant communities within a metacommunity can vary widely in their degree of invasion by introduced species. Disturbance, propagule pressure, and biotic resistance are common explanations for this variation, but empirical evidence for these hypotheses is mixed.
Alternatively, the community assembly framework predicts that local assembly filters determine both native and exotic composition, but lower trait variation in the introduced species pool may exclude them from certain sites.
We examined evidence for this framework using observational data from forests and woodlands of Long Island, NY, USA. These forests vary in vegetation composition and invasion along a soil gradient. They are also highly disturbed and fragmented, yet some stands have almost no introduced plants.
Using data collected in 1998 and 2021-22, we quantified relationships between community composition, soil characteristics, and functional traits for native and exotic assemblages, as indicators of environmental filtering.
We found similar trait-environment relationships in native and introduced species, suggesting that both groups follow the same local assembly rules. Introduced species were predominantly found in sites with more nutrient-rich soils and were absent from sites with nutrient-poor soils.
At the regional scale, the exotic species pool was biased toward trait values favored in more nutrient-rich environments, particularly high growth rates and low leaf C:N ratios, which explains their absence from nutrient-poor environments.
These patterns were consistent over time, and stands that were uninvaded in 1998 remained so in 2021-22, supporting the robustness and reliability of short-term studies.
Synthesis: This study shows that invasion patterns in plant communities can be explained by the assembly rules that govern native species. By linking local environmental filtering with regional species pool characteristics, this work advances our understanding of how some communities remain uninvaded despite high disturbance and propagule pressure. Overall, these results highlight the utility of the community assembly framework, and emphasize the importance of regional processes in constraining the local distribution of introduced species.
Description of the data and file structure
Vegetation Composition Data
1998_AboveDat.csv - Vegetation composition data from the canopy of each field site, from the 1998 field surveys. This data was collected along a 30m transect in each site, with canopy cover of each species being estimated visually every 10 cm (see methods for details). Column names: Site - name of site (see sites.csv for details), all other columns are abbreviated species names (see Species_list.csv), with the data in these columns representing the cover of each species in each site (max possible value = 3000).
1998_belowDat.csv - Vegetation composition data from the ground layer of each field site, from the 1998 field surveys. This data was collected along a 30m transect in each site, with the cover of each species being estimated in 10 cm segments using the line-intercept method (see methods for details). Column names are the same as in 1998_AboveDat.csv.
1998_PlotDat.csv - Vegetation composition data from the 1m x 1m plots laid out in each field site, from the 1998 field surveys. This data was collected by visually estimating the cover of each species in each plot (see methods for details). Each site had 5 plots. Column names: Plot - name of the plot (format: SiteName_PlotNumber), all other columns are abbreviated species names (see Species_list.csv), with the data in these columns representing the cover of each species in each site (max possible value = 2500).
1998_PlotDat_summed.csv - Combined vegetation composition data from all the plots in each site, from the 1998 field surveys, created by adding over all the plots in a given site. Column names: Site - name of site (see sites.csv for details), all other columns are abbreviated species names (see Species_list.csv), with the data in these columns representing the cover of each species in each site (max possible value = 12500).
1998__Combined_composition_data.csv - Combined above, below and plot composition data of each site from the 1998 field surveys. This was created by summing over the abundance of each species in the 1998_AboveDat.csv, 1998_BelowDat.csv, 1998_PlotDat.csv, using the code given in 0.1_composition_data_formatting.R. Column names are the same as in 1998_Above_dat.csv.
2021-22_AboveDat.csv, 2021-22_BelowDat.csv, 2021-22_PlotDat.csv, 2021-22_PlotDat_summed.csv, and 2021-22__Combined_composition_data.csv - Similar to the corresponding files from 1998, except that they contain data from the 2021-22 field surveys (instead of 1998).
Site Data
sites.csv - Information about the field sites. Column names: Transect - name of transect (same as site names in the composition data files), Sampling_year - year in which the transect was sampled, Lat_conv - latitude of the transect starting point the in degree decimals, Long_conv - longitude of the transect starting point in degree decimals, Resampled? - whether the site was resampled in 2021-22 (for 1998 sites only; y = resampled, = not resampled, new = not applicable because corresponding site is from 2021-22). Paired with - for the 1998 sites that were resampled and for 2021-22 sites, this column gives the name of the corresponding transect from the other sampling year ( = not applicable, site was not resampled).
sites2.csv - Same as sites.csv, except with an additional Forest type column. Created using the code given in 0.2_forest_type_identification.R. Column names: Forest_Type - forest type of the given site (1 = hardwood forest, 2 = pine barrens forest). All other columns same as sites.csv.
Soil.csv - Soil data from each site (see methods for details). Column names: Site - name of site (same as site names in the composition files and in sites.csv); pH - soil pH; Ca - soil calcium content in milliequivalents per liter; Mg - soil magnesium content in milliequivalents per liter, CEC - cation exchange capacity in milliequivalents per 100 grams; Bray-P - plant-available phosphorous content of the soil (Bray phosphorous) in parts per million; Sand - % sand content; Silt - % silt content; Clay - % clay content.
Species and Functional Traits Data
Species_list.csv - List of all vascular plant species that have been recording in at least one of the study sites. Column names: Abbreviation - abbreviated species name (same as the column names in the composition data files); Scientific name - scientific name of the given species ("-" = no scientific name); Common name - common name of the given species, Native/Introduced - whether the species is native (N) or introduced (I), according to the USDA Plants Database (https://plants.usda.gov/ (unk = unknown); Composite species? - whether the corresponding abbreviation is for a 'composite species' or species aggregate, which was created by combining data from 2 or more individual species (referred to as 'species aggregates' in the methods, see methods for details; "y" = composite species, "in composite" = an individual species that is a part of a composite species group, "n" = not a composite species and not part of a composite group); Composite num - a unique group number for each composite species (blanks = corresponding species is not a composite species); Composite group num - a group number for each individual species that is a part of a composite group, identifying which group it belongs to (blanks = corresponding species is not part of a composite group); Notes - any additional information about the given species (blanks - no relevant additional information).
USDA_Traits.csv - Raw functional trait data from the USDA Plants Database (https://plants.usda.gov/. This data was gathered by visiting the USDA Plants webpage for each species and noting down the relevant information. Blanks in columns 6-13 indicate no data available for those species on the USDA database. Columns 1-5 are the same as in Species_list.csv, all other column names explained below.
| Name | Explanation |
|---|---|
| Funct_Group | The growth habit (functional group) of the given species, |
| Annual/Perennial | Whether the species is annual (a) or perrenial (p) |
| Height_ft | Expected height of the given species at maturity, in feet |
| Height_m | Expected height of the given species at maturity, in meters |
| C:N | Shoot carbon:nitrogen ratio (categorical; possible values: low, medium, high) |
| Growth_rate | Expected growth rate of the species relative to other species with the same growth habit (categorical; possible values: slow, medium, rapid) |
| Fire resistant | Fire resistant - whether the species is known to resist burning (possible values: yes, no) |
| Fire tolerance | The relative ability of the species to re-sprout, regrow, or reestablish from residual seed after a fire (categorical; possible values: none, low, medium, high) |
Traits.csv - Formatted trait data. This file was created by combining the data from USDA and TRY using the code in 0.3_trait_formatting.R. Columns 1-7 are the same as in Species_list.csv. All other column names explained below. Blanks in columns 8-24 indicate no data available.
| Name | Explanation |
|---|---|
| Leaf_C/N | Leaf carbon:nitrogen ratio |
| Leaf_Ca | Leaf calcium content per unit dry mass, in mg/g |
| Leaf_Mg | Leaf magnesium content per unit dry mass, in mg/g |
| Leaf_P | Leaf phosphorous content per unit dry mass, in mg/g |
| Root_P | Fine root phosphorous content per unit dray mass, in mg/g |
| SSD | Stem specific density in g/cm^3 |
| Woody | Woodiness, i.e. whether the species has woody tissues or not (0= not woody, 2=woody) |
| Annual.Perennial | Whether the species is annual (a) or perrenial (p) |
| C_N_ratio | Shoot carbon:nitrogen ratio (low-med = low to medium, high = high) |
| Fire_Resistance | Whether the species resists burning (0 = not fire resistant, 1 = fire resistant ) |
| Fire_Tolerance | The relative ability of the species to re-sprout, regrow, or reestablish from residual seed after a fire (0 = none,1 = low, 2 = medium, 3 = high) |
| Funct_Group | Original functional group (growth habit), as given on the USDA Plants Database |
| Growth_Rate | Expected growth rate of the species relative to other species with the same growth habit (1 = slow, 2 = medium,3 = rapid) |
| SLA | Specific leaf area in g/cm^2 |
| Height | Expected height of the species at maturity in meters |
| Funct_Group3 | Functional group of the species after combining certain subcategories and ensuring that each species is assigned to only one function group (see methods for details). Possible values - forb, graminoid, liana, shrub, tree |
| Funct_Group2 | Functional group of the species after combining forbs and graminoids into a single category, i.e. herbs (rest of the column is the saame as Funct_Group3) |
Code/Software
0.1_composition_data_formatting.R - Code for formatting raw community composition data and combining above-transect, below-transect and plot data.
0.2_forest_type_identification.R - Code for identifying which site belongs to which forest type (pine barrens vs hardwood).
0.3_trait_formatting.R - Code for formatting trait data and preparing it for analysis.
1.0_ordinations&prelim_analysis.R - Preliminary analyses and visualizations (e.g. ordinations on community composition and soil data to visualize patterns across sites, comparisons of ground cover and canopy cover across forest type and sampling year).
2.1_dcCA.R - Code for the dcCA analysis.
2.2_SpPoolsComparisons.R - Code for the species pool comparisons.
2.3_TemporalAnalysis.R - Code for analyzing temporal trends in community composition and invasion.
Study sites
Community composition data used in this study was collected at two time points: 1998 and 2021-22. The 1998 data was gathered by Howard et al. (2004), who measured the vegetation composition and soil characteristics of 38 sites in Suffolk County, New York State, USA. The authors of the present study resampled 16 of these sites between 2021 and 2022.
Suffolk County is located on Long Island, east of New York City, and spans a land area of ~2300 km2. It has a hot-summer humid continental climate (Dfa) (Beck et al., 2018). The soils of the study area are generally acidic and well-drained to excessively well-drained (USDA-SSA & CAES, 1975). All study sites were located in protected forest preserves, except for a few that were on the campus of Brookhaven National Laboratory. These stands are open to the public and consist of fragmented secondary forests with various land-use histories.
Pine barrens communities had partially-open canopies of Pinus rigida and Quercus tree species (mainly Q. alba, Q. velutina, and Q. coccinea), with dense understories dominated by shrubs such as Q. ilicifolia, Vaccinium angustifolium, V. pallidum, and Gaylussacia baccata. Hardwood forests, in contrast, were more diverse, with closed canopies dominated by Acer rubrum, Betula lenta, Prunus serotina, and Quercus species (mainly Q. alba and Q. velutina). Lianas such as Toxicodendron radicans and Parthenocissus quinquefolia were also common in hardwood forests. The understory tended to be sparse and varied from site to site in hardwood communities.
Details of the site selection process for the 1998 sites can be found in Howard et al. (2004). Sites in 2021-22 were selected to cover as much of the range of forest composition and invasion reported by Howard et al. (2004) as possible, as well as to maximize the distance between sites. Equal numbers of pine barrens and hardwood sites were chosen for resampling (for this purpose, forest type was determined by using the 1998 composition data and through visual assessments of potential sites).
Vegetation composition
The methods used for quantifying vegetation composition were the same for both time points, detailed in Howard et al. (2004). In brief, a 30 m transect line was laid out in each site, and the cover of all vascular plants intersecting with the transect was noted. Ground cover (vegetation < 1 m height) was estimated through the line-intercept method, in 10-cm segments. Canopy cover was estimated visually at every 10-cm along the transect, using a modified periscope in 1998, and a GRS Densitometer (Geographic Resources Solutions, Arcata, California, USA) in 2021-22. Additionally, five 1 m x 1 m plots were placed within 5 meters of the transect line, and the percent cover of all vascular plant species rooted in each plot was noted. The World Flora Online (WFO) Plant List (https://wfoplantlist.org/; accessed June 2023) was used for taxonomic nomenclature. Species origin (native/introduced status) information was obtained from USDA Plants Database (https://plants.usda.gov/; USDA-NRCS, 2023).
The three cover estimates (canopy, transect-ground, plot) for each species were divided by their maximum possible values (100% cover) and then averaged to obtain a single abundance metric for each species in each site, which was used in all analyses. Prior to analysis, the abundance data for certain groups of species was combined due to difficulty in distinguishing between them in the field (these groups are henceforth referred to as species aggregates). This was done by summing the relative abundances of each individual species in an aggregate group, and then treating the aggregate as a single species. Species that were combined in this manner were: Carya alba and Carya glabra, Elaeagnus angustifolia and Elaeagnus umbellata, Quercus rubra and Quercus velutina, Quercus coccinea and Quercus palustris, Solidago nemoralis and unidentified Solidago spp., Vaccinium angustifolium and Vaccinium pallidum, all Vitis species, and all unidentified graminoids. In most cases, the combined species were closely related and functionally similar, likely to hybridize in some cases (particularly Quercus species) and were often found together or in similar sites. Moreover, since our study is concerned with overall community-level patterns, we believe that this combining of a few species did not substantially impact our results.
Some of our analyses focused on comparisons of the two forest types (hardwood vs pine barrens). We used the following procedure to classify sites by forest type using vegetation composition. First, Ward’s hierarchical clustering, with Bray-Curtis dissimilarity, was applied to cluster sites based on similarities in composition. Then, the cluster dendrogram was cut to obtain two clusters. Finally, each cluster was assigned to a forest type based on the dominant native species of the sites in each cluster. After assignment of sites to forest type, the degree of association of each species with the pine barrens was calculated, as an indicator of species’ habitat preferences. For this, the cover of each species was summed across pine barrens sites and divided by its total cover across all sites. Note that a value of 1 indicates the given species was only found in the pine barrens, while 0 indicates that the species was only found in the hardwoods. Introduced species were expected to have values close to 0.
Soil characteristics
Soil characteristics were measured by Howard et al. (2004) in 1998. In each site, they collected twelve soil cores from within a 10 m x 10 m square centered on the midpoint of the transect. These were air-dried and combined into a single sample per site, and then analyzed for pH, calcium, magnesium, cation exchange capacity (CEC), Bray-phosphorus, and soil texture (% sand, % silt, % clay). Further details can be found in the original study.
We standardized all soil variables by their means and variances prior to analysis. We carried out a PCA to visualize the variation in soil composition across sites, along with t-tests on each soil variable to test whether they significantly differed between the two forest types.
Functional traits
Seven functional traits were used: expected height at maturity, specific leaf area (SLA), leaf carbon to nitrogen (C:N) ratio, shoot carbon to nitrogen (C:N) ratio (categorical), growth rate (categorical), fire tolerance (categorical), and life form (categorical) (Table 1). Height, specific leaf area (SLA), and growth rate are a part of the global plant economic spectrum (Diaz et al., 2015), i.e., the spectrum of plant life-history and competitive strategies. Leaf C:N ratio and shoot C:N ratio are measures of plant nutrient-use strategies (Ågren, 2004; Andersen et al., 2012; Reich 2014; Zhang, He, Liu et al., 2020), and they tend to vary along soil nutrient gradients (Luizão et al., 2004; Zhao et al., 2014; Gong et al., 2020; Peng et al., 2023). Fire tolerance measures the ability of a species to reestablish after a fire. It is expected to vary with forest type in this system, since pine barrens communities tend to be more fire-prone than hardwood communities. Life form refers to broad groupings of plant species based on similarities in physiognomy and life-history characteristics. Species belonging to the same life form are expected to respond similarly to the abiotic environment (Box 1996; Duckworth et al., 2000; Engemann et al., 2016).
Trait data was gathered from the TRY Plant Trait Database (https://www.try-db.org; Kattge et al., 2020) and the USDA Plants Database (https://plants.usda.gov/; USDA-NRCS, 2023). Height and leaf C:N ratio were log-transformed prior to analysis to reduce skew. Further details on the data source for each trait and trait data preprocessing can be found in Appendix S2.
Trait data was available for 60-90% of all species in this study. For each trait, species with available data represented 80-90% of total vegetation cover across all sites, except for leaf C:N ratio (~70%) and life form (>99%). Trait interpolation was necessary for the dc-CA analysis (see next section), and was performed as detailed in Appendix S2.
Relationship between community composition, functional traits, and soil environment
To evaluate the relationship between taxonomic composition, functional traits and the soil environment, we used double-constrained correspondence analysis (dc-CA). This is an ordination method that relates species composition to both functional traits and environmental predictors by maximizing the correlation between traits and environment (ter Braak et al., 2018). In practice, dc-CA first summarizes each site using canonical correspondence analysis (CCA), based on the traits of the species present, producing site scores that represent functional composition. These site scores are then related to environmental variables using redundancy analysis (RDA), to identify which environmental predictors that best explain variation in functional composition across sites. The resulting site scores, representing each site’s environment, are used to infer species’ environmental associations and the functional traits that best explain the same. The R package ‘douconca’ (ter Braak & van Rossum, 2025), was used for performing dc-CA, with separate analyses for 1998 and 2021-22. Model significance was assessed using the maximum permutation test (max test), which involves two independent permutation tests, one permuting species attributes (traits), and the other permuting site attributes (environmental variables). The more conservative (i.e. larger) p-value is then used for significance testing. This approach accounts for inflated type I error rates that often arise in trait-environment analyses (ter Braak et al., 2012; Zelený, 2018).
Due to multicollinearity between traits, we selected a reduced subset of traits for each time point, by performing CCA on the transposed composition matrix followed by stepwise selection. Lifeform was one of the traits selected through this procedure, but some of its levels showed high variance inflation factor (VIF) values in the dc-CA. Therefore, it was simplified to a binary liana vs. non-liana trait. All soil predictors except % silt and % clay were included. The latter two were excluded as the three soil texture variables (% silt, % clay and % sand) are strongly correlated with each other and sum to 100%; hence % sand was the only soil texture variable used.
To assess the similarity in the trait-environment relationships of native and introduced species, first, we visually compared the dc-CA scores of the two groups. The scores of introduced species were expected to be similar to those of hardwood-associated native species but dissimilar from pine barrens-associated native species, which would indicate that exotic species’ traits and environmental preferences align with those of co-occurring native species. Note that dc-CA produces two sets of species scores, one representing species traits, and the other representing their environmental preferences. We also tested whether species origin (native/introduced status) explained species’ environmental preferences beyond what can be explained by functional traits. For this, a second set of dc-CA models was fitted with species origin as a predictor and functional traits as conditioning variables. This allows for testing the effect of species origin after controlling for potential trait differences between native and introduced species. Therefore, if such a model is statistically significant, it would suggest that the environmental associations of introduced species are dissimilar to those of functionally-similar native species.
Species pool differences
To compare the regional species pool trait distributions of native and introduced species, we first defined the regional pool as all species observed in either survey (1998 or 2021-22). We then constructed species pool trait distributions by assuming equal abundance of all species in the regional pool. Separate distributions were created for native and introduced species, which were then compared using two-sample Kolmogorov Smirnov tests (continuous traits) and Chi-squared tests of independence (categorical tests). For continuous traits, t-tests and F-tests were also used to compare means and variances, respectively.
Temporal trends in composition
To test whether the taxonomic composition of the study sites significantly changed over time, we carried out PERMANOVAs (permutational MANOVA) on the composition of resampled sites, with sampling year (1998 vs 2021-22) as the predictor. Permutations were restricted to within site, i.e. the year labels were permuted within each site, to account for non-independence between samples from the same site. Two sets of PERMANOVAs were carried out, one on taxonomic composition, and another on functional composition. Additionally, we first carried out this analysis using all species, and then separately for native and introduced species to allow comparison between the two. Thus, there were six PERMANOVAs in total. Bray-Curtis dissimilarity was used for the PERMANOVAs on taxonomic composition. For the functional composition PERMANOVAs, the response variable was calculated as follows: first the functional distance between all pairs of species in the dataset was calculated, using the Gower distance metric, with all seven traits. Then, mean pairwise functional dissimilarity between sites, and between the two samples from each site, was calculated. This was then used as the response variable in the PERMANOVA.
To test whether the degree of invasion in the two forest types significantly changed over time, two mixed-effects ANOVAs were performed, one on log-transformed exotic species richness and another on square root-transformed introduced species cover per site. Forest type and sampling year were the fixed-effects predictors, and site was the random effect.
All analyses were performed in R version 4.2.0 (R Core Team, 2022).
Citations
Beck, H. E. et al. (2018) ‘Present and future köppen-geiger climate classification maps at 1-km resolution’, Scientific Data, 5. doi: 10.1038/SDATA.2018.214
Howard, Timothy G., Jessica Gurevitch, Laura Hyatt, Margaret Carreiro, and Manuel Lerdau. 2004. “Forest Invasibility in Communities in Southeastern New York.” Biological Invasions 6 (4): 393–410. https://doi.org/10.1023/B:BINV.0000041559.67560.7E/METRICS.
Kattge, J. et al. (2020) ‘TRY plant trait database – enhanced coverage and open access’, Global Change Biology, 26(1), pp. 119–188. doi: 10.1111/GCB.14904
R Core Team (2022) ‘R: A Language and Environment for Statistical Computing’. Vienna, Austria. Available at: https://www.r-project.org/
ter Braak, C. J. F., Cormont, A., & Dray, S. (2012). Improved testing of species traits–environment relationships in the fourth‐corner problem. Ecology, 93(7), 1525–1526. https://doi.org/10.1890/12-0126.1
ter Braak, C. J. F., Šmilauer, P., & Dray, S. (2018). Algorithms and biplots for double constrained correspondence analysis. Environmental and Ecological Statistics, 25(2), 171–197. https://doi.org/10.1007/s10651-017-0395-x
ter Braak, C. J. F., & van Rossum, B.-J. (2025). Linking multivariate trait variation to the environment: the advantages of double constrained correspondence analysis with the R package douconca. Ecological Informatics, 88, 103143. https://doi.org/10.1016/j.ecoinf.2025.103143
United States Department of Agriculture National Resources Conservation Service (USDA -NRCS) (2023) The PLANTS Database. Available at: https://plants.usda.gov/ (Accessed: 9 March 2023).
United States Department of Agriculture Soil Conservation Service (USDA - SSA) and Cornell Agricultural Experiment Station (CAES) (1975) Soil Survey of Suffolk County, New York - United States. U.S. Government Printing Office
