Plant communities within a metacommunity can vary widely in their degree of invasion by introduced species. Disturbance, propagule pressure, and biotic resistance are common explanations for this variation, but empirical evidence for these hypotheses is mixed.

Alternatively, the community assembly framework predicts that local assembly filters determine both native and exotic composition, but lower trait variation in the introduced species pool may exclude them from certain sites.

We examined evidence for this framework using observational data from forests and woodlands of Long Island, NY, USA. These forests vary in vegetation composition and invasion along a soil gradient. They are also highly disturbed and fragmented, yet some stands have almost no introduced plants.

Using data collected in 1998 and 2021-22, we quantified relationships between community composition, soil characteristics, and functional traits for native and exotic assemblages, as indicators of environmental filtering.

We found similar trait-environment relationships in native and introduced species, suggesting that both groups follow the same local assembly rules. Introduced species were predominantly found in sites with more nutrient-rich soils and were absent from sites with nutrient-poor soils.

At the regional scale, the exotic species pool was biased toward trait values favored in more nutrient-rich environments, particularly high growth rates and low leaf C:N ratios, which explains their absence from nutrient-poor environments.

These patterns were consistent over time, and stands that were uninvaded in 1998 remained so in 2021-22, supporting the robustness and reliability of short-term studies.

Synthesis: This study shows that invasion patterns in plant communities can be explained by the assembly rules that govern native species. By linking local environmental filtering with regional species pool characteristics, this work advances our understanding of how some communities remain uninvaded despite high disturbance and propagule pressure. Overall, these results highlight the utility of the community assembly framework, and emphasize the importance of regional processes in constraining the local distribution of introduced species.

Study sites 

Community composition data used in this study was collected at two time points: 1998 and 2021-22. The 1998 data was gathered by Howard et al. (2004), who measured the vegetation composition and soil characteristics of 38 sites in Suffolk County, New York State, USA. The authors of the present study resampled 16 of these sites between 2021 and 2022.

Suffolk County is located on Long Island, east of New York City, and spans a land area of ~2300 km². It has a hot-summer humid continental climate (Dfa) (Beck et al., 2018). The soils of the study area are generally acidic and well-drained to excessively well-drained (USDA-SSA & CAES, 1975). All study sites were located in protected forest preserves, except for a few that were on the campus of Brookhaven National Laboratory. These stands are open to the public and consist of fragmented secondary forests with various land-use histories.

Pine barrens communities had partially-open canopies of Pinus rigida and Quercus tree species (mainly Q. alba, Q. velutina, and Q. coccinea), with dense understories dominated by shrubs such as Q. ilicifolia, Vaccinium angustifolium, V. pallidum, and Gaylussacia baccata. Hardwood forests, in contrast, were more diverse, with closed canopies dominated by Acer rubrum, Betula lenta, Prunus serotina, and Quercus species (mainly Q. alba and Q. velutina). Lianas such as Toxicodendron radicans and Parthenocissus quinquefolia were also common in hardwood forests. The understory tended to be sparse and varied from site to site in hardwood communities.

Details of the site selection process for the 1998 sites can be found in Howard et al. (2004). Sites in 2021-22 were selected to cover as much of the range of forest composition and invasion reported by Howard et al. (2004) as possible, as well as to maximize the distance between sites. Equal numbers of pine barrens and hardwood sites were chosen for resampling (for this purpose, forest type was determined by using the 1998 composition data and through visual assessments of potential sites).

Vegetation composition

The methods used for quantifying vegetation composition were the same for both time points, detailed in Howard et al. (2004). In brief, a 30 m transect line was laid out in each site, and the cover of all vascular plants intersecting with the transect was noted. Ground cover (vegetation < 1 m height) was estimated through the line-intercept method, in 10-cm segments. Canopy cover was estimated visually at every 10-cm along the transect, using a modified periscope in 1998, and a GRS Densitometer (Geographic Resources Solutions, Arcata, California, USA) in 2021-22. Additionally, five 1 m x 1 m plots were placed within 5 meters of the transect line, and the percent cover of all vascular plant species rooted in each plot was noted. The World Flora Online (WFO) Plant List (https://wfoplantlist.org/; accessed June 2023) was used for taxonomic nomenclature. Species origin (native/introduced status) information was obtained from USDA Plants Database (https://plants.usda.gov/; USDA-NRCS, 2023).

The three cover estimates (canopy, transect-ground, plot) for each species were divided by their maximum possible values (100% cover) and then averaged to obtain a single abundance metric for each species in each site, which was used in all analyses. Prior to analysis, the abundance data for certain groups of species was combined due to difficulty in distinguishing between them in the field (these groups are henceforth referred to as species aggregates). This was done by summing the relative abundances of each individual species in an aggregate group, and then treating the aggregate as a single species. Species that were combined in this manner were: Carya alba and Carya glabra, Elaeagnus angustifolia and Elaeagnus umbellata, Quercus rubra and Quercus velutina, Quercus coccinea and Quercus palustris, Solidago nemoralis and unidentified Solidago spp., Vaccinium angustifolium and Vaccinium pallidum, all Vitis species, and all unidentified graminoids. In most cases, the combined species were closely related and functionally similar, likely to hybridize in some cases (particularly Quercus species) and were often found together or in similar sites. Moreover, since our study is concerned with overall community-level patterns, we believe that this combining of a few species did not substantially impact our results.

Some of our analyses focused on comparisons of the two forest types (hardwood vs pine barrens). We used the following procedure to classify sites by forest type using vegetation composition. First, Ward’s hierarchical clustering, with Bray-Curtis dissimilarity, was applied to cluster sites based on similarities in composition. Then, the cluster dendrogram was cut to obtain two clusters. Finally, each cluster was assigned to a forest type based on the dominant native species of the sites in each cluster. After assignment of sites to forest type, the degree of association of each species with the pine barrens was calculated, as an indicator of species’ habitat preferences. For this, the cover of each species was summed across pine barrens sites and divided by its total cover across all sites. Note that a value of 1 indicates the given species was only found in the pine barrens, while 0 indicates that the species was only found in the hardwoods. Introduced species were expected to have values close to 0.

Soil characteristics

Soil characteristics were measured by Howard et al. (2004) in 1998. In each site, they collected twelve soil cores from within a 10 m x 10 m square centered on the midpoint of the transect. These were air-dried and combined into a single sample per site, and then analyzed for pH, calcium, magnesium, cation exchange capacity (CEC), Bray-phosphorus, and soil texture (% sand, % silt, % clay). Further details can be found in the original study.

We standardized all soil variables by their means and variances prior to analysis. We carried out a PCA to visualize the variation in soil composition across sites, along with t-tests on each soil variable to test whether they significantly differed between the two forest types.

Functional traits

Seven functional traits were used: expected height at maturity, specific leaf area (SLA), leaf carbon to nitrogen (C:N) ratio, shoot carbon to nitrogen (C:N) ratio (categorical), growth rate (categorical), fire tolerance (categorical), and life form (categorical) (Table 1). Height, specific leaf area (SLA), and growth rate are a part of the global plant economic spectrum (Diaz et al., 2015), i.e., the spectrum of plant life-history and competitive strategies. Leaf C:N ratio and shoot C:N ratio are measures of plant nutrient-use strategies (Ågren, 2004; Andersen et al., 2012; Reich 2014; Zhang, He, Liu et al., 2020), and they tend to vary along soil nutrient gradients (Luizão et al., 2004; Zhao et al., 2014; Gong et al., 2020; Peng et al., 2023). Fire tolerance measures the ability of a species to reestablish after a fire. It is expected to vary with forest type in this system, since pine barrens communities tend to be more fire-prone than hardwood communities. Life form refers to broad groupings of plant species based on similarities in physiognomy and life-history characteristics. Species belonging to the same life form are expected to respond similarly to the abiotic environment (Box 1996; Duckworth et al., 2000; Engemann et al., 2016).

Trait data was gathered from the TRY Plant Trait Database (https://www.try-db.org; Kattge et al., 2020) and the USDA Plants Database (https://plants.usda.gov/; USDA-NRCS, 2023). Height and leaf C:N ratio were log-transformed prior to analysis to reduce skew. Further details on the data source for each trait and trait data preprocessing can be found in Appendix S2.

Trait data was available for 60-90% of all species in this study. For each trait, species with available data represented 80-90% of total vegetation cover across all sites, except for leaf C:N ratio (~70%) and life form (>99%). Trait interpolation was necessary for the dc-CA analysis (see next section), and was performed as detailed in Appendix S2.

Relationship between community composition, functional traits, and soil environment 

To evaluate the relationship between taxonomic composition, functional traits and the soil environment, we used double-constrained correspondence analysis (dc-CA). This is an ordination method that relates species composition to both functional traits and environmental predictors by maximizing the correlation between traits and environment (ter Braak et al., 2018). In practice, dc-CA first summarizes each site using canonical correspondence analysis (CCA), based on the traits of the species present, producing site scores that represent functional composition. These site scores are then related to environmental variables using redundancy analysis (RDA), to identify which environmental predictors that best explain variation in functional composition across sites. The resulting site scores, representing each site’s environment, are used to infer species’ environmental associations and the functional traits that best explain the same. The R package ‘douconca’ (ter Braak & van Rossum, 2025), was used for performing dc-CA, with separate analyses for 1998 and 2021-22. Model significance was assessed using the maximum permutation test (max test), which involves two independent permutation tests, one permuting species attributes (traits), and the other permuting site attributes (environmental variables). The more conservative (i.e. larger) p-value is then used for significance testing. This approach accounts for inflated type I error rates that often arise in trait-environment analyses (ter Braak et al., 2012; Zelený, 2018).

Due to multicollinearity between traits, we selected a reduced subset of traits for each time point, by performing CCA on the transposed composition matrix followed by stepwise selection. Lifeform was one of the traits selected through this procedure, but some of its levels showed high variance inflation factor (VIF) values in the dc-CA. Therefore, it was simplified to a binary liana vs. non-liana trait. All soil predictors except % silt and % clay were included. The latter two were excluded as the three soil texture variables (% silt, % clay and % sand) are strongly correlated with each other and sum to 100%; hence % sand was the only soil texture variable used.

To assess the similarity in the trait-environment relationships of native and introduced species, first, we visually compared the dc-CA scores of the two groups. The scores of introduced species were expected to be similar to those of hardwood-associated native species but dissimilar from pine barrens-associated native species, which would indicate that exotic species’ traits and environmental preferences align with those of co-occurring native species. Note that dc-CA produces two sets of species scores, one representing species traits, and the other representing their environmental preferences. We also tested whether species origin (native/introduced status) explained species’ environmental preferences beyond what can be explained by functional traits. For this, a second set of dc-CA models was fitted with species origin as a predictor and functional traits as conditioning variables. This allows for testing the effect of species origin after controlling for potential trait differences between native and introduced species. Therefore, if such a model is statistically significant, it would suggest that the environmental associations of introduced species are dissimilar to those of functionally-similar native species.

Species pool differences 

To compare the regional species pool trait distributions of native and introduced species, we first defined the regional pool as all species observed in either survey (1998 or 2021-22). We then constructed species pool trait distributions by assuming equal abundance of all species in the regional pool. Separate distributions were created for native and introduced species, which were then compared using two-sample Kolmogorov Smirnov tests (continuous traits) and Chi-squared tests of independence (categorical tests). For continuous traits, t-tests and F-tests were also used to compare means and variances, respectively.

Temporal trends in composition 

To test whether the taxonomic composition of the study sites significantly changed over time, we carried out PERMANOVAs (permutational MANOVA) on the composition of resampled sites, with sampling year (1998 vs 2021-22) as the predictor. Permutations were restricted to within site, i.e. the year labels were permuted within each site, to account for non-independence between samples from the same site. Two sets of PERMANOVAs were carried out, one on taxonomic composition, and another on functional composition. Additionally, we first carried out this analysis using all species, and then separately for native and introduced species to allow comparison between the two. Thus, there were six PERMANOVAs in total. Bray-Curtis dissimilarity was used for the PERMANOVAs on taxonomic composition. For the functional composition PERMANOVAs, the response variable was calculated as follows: first the functional distance between all pairs of species in the dataset was calculated, using the Gower distance metric, with all seven traits. Then, mean pairwise functional dissimilarity between sites, and between the two samples from each site, was calculated. This was then used as the response variable in the PERMANOVA.

To test whether the degree of invasion in the two forest types significantly changed over time, two mixed-effects ANOVAs were performed, one on log-transformed exotic species richness and another on square root-transformed introduced species cover per site. Forest type and sampling year were the fixed-effects predictors, and site was the random effect.

All analyses were performed in R version 4.2.0 (R Core Team, 2022).

Citations

Beck, H. E. et al. (2018) ‘Present and future köppen-geiger climate classification maps at 1-km resolution’, Scientific Data, 5. doi: 10.1038/SDATA.2018.214

Howard, Timothy G., Jessica Gurevitch, Laura Hyatt, Margaret Carreiro, and Manuel Lerdau. 2004. “Forest Invasibility in Communities in Southeastern New York.” Biological Invasions 6 (4): 393–410. https://doi.org/10.1023/B:BINV.0000041559.67560.7E/METRICS.

Kattge, J. et al. (2020) ‘TRY plant trait database – enhanced coverage and open access’, Global Change Biology, 26(1), pp. 119–188. doi: 10.1111/GCB.14904

R Core Team (2022) ‘R: A Language and Environment for Statistical Computing’. Vienna, Austria. Available at: https://www.r-project.org/

ter Braak, C. J. F., Cormont, A., & Dray, S. (2012). Improved testing of species traits–environment relationships in the fourth‐corner problem. Ecology, 93(7), 1525–1526. https://doi.org/10.1890/12-0126.1

ter Braak, C. J. F., Šmilauer, P., & Dray, S. (2018). Algorithms and biplots for double constrained correspondence analysis. Environmental and Ecological Statistics, 25(2), 171–197. https://doi.org/10.1007/s10651-017-0395-x

ter Braak, C. J. F., & van Rossum, B.-J. (2025). Linking multivariate trait variation to the environment: the advantages of double constrained correspondence analysis with the R package douconca. Ecological Informatics, 88, 103143. https://doi.org/10.1016/j.ecoinf.2025.103143

United States Department of Agriculture National Resources Conservation Service (USDA -NRCS) (2023) The PLANTS Database. Available at: https://plants.usda.gov/ (Accessed: 9 March 2023).

United States Department of Agriculture Soil Conservation Service (USDA - SSA) and Cornell Agricultural Experiment Station (CAES) (1975) Soil Survey of Suffolk County, New York - United States. U.S. Government Printing Office

Name	Explanation
Funct_Group	The growth habit (functional group) of the given species,
Annual/Perennial	Whether the species is annual (a) or perrenial (p)
Height_ft	Expected height of the given species at maturity, in feet
Height_m	Expected height of the given species at maturity, in meters
C:N	Shoot carbon:nitrogen ratio (categorical; possible values: low, medium, high)
Growth_rate	Expected growth rate of the species relative to other species with the same growth habit (categorical; possible values: slow, medium, rapid)
Fire resistant	Fire resistant - whether the species is known to resist burning (possible values: yes, no)
Fire tolerance	The relative ability of the species to re-sprout, regrow, or reestablish from residual seed after a fire (categorical; possible values: none, low, medium, high)

Name	Explanation
Leaf_C/N	Leaf carbon:nitrogen ratio
Leaf_Ca	Leaf calcium content per unit dry mass, in mg/g
Leaf_Mg	Leaf magnesium content per unit dry mass, in mg/g
Leaf_P	Leaf phosphorous content per unit dry mass, in mg/g
Root_P	Fine root phosphorous content per unit dray mass, in mg/g
SSD	Stem specific density in g/cm^3
Woody	Woodiness, i.e. whether the species has woody tissues or not (0= not woody, 2=woody)
Annual.Perennial	Whether the species is annual (a) or perrenial (p)
C_N_ratio	Shoot carbon:nitrogen ratio (low-med = low to medium, high = high)
Fire_Resistance	Whether the species resists burning (0 = not fire resistant, 1 = fire resistant )
Fire_Tolerance	The relative ability of the species to re-sprout, regrow, or reestablish from residual seed after a fire (0 = none,1 = low, 2 = medium, 3 = high)
Funct_Group	Original functional group (growth habit), as given on the USDA Plants Database
Growth_Rate	Expected growth rate of the species relative to other species with the same growth habit (1 = slow, 2 = medium,3 = rapid)
SLA	Specific leaf area in g/cm^2
Height	Expected height of the species at maturity in meters
Funct_Group3	Functional group of the species after combining certain subcategories and ensuring that each species is assigned to only one function group (see methods for details). Possible values - forb, graminoid, liana, shrub, tree
Funct_Group2	Functional group of the species after combining forbs and graminoids into a single category, i.e. herbs (rest of the column is the saame as Funct_Group3)

Community assembly explains invasion differences between two contrasting forest types

Data files

Abstract

Description of the data and file structure

Vegetation Composition Data

Site Data

Species and Functional Traits Data

Code/Software