Introduced plant species show weaker conspecific density dependence than native plant species
Data files
Mar 30, 2026 version files 111.33 MB
-
0.1Sp_Occurrences.R
2.35 KB
-
0.2Trait_Formatting.R
10.57 KB
-
0.3DispersalParams_Assignment.R
5.03 KB
-
0.4AreaGrowthParams_Calc.R
1.49 KB
-
0.5Mort_GermParam_Sweep_Sim.R
14.07 KB
-
0.6Mort_GermParam_Analysis.R
6.47 KB
-
1.1NullModelSims.R
13.42 KB
-
2.1Moran_I_Calc.R
3.76 KB
-
2.2CNDD_Analysis.R
6.98 KB
-
3.1HNDD_Analysis.R
5.28 KB
-
CNDD.csv
7.29 KB
-
CompDat_Plot_long.csv
102.59 KB
-
CompDat_Transect_long.csv
241.78 KB
-
DispersalDistance_Vittoz_Engler.csv
383 B
-
DispersalKernel_Bullock.csv
726 B
-
DispersalKernelParams_final.csv
1 KB
-
DispersalTraits_other.csv
15.41 KB
-
GrowthLifeSpan_Params.csv
339 B
-
HNDD.csv
4.02 KB
-
MoranI_Observed.csv
4.10 KB
-
MoranI_Simulated.csv
3.14 MB
-
Mort_DensityDep_Params.csv
1.72 KB
-
ParamsTrial1.csv
2.31 MB
-
ParamsTrial2.csv
943.05 KB
-
ParamsTrial3.csv
741.16 KB
-
README.md
27.06 KB
-
RecruitmentTraits.csv
12.46 KB
-
SimulatedTransectDat.csv
103.70 MB
-
Soil.csv
572 B
-
Sp_Occurrences_per_Site.csv
4.18 KB
-
Traits_Poddar.csv
6.81 KB
-
Traits_SelectSpecies.csv
2.68 KB
Abstract
Conspecific negative density dependence (CNDD) is necessary for plant coexistence, as it limits each species’ local density and dominance. The high densities attained by many introduced plant species suggest they experience weaker CNDD than native species, yet this has not been tested. This dataset contains fine-scale plant composition data from temperate forest understory communities on Long Island, NY, USA, which was used to contrast CNDD in native and introduced plant species. CNDD is expected to generate conspecific spatial repulsion (spatial overdispersion), but other processes such as dispersal limitation tend to generate clustered distributions of conspecifics. Therefore, we quantified observed spatial patterns in study populations to those seen in simulations of dispersal limitation. Data used to parametrize these simulations, along with the code used for simulations and data analysis, is also provided here.
Dataset DOI: 10.5061/dryad.nk98sf87b
Description of the data and file structure
This dataset contains raw data files (original community composition and trait data from Poddar et al. (2026), and raw trait data obtained from databases and prior literature), formatted data files (data that has been formatted, summarized or otherwise prepared for analysis, using the code files provided), output files (output from the null model simulations, parameter sweeps or final analyses, generated using the provided code files), and code files.
Files and variables
Raw data files
File: CompDat_Transect_long.csv
Description: Community composition data collected through transect sampling by Poddar et al. Each transect was 30 m long and was divided into 10 cm long segments. The cover of each plant species intersecting with each transect segment was visually estimated. Each row in this file represents an observation of a species in one segment of each transect. There may be multiple rows for the same segment of the same transect, if multiple species were observed in that segment.
Variables
- Transect: Unique identifier for each transect (only one transect was laid out in each study site, therefore, transect identifier is the same as site identifier)
- SegmentStart: Distance between the starting point of the transect and the starting point of the given transect segment, in cm.
- SegmentEnd: Distance between the starting point of the transect, and the end point of the given transect segment, in cm.
- Species: Scientific name of the species observed. BARE = bare ground (no plant species present in that transect segment)
- Cover: Cover of the species observed (maximum value = 10)
File: CompDat_Plot_long.csv
Description: Community composition data collected through plot sampling by Poddar et al. Along each transect, five 1 m x 1 m plots were laid out. Each plot was further subdivided into 20 cm x 20 cm subplots, and the cover of each plant species rooted in each subplot was visually estimated. Each row in this file represents an observation of a species in one subplot of a particular plot. There may be multiple rows for the same subplot of the same plot, if multiple species were observed in that subplot.
Variables
- Transect: Unique identifier for each transect/study site
- Plot: Unique identifier for each plot in each transect (ranges from 1-5)
- Subplot: Unique identifier for each subplot of each plot (ranges from A-Y)
- Species: Scientific name of the species observed. BARE = bare ground (no plant species present in that subplot)
- PercentCover: Percentage cover of the species observed (maximum value = 100).
File: Traits_Poddar.csv
Description: Trait data compiled by Poddar et al., who gathered it from the TRY trait database (https://www.try-db.org) and the USDA Plants database (https://plants.usda.gov/). Each row of this file is a separate species. This file contains all species observed in the Poddar et al dataset, including ones that were excluded from in the CNDD study. Blanks indicate missing data.
Variables
- Scientific.name: Scientific name of the species
- Common.name: Common name of the species
- Origin: Whether the species is native (N) or introduced (I). unk - unkown
- Woody: Whether the species produces woody tissues (0 - herbaceous/not woody, 2 - wood)
- Annual.Perennial: Whether the species is annual (a) or perennial (p)
- Funct_Group_orig: Functional group (growth from) of each species, as listed on the USDA Plants database
- Funct_Group_simpl: Functional group (growth form) simplified into a smaller number of categories (shrubs, trees, lianas, herbs).
- Growth_Rate: Growth rate of each species (categorical)
- Leaf_C.N: Leaf carbon to nitrogen ratio of each species, in g/g
- Height: Expected height at maturity in m
- SLA: Specific leaf area in mm/mg
File: Soil.csv
Description: Soil data for the study sites sampled by Poddar et al. See original data publication for details: Poddar, Urmi; Dong, Tracey; Lam, Kristi et al. (2026). Community assembly explains invasion differences between two contrasting forest types [Dataset]. Dryad. https://doi.org/10.5061/dryad.1jwstqk3k
File: DispersalTraits_TRY_Raw.csv (supplemental file - Zenodo)
Description: Data on seed dry mass and dispersal syndrome/dispersal agent of each species downloaded from the TRY trait database (https://www.try-db.org) using the request ID 42311. This dataset contains all species that were observed in the Poddar et al dataset, including ones that were excluded from the the CNDD study. This is the original raw data file as obtained from TRY without any modification by the authors of this dataset. We have tried to provide an explanation of variable names in the TRY_Variable_Names.csv file, however, the meaning of some them was unclear to us (such columns were not used in this analysis). Further questions about this file should be directed to the administrators of TRY.
Variables
- See TRY_Variable_Names.csv file
File: DispersalTraits_other.csv
Description: Data on dispersal syndrome (dispersal agent) of each species obtained from the Seed Information Database (SID; https://ser-sid.org/) or from primary and secondary literature and sources other than TRY. This dataset contains all species that were observed in the Poddar et al dataset, including ones that were excluded from the CNDD study. Each row corresponds to a different species.
Variables
- Scientific.name: Scientific name of the species
- DispSyndrome_SID: Dispersal syndrome of the given species, as reported in the seed information database (blanks = data not available)
- DispSyndrome_Other: Dispersal syndrome of the given species from other data sources (blanks = no data found)
- Source_DispSyndrome_Other: Data source for the data in the DispSyndrome_Other column
- Wind_Adaptations: Morphological adaptations for wind-dispersal (e.g. light seeds, winged seeds), for wind dispersed species only
- Source_Wind_Adaptations: Data source for the data in the Wind_Adaptations column
- VertebrateDisp_SubCategory: Subcategory of dispersal vectors for vertebrate-dispersed seeds (e.g. birds, rodents)
- Source_Vetebrate_SubCategory: Data source for the data in the VertebrateDisp_SubCategory column
File: DispersalDistance_Vittoz_Engler.csv
Description: Expected dispersal distances for plants with different dispersal syndromes, as reported by Vittoz & Engler (2007). The authors of the original study reviewed primary literature on seed dispersal, grouped species by dispersal syndrome (sometimes sub-grouped by growth form or height), and provided expected dispersal distances for each category. Each row in this file corresponds to a different dispersal syndrome category in the original study.
Variables
- GrowthForm: Growth form, if specified. Blanks - applies to all growth forms
- DispSyndrome: Dispersal syndrome
- Wind_Adapt: Morphological adaptations for wind dispersal (for wind-dispersed species only)
- Vertebrate_Subcat: Subcategories of dispersal vectors for vertebrate-dispersed seeds
- Height_min: Minimum height, in meters, if specified. Blanks - applies to plants of all heights
- Height_max: Maximum height, in meters, if specified. Blanks - applies to plants of all heights
- Distance_50Quantile: 50th quantile of the dispersal kernel (median dispersal distance), in meters
- Distance_99Quantile: 99th quantile of the dispersal kernel (maximum dispersal distance), in meters
File: DispersalKernel_Bullock.csv
Description: Expected dispersal distances for plants with different dispersal syndromes, as reported by Bullock et al (2007). The authors of the original study reviewed primary literature on seed dispersal, grouped species by dispersal syndrome (sometimes sub-grouped by height, seed mass etc.), and estimated the dispersal kernel parameters for each category. Each row in this file corresponds to a different dispersal syndrome group in the original study.
Variables
- SeedMassMin: Minimum seed mass, in mg, if specified. Blanks - applies to seeds of any mass
- SeedMassMax: Maximum seed mass, in mg, if specified. Blanks - applies to seeds of any mass
- a: alpha (scale) parameter of the dispersal kernel, assuming a exponential power distributed dispersal kernel, in meters
- b: beta (shape) parameter of the dispersal kernel, assuming a exponential power distributed dispersal kernel (unitless)
- All other variables - same as in the DispersalDistance_Vittoz_Engler.csv
File: SizeGrowthTraits_TRY_Raw.csv (supplemental file - Zenodo)
Description: Data on relative growth rate and canopy size/canopy area of each species, downloaded from the TRY trait database (https://www.try-db.org) using the request ID 42757. Only the relative growth rate data was used. This dataset contains also species that were observed in the Poddar et al dataset but were not included in the CNDD study. This is the original raw data file as obtained from TRY without any modification by the authors of this dataset. We have tried to provide an explanation of variable names in the TRY_Variable_Names.csv file, however, the meaning of some them was unclear to us (such columns were not used in this analysis). Further questions about this file should be directed to the administrators of TRY.
Variables
- See TRY_Variable_Names.csv file
File: RecruitmentTraits.csv
Description: Data on recruitment traits (seed production and seed germination rates) of each species, taken from primary and secondary literature.
Variables
- Scientific.name: Scientific name
- GrowthForm: Growth form
- SeedsPerPlant: average number of seeds produced per year per individual plant of given species. Blanks = data not available
- SeedsPerSqM: average number of seeds of the given species produced per year per m2 of ground cover of the given species. Blanks = data not available
- Source_SeedProd: Data source for seed production data
- Notes_SeedProd: Further details on how seed production data was obtained
- SeedsPerSqCm: Number of seeds produced per cm2 of ground cover, calculated from the SeedsPerSqM or SeedsPerPlant columns
- Notes_SeedsPerSqCm: Further details on how SeedsPerSqCm was calculated
- GerminationRate: Average seed germination rate (%) of the given species. Blanks = data not found
- Source_GerminationRate: Data source for seed germination rate
- Notes_Germination: Further details on how seed germination data was obtained
- GerminationPerSqM_lit: Number of seedlings of the given species germinating per m2 of ground area sampled, in seed bank studies. Blanks = data not available
- Source_GerminationPerSqM: Data source for the data in the GerminationPerSqM_lit column
- Notes_GerminationPerSqM: Further details on how the data in the GerminationPerSqM_lit column was obtained
- RecruitsPerSqCm: Expected number of recruits of the given species produced per year per cm2 of ground cover, calculated either by multiplying the SeedsPerSqCm and the GerminationRate, or by converting the values in the GerminationPerSqM_lit to /cm2. Used as the value of the b0 parameter in the null model simulations (see equation 1 of methods)
File: GrowthLifeSpan_Params.csv
Description: Life span, growth rate and individual size parameters used in the null model simulations. These values were assigned by the authors and represent what the authors consider to be reasonable and realistic values for these parameters for each growth form. Unique values were assigned to each growth form rather than for each species due to lack of species-level data.
Variables
- GrowthForm: Growth form
- Annual.Perennial: Annual (a) or perrenial (p)
- Lifespan_yrs: Lifespan in years. Used as the values of the L parameter in equation 5 (see methods)
- Age_ReproMature_yrs: Age at which an individual becomes reproductively mature, in years.
- d_max_cm: Maximum attainable canopy diameter in cm
- a_max_cm2: Maximum attainable canopy area/ground footprint of an individual plant in cm2. Calculated using the d_max_cm column assuming circular canopy. Used as the values of the ainfinity parameter in equation 3 (see methods)
- TimeTo0.9: Time taken for an individual to growth to 90% of its maximum canopy area, in days.
- a_init: value of the ainit parameter in equation 4 (see methods), calculated using the code in 0.4AreaGrowthParams_Calc.R
Formatted data files
File: Sp_Occurrences_per_Site.csv
Description: Number of occurrences of each species in each site, for the populations that were included in this study. Created using the data in CompDat_Plot_long.csv and CompDat_Transect_long.csv, and the code in 0.1Sp_Occurrences.R. Each row corresponds to a population of a given species in a given site
Variables
- Transect: Transect ID (site ID)
- Species: species scientific name
- Origin: whether the species is native (N) or introduced (I)
- GrowthForm: Growth form of the given species
- Annual.Perennial: whether the species is annual (a) or perennial (p)
- NumSegments: Number of transect segments in which the species was observed, in the given site
- NumSubPlots: Number of subplots in which the given species was observed, in the given site
- TransectCov: Cover of the given species in the given site, per unit ground area, measured through transect sampling (maximum value = 1)
- PlotCov: Relative cover of the given species in given site, per unit ground area, measured through plot sampling (maximum value = 1)
File: Traits_SelectSpecies.csv
Description: Traits of species included in this study. Created using the code in 0.2Trait_Formatting.R file and the data from the following files: DispersalTraits_TRY_Raw.csv, DispersalTraits_other.csv, RecruitmentTraits.csv, SizeGrowthTraits_TRY_Raw.csv, Traits_Poddar.csv. Each row corresponds to a separate species. Blanks = data not available
Variables
- Columns "Scientific.name" to "Growth_Rate" - same meaning as Traits_Poddar.csv file.
- RGR: Relative growth rate (g/g/day)
- Columns "DispSyndrome" to "VertebrateDisp_SubCategory" - same meaning as DispersalTraits_other.csv file
- RecruitsPerSqCm: Expected number of recruits of the given species produced per year per cm2 of ground cover, taken from the RecruitmentTraits.csv. Used as the value of the b0 parameter in the null model simulations (see equation 1 of methods)
File: DispersalKernelParams_final.csv
Description: Dispersal kernel parameter values used in the null model simulations. Created using the data in DispersalDistance_Vittoz&Engler.csv and the DispersalKernel_Bullock.csv, and the code in 0.3DispersalParams_Assignment.R file. Variable names same as in the DispersalKernel_Bullock.csv file. Each row corresponds to a seperate species.
File: Mort_DensityDep_Params.csv
Description: Values of the density dependence parameter k (equation 1 in methods) and the mortality parameter m1 (equation 5 in methods) used in the null model simulations. Values obtained from parameter sweeps matching simulated cover to observed cover, using the code in 0.5Mort&GermParam_Sweep_Sim.R and 0.6Mort&GermParam_Analysis.R files. Separate parameter values for each species x site combination
Variables
- Species: Species name
- Transect: Transect (site) ID
- k: Density dependence parameter
- m1: Mortality parameter
Output files
File: ParamsTrial1.csv
Description: Results of the first set of parameter sweeps for estimating the density dependence parameter k (equation 1 in methods) and the mortality parameter m1 (equation 5 in methods). Similarly, ParamsTrial2.csv contains the results of the second sweep, and ParamsTrial3.csv contains the results of the third sweep. Each row represent the results of a single simulation run simulating the population of a given species in a given site (transect). Generated using the 0.5Mort&GermParam_Sweep_Sim.R code file.
Variables
- Species: Species name
- Transect: Transect name
- k: Value of density dependence parameter k
- m1: Value of density dependence parameter m1
- MeanSimCov: Total cover of the simulated population in the simulated landscape, averaged over the last 500 timesteps of the simulation. Maximum possible value = 1. NA - mean simulated cover could not be calculated (see StopReason column)
- ExpecCov: Observed cover of the given population in the empirical data (mean of transect cover and plot cover). Maximum possible value = 1
- StopReason: Each simulation run was supposed to run for 2000 timesteps; however, in some cases, a run was terminated early due to simulated population size or cover being too high. In that case, the termination reason was recorded in this column, and the MeanSimCov column had an NA value. CovHigh - cover of simulated population is much higher than observed cover, PopHigh - population size is higher than maximum limit (a maximum limit of 105 individuals had to be set in the code to prevent computational overload), NA - the simulation run successfully ran for 2000 timesteps.
File: SimulatedTransectDat.csv
Description: Simulated data generated using the null model simulations. Each row contains the simulated cover of a single species in the a single transect segment. Generated using the 1.1NullModelSims.R code file.
Variables
- Rep: Unique idenifier for each replicate simulation for each species x site combination
- BlankReason: If the Cover column contains NA (missing value), the reason for that is recorded in this column. ZeroTrCov - the simulated population has zero transect cover (the population may be present in other parts of the landscape, but was not detected in the part of the landscape where the virtual transect was placed), PopHigh - population size reached higher than the maximum limit and the simulation run had to be terminated
- All other columns - same as CompDat_Transect_long.csv file
File: MoranI_Observed.csv
Description: Moran's I values for each observed population (site x species combination). Calculated using the 2.1Moran_I_Calc.R file.
Variables
- Transect: Transect ID (site ID)
- Species: Species name
- TransectCov: total cover of the given species in the given transect (maximum value = 1)
- Moran_I: Moran's I value
- P_Value: P value from standard randomization tests
File: MoranI_Simulated.csv
Description: Moran's I values for each simulated population. Calculated using the 2.1Moran_I_Calc.R file. Variable names same as MoranI_Observed.csv file (variable 'Rep' same as in the SimulatedDat.csv file)
File: CNDD.csv
Description: Observed Moran's I and mean Moran's I values of corresponding simulated populations. Observed values are the same as in MoranI_Observed.csv, while values from the MoranI_Simulated.csv file are averaged over each site x transect combination. Generated using the 2.2CNDD_Analysis.R code file.
Variables
- Transect: Transect (site) ID
- Species: Species name
- Moran_I: Moran's I value (same data as Moran_I column in MoranI_Observed.csv)
- P_Value_stdtest: P value from standard randomization test (same as P_Value column in MoranI_Observed.csv file)
- MeanNullI: Mean of Moran's I values of simulated populations for the given species x site combination
- SDNullI: Standard deviation of Moran's I values of simulated populations for the given species x site combination
- NullSampleSize: Number of simulated Moran's values available for the given species x site combination (usually 1000, but may be lower if the simulated population's spatial distribution was could not be quantified; see BlankReason column of SimulatedDat.csv file)
- P_Value_displimtest: P-value generated by comparing observed Moran's I value to the distribution of simulated values for the given species x site combination
- SES: Standardized effect size, i.e. observed Moran's I standardized by mean of the null distribution (distribution of simulated values for the given species x site combination). A measure of the magnitude of CNDD (more negative values - stronger CNDD)
- NullGreater: Whether the mean simulated Moran's I value is greater than the observed value (which would indicate CNDD)
File: HNDD.csv
Description: HNDD of each observed population, quantified as the correlation between the cover of the given species in a given transect segment, and the cover of all other species. Generated using the 3.1HNDD_Analysis.R code file.
Variables
- Transect: Transect (site) ID
- Species: Species name
- Origin: Species origin (native - N, introduced - I)
- Corr: Pearson's correlation coefficient, measuring the correlation between the cover of the given species in each transect segment, and the cover of all other species in the same segment. Higher negative values indicate stronger HNDD.
- MeanPermCorr: Mean value of the correlation coefficient from the circular shift permuation test (see Statistical Analysis section of methods)
- Perm_P_value: P-value testing whether the observed correlation coefficient significantly differs from zero, generated using the circular shift permutation test (see Statistical Analysis section of methods)
Code Files
File: 0.1Sp_Occurrences.R
Description: Code for counting the number of occurrences of each species in each site (transect), and for determining which species are suitable for inclusion in the CNDD study (see Study Sites and Vegetation Data section of the methods).
File: 0.2Trait_Formatting.R
Description: Code for formatting the raw trait data (e.g. dispersal syndrome, seed germination rates etc.) obtained from TRY, Poddar et al. and from primary literature.
File: 0.3DispersalParams_Assignment.R
Description: Code for assigning dispersal kernel parameter values to each species based on their dispersal syndrome and other traits.
File: 0.4AreaGrowthParams_Calc.R
Description: Code for estimating the ainit parameter in equation 4 (see methods), using on the expected maximum size and expected time to 90% size for each lifeform. This parameter is needed for calculating individual growth rate r.
File: 0.5Mort_GermParam_Sweep_Sim.R
Description: Code for running the parameter sweeps for estimating the density dependence parameter k (equation 1 in methods) and the mortality parameter m1 (equation 5 in methods). This code simulates the dispersal limited null model with different values of k and m1. Three sets of parameter sweeps were run by re-running this code three times and changing the 'trial_num' variable each time. The parameter values tried out in the first sweep are generated in-situ, while those tried out in subsequent sweeps are generated using 0.6Mort_GermParam_Analysis.R code file. Please note: the authors ran this code on a high performance computing cluster. The code, in its present form, requires 96 cores for parallel computing. Attempting to run it as is on a personal computer/laptop will likely produce an error, as most personal computers do not have 96 cores. Users can reduce the number of cores using the 'ncores' parameter in the 'setting up cluster & loop' code section, however, this will increase computational time.
File: 0.6Mort_GermParam_Analysis.R
Description: Code for analyzing the results of parameter sweeps and identifying the optimal values of k and m1 (optimal values are those at which simulated cover matches observed cover). If optimal values for a given population are not found in a given sweep, this code generates a list further values to try out, which was then input into the 0.5Mort_GermParam_Sweep_Sim.R file. In this way, three sets of parameter sweeps were run.
File: 1.1NullModelSims.R
Description: Code for the null model simulations, which was used to simulate the spatial distribution of each study population in the absence of CNDD, as described in the Overview of Null Model Simulations and Simulation Processes and Parameters sections of the methods. Please note: the authors ran this code on a high performance computing cluster. The code, in its present form, requires 96 cores for parallel computing. Attempting to run it as is on a personal computer/laptop will likely produce an error, as most personal computers do not have 96 cores. Users can reduce the number of cores using the 'ncores' parameter in the 'setting up cluster & loop' code section, however, this will increase computational time.
File: 2.1Moran_I_Calc.R
Description: Code for calculating Moran's I on the observed and simulated data. It also tests whether observed Moran's I values significantly differ from zero, using standard permutation tests (which assume a null model of complete spatial randomness). (See Statistical Analysis section of methods)
File: 2.2CNDD_Analysis.R
Description: Code for comparing the Moran's I values of observed and simulated populations, and for quantifying the magnitude of CNDD (observed Moran's I minus mean simulated Moran's I, divided by the latter). It also tests whether CNDD significantly differs by species origin (native/introduced) and whether it varies with soil nutrient concentrations or species traits. (See Statistical Analysis section of methods)
File: 3.1HNDD_Analysis.R
Description: Code for quantifying HNDD, testing whether CNDD and HNDD are correlated across species, and whether HNDD significantly differs by species origin (native/introduced). (See Statistical Analysis section of methods)
Other files
File: TRY_Variable_Names.csv (supplemental file - Zenodo)
Explanation of variables in the DispersalTraits_TRY_Raw.csv and SizeGrowthTraits_TRY_Raw.csv
Code/software
All code was run on R version 4.2.0. A high performance computing cluster was used for running some of the code files, which is indicated under the individual file descriptions.
Access information
Other publicly accessible locations of the data:
- NA
Data was partially derived from the following sources:
- Poddar, Urmi; Dong, Tracey; Lam, Kristi et al. (2026). Community assembly explains invasion differences between two contrasting forest types [Dataset]. Dryad. https://doi.org/10.5061/dryad.1jwstqk3k
- Poddar, U., Dong, T., Lam, K., Lee, V., Wilson, P., Gurevitch, J., & D’Andrea, R. (2026). Community assembly explains invasion differences between two contrasting forest types. bioRxiv, 2026.03.05.709929. https://doi.org/10.64898/2026.03.05.709929
Study Sites and Vegetation Data
The empirical data used here were collected by Poddar et al. (2026), who sampled 16 forest sites on Long Island (Suffolk County), NY, USA in 2021-22. In each site, a 30 m long transect, divided into 10 cm segments, was laid out, and the cover of all understory vascular plant species (height < 1m) was estimated in each segment using the line-intercept method. Additionally, five 1 m x 1 m plots were placed within 5 m of the transect, and the cover of all vascular plant species rooted in each plot was visually estimated. This dataset provides spatially-explicit cover data, but not counts or locations of individuals.
We used the spatial distribution of each species in each transect to quantify CNDD. Additionally, we combined transect and plot data to calculate total cover of each species at each site, which informed parameter selection for the null model simulations (see "Simulation Processes and Parameters" section). Only non-tree species and species present in at least 10 transect segments were included, yielding an initial dataset of 22 species and 57 species × site combinations. Two species were subsequently excluded due to difficulties in estimating demographic parameters for the null model simulations (see "Simulation Processes and Parameters" section). Therefore, the final dataset consisted of 20 species, with 14 being native and 6 introduced, and 50 species × site combinations (henceforth called ‘populations’).
Overview of Null Model Simulations
The observed within-transect spatial distribution of study species were compared with those generated by a dispersal-limited null model. For this, we simulated the dynamics of each study species in each site in the absence of CNDD, using individual-based, continuous-space simulations. These simulations included recruitment and dispersal limitation, size-dependent mortality, and global density dependence. The last of these refers to a decrease in recruitment with increasing total biomass in the landscape. Note that this differs from CNDD, which operates at the neighborhood scale and decreases with distance from neighbors, while global density dependence is a function of the total biomass of a given species in the entire landscape, and is not dependent on the spatial distribution of individuals. The functional forms of these processes are described in the "Simulation Processes and Parameters" section. Parameters were taken from empirical literature where available; otherwise, realistic estimated values were assigned by the authors or chosen to match simulated cover with observed cover, as detailed in the "Simulation Processes and Parameters" section.
Separate simulations were run for each species in each site; i.e., only a single population was simulated at a time. The simulated landscape was uniform and 30 m x 10 m in size, with toroidal boundary conditions to avoid edge effects. Individual plants were modelled as circles of finite area and fixed centroid positions, representing plant canopies. The canopies of two or more individuals may overlap with each other. The centroid position, area, and age of each individual was tracked over time.
Each simulation run began with a small number of individuals being randomly distributed across the landscape, with initial population size drawn from a Poisson distribution (λ = 5). At each timestep, the following processes then occurred in sequence: 1) seed production, seed dispersal and birth of new individuals, 2) growth of individuals (increase in size), 3) mortality . Each timestep represents one calendar year, but was assumed to consist of 200 growing days, given the deciduous nature of the study species. The simulations were run for 2000 timesteps, after which transect sampling was conducted on the simulated landscape. A total of 1,000 replicate runs were performed for each species × site combination.
Simulation Processes and Parameters
Recruitment rate and global density dependence
Recruitment rate or birth rate, defined as the expected number of individuals added in a given timestep per unit area occupied by reproductively mature individuals, was modeled as an exponentially declining function of total population area (biomass):
Where is the recruitment rate at time
,
is the maximum recruitment rate for the given species,
is the total area covered by all individuals at the end of timestep
, and
is the density dependence parameter. Equation 1 thus represents global density dependence. Species-specific
values were derived from the literature, by multiplying estimates of seed production per unit ground cover with estimates of seed germination rates. When such data were unavailable, soil seed bank studies were used instead, and the reported number of seeds germinating per unit area of soil sampled was taken as
.
Values of were chosen through parameter sweeps so that simulated population cover after 2000 timesteps was within 5% of observed cover, Separate values were assigned to each population. Initial sweeps explored
values from 1 to 250 at a resolution of 25. When optimal values were not identified, up to two additional sweeps were conducted by expanding the range or increasing the resolution based on prior results. The range of
values explored ultimately extended to 2000 for some species. Two species were excluded because their maximum recruitment rates were either too high or too low for simulated cover to approximate observed values at any value of
. Here simulated cover refers to the proportion of the landscape occupied by the population, without double-counting overlapping canopies. Note that this differs from
, which was calculated by summing the area of all individuals without accounting for overlap. This distinction reflects empirical cover estimates, which do not double-count overlaps but may underestimate biomass. Accordingly, total area, which represents total biomass, was used to model density dependence, whereas total cover was used for parameter fitting. Observed cover was calculated as the mean of total transect cover and total plot cover of each population, divided by the total area sampled in transect and plot sampling, respectively.
The number of recruits produced by each reproductively mature individual at each timestep was drawn from Poisson distributions with mean equal to multiplied by the individual's area. Individuals were considered reproductively mature once they reached the age of reproductive maturity. These ages were assigned based on the species’s lifeform (biennial herb = 1 year, perennial herb = 2 years, shrubs and lianas = 5 years). Lifeform data was taken from Poddar et al. (2026) , who obtained it from the PLANTS Database (USDA-NCRS, 2023).
Dispersal kernel
The distance between each recruit and its parent was drawn from an exponential power dispersal kernel:
Where is the dispersal distance,
is the scale parameter,
is the shape parameter, and
is the Gamma function.This distribution was selected based on the findings of Bullock et al. (2017), who synthesized empirical seed-dispersal data and identified it as the best fit distribution in most cases. They also provided parameter estimates based on dispersal syndrome and life form, with further subdivisions by height, seed mass, or subcategories of dispersal vectors. Another study, Vittoz & Engler (2007), estimated the median and 99th percentile dispersal distances for each dispersal syndrome through literature review. We used these two studies for assigning dispersal parameter values based on each species’ dispersal syndrome, lifeform, height, and seed mass. Dispersal syndrome data was obtained from the TRY Plant Trait Database (Kattge et al., 2020), the Seed Information Database (SER, INSR & RBGK, 2023), and from primary and secondary literature. Seed mass data was obtained from TRY, and height was taken from Poddar et al. (2026), who synthesized data from TRY and the PLANTS Database.
Additionally, the angle between each recruit and its parent was drawn from a uniform distribution over radians. This, combined with the distance from the parent, was then used to determine the absolute position of each recruit on the landscape. Note that the landscape was toroidal, therefore recruits dispersing beyond the landscape’s boundaries reappeared on the opposite side.
Growth of individuals
Newly-born individuals initially occupied an area of 1 cm2. Individual growth was modeled using the Gompertz function, which is commonly used to model plant biomass growth (Paine et al., 2012).
Where and
are the size (area) of the individual at year
and
, respectively,
is the maximum attainable size,
is the growth rate, and
is the number of growing days between the timesteps. For newly-born individuals,
was sampled uniformly from 1-200 days, representing variation in date of birth, for all others,
days. Maximum size
was assigned based on lifeform (10 cm diameter for herbs, 50 cm for shrubs, 100 cm for lianas). Growth rates (
) were derived from empirical relative growth rate (RGR) data from TRY (Kattge et al., 2020). We initially attempted to use species-specific RGR values, however, this could not be done due to low data availability. Instead, a single value was used by averaging available data across species. The relationship between RGR and
is given by (Paine et al., 2012):
Where is the initial size of individuals in the original studies. Since
values were not reported, we fit these values, using equations 3-4, such that individuals reached 90% of
in 250 days (1.25 years) for biennial herbs, 1000 days (5 years) for perennial herbs, and 2000 days (10 years) for shrubs and lianas. Thus there were separate
values for each lifeform.
Mortality
Individuals were chosen to die using Bernoulli trials with probability of death equal to their mortality rates, which was modelled as a step function of size:
(5)
Where is the mortality rate as a function of individual size
,
is the expected size at 1 year (200 days) of age,
is the mortality rate of individuals of size less than
,
is the expected lifespan of individuals who survive for at least 1 year. Values of
were assigned based on lifeform (2 years for biennial herbs, 10 years for perennial herbs, 50 years for shrubs and lianas). Values of
assigned through parameter sweeps, concurrently with the parameter sweeps for
( see Recruitment rate and global density dependence section), with separate values for each population. Initial sweeps explored
values from 0.69 to 0.99, at a resolution of 0.05, with subsequent rounds extending the range from 0.01 to 0.999 for some populations.
Statistical analysis
We calculated Moran’s I, a measure of spatial autocorrelation, to quantify spatial patterns in the observed and simulated populations. Moran’s I ranges from -1 to 1. Positive Moran’s I indicates spatial clustering, which is expected under dispersal limitation, while negative values indicate overdispersion, as expected under CNDD alone. We first calculated this metric for observed populations and assessed statistical significance using standard randomization tests, which assume a null model of complete spatial randomness. This allowed us to evaluate whether CNDD could be detected without accounting for dispersal limitation. The ‘moran.test’ function in the R package ‘spdep’ (Baddeley & Turner, 2005) was used. We then conducted a second set of tests using the dispersal-limited null model, in which observed Moran’s I values were compared to the distribution of values from the corresponding simulated populations, for each species × site combination. Furthermore, we quantified the magnitude of CNDD by calculating the proportional deviation of each observed value from the mean of the dispersal-limited null, that is, the difference between each observed value and the mean of the corresponding simulated distribution, divided by the latter. Negative values of this metric would indicate that observed populations are more overdispersed than expected under dispersal limitation alone, consistent with CNDD.
We also quantified heterospecific negative density dependence (HNDD),by examining the spatial association between each species’ abundance and the abundance of all other species in the community. Analogous to CNDD, HNDD refers to the decline in a species’ performance with increasing density of heterospecifics. We calculated, for each focal population, the Pearson correlation between the cover of the focal species in each transect segment and the total cover of all other understory species in the same segment. All observed species were included in the latter, including those excluded from the CNDD analysis. Significance was assessed using circular shift permutation tests. This tests for spatial associations between two variables while preserving the internal spatial structure of each variable. It involves shifting the distribution of one of the two variables by a random distance along the transect, with values that move past one end reappearing at the other end, as though the transect were circular. Correlation coefficients were recalculated after each shift, and were then used to create the null distribution. One thousand permutations were used. The means of most null distributions were close to zero, therefore, we did not standardize the observed correlation coefficients by the means of the null distributions, as was done for CNDD. Instead, the observed correlation coefficients were directly used as a measure of the magnitude of HNDD. Note that our values of CNDD and HNDD cannot be directly compared to each other, due to being on different scales. However, comparisons of HNDD across species could be made.
To test whether native and introduced species significantly differed in the strength of CNDD, we fit mixed-effects linear models, with species origin (native/introduced) as the fixed effect and species identity as the random effect. We similarly tested whether the strength of HNDD significantly differed by species origin. Lastly, we tested whether CNDD and HNDD were correlated with each other across species, using Spearman’s correlation coefficient. This assesses whether our CNDD metric reflects density dependence driven broadly by non-specific neighbor density rather than conspecifics per se, which would be indicated by a significant positive correlation between CNDD and HNDD.
References
Bullock, J. M., Mallada González, L., Tamme, R., Götzenberger, L., White, S. M., Pärtel, M., & Hooftman, D. A. P. (2017). A synthesis of empirical plant dispersal kernels. Journal of Ecology, 105(1), 6–19. https://doi.org/10.1111/1365-2745.12666
Kattge, J., Bönisch, G., Díaz, S., Lavorel, S., Prentice, I. C., Leadley, P., Tautenhahn, S., Werner, G. D. A., Aakala, T., Abedi, M., Acosta, A. T. R., Adamidis, G. C., Adamson, K., Aiba, M., Albert, C. H., Alcántara, J. M., Alcázar C, C., Aleixo, I., Ali, H., … Wirth, C. (2020). TRY plant trait database – enhanced coverage and open access. Global Change Biology, 26(1), 119–188. https://doi.org/10.1111/GCB.14904
Paine, C. E. T., Marthews, T. R., Vogt, D. R., Purves, D., Rees, M., Hector, A., & Turnbull, L. A. (2012). How to fit nonlinear plant growth models and calculate growth rates: an update for ecologists. Methods in Ecology and Evolution, 3(2), 245–256. https://doi.org/10.1111/j.2041-210X.2011.00155.x
Poddar, U., Dong, T., Lam, K., Lee, V., Wilson, P., Gurevitch, J., & D’Andrea, R. (2026). Community assembly explains invasion differences between two contrasting forest types. bioRxiv, 2026.03.05.709929. https://doi.org/10.64898/2026.03.05.709929
Society for Ecological Restoration, International Network for Seed Based Restoration, & Royal Botanic Gardens Kew. (2023). Seed Information Database (SID). Https://Ser-Sid.Org/.
United States Department of Agriculture National Resources Conservation Service (USDA-NRCS). (2023). The PLANTS Database. https://plants.usda.gov/
Vittoz, P., & Engler, R. (2007). Seed dispersal distances: A typology based on dispersal modes and plant traits. Botanica Helvetica, 117(2), 109–124. https://doi.org/10.1007/s00035-007-0797-8
