Introduced plant species show weaker conspecific density dependence than native plant species

Published Mar 30, 2026 on Dryad. https://doi.org/10.5061/dryad.nk98sf87b

Abstract

Conspecific negative density dependence (CNDD) is necessary for plant coexistence, as it limits each species’ local density and dominance. The high densities attained by many introduced plant species suggest they experience weaker CNDD than native species, yet this has not been tested. This dataset contains fine-scale plant composition data from temperate forest understory communities on Long Island, NY, USA, which was used to contrast CNDD in native and introduced plant species. CNDD is expected to generate conspecific spatial repulsion (spatial overdispersion), but other processes such as dispersal limitation tend to generate clustered distributions of conspecifics. Therefore, we quantified observed spatial patterns in study populations to those seen in simulations of dispersal limitation. Data used to parametrize these simulations, along with the code used for simulations and data analysis, is also provided here.

Dataset DOI: 10.5061/dryad.nk98sf87b

Description of the data and file structure

This dataset contains raw data files (original community composition and trait data from Poddar et al. (2026), and raw trait data obtained from databases and prior literature), formatted data files (data that has been formatted, summarized or otherwise prepared for analysis, using the code files provided), output files (output from the null model simulations, parameter sweeps or final analyses, generated using the provided code files), and code files.

Files and variables

Raw data files

File: CompDat_Transect_long.csv

Description: Community composition data collected through transect sampling by Poddar et al. Each transect was 30 m long and was divided into 10 cm long segments. The cover of each plant species intersecting with each transect segment was visually estimated. Each row in this file represents an observation of a species in one segment of each transect. There may be multiple rows for the same segment of the same transect, if multiple species were observed in that segment.

Variables

Transect: Unique identifier for each transect (only one transect was laid out in each study site, therefore, transect identifier is the same as site identifier)
SegmentStart: Distance between the starting point of the transect and the starting point of the given transect segment, in cm.
SegmentEnd: Distance between the starting point of the transect, and the end point of the given transect segment, in cm.
Species: Scientific name of the species observed. BARE = bare ground (no plant species present in that transect segment)
Cover: Cover of the species observed (maximum value = 10)

File: CompDat_Plot_long.csv

Description: Community composition data collected through plot sampling by Poddar et al. Along each transect, five 1 m x 1 m plots were laid out. Each plot was further subdivided into 20 cm x 20 cm subplots, and the cover of each plant species rooted in each subplot was visually estimated. Each row in this file represents an observation of a species in one subplot of a particular plot. There may be multiple rows for the same subplot of the same plot, if multiple species were observed in that subplot.

Variables

Transect: Unique identifier for each transect/study site
Plot: Unique identifier for each plot in each transect (ranges from 1-5)
Subplot: Unique identifier for each subplot of each plot (ranges from A-Y)
Species: Scientific name of the species observed. BARE = bare ground (no plant species present in that subplot)
PercentCover: Percentage cover of the species observed (maximum value = 100).

File: Traits_Poddar.csv

Description: Trait data compiled by Poddar et al., who gathered it from the TRY trait database (https://www.try-db.org) and the USDA Plants database (https://plants.usda.gov/). Each row of this file is a separate species. This file contains all species observed in the Poddar et al dataset, including ones that were excluded from in the CNDD study. Blanks indicate missing data.

Variables

Scientific.name: Scientific name of the species
Common.name: Common name of the species
Origin: Whether the species is native (N) or introduced (I). unk - unkown
Woody: Whether the species produces woody tissues (0 - herbaceous/not woody, 2 - wood)
Annual.Perennial: Whether the species is annual (a) or perennial (p)
Funct_Group_orig: Functional group (growth from) of each species, as listed on the USDA Plants database
Funct_Group_simpl: Functional group (growth form) simplified into a smaller number of categories (shrubs, trees, lianas, herbs).
Growth_Rate: Growth rate of each species (categorical)
Leaf_C.N: Leaf carbon to nitrogen ratio of each species, in g/g
Height: Expected height at maturity in m
SLA: Specific leaf area in mm/mg

File: Soil.csv

Description: Soil data for the study sites sampled by Poddar et al. See original data publication for details: Poddar, Urmi; Dong, Tracey; Lam, Kristi et al. (2026). Community assembly explains invasion differences between two contrasting forest types [Dataset]. Dryad. https://doi.org/10.5061/dryad.1jwstqk3k

File: DispersalTraits_TRY_Raw.csv (supplemental file - Zenodo)

Description: Data on seed dry mass and dispersal syndrome/dispersal agent of each species downloaded from the TRY trait database (https://www.try-db.org) using the request ID 42311. This dataset contains all species that were observed in the Poddar et al dataset, including ones that were excluded from the the CNDD study. This is the original raw data file as obtained from TRY without any modification by the authors of this dataset. We have tried to provide an explanation of variable names in the TRY_Variable_Names.csv file, however, the meaning of some them was unclear to us (such columns were not used in this analysis). Further questions about this file should be directed to the administrators of TRY.

Variables

See TRY_Variable_Names.csv file

File: DispersalTraits_other.csv

Description: Data on dispersal syndrome (dispersal agent) of each species obtained from the Seed Information Database (SID; https://ser-sid.org/) or from primary and secondary literature and sources other than TRY. This dataset contains all species that were observed in the Poddar et al dataset, including ones that were excluded from the CNDD study. Each row corresponds to a different species.

Variables

Scientific.name: Scientific name of the species
DispSyndrome_SID: Dispersal syndrome of the given species, as reported in the seed information database (blanks = data not available)
DispSyndrome_Other: Dispersal syndrome of the given species from other data sources (blanks = no data found)
Source_DispSyndrome_Other: Data source for the data in the DispSyndrome_Other column
Wind_Adaptations: Morphological adaptations for wind-dispersal (e.g. light seeds, winged seeds), for wind dispersed species only
Source_Wind_Adaptations: Data source for the data in the Wind_Adaptations column
VertebrateDisp_SubCategory: Subcategory of dispersal vectors for vertebrate-dispersed seeds (e.g. birds, rodents)
Source_Vetebrate_SubCategory: Data source for the data in the VertebrateDisp_SubCategory column

File: DispersalDistance_Vittoz_Engler.csv

Description: Expected dispersal distances for plants with different dispersal syndromes, as reported by Vittoz & Engler (2007). The authors of the original study reviewed primary literature on seed dispersal, grouped species by dispersal syndrome (sometimes sub-grouped by growth form or height), and provided expected dispersal distances for each category. Each row in this file corresponds to a different dispersal syndrome category in the original study.

Variables

GrowthForm: Growth form, if specified. Blanks - applies to all growth forms
DispSyndrome: Dispersal syndrome
Wind_Adapt: Morphological adaptations for wind dispersal (for wind-dispersed species only)
Vertebrate_Subcat: Subcategories of dispersal vectors for vertebrate-dispersed seeds
Height_min: Minimum height, in meters, if specified. Blanks - applies to plants of all heights
Height_max: Maximum height, in meters, if specified. Blanks - applies to plants of all heights
Distance_50Quantile: 50th quantile of the dispersal kernel (median dispersal distance), in meters
Distance_99Quantile: 99th quantile of the dispersal kernel (maximum dispersal distance), in meters

File: DispersalKernel_Bullock.csv

Description: Expected dispersal distances for plants with different dispersal syndromes, as reported by Bullock et al (2007). The authors of the original study reviewed primary literature on seed dispersal, grouped species by dispersal syndrome (sometimes sub-grouped by height, seed mass etc.), and estimated the dispersal kernel parameters for each category. Each row in this file corresponds to a different dispersal syndrome group in the original study.

Variables

SeedMassMin: Minimum seed mass, in mg, if specified. Blanks - applies to seeds of any mass
SeedMassMax: Maximum seed mass, in mg, if specified. Blanks - applies to seeds of any mass
a: alpha (scale) parameter of the dispersal kernel, assuming a exponential power distributed dispersal kernel, in meters
b: beta (shape) parameter of the dispersal kernel, assuming a exponential power distributed dispersal kernel (unitless)
All other variables - same as in the DispersalDistance_Vittoz_Engler.csv

File: SizeGrowthTraits_TRY_Raw.csv (supplemental file - Zenodo)

Description: Data on relative growth rate and canopy size/canopy area of each species, downloaded from the TRY trait database (https://www.try-db.org) using the request ID 42757. Only the relative growth rate data was used. This dataset contains also species that were observed in the Poddar et al dataset but were not included in the CNDD study. This is the original raw data file as obtained from TRY without any modification by the authors of this dataset. We have tried to provide an explanation of variable names in the TRY_Variable_Names.csv file, however, the meaning of some them was unclear to us (such columns were not used in this analysis). Further questions about this file should be directed to the administrators of TRY.

Variables

See TRY_Variable_Names.csv file

File: RecruitmentTraits.csv

Description: Data on recruitment traits (seed production and seed germination rates) of each species, taken from primary and secondary literature.

Variables

Scientific.name: Scientific name
GrowthForm: Growth form
SeedsPerPlant: average number of seeds produced per year per individual plant of given species. Blanks = data not available
SeedsPerSqM: average number of seeds of the given species produced per year per m² of ground cover of the given species. Blanks = data not available
Source_SeedProd: Data source for seed production data
Notes_SeedProd: Further details on how seed production data was obtained
SeedsPerSqCm: Number of seeds produced per cm² of ground cover, calculated from the SeedsPerSqM or SeedsPerPlant columns
Notes_SeedsPerSqCm: Further details on how SeedsPerSqCm was calculated
GerminationRate: Average seed germination rate (%) of the given species. Blanks = data not found
Source_GerminationRate: Data source for seed germination rate
Notes_Germination: Further details on how seed germination data was obtained
GerminationPerSqM_lit: Number of seedlings of the given species germinating per m² of ground area sampled, in seed bank studies. Blanks = data not available
Source_GerminationPerSqM: Data source for the data in the GerminationPerSqM_lit column
Notes_GerminationPerSqM: Further details on how the data in the GerminationPerSqM_lit column was obtained
RecruitsPerSqCm: Expected number of recruits of the given species produced per year per cm² of ground cover, calculated either by multiplying the SeedsPerSqCm and the GerminationRate, or by converting the values in the GerminationPerSqM_lit to /cm². Used as the value of the b₀ parameter in the null model simulations (see equation 1 of methods)

File: GrowthLifeSpan_Params.csv

Description: Life span, growth rate and individual size parameters used in the null model simulations. These values were assigned by the authors and represent what the authors consider to be reasonable and realistic values for these parameters for each growth form. Unique values were assigned to each growth form rather than for each species due to lack of species-level data.

Variables

GrowthForm: Growth form
Annual.Perennial: Annual (a) or perrenial (p)
Lifespan_yrs: Lifespan in years. Used as the values of the L parameter in equation 5 (see methods)
Age_ReproMature_yrs: Age at which an individual becomes reproductively mature, in years.
d_max_cm: Maximum attainable canopy diameter in cm
a_max_cm2: Maximum attainable canopy area/ground footprint of an individual plant in cm². Calculated using the d_max_cm column assuming circular canopy. Used as the values of the a_infinity parameter in equation 3 (see methods)
TimeTo0.9: Time taken for an individual to growth to 90% of its maximum canopy area, in days.
a_init: value of the a_init parameter in equation 4 (see methods), calculated using the code in 0.4AreaGrowthParams_Calc.R

Formatted data files

File: Sp_Occurrences_per_Site.csv

Description: Number of occurrences of each species in each site, for the populations that were included in this study. Created using the data in CompDat_Plot_long.csv and CompDat_Transect_long.csv, and the code in 0.1Sp_Occurrences.R. Each row corresponds to a population of a given species in a given site

Variables

Transect: Transect ID (site ID)
Species: species scientific name
Origin: whether the species is native (N) or introduced (I)
GrowthForm: Growth form of the given species
Annual.Perennial: whether the species is annual (a) or perennial (p)
NumSegments: Number of transect segments in which the species was observed, in the given site
NumSubPlots: Number of subplots in which the given species was observed, in the given site
TransectCov: Cover of the given species in the given site, per unit ground area, measured through transect sampling (maximum value = 1)
PlotCov: Relative cover of the given species in given site, per unit ground area, measured through plot sampling (maximum value = 1)

File: Traits_SelectSpecies.csv

Description: Traits of species included in this study. Created using the code in 0.2Trait_Formatting.R file and the data from the following files: DispersalTraits_TRY_Raw.csv, DispersalTraits_other.csv, RecruitmentTraits.csv, SizeGrowthTraits_TRY_Raw.csv, Traits_Poddar.csv. Each row corresponds to a separate species. Blanks = data not available

Variables

Columns "Scientific.name" to "Growth_Rate" - same meaning as Traits_Poddar.csv file.
RGR: Relative growth rate (g/g/day)
Columns "DispSyndrome" to "VertebrateDisp_SubCategory" - same meaning as DispersalTraits_other.csv file
RecruitsPerSqCm: Expected number of recruits of the given species produced per year per cm² of ground cover, taken from the RecruitmentTraits.csv. Used as the value of the b₀ parameter in the null model simulations (see equation 1 of methods)

File: DispersalKernelParams_final.csv

Description: Dispersal kernel parameter values used in the null model simulations. Created using the data in DispersalDistance_Vittoz&Engler.csv and the DispersalKernel_Bullock.csv, and the code in 0.3DispersalParams_Assignment.R file. Variable names same as in the DispersalKernel_Bullock.csv file. Each row corresponds to a seperate species.

File: Mort_DensityDep_Params.csv

Description: Values of the density dependence parameter k (equation 1 in methods) and the mortality parameter m₁ (equation 5 in methods) used in the null model simulations. Values obtained from parameter sweeps matching simulated cover to observed cover, using the code in 0.5Mort&GermParam_Sweep_Sim.R and 0.6Mort&GermParam_Analysis.R files. Separate parameter values for each species x site combination

Variables

Species: Species name
Transect: Transect (site) ID
k: Density dependence parameter
m1: Mortality parameter

Output files

File: ParamsTrial1.csv

Description: Results of the first set of parameter sweeps for estimating the density dependence parameter k (equation 1 in methods) and the mortality parameter m₁ (equation 5 in methods). Similarly, ParamsTrial2.csv contains the results of the second sweep, and ParamsTrial3.csv contains the results of the third sweep. Each row represent the results of a single simulation run simulating the population of a given species in a given site (transect). Generated using the 0.5Mort&GermParam_Sweep_Sim.R code file.

Variables

Species: Species name
Transect: Transect name
k: Value of density dependence parameter k
m1: Value of density dependence parameter m₁
MeanSimCov: Total cover of the simulated population in the simulated landscape, averaged over the last 500 timesteps of the simulation. Maximum possible value = 1. NA - mean simulated cover could not be calculated (see StopReason column)
ExpecCov: Observed cover of the given population in the empirical data (mean of transect cover and plot cover). Maximum possible value = 1
StopReason: Each simulation run was supposed to run for 2000 timesteps; however, in some cases, a run was terminated early due to simulated population size or cover being too high. In that case, the termination reason was recorded in this column, and the MeanSimCov column had an NA value. CovHigh - cover of simulated population is much higher than observed cover, PopHigh - population size is higher than maximum limit (a maximum limit of 10⁵ individuals had to be set in the code to prevent computational overload), NA - the simulation run successfully ran for 2000 timesteps.

File: SimulatedTransectDat.csv

Description: Simulated data generated using the null model simulations. Each row contains the simulated cover of a single species in the a single transect segment. Generated using the 1.1NullModelSims.R code file.

Variables

Rep: Unique idenifier for each replicate simulation for each species x site combination
BlankReason: If the Cover column contains NA (missing value), the reason for that is recorded in this column. ZeroTrCov - the simulated population has zero transect cover (the population may be present in other parts of the landscape, but was not detected in the part of the landscape where the virtual transect was placed), PopHigh - population size reached higher than the maximum limit and the simulation run had to be terminated
All other columns - same as CompDat_Transect_long.csv file

File: MoranI_Observed.csv

Description: Moran's I values for each observed population (site x species combination). Calculated using the 2.1Moran_I_Calc.R file.

Variables

Transect: Transect ID (site ID)
Species: Species name
TransectCov: total cover of the given species in the given transect (maximum value = 1)
Moran_I: Moran's I value
P_Value: P value from standard randomization tests

File: MoranI_Simulated.csv

Description: Moran's I values for each simulated population. Calculated using the 2.1Moran_I_Calc.R file. Variable names same as MoranI_Observed.csv file (variable 'Rep' same as in the SimulatedDat.csv file)

File: CNDD.csv

Description: Observed Moran's I and mean Moran's I values of corresponding simulated populations. Observed values are the same as in MoranI_Observed.csv, while values from the MoranI_Simulated.csv file are averaged over each site x transect combination. Generated using the 2.2CNDD_Analysis.R code file.

Variables

Transect: Transect (site) ID
Species: Species name
Moran_I: Moran's I value (same data as Moran_I column in MoranI_Observed.csv)
P_Value_stdtest: P value from standard randomization test (same as P_Value column in MoranI_Observed.csv file)
MeanNullI: Mean of Moran's I values of simulated populations for the given species x site combination
SDNullI: Standard deviation of Moran's I values of simulated populations for the given species x site combination
NullSampleSize: Number of simulated Moran's values available for the given species x site combination (usually 1000, but may be lower if the simulated population's spatial distribution was could not be quantified; see BlankReason column of SimulatedDat.csv file)
P_Value_displimtest: P-value generated by comparing observed Moran's I value to the distribution of simulated values for the given species x site combination
SES: Standardized effect size, i.e. observed Moran's I standardized by mean of the null distribution (distribution of simulated values for the given species x site combination). A measure of the magnitude of CNDD (more negative values - stronger CNDD)
NullGreater: Whether the mean simulated Moran's I value is greater than the observed value (which would indicate CNDD)

File: HNDD.csv

Description: HNDD of each observed population, quantified as the correlation between the cover of the given species in a given transect segment, and the cover of all other species. Generated using the 3.1HNDD_Analysis.R code file.

Variables

Transect: Transect (site) ID
Species: Species name
Origin: Species origin (native - N, introduced - I)
Corr: Pearson's correlation coefficient, measuring the correlation between the cover of the given species in each transect segment, and the cover of all other species in the same segment. Higher negative values indicate stronger HNDD.
MeanPermCorr: Mean value of the correlation coefficient from the circular shift permuation test (see Statistical Analysis section of methods)
Perm_P_value: P-value testing whether the observed correlation coefficient significantly differs from zero, generated using the circular shift permutation test (see Statistical Analysis section of methods)

Code Files

File: 0.1Sp_Occurrences.R

Description: Code for counting the number of occurrences of each species in each site (transect), and for determining which species are suitable for inclusion in the CNDD study (see Study Sites and Vegetation Data section of the methods).

File: 0.2Trait_Formatting.R

Description: Code for formatting the raw trait data (e.g. dispersal syndrome, seed germination rates etc.) obtained from TRY, Poddar et al. and from primary literature.

File: 0.3DispersalParams_Assignment.R

Description: Code for assigning dispersal kernel parameter values to each species based on their dispersal syndrome and other traits.

File: 0.4AreaGrowthParams_Calc.R

Description: Code for estimating the a_init parameter in equation 4 (see methods), using on the expected maximum size and expected time to 90% size for each lifeform. This parameter is needed for calculating individual growth rate r.

File: 0.5Mort_GermParam_Sweep_Sim.R

Description: Code for running the parameter sweeps for estimating the density dependence parameter k (equation 1 in methods) and the mortality parameter m₁ (equation 5 in methods). This code simulates the dispersal limited null model with different values of k and m₁. Three sets of parameter sweeps were run by re-running this code three times and changing the 'trial_num' variable each time. The parameter values tried out in the first sweep are generated in-situ, while those tried out in subsequent sweeps are generated using 0.6Mort_GermParam_Analysis.R code file. Please note: the authors ran this code on a high performance computing cluster. The code, in its present form, requires 96 cores for parallel computing. Attempting to run it as is on a personal computer/laptop will likely produce an error, as most personal computers do not have 96 cores. Users can reduce the number of cores using the 'ncores' parameter in the 'setting up cluster & loop' code section, however, this will increase computational time.

File: 0.6Mort_GermParam_Analysis.R

Description: Code for analyzing the results of parameter sweeps and identifying the optimal values of k and m₁ (optimal values are those at which simulated cover matches observed cover). If optimal values for a given population are not found in a given sweep, this code generates a list further values to try out, which was then input into the 0.5Mort_GermParam_Sweep_Sim.R file. In this way, three sets of parameter sweeps were run.

File: 1.1NullModelSims.R

Description: Code for the null model simulations, which was used to simulate the spatial distribution of each study population in the absence of CNDD, as described in the Overview of Null Model Simulations and Simulation Processes and Parameters sections of the methods. Please note: the authors ran this code on a high performance computing cluster. The code, in its present form, requires 96 cores for parallel computing. Attempting to run it as is on a personal computer/laptop will likely produce an error, as most personal computers do not have 96 cores. Users can reduce the number of cores using the 'ncores' parameter in the 'setting up cluster & loop' code section, however, this will increase computational time.

File: 2.1Moran_I_Calc.R

Description: Code for calculating Moran's I on the observed and simulated data. It also tests whether observed Moran's I values significantly differ from zero, using standard permutation tests (which assume a null model of complete spatial randomness). (See Statistical Analysis section of methods)

File: 2.2CNDD_Analysis.R

Description: Code for comparing the Moran's I values of observed and simulated populations, and for quantifying the magnitude of CNDD (observed Moran's I minus mean simulated Moran's I, divided by the latter). It also tests whether CNDD significantly differs by species origin (native/introduced) and whether it varies with soil nutrient concentrations or species traits. (See Statistical Analysis section of methods)

File: 3.1HNDD_Analysis.R

Description: Code for quantifying HNDD, testing whether CNDD and HNDD are correlated across species, and whether HNDD significantly differs by species origin (native/introduced). (See Statistical Analysis section of methods)

Other files

File: TRY_Variable_Names.csv (supplemental file - Zenodo)

Explanation of variables in the DispersalTraits_TRY_Raw.csv and SizeGrowthTraits_TRY_Raw.csv

Code/software

All code was run on R version 4.2.0. A high performance computing cluster was used for running some of the code files, which is indicated under the individual file descriptions.

Access information

Other publicly accessible locations of the data:

Data was partially derived from the following sources:

Poddar, Urmi; Dong, Tracey; Lam, Kristi et al. (2026). Community assembly explains invasion differences between two contrasting forest types [Dataset]. Dryad. https://doi.org/10.5061/dryad.1jwstqk3k
Poddar, U., Dong, T., Lam, K., Lee, V., Wilson, P., Gurevitch, J., & D’Andrea, R. (2026). Community assembly explains invasion differences between two contrasting forest types. bioRxiv, 2026.03.05.709929. https://doi.org/10.64898/2026.03.05.709929

Study Sites and Vegetation Data

The empirical data used here were collected by Poddar et al. (2026), who sampled 16 forest sites on Long Island (Suffolk County), NY, USA in 2021-22. In each site, a 30 m long transect, divided into 10 cm segments, was laid out, and the cover of all understory vascular plant species (height < 1m) was estimated in each segment using the line-intercept method. Additionally, five 1 m x 1 m plots were placed within 5 m of the transect, and the cover of all vascular plant species rooted in each plot was visually estimated. This dataset provides spatially-explicit cover data, but not counts or locations of individuals.

We used the spatial distribution of each species in each transect to quantify CNDD. Additionally, we combined transect and plot data to calculate total cover of each species at each site, which informed parameter selection for the null model simulations (see "Simulation Processes and Parameters" section). Only non-tree species and species present in at least 10 transect segments were included, yielding an initial dataset of 22 species and 57 species × site combinations. Two species were subsequently excluded due to difficulties in estimating demographic parameters for the null model simulations (see "Simulation Processes and Parameters" section). Therefore, the final dataset consisted of 20 species, with 14 being native and 6 introduced, and 50 species × site combinations (henceforth called ‘populations’).

Overview of Null Model Simulations

The observed within-transect spatial distribution of study species were compared with those generated by a dispersal-limited null model. For this, we simulated the dynamics of each study species in each site in the absence of CNDD, using individual-based, continuous-space simulations. These simulations included recruitment and dispersal limitation, size-dependent mortality, and global density dependence. The last of these refers to a decrease in recruitment with increasing total biomass in the landscape. Note that this differs from CNDD, which operates at the neighborhood scale and decreases with distance from neighbors, while global density dependence is a function of the total biomass of a given species in the entire landscape, and is not dependent on the spatial distribution of individuals. The functional forms of these processes are described in the "Simulation Processes and Parameters" section. Parameters were taken from empirical literature where available; otherwise, realistic estimated values were assigned by the authors or chosen to match simulated cover with observed cover, as detailed in the "Simulation Processes and Parameters" section.

Separate simulations were run for each species in each site; i.e., only a single population was simulated at a time. The simulated landscape was uniform and 30 m x 10 m in size, with toroidal boundary conditions to avoid edge effects. Individual plants were modelled as circles of finite area and fixed centroid positions, representing plant canopies. The canopies of two or more individuals may overlap with each other. The centroid position, area, and age of each individual was tracked over time.

Each simulation run began with a small number of individuals being randomly distributed across the landscape, with initial population size drawn from a Poisson distribution (λ = 5). At each timestep, the following processes then occurred in sequence: 1) seed production, seed dispersal and birth of new individuals, 2) growth of individuals (increase in size), 3) mortality . Each timestep represents one calendar year, but was assumed to consist of 200 growing days, given the deciduous nature of the study species. The simulations were run for 2000 timesteps, after which transect sampling was conducted on the simulated landscape. A total of 1,000 replicate runs were performed for each species × site combination.

Simulation Processes and Parameters

Recruitment rate and global density dependence

Recruitment rate or birth rate, defined as the expected number of individuals added in a given timestep per unit area occupied by reproductively mature individuals, was modeled as an exponentially declining function of total population area (biomass):

Where is the recruitment rate at time , is the maximum recruitment rate for the given species, is the total area covered by all individuals at the end of timestep , and is the density dependence parameter. Equation 1 thus represents global density dependence. Species-specific values were derived from the literature, by multiplying estimates of seed production per unit ground cover with estimates of seed germination rates. When such data were unavailable, soil seed bank studies were used instead, and the reported number of seeds germinating per unit area of soil sampled was taken as .

Values of were chosen through parameter sweeps so that simulated population cover after 2000 timesteps was within 5% of observed cover, Separate values were assigned to each population. Initial sweeps explored values from 1 to 250 at a resolution of 25. When optimal values were not identified, up to two additional sweeps were conducted by expanding the range or increasing the resolution based on prior results. The range of values explored ultimately extended to 2000 for some species. Two species were excluded because their maximum recruitment rates were either too high or too low for simulated cover to approximate observed values at any value of . Here simulated cover refers to the proportion of the landscape occupied by the population, without double-counting overlapping canopies. Note that this differs from , which was calculated by summing the area of all individuals without accounting for overlap. This distinction reflects empirical cover estimates, which do not double-count overlaps but may underestimate biomass. Accordingly, total area, which represents total biomass, was used to model density dependence, whereas total cover was used for parameter fitting. Observed cover was calculated as the mean of total transect cover and total plot cover of each population, divided by the total area sampled in transect and plot sampling, respectively.

The number of recruits produced by each reproductively mature individual at each timestep was drawn from Poisson distributions with mean equal to multiplied by the individual's area. Individuals were considered reproductively mature once they reached the age of reproductive maturity. These ages were assigned based on the species’s lifeform (biennial herb = 1 year, perennial herb = 2 years, shrubs and lianas = 5 years). Lifeform data was taken from Poddar et al. (2026) , who obtained it from the PLANTS Database (USDA-NCRS, 2023).

Dispersal kernel

The distance between each recruit and its parent was drawn from an exponential power dispersal kernel:

Where is the dispersal distance, is the scale parameter, is the shape parameter, and is the Gamma function.This distribution was selected based on the findings of Bullock et al. (2017), who synthesized empirical seed-dispersal data and identified it as the best fit distribution in most cases. They also provided parameter estimates based on dispersal syndrome and life form, with further subdivisions by height, seed mass, or subcategories of dispersal vectors. Another study, Vittoz & Engler (2007), estimated the median and 99th percentile dispersal distances for each dispersal syndrome through literature review. We used these two studies for assigning dispersal parameter values based on each species’ dispersal syndrome, lifeform, height, and seed mass. Dispersal syndrome data was obtained from the TRY Plant Trait Database (Kattge et al., 2020), the Seed Information Database (SER, INSR & RBGK, 2023), and from primary and secondary literature. Seed mass data was obtained from TRY, and height was taken from Poddar et al. (2026), who synthesized data from TRY and the PLANTS Database.

Additionally, the angle between each recruit and its parent was drawn from a uniform distribution over radians. This, combined with the distance from the parent, was then used to determine the absolute position of each recruit on the landscape. Note that the landscape was toroidal, therefore recruits dispersing beyond the landscape’s boundaries reappeared on the opposite side.

Growth of individuals

Newly-born individuals initially occupied an area of 1 cm². Individual growth was modeled using the Gompertz function, which is commonly used to model plant biomass growth (Paine et al., 2012).

Where and are the size (area) of the individual at year and , respectively, is the maximum attainable size, is the growth rate, and is the number of growing days between the timesteps. For newly-born individuals, was sampled uniformly from 1-200 days, representing variation in date of birth, for all others, days. Maximum size was assigned based on lifeform (10 cm diameter for herbs, 50 cm for shrubs, 100 cm for lianas). Growth rates () were derived from empirical relative growth rate (RGR) data from TRY (Kattge et al., 2020). We initially attempted to use species-specific RGR values, however, this could not be done due to low data availability. Instead, a single value was used by averaging available data across species. The relationship between RGR and is given by (Paine et al., 2012):

Where is the initial size of individuals in the original studies. Since values were not reported, we fit these values, using equations 3-4, such that individuals reached 90% of in 250 days (1.25 years) for biennial herbs, 1000 days (5 years) for perennial herbs, and 2000 days (10 years) for shrubs and lianas. Thus there were separate values for each lifeform.

Mortality

Individuals were chosen to die using Bernoulli trials with probability of death equal to their mortality rates, which was modelled as a step function of size:

(5)

Where is the mortality rate as a function of individual size , is the expected size at 1 year (200 days) of age, is the mortality rate of individuals of size less than, is the expected lifespan of individuals who survive for at least 1 year. Values of were assigned based on lifeform (2 years for biennial herbs, 10 years for perennial herbs, 50 years for shrubs and lianas). Values of assigned through parameter sweeps, concurrently with the parameter sweeps for ( see Recruitment rate and global density dependence section), with separate values for each population. Initial sweeps explored values from 0.69 to 0.99, at a resolution of 0.05, with subsequent rounds extending the range from 0.01 to 0.999 for some populations.

Statistical analysis

We calculated Moran’s I, a measure of spatial autocorrelation, to quantify spatial patterns in the observed and simulated populations. Moran’s I ranges from -1 to 1. Positive Moran’s I indicates spatial clustering, which is expected under dispersal limitation, while negative values indicate overdispersion, as expected under CNDD alone. We first calculated this metric for observed populations and assessed statistical significance using standard randomization tests, which assume a null model of complete spatial randomness. This allowed us to evaluate whether CNDD could be detected without accounting for dispersal limitation. The ‘moran.test’ function in the R package ‘spdep’ (Baddeley & Turner, 2005) was used. We then conducted a second set of tests using the dispersal-limited null model, in which observed Moran’s I values were compared to the distribution of values from the corresponding simulated populations, for each species × site combination. Furthermore, we quantified the magnitude of CNDD by calculating the proportional deviation of each observed value from the mean of the dispersal-limited null, that is, the difference between each observed value and the mean of the corresponding simulated distribution, divided by the latter. Negative values of this metric would indicate that observed populations are more overdispersed than expected under dispersal limitation alone, consistent with CNDD.

We also quantified heterospecific negative density dependence (HNDD),by examining the spatial association between each species’ abundance and the abundance of all other species in the community. Analogous to CNDD, HNDD refers to the decline in a species’ performance with increasing density of heterospecifics. We calculated, for each focal population, the Pearson correlation between the cover of the focal species in each transect segment and the total cover of all other understory species in the same segment. All observed species were included in the latter, including those excluded from the CNDD analysis. Significance was assessed using circular shift permutation tests. This tests for spatial associations between two variables while preserving the internal spatial structure of each variable. It involves shifting the distribution of one of the two variables by a random distance along the transect, with values that move past one end reappearing at the other end, as though the transect were circular. Correlation coefficients were recalculated after each shift, and were then used to create the null distribution. One thousand permutations were used. The means of most null distributions were close to zero, therefore, we did not standardize the observed correlation coefficients by the means of the null distributions, as was done for CNDD. Instead, the observed correlation coefficients were directly used as a measure of the magnitude of HNDD. Note that our values of CNDD and HNDD cannot be directly compared to each other, due to being on different scales. However, comparisons of HNDD across species could be made.

To test whether native and introduced species significantly differed in the strength of CNDD, we fit mixed-effects linear models, with species origin (native/introduced) as the fixed effect and species identity as the random effect. We similarly tested whether the strength of HNDD significantly differed by species origin. Lastly, we tested whether CNDD and HNDD were correlated with each other across species, using Spearman’s correlation coefficient. This assesses whether our CNDD metric reflects density dependence driven broadly by non-specific neighbor density rather than conspecifics per se, which would be indicated by a significant positive correlation between CNDD and HNDD.

References

Bullock, J. M., Mallada González, L., Tamme, R., Götzenberger, L., White, S. M., Pärtel, M., & Hooftman, D. A. P. (2017). A synthesis of empirical plant dispersal kernels. Journal of Ecology, 105(1), 6–19. https://doi.org/10.1111/1365-2745.12666

Kattge, J., Bönisch, G., Díaz, S., Lavorel, S., Prentice, I. C., Leadley, P., Tautenhahn, S., Werner, G. D. A., Aakala, T., Abedi, M., Acosta, A. T. R., Adamidis, G. C., Adamson, K., Aiba, M., Albert, C. H., Alcántara, J. M., Alcázar C, C., Aleixo, I., Ali, H., … Wirth, C. (2020). TRY plant trait database – enhanced coverage and open access. Global Change Biology, 26(1), 119–188. https://doi.org/10.1111/GCB.14904

Paine, C. E. T., Marthews, T. R., Vogt, D. R., Purves, D., Rees, M., Hector, A., & Turnbull, L. A. (2012). How to fit nonlinear plant growth models and calculate growth rates: an update for ecologists. Methods in Ecology and Evolution, 3(2), 245–256. https://doi.org/10.1111/j.2041-210X.2011.00155.x

Poddar, U., Dong, T., Lam, K., Lee, V., Wilson, P., Gurevitch, J., & D’Andrea, R. (2026). Community assembly explains invasion differences between two contrasting forest types. bioRxiv, 2026.03.05.709929. https://doi.org/10.64898/2026.03.05.709929

Society for Ecological Restoration, International Network for Seed Based Restoration, & Royal Botanic Gardens Kew. (2023). Seed Information Database (SID). Https://Ser-Sid.Org/.

United States Department of Agriculture National Resources Conservation Service (USDA-NRCS). (2023). The PLANTS Database. https://plants.usda.gov/

Vittoz, P., & Engler, R. (2007). Seed dispersal distances: A typology based on dispersal modes and plant traits. Botanica Helvetica, 117(2), 109–124. https://doi.org/10.1007/s00035-007-0797-8

Introduced plant species show weaker conspecific density dependence than native plant species

Data files

Abstract

README: Introduced plant species show weaker conspecific density dependence than native plant species

Description of the data and file structure

Files and variables

Raw data files

File: CompDat_Transect_long.csv

Variables

File: CompDat_Plot_long.csv

Variables

File: Traits_Poddar.csv

Variables

File: Soil.csv

File: DispersalTraits_TRY_Raw.csv (supplemental file - Zenodo)

Variables

File: DispersalTraits_other.csv

Variables

File: DispersalDistance_Vittoz_Engler.csv

Variables

File: DispersalKernel_Bullock.csv

Variables

File: SizeGrowthTraits_TRY_Raw.csv (supplemental file - Zenodo)

Variables

File: RecruitmentTraits.csv

Variables

File: GrowthLifeSpan_Params.csv

Variables

Formatted data files

File: Sp_Occurrences_per_Site.csv

Variables

File: Traits_SelectSpecies.csv

Variables

File: DispersalKernelParams_final.csv

File: Mort_DensityDep_Params.csv

Variables

Output files

File: ParamsTrial1.csv

Variables

File: SimulatedTransectDat.csv

Variables

File: MoranI_Observed.csv

Variables

File: MoranI_Simulated.csv

File: CNDD.csv

Variables

File: HNDD.csv

Variables

Code Files

File: 0.1Sp_Occurrences.R

File: 0.2Trait_Formatting.R

File: 0.3DispersalParams_Assignment.R

File: 0.4AreaGrowthParams_Calc.R

File: 0.5Mort_GermParam_Sweep_Sim.R

File: 0.6Mort_GermParam_Analysis.R

File: 1.1NullModelSims.R

File: 2.1Moran_I_Calc.R

File: 2.2CNDD_Analysis.R

File: 3.1HNDD_Analysis.R

Other files

File: TRY_Variable_Names.csv (supplemental file - Zenodo)

Code/software

Access information

Methods