Skip to main content

Rare species biodiversity, socio-demographics and local and landscape characteristics in Northern California community urban gardens

Cite this dataset

Ong, Theresa et al. (2022). Rare species biodiversity, socio-demographics and local and landscape characteristics in Northern California community urban gardens [Dataset]. Dryad.


Cities are sometimes characterized as homogenous with species assemblages composed of abundant, generalist species having similar ecological functions. Under this assumption, rare species, or species observed infrequently, would have especially high conservation value in cities for their potential to increase functional diversity. Management to increase the number of rare species in cities could be an important conservation strategy in a rapidly urbanizing world. However, most studies of species rarity define rarity in relatively pristine environments where human management and disturbance is minimized. We know little about what species are rare, how many species are rare, and what management practices promote rare species in urban environments. Here, we identified which plants and species of birds and bees that control pests and pollinate crops are rare in urban gardens and assessed how social, biophysical factors, and cross-taxonomic comparisons influence rare species richness. We found overwhelming numbers of rare species, with over 50% of plant cultivars observed classified as rare. Our results highlight the importance of women, older individuals, and gardeners who live closer to garden sites in increasing the number of rare plants within urban areas. Fewer rare plants were found in older gardens and gardens with more bare soil. There were more rare bird species in larger gardens and more rare bee species where canopy cover was higher. We also found that in some cases, rarity begets rarity, with positive correlations found between the number of rare plants and bee species and between bee and bird species. Overall, our results suggest that urban gardens include a high number of species existing at low frequency and that social and biophysical factors promoting rare, planned biodiversity can cascade down to promote rare, associated biodiversity.


Study Region 

We worked in 18 urban community gardens in three counties (Santa Clara, Santa Cruz, and Monterey) in the central coast region of California, USA. The gardens differ in local habitat (structural and compositional diversity of both crop and non-crop species) and landscape context (amount of natural, agricultural, and urban land cover in the surrounding area). All gardens have been cultivated for five to 47 years and range from 444 to 15,525 m2 in size. All of the gardens use organic management practices and prohibit the use of chemical pesticides and insecticides. Gardens were chosen because they represent sites across a gradient of urban, natural, and agricultural landscapes and were separated from each other by >2 km, the farthest distance between gardens was 90 km and the closest was 2 km (Cohen et al., 2020; Egerer et al., 2017; Philpott and Bichier, 2017). Gardener demographic data indicates that gardeners are diverse in their make-up, covering a range of family sizes, education, salary, and food insecurity levels (Egerer et al., 2017; Philpott et al., 2020).

Data Collection

We provide the following framework (Fig. 1) to help visualize the specific set of questions posed in this study and the data and analyses used to address them. First, we ask which gardener characteristics (Q1), and which local and landscape garden features affect the number of rare plant cultivars (Q2a) and rare bird and bee species (Q2b) in urban community gardens. We include cultivars as distinct types per (Reiss and Drinkwater 2018). Subsequently, we ask if there is an association between the number of rare plant cultivars and the number of rare bird and bee species (Q3), and if the number of rare bird and bee species are also related to one another (Q4).  

The data analyzed for this research was collected in two summer field seasons (2015, 2017), from May to September, which is the peak urban garden growing season for the region. Gardener characteristics data (defined below) and gardener self-reported plant data were collected in summer 2017 to address Q1 (Fig. 1). Direct sampling of biodiversity (plants, bees, birds) and garden characteristics was done in summer 2015 to address Q2-4 (Fig. 1). Though structural equation modeling (SEMs) was considered, there is no direct way to compare data from 2017 and 2015 because of the methodological differences outlined below. Thus, separate statistical analyses are conducted for 2017 and 2015 data. We can test the relationship between gardener characteristics and number of rare plant cultivars because gardeners reported what plants they grew in our surveys. We cannot directly test how gardener characteristics influenced the number of rare bird and bee species because gardeners were not asked about these species. Instead, we infer effects of gardener characteristics on bees and birds indirectly via the overall research framework in Figure 1. We explain the specific methods for each type of data collection and the analysis below.

Gardener characteristics data

We surveyed gardeners from 18 urban community gardens during the 2017 summer field season. Survey questionnaires collected information on gardener demographic information as well as gardening experience and use data (Table 1). Specifically, we surveyed 185 gardeners in total, or six to 14 gardeners per garden (9.5-65% of the gardener population in a site). We only included surveys in our analysis if plant information on the survey was completed (n=162). We administered surveys in English (n=123), Spanish (n=38), and Bosnian (n=1) and either read the survey out loud in person (n=138) or via phone (n=1), and either had the gardener fill out the survey themselves (n=21) or had a gardener read the survey to another gardener (n=1). Two of the surveys did not have information on the method of survey administration. We also note that despite best efforts to surveys gardens equally, uneven gardener availability resulted in unequal gardener sampling across the 18 community gardens, requiring us to calculate the number of rare plant cultivars in gardener-reported data (2017) by gardener surveys rather than by garden as was done in direct field-based data (2015) described below.

Gardener-reported plant data

Gardeners were asked to identify and list the plant species and cultivars that they planted in their plots. We then classified gardener-reported plants into either crop or ornamental species. Crop species included fruits, vegetables, herbs, and other consumable plants. Ornamental species included plants grown for decorative purposes, such as flowers and non-food providing crops. Though we included plant cultivars as distinct types, gardeners varied in the level of cultivar specificity provided, which we acknowledge is a limitation to our study. We looked up scientific names for common names provided and supplemented these results with direct field-based plant data where researchers identified species and cultivars in the field using methods described in detail below.

Garden characteristics data

Landscape-level garden data

For each garden, we measured the surrounding landscape composition within buffers surrounding gardens at the 0.5, 1, and 3 km scale. We used the 2011 National Land Cover Database (NLCD) (Jin et al. 2015) to calculate the percentage of urban NLCD land cover class using ArcGIS (v. 10.1) (ESRI 2011). Urban land cover was calculated by combining developed low, medium, and high intensity developed land. Urban land cover is correlated with many other land use categories (e.g., natural land), thus we chose to focus on only urban land cover in our models because we were most interested in the effects of urbanization on biodiversity; further, urban land cover has been a significant predictor of biodiversity in previous analyses of these gardens (Quistberg et al. 2016, Egerer et al. 2017). Urban cover at the 1 km scale best predicted pooled species rarity across taxa, exhibiting the lowest AIC of all the scale models (Appendix S1: Table S1), thus the 1 km spatial scale was used for all subsequent analyses.

Local-level garden data

To collect local-scale garden characteristics, we established a 20 x 20 m plot in the center of each garden. In this plot, we measured canopy cover using a spherical densiometer at the center and N, S, E, and W edges of the plot, counted the number and species of trees and shrubs, and counted the number of trees or shrubs in flower within the plot. We determined age and size of each garden by examining historic Google Earth images and noting the first appearance of the gardens, and then we used ground-truthed GPS points taken from each garden to calculate size. For a few of the gardens older than 35 years, we used historical information gained through community resources or discussions with farm management to determine age.

We measured ground characteristics using four 1 x 1 m sub-plots within the 20 x 20 m plots. The 1 x 1 m sub-plots were randomly placed anywhere (including pathways) within the 20 x 20 m plots. Within each 1 x 1 m sub-plot, we measured the height of the tallest herbaceous vegetation and estimated ground cover composition (percent bare soil, rocks, leaf litter, grass, mulch).

We repeated sampling once per month between May and September 2015 and calculated the mean value for each environmental variable for each garden at each time point.

Field-based biodiversity data

Field-based plant data

We measured plant biodiversity using the same four 1 x 1 m sub-plots within the 20 x 20 m plots. Within each sub-plot, we identified the species and cultivars of all herbaceous plants and measured the percent cover for each species and cultivar. This was measured once per month for five sampling periods, separated by roughly 21 days. As with gardener-reported plant data, researchers classified field-based plant data into either crop or ornamental species and cultivars. Plants that did not fit crop or ornamental categories were designated weeds. Gardeners were not asked to report any weeds, thus not classified in gardener-reported plant data.

Bird data

All bird surveys were conducted by one observer (PB) at each sampling period (see Mayorga et al., 2020). In each garden, this person performed a 10-minute point count. Due to small sizes and irregular shape of some gardens, fixed-radius point counts were not used. Instead, the observer stood approximately at the center of each garden and recorded all birds seen or heard within the garden.  We assumed that birds within 30 m that were heard but not seen, were in the plot unless visually observed to be outside of the plot. Each site was visited during different times during daylight hours (i.e. morning, afternoon, evening) across sample periods to reduce bias in the survey. All birds that were seen or heard inside the garden were identified, and totals were calculated for each round.

Bee data 

We sampled bees with both elevated pan traps and hand netting (Grundel et al. 2011), using 400 ml plastic bowls (yellow, white, and blue) painted with Clear Neon Brand and Clear UV spray paint for pan traps (see Quistberg et al. 2016). We placed pan traps from approximately 8-11 AM and collected traps between 4-7 PM on the same day, and sampling was repeated 5 times across the summer. We placed three 1 m tall PVC pipes in the ground in a triangle formation, 5 m apart within each of the 20 x 20 m plots and placed one bowl of each color on top of PVC tubes (Tuell and Isaacs 2009). We filled bowls with 300 ml of water and 4 ml of unscented Dawn dish soap. In addition, we sampled bees using aerial nets at each site for a total of 30 min per site, not including handling time. We netted bees that were observed on flowers, within 20 m of and inside the 20 x 20 m plots in each site. We stored all captured bees for later identification. We performed bee identifications with reference to online resources, image databases, books, and dichotomous keys (Ascher and Pickering, 2015; Frankie et al. 2014; Gibbs, 2010; Michener, 2007)We identified all specimens to the highest taxonomic level possible or designated morphospecies. We compared our specimens to those held in the Kenneth S. Norris Center for Natural History on the University of California, Santa Cruz campus. All voucher specimens are housed in the Philpott Lab at the University of California, Santa Cruz.  

Defining rarity 

We considered a species or cultivar as rare if it occupied less than or equal to 1% of all samples (n=18 garden samples for field-based data for each of 5 sampling rounds and n=185 gardener survey samples for gardener-reported data) (as per (Lyons et al., 2005)). In our study, a rare species or cultivar was found in only one of all 18 sites sampled (1/18 is less than or equal to 5%, which is the lowest occurrence rate possible for our sample size) or was reported in only 2 of all 185 gardener surveys (2/185 rounds down to 1%) for gardener-reported plant data. To assess whether we adequately sampled the biodiversity of each taxon and sampling scheme, species accumulation curves were produced for the gardener-reported plant data, and field-based plant, bird, and bee biodiversity data (Figures S1-4). We tallied the number of rare species and cultivars and compared this to total numbers of species or cultivars documented for each taxonomic group and sampling protocol to determine the extent of rarity we observed in urban gardens. Full lists of rare and common cultivars and species and their frequencies are available in Appendix S1: Tables S2-9.


We constructed four generalized linear mixed models (GLMM) that together address our questions in Figure 1 (Bolker et al., 2009). We used these four models to predict the number of rare gardener-reported plant, field-based plant, bird, and bee species or cultivars as a function of the following fixed and random effects:

rplant-reported 2017 ~ gardener characteristics + (1|garden)+ x                                       (1)

rplant-field 2015 ~ garden characteristics + rbee + rbird + (1|round) + x                           (2)

rbee ~ garden characteristics + rplant-field 2015 + rbird + (1|round) + X                          (3)

rbird ~ garden characteristics + rbee + rplant-field 2015 + (1|round) + X                          (4)

Where r is the number of rare plant cultivars reported by gardeners or rare plant cultivars, bee, and bird species observed in field-based surveys. We used garden as a random effect for gardener-reported data (Eq. 1) and sampling round as a random effect for field-based data (Eqs. 2-4) because the number of rare species or cultivars is calculated by garden for field-based data and by survey for gardener-reported data. We assumed Poisson error distributions, x , for plant count data and transformed our rare bird and bee observations into a binomial presence/absence variable as most counts (97.6% of rare birds and 98.9% of rare bees) were 1 or 0. Thus, we assumed binomial error distributions, X, for the bird and bee count data. 

Our fixed effects include gardener, garden characteristics, and cross-taxonomic effects. Gardener and garden characteristics are composed of several variables detailed below. For all fixed effects, we utilized a VIF cut-off of three to remove any collinear variables (Zuur et al. 2009). Categorical variables were coded in an ordinal format when appropriate (see Table 1) as per (Hildebrand et al. 1977). All analyses and figures were run and generated using the R environment using packages tidyverse, lme4, ggpubr, car, corrplot, and vegan (Bates et al., 2015, R Core Team, 2016, Wei and Simko, 2017, Fox and Weisberg, 2019, Wickham et al., 2019, Alboukadel 2020, and Oksanen et al., 2020).

Gardener characteristics

We used the sociodemographic variables described in Table 1 as fixed effects to predict the number of gardener-reported rare plant cultivars (Eq. 1). After removing collinear variables, the final fixed effects for gardener traits included age, number of people in the family, gender, number of languages other than English, distance of home to the garden, income, education, number of years gardening, number of hours gardening, and food insecurity. 

Garden characteristics 

Several local and landscape-level garden characteristics were measured and broadly divided into three groups: permanent garden and landscape variables, woody vegetation variables, and ground cover variables. The full list of measured variables is available above in methods. In each group, we tested for collinearity between variables, and then among collinear sets of variables, we retained variables for final models based on perceived importance for the taxa in this system  (Quistberg et al. 2016, Burks and Philpott 2017). For example, in the permanent garden and landscape group, size and age of gardens are anticipated to influence the biogeography and microclimates experienced by taxa at garden sites (Smith et al. 2005, Potter and LeBuhn 2015). We are also interested in urbanization, thus selected the variable % urban cover over the variable % agriculture cover, which were correlated. We chose % canopy cover for vegetation characteristics because this is a widely used metric in other studies, it is positively correlated with the number of trees and shrubs, and we did not want to use the number of trees and shrubs since this variable is also a component of our dependent variable, the number of rare plant species. In the category of ground cover, we chose percent bare soil because there is evidence from the literature that many bee species are strongly influenced by this metric and it is negatively correlated with percent mulch and straw (Quistberg et al. 2016). We excluded percent herbaceous plants in this category because of the potential conflict with our dependent variable, the number of rare plant species and cultivars. The same garden characteristic variables were used to predict all rare taxa using (Eqs. 2-4). From here on, we only discuss variables that were retained in our final models. These include garden age, garden size, % urban at 1km, % canopy cover, and % bare soil.

Testing whether rarity begets rarity

Since we are interested in whether the number of rare species and cultivars of plants, birds, and bees are associated, our GLMMs also include the number of rare plant, bird, and bee species or cultivars as explanatory variables where appropriate (Eqs. 2-4). In addition to these models, which test for potential causal relationships, we ran Pearson’s r tests to assess correlations between the numbers of rare species or cultivars across taxa (Fig. S1). When correlating each pair of taxa, we constrained our analysis to include only data that were collected during the same sampling rounds and gardens. 


USDA NIFA, Award: 2016-67019-25185

USDA NIFA, Award: 2016-67032-24987

National Science Foundation, Award: 2016-174835


UC LEADS Program

UC Santa Cruz Committee on Research

University of California, Santa Cruz

National Science Foundation, Award: 1711167