Distinguishing Impatiens capensis from Impatiens pallida (Balsaminaceae) using leaf traits
Data files
Mar 27, 2020 version files 3.19 GB
-
Leaf_scans-Original.tar.gz
-
Leaf_scans-Processed.tar.gz
-
Leaf_scans-subset-Standardized_Color_for_RGB_analysis.tar.gz
-
leaf_trait_data_files.tar.gz
-
master_list_of_all_samples.csv
-
morphometric_analysis_leaves.R
Abstract
Impatiens capensis (orange jewelweed) and Impatiens pallida (yellow jewelweed) are annual species with similar phenotypes that grow in similar environments throughout the eastern United States. This makes them extremely difficult to distinguish when (chasmogamous) flowers are absent. We use morphometric analyses to identify leaf characters that distinguish these species. After collecting and scanning 342 leaves from plants of each species growing in co-occurring populations in Madison, WI, we quantified: leaf size, shape (using elliptical Fourier analysis), serratedness, and color. Using leaf size and shape traits, a linear discriminate analysis assigned up to 100% of leaves to the correct species. The uppermost fully expanded leaf yielded the most accurate species assignments based on size and shape traits. This leaf was on average, smaller, less deeply serrated, with a more acute base, apex, and elliptical shape in I. capensis as compared to I. pallida. Impatiens pallida leaves had more color contrast (lighter veins and margins) than I. capensis, which were solid green throughout. Morphometric analysis is a promising technique to identify species-distinguishing characters in the absence of binary traits or molecular genetic analyses. Leaves from across these species’ ranges should be analyzed to test the robustness of the species-distinguishing characters we present.
Methods
Leaf collections
We identified three nature reserves (sites) on the University of Wisconsin – Madison campus in Madison, Wisconsin, USA: Muir Woods, Bill’s Woods, and Picnic Point Marsh (Fig. 1 in published paper) that contained co-occurring populations of I. capensis and I. pallida. These sites varied in light, soil moisture, and likely genetic composition. Within each site, we collected 3 leaves from each of 5-10 randomly selected plants of each species within each of 3 sub-locations (areas). We sampled until we reached 25 plants of each species in each site. We collected from areas where both species were growing in intermixed stands to standardize the range of environmental variability sampled across species. We could only find two areas of intermixed Impatiens in Bill’s Woods so we sampled two additional areas where only one of the species was present. We could only find two areas of I. capensis to sample in Muir Woods.
We collected leaves from three standardized locations within each plant (Supp. Fig. 1 in published paper) to test for differences in leaf shape or size depending on where they were growing on the plant (essentially leaf height, age, and susceptibility to herbivory and disease). We defined leaf position 1 as the lowest leaf on the plant borne on the main stem. The second leaf came from the middle of a branch that diverged from the main stem in the leafiest region of the plant. The third leaf was the fully expanded leaf closest to the top of the plant on the main stem. We collected leaves into envelopes labeled with the leaf position, species, site, and area and pressed them in a plant press. All leaves were collected between August and September of 2017.
Leaf scanning and image processing
We scanned pressed leaves one at a time using a CanoScan 8800F desktop scanner at a resolution of 300dpi in color photo mode with auto exposure settings and saved the scans in TIFF format. All leaves were scanned with the leaf tip positioned at twelve o’clock on the scanner bed.
Using the program FIJI (Schindelin et al. 2012), we converted leaf scans into binary black and white images. We removed any leaves that did not have entire margins and filled in any interior holes using the black paint tool. We removed leaf petioles in the scans by painting over them with the white paint tool (petioles were torn at variable lengths when collected). We retained a total of 342 leaf blade silhouettes for morphometric analysis.
The exposure settings on our original scans were not standardized to allow for meaningful comparisons of color, so we re-scanned “leaf 2” from 2 randomly selected individuals from each area and species (N = 34) to investigate differences in leaf color. We used leaf 2 for the color analysis because it was most representative of leaf color on the plant as a whole based on our field observations. Often, leaf 1 was partially senesced and leaf 3 was too young to have developed full color. We used a color card to ensure the color parameters of all leaves were standardized across scans. Using FIJI, we adjusted color threshold values to exclude the ink markings on each leaf (denoting leaf number) from the analysis. We then generated a distribution of the RGB values present in each leaf (FIJI: Analyze, Color Histogram) and calculated the mean, mode, and variance of this leaf color distribution. We chose the mode (as opposed to the mean) because it is not influenced by minor blemishes and imperfections on the leaf surface and is likely closer to the color we perceive than the mean value.
Usage notes
README for uploaded files:
master_list_of_all_samples.csv
This is a .csv file containing meta-data for each leaf in this study (essentially the master sample naming key with associated meta-data). Each row represents a unique leaf.
Columns are as follows:
image_file_name: name of the leaf scan image file used to obtain leaf trait data from
species: value of Impatiens_capensis or Impatiens_pallida denoting which species the leaf belongs to
collection_year: value of 2014 or 2017 denoting the year that the leaf was collected
site: codes representing the location that leaves were collected (can be thought of as population level labels)
area: letter code representing the sub sampling location within each site, unique within each site but not across sites
plant_id: numerical identifier for each individual plant that leaves were collected from, unique within areas but not across areas or sites
leaf_position: value of 1, 2, 3, or not_standardized, representing the location on each plant that a leaf was collected from
in_morpho_analysis_2017_allusable: value of yes or no denoting if the leaf was included in the analysis of all leaves collected for this study
in_morpho_analysis_2017_onlyleaf2: value of yes or no denoting if the leaf was included in the analysis of leaves collected for this study from leaf position #2
in_morpho_analysis_2017_onlyleaf3: value of yes or no denoting if the leaf was included in the analysis of leaves collected for this study from leaf position #3
in_morpho_analysis_2014_alluseable: value of yes or no denoting if the leaf was included in the analysis of leaves collected in 2014 for other studies (used here as a validation set)
in_morpho_analysis_2017_predicting_2014: value of yes or no denoting if the leaf was included in the analysis using leaves collected for this study ("2017 leaves") to predict species labels of leaves collected for other studies ("2014 leaves").
Notes:
Leaves that have a "no" in the column "in_morpho_analysis_2017_predicting_2014" were damaged between collection and scanning (e.g. leaf dried folded over inside envelope).
Leaves collected for this study are sometimes called 2017 leaves, representing the year they were collected. This was done to distinguish them from leaves collected in 2014 for a different project that we used a random subset of to test robustness of models generated using leaves collected for this study specifically.
Leaves collected for this study were collected from one of 3 standardized positions on the plants - 1, 2, or 3. See Supp. Fig. 1 for a visualization of these leaf positions. Leaf position 1: the lowest leaf on the plant born on the main stem; leaf position 2: a leaf from the middle of a branch that branched off of the main stem (in the leafiest region of the plant); leaf position 3: the fully expanded leaf closest to the top of the plant on the main stem.
The leaves collected in 2014 were collected from random locations within each plant (so no leaf position data exist).
Leaf_scans-Original.tar.gz
A zipped file containing all of the original leaf scan images. We scanned leaves one at a time using a CanoScan 8800F desktop scanner at a resolution of 300dpi in color photo mode with auto exposure settings and saved the scans in TIFF format. All leaves were scanned with the leaf tip positioned at twelve o’clock on the scanner bed. Leaves were pressed and air dried between sheets of cardboard prior to scanning. Meaningful quantitative color comparisons cannot be made using these images because we used the auto exposure setting. File names can be matched to "image_file_name" column in "master_list_of_all_samples.csv". N = 542 unique leaves (files).
Leaf_scans-Processed.tar.gz
A zipped file containing all of the processed leaf scan images. Images are batched into subfolders that reflect the different analyses conducted in the paper (based on collection year and/or leaf position). Each subfolder contains a Groups.csv file that is used by the R script to assign metadata to each image and the images that were used in that analysis. Using the program FIJI (Schindelin et al. 2012), we converted leaf scans ("Leaf_scans-Original.tar.gz") into these binary black and white images. We removed any leaves that did not have entire margins and filled in any interior holes using the black paint tool. We removed leaf petioles in the scans by painting over them with the white paint tool (petioles were torn at variable lengths when collected). We rotated images by hand when needed so that all leaves were aligned with their apex and base along an imaginary vertical axis. These black and white image files were then processed with the R script "morphometric_analysis_leaves.R" to produce the leaf trait data in "leaf_trait_data_files.tar.gz".
Final_Images_JPEG-rotated/
Leaf silhouettes for all of the leaves collected for this study ("2017 leaves") from all leaf positions. N = 342 unique leaves.
Final_Images_JPEG-rotated-onlyleaf2/
Leaf silhouettes for all of the leaves collected for this study ("2017 leaves") from leaf position #2. N = 129 unique leaves.
Final_Images_JPEG-rotated-onlyleaf3/
Leaf silhouettes for all of the leaves collected for this study ("2017 leaves") from leaf position #3. N = 132 unique leaves.
Final_Images_JPEG-rotated-RTandHW/
Leaf silhouettes for all of the leaves collected for this study ("2017 leaves") and a random subset of the leaves collected for other studies (called "2014 leaves" in the paper) from all leaf positions (positions not standardized in 2014). N = 342 + 167 = 509 unique leaves.
Final_Images_JPEG-rotated-fromRT/
Leaf silhouettes for a random subset of the leaves collected for other studies (called "2014 leaves" in the paper) from random, not standardized leaf positions. N = 167 unique leaves.
Leaf_scans-subset-Standardized_Color_for_RGB_analysis.tar.gz
A zipped file containing the random subset of leaves from leaf position #2 that were rescanned with a standardized color setting and used to measure leaf color. Using FIJI, we adjusted color threshold values to exclude the ink markings on each leaf (denoting leaf number) from the analysis. We then generated a distribution of the RGB values present in each leaf (FIJI: Analyze, Color Histogram) and calculated the mean, mode, and variance of this leaf color distribution (these data stored inside leaf_trait_data_files.tar.gz). N = 33 unique leaves (files).
leaf_trait_data_files.tar.gz
A zipped file containing all of the leaf trait data files that were used in the statistical analyses for this paper and a detailed README file describing each file and each data column within data files. All leaf size and shape data were generated by running the R script, "morphometric_analysis_leaves.R", which reads in black and white silhouettes of individual leaves (Leaf_scans-Processed.tar.gz) and calculates various size and shape metrics. As some data (shape traits) were generated using principal components analysis, a data file exists for each set of leaves used for analyses (because the principal component scores change depending on which leaves are including in the analysis). That is, measurements like length and width are constant throughout these data files for an individual leaf, but principal component values change depending on the leaves included in the analysis. Only leaves with usable scans are included in these analyses (some leaves got damaged - e.g. folded - between collection and scanning).
morphometric_analysis_leaves.R
R script used to generate leaf trait data (leaf_trait_data_files.tar.gz) from black and white leaf silhouettes (Leaf_scans-Processed.tar.gz). Primarily uses the R package Momocs.