Skip to main content

King Rail and Common Moorhen egg pattern matching data

Cite this dataset

McRae, Susan B.; Johnson, Emily W. (2021). King Rail and Common Moorhen egg pattern matching data [Dataset]. Dryad.


This dataset is a combination of egg images and processed output on pattern matching of eggshell surfaces from the images using NaturePatternMatch (NPM), additional data extracted from NPM, and field data. The data are separated into four folders based on species and analyses performed. There are two folders of scaled single egg images, one for King Rails, Rallus elegans, and one for Common Moorhens, Gallinula chloropus chloropus. Each photograph is identified by year, clutch and egg identity. Most of the Common Moorhen eggs are numbered in the order of their laying sequence, and they are further identified by laying hen. The NMDS folder includes NPM matching output and clutch identities needed to perform NMDS and PERMANOVA analyses. It includes consolidated and organized output from NPM for each species as well as files with clutch names needed to create merged datasets for graphing. These data were used to perform non-metric multidimensional scaling (NMDS) and permutational analysis of variance (PERMANOVA). The Linear Discriminant Analyses folder includes field data (egg length, width, and identity), pattern data extracted from NPM, as well as the estimated proportion of pigment measured within a scaled oval on binary images of eggs. This was used as a proxy for the relative amount of pigmentation on each egg. These data were used to conduct linear discriminant analyses for each species.

Usage notes

1. Scaled individual egg images can be found in two folders: KingRailScaledIndividualEggPhotos and CommonMoorhenScaledIndividualEggPhotos
A. King Rail egg files are labeled in the following format: YYC##E## where YY is the last two digits of the year in which an egg was found, C signifies clutch, ## is the number of the clutch, E signifies egg, and ## signifies egg identification number within a clutch. For example, a file labeled 19C21E2 would indicate that the year of laying was 2019, the assigned clutch ID was 21 and the egg identification number was 2. In most cases, the laying sequence of the eggs was not known.
B. Common Moorhen egg files are labeled in the following format: YYC##H##E## where YY is the last two digits of the year in which an egg was found, C signifies clutch, ## is the number of the clutch, H stands for hen and ## is the hen's identification number, E signifies egg, and ## stands for egg identification number. In many cases, the egg identification number signified also the laying sequence because laying order was known at many nests. For example, a file labeled 92C13H3E4 would indicate that the year of laying was 1992, the assigned clutch ID was 13, the egg was laid by hen 3, and the egg sequence number in the clutch was 4.

2. NMDS files: Two file types are used for the NMDS and PERMANOVA analyses. The first files contain the pairwise NPM output. The second set of files are lists of clutches needed to create merged datasets for graphing in R. Numbers in a file name indicate a year, such as 2014, and files are delineated as “by clutch” or “indivfolders” standing for individual folders to indicate how the data were run through NPM. A file with “all”, “NS”, or “SS” indicate the following respectively: all data for a species from all years grouped in one file, all data from the North side of the refuge, and all data from the South side of the refuge. The date is also added to the end of all file names in the day month year format. Column titles of “Var1” and “Var2” correspond to the query egg and the best match respectively. “Value” refers to the matching coefficient generated by NPM for the quality of the match. Clutches in the files are named as follows: YYC#, where YY indicates the last two digits of the year the nest was found in, C indicates that it is referring to a clutch, and # is the number assigned to the clutch when found. Moorhen nests are named slightly differently with the following convention: YY-###-H where YY is the last two digits of the year the nest was found in, ### indicates the number assigned to the clutch when found, and H indicates the number assigned to the hen that laid the clutch.

3. LDA files: Files for linear discriminant analyses are named following this convention: shortened species name_lda_DDMMYYYY, where the shortened species name is KIRA for King Rail or MOOR for Common Moorhens. The date is also added in the day month year format. Columns within the files: Clutch_wHen indicates the a clutch ID where there is first a two letter code for the year then a two or three number code for the clutch and finally a one to three number code for the hen. Egg_Num refers to the individual egg sequence within a given clutch. Number black refers to the number of black pixels counted. % Black is the percentage of the measured area of the eggshell that had black pixels. Mass is the fresh weight of the egg in grams. Length and width are the measurements of each egg in mm. Volume is the volume of each egg in mL. Diameter is the calculated diameter of each egg. Number_Features refers to the number of features NPM detected on each eggshell’s surface. Scale_largest_feature refers to the size in arbitrary units of the largest feature detected on the eggshell’s surface by NPM. Dominant_orientation refers to the main direction in which the largest feature detected was oriented on the eggshell as measured by NPM.


U.S. Fish and Wildlife Service, Piedmont South Atlantic Coast Cooperative Ecosystems Studies Unit Agreement, Award: F19AC00629

Association of Field Ornithologists, Award: E. Alexander Bergstrom Memorial Research Award