Evolutionary insights into Felidae iris color through ancestral state reconstruction

Published Feb 15, 2023; Updated May 28, 2024 on Dryad. https://doi.org/10.5061/dryad.s4mw6m9b0

Abstract

There have been very few studies with an evolutionary perspective on eye (iris) color, outside of humans and domesticated animals. Extant members of the family Felidae have a great interspecific and intraspecific diversity of eye colors, in stark contrast to their closest relatives, all of which have only brown eyes. This makes the felids a great model to investigate the evolution of eye color in natural populations. Through machine learning image analysis of publicly available photographs of all felid species, as well as a number of subspecies, five felid eye colors were identified: brown, green, yellow, gray, and blue. Using phylogenetic comparative methods, the presence or absence of these colors was reconstructed on a phylogeny. Additionally, through a new color analysis method, the specific shades of the ancestors’ eyes were quantitatively reconstructed. The ancestral felid population was predicted to have brown-eyed individuals, as well as a novel evolution of gray-eyed individuals, the latter being a key innovation that allowed the rapid diversification of eye color seen in modern felids, including numerous gains and losses of different eye colors. It was also found that the gain of yellow eyes is highly associated with, and may be necessary for, the evolution of round pupils in felids, which may influence the shades present in the eyes in turn. Along with these important insights, the methods presented in this work are widely applicable and will facilitate future research into phylogenetic reconstruction of color beyond irises.

Provenance for this README

File name: README_FelidDataset.md
Authors: Julius A. Tabin
Other contributors: Katherine A. Chiasson
Date created: 2023-02-15
Date most recently modified: 2024-05-27

Dataset Version and Release History

Current Version:
- Number: 2.5.0
- Date: 2024-05-27
- Persistent identifier: DOI: 10.5061/dryad.s4mw6m9b0
- Summary of changes: Updated to reflect an improved analysis pipeline
Embargo Provenance: n/a
- Scope of embargo: n/a
- Embargo period: n/a

Dataset Attribution and Usage

Dataset Title: Data for the article "Evolutionary Insights Into Felidae Iris Color Through Ancestral State Reconstruction"
Persistent Identifier: https://doi.org/10.5061/dryad.s4mw6m9b0
Dataset Contributors:
- Creators: Julius A. Tabin and Katherine A. Chiasson
Date of Issue: 2023-02-15
License: Use of these data from Zenodo or GitHub is covered by the following license:
- Title: GNU General Public License v3.0
- Specification: https://www.gnu.org/licenses/gpl-3.0.en.html
License: Use of these data from Dryad is covered by the following license:
- Title: CC0 1.0 Universal (CC0 1.0)
- Specification: https://creativecommons.org/publicdomain/zero/1.0/
Data Reuse
- The authors respectfully request to be contacted by researchers interested in the reuse of these data so that the possibility of collaboration can be discussed.
Suggested Citations:
- Dataset citation:
  
  Tabin J.A. and K.A. Chiasson. 2024. Data for the article "Evolutionary Insights Into Felidae Iris Color Through Ancestral State Reconstruction", Dryad, Dataset, https://doi.org/10.5061/dryad.s4mw6m9b0
- Corresponding publication:
  
  Tabin J.A. and K.A. Chiasson. 2024. Evolutionary Insights Into Felidae Iris Color Through Ancestral State Reconstruction. iScience. Submitted.

Contact Information

Name: Julius A. Tabin
Affiliations: Department of Organismic and Evolutionary Biology, Harvard University
ORCID ID: https://orcid.org/0000-0002-3591-6620
Email: jtabin1@gmail.com
Alternate Email: jtabin@g.harvard.edu
Address: e-mail preferred
Contributor ORCID IDs:
- Julius A. Tabin: https://orcid.org/0000-0002-3591-6620
- Katherine A. Chiasson: https://orcid.org/0000-0002-9729-1718

Additional Dataset Metadata

Acknowledgements

Funding sources: This work was supported in part by a graduate stipend from the Department of Organismic and Evolutionary Biology at Harvard University.
Formatting for this README file is based on the README file of LaPergola, J.B., C. Riehl, J.E. Martínez-Gómez, B. Roldán-Clarà, and R.L. Curry. 2022. Data for the article "Extra-pair paternity correlates with genetic diversity, but not breeding density, in a Neotropical passerine, the Black Catbird", Dryad, Dataset, https://doi.org/10.5061/dryad.2bvq83btg

Methodological Information

Methods of data collection/generation: see manuscript for details

Data and File Overview

Summary Metrics

File count: 33
Range of individual file sizes: 1.0 KB - 42 MB
File formats: .csv, .Rmd, .ipynb, .pdf, .xlsx, .NEX

Tabin_Chiasson_Supplemental_Results.pdf
Tabin_Chiasson_Supplemental_Figures.pdf
Tabin_Chiasson_Supplemental_Table_1.xlsx
Tabin_Chiasson_Supplemental_Table_2.xlsx
Tabin_Chiasson_Supplemental_Table_3.xlsx
Data Collection Script.ipynb
Color Polymorphism Assessment.ipynb
Felid LMM.Rmd
Color Presence Reconstruction.Rmd
Quantitative Color Reconstruction.Rmd
Output Specific Colors.ipynb
Find Correlations.Rmd
Carnivore_phylo_Nyakatura2012.NEX
enviro_data.csv
enviro_data_onlytree.csv
poly_data_main_subset.csv
poly_data_subset.csv
general_data_reordered_withsub.csv
Tip_col_data.csv
Node_col_data.csv
general_data_brown_only.csv
general_data_green_only.csv
general_data_yellow_only.csv
general_data_gray_only.csv
general_data_blue_only.csv
general_data_reordered.csv
col_data.csv
col_data_nosub
col_data_onlytree
dom_col_data.csv
dom_col_data_nosub.csv
dom_col_data_onlytree.csv
Felidae_all_pixel_colors.csv

Setup

Unpacking instructions: n/a
Recommended software/tools: Python version 3.8.8; RStudio 2021.05.24; R version 4.2.1
Raw data files used for this analysis can be found at https://github.com/jtabin/Felid-Eyes

Notes

All cells left empty in any data file are because there is no data present there for that taxon. The programs for analysing the data have been designed with these gaps in mind and the gaps are intentional; data is not missing.
For some files below, there are columns grouped corresponding to each of the five eye colors identified in the study: brown, green (includes hazel, so is referred to as "hazgre" in some files), yellow (includes beige, so is referred to as "yelbei" in some files), gray, and blue. In the follow data description, an x will stand in for any color name.

File Details

Details for: Tabin_Chiasson_Supplemental_Results.docx

Description: a Word document containing the supplemental methods and results for "Evolutionary Insights Into Felidae Iris Color Through Ancestral State Reconstruction"
Format(s): .docx

Details for: Tabin_Chiasson_Supplemental_Figures.pdf

Description: a .pdf file containing the supplemental figures and figure captions for "Evolutionary Insights Into Felidae Iris Color Through Ancestral State Reconstruction"
Format(s): .pdf

Details for: Tabin_Chiasson_Supplemental_Table_1.xlsx

Description: an Excel sheet containing Supplemental Table 1 for "Evolutionary Insights Into Felidae Iris Color Through Ancestral State Reconstruction". The table has been split up for clarity, each part corresponding to p-values for one part of the paper's polymorphism analysis: Satterthwaite's t-test for the linear mixed models, the post-hoc Tukey test for the linear mixed models, the Kruskal-Wallis test, and the two-sided Mann-Whitney-Wilcoxon test.
Format(s): .xlsx
Variables:
- Test: The statistical test used for the following section of the table
- Comparison: The variable being compared using the statistical test
- PCx Bonferroni Adjusted P-value: The Bonferroni adjusted p-value for each principal component
- Species: The species being compared (this column is only present/necessary for the Mann-Whitney-Wilcoxon test)

Details for: Tabin_Chiasson_Supplemental_Table_2.xlsx

Description: an Excel sheet containing Supplemental Table 2 for "Evolutionary Insights Into Felidae Iris Color Through Ancestral State Reconstruction"
Format(s): .xlsx
Variables:
- Scientific Name: Taxon name
- Common Name: The common name of each taxon
- Number of Images: The number of images used for data collection

Details for: Tabin_Chiasson_Supplemental_Table_3.xlsx

Description: an Excel sheet containing Supplemental Table 3 for "Evolutionary Insights Into Felidae Iris Color Through Ancestral State Reconstruction". The table has been split up for clarity, each part corresponding to AIC values for one part of the paper's analysis: the main phylogeny, the main phylogeny with only the most common colors condidered, and the full phylogeny, including all the subspecies. These values are repeated for each of the five eye colors (brown, green, yellow, gray, and blue), indicated by headers to each section's left.
Format(s): .xlsx
Variables:
- Model: the phylogenetic model of trait evolution
- logLik: the log likelihood of that model
- df: the degrees of freedom
- AIC: the calculated AIC value
- Weight: the proportion of total predictive power of the model

Details for: Data Collection Script.ipynb

Description: a Jupyter Notebook file containing code to take a folder of iris images for a species as its input and outputs which colors are present and their various shades. Some of this must be manually determined according to the methods outlined in "Evolutionary Insights Into Felidae Iris Color Through Ancestral State Reconstruction".
Format(s): .ipynb

Details for: Color Polymorphism Assessment.ipynb

Description: a Jupyter Notebook file containing code to take "Felidae_all_pixel_colors.csv", a list of all the images in the data set and the colors of their pixels, to assess the validity of a polymorphic interpretation of iris color. The output corresponds to figures S1 and S2 in "Evolutionary Insights Into Felidae Iris Color Through Ancestral State Reconstruction".
Format(s): .ipynb

Details for: Felid LMM.Rmd

Description: an R Markdown file containing code to take "Felidae_all_pixel_colors.csv", a list of all the images in the data set and the colors of their pixels, to create a linear mixed model and perform statistical tests of the validity of a polymorphic interpretation of iris color. The output corresponds to the data in Tabin_Chiasson_Supplemental_Table_1.xlsx.
Format(s): .Rmd

Details for: Color Presence Reconstruction.Rmd

Description: an R Markdown file containing code for reconstructing which color eyes were present in the populations of each ancestor. This takes the output of "Data Collection Script.ipynb" as its input and outputs the reconstructions at each phylogenetic node, along with figures.
Format(s): .Rmd

Details for: Quantitative Color Reconstruction.Rmd

Description: an R Markdown file containing code for performing the reconstruction for the more specific colors (i.e. not just whether or not a color is present, but what its shades were quantitatively). This also takes the output of "Data Collection Script.ipynb" as its input and outputs the reconstructions at each phylogenetic node, along with figures.
Format(s): .Rmd

Details for: Output Specific Colors.ipynb

Description: a Jupyter Notebook file containing code for transforming the RGB .csv output of "Data Collection Script.ipynb" and "Specific Color Reconstruction.Rmd" into colorful images for figure creation. It also takes "shade_bg.png" as an background input, provided in the https://github.com/jtabin/Felid-Eyes repository.
Format(s): .ipynb

Details for: Find Correlations.Rmd

Description: an R Markdown file containing code for taking the output of "Data Collection Script.ipynb" and performing phylogenetic and tetrachoric correlations, resulting in raw data, as well as figures.
Format(s): .Rmd

Details for: Carnivore_phylo_Nyakatura2012.NEX

Description: a NEXUS file containing the species-level supertree for Carnivora used in this paper. It was taken from Nyakatura, K, Bininda-Emonds ORP. 2012. Updating the evolutionary history of Carnivora (Mammalia): a new species-level supertree complete with divergence time estimates. BMC Biology 10: 1-31.
Format(s): .NEX

Details for: Tip_col_data.csv

Description: a comma-delimited file containing the eye color information for the tips of the phylogenetic tree loaded in "Color Presence Reconstruction.Rmd" and "Specific Color Reconstruction.Rmd". This is the input to those files.
Format(s): .csv
Variables:
- Rename: Taxon name
- x_col_num: How many distinct shades of a certain color appear for that species/node (determined by "Data Collection Script.ipynb")
- x_order: The order of shades in the eyes (i.e. which shades are more abundant). This is 1-4 letters (d, m, and l), corresponding to dark, medium, and light shades. Thus, if a cell contained mdl, then the order of shade abundance in the eye of that row's taxon would be medium > dark > light.
- x_pri and x_sec: The primary and secondary shades in the eye individually. For the mdl example, x_pri would contain m and x_sec would contain d.
- x_light_R, x_light_G, and x_light_B: The red, green, and blue RGB values, respectively, for the light shade, provided it exists.
- x_med_R, x_med_G, and x_med_B: The red, green, and blue RGB values, respectively, for the medium shade, provided it exists.
- x_dark_R, x_dark_G, and x_dark_B: The red, green, and blue RGB values, respectively, for the dark shade, provided it exists.
- x_excl_R, x_excl_G, and x_excl_B: The red, green, and blue RGB values, respectively, for a rare fourth color, if it exists for some species, that was excluded in the comparative analyses.

Details for: Node_col_data.csv

Description: a comma-delimited file containing the output of the phylogenetic reconstructions done by the programs.
Format(s): .csv
Variables:
- Node: The node ID (ordered as the R package ape orders the nodes, with 1 being the common ancestor of the whole tree)
- Other columns are identical to those for Tip_col_data.csv above.

Details for: general_data_brown_only.csv, general_data_green_only.csv, general_data_yellow_only.csv, general_data_gray_only.csv, general_data_blue_only.csv, general_data_reordered.csv, and general_data_reordered_withsub.csv

Description: comma-delimited files containing subsets of the "Tip_col_data.csv" file, which are better for comparisons using "Specific Color Reconstruction.Rmd".
Format(s): .csv
Variables:
- x_ter and x_qua: The tertiary and quaternary shades in the eye individually. The _R, _G, _B suffixes indicate the red, green, and blue RGB values, respectively.
- Other columns are identical to those for Tip_col_data.csv above.

Details for: col_data.csv, dom_col_data.csv, col_data_nosub.csv, dom_col_data_nosub.csv, col_data_onlytree.csv, and dom_col_data_onlytree.csv

Description: comma-delimited files containing just the presence or absence of each overall eye color for each felid taxon considered in the study. Any file beginning with "dom_" just contains the most common eye colors, determined using our experimental methods. Any file ending with "_nosub" only has species (with subspecies removed) and any file ending with "_onlytree" only has species that appear on the Nyakatura and Bininda-Emonds (2012) Carnivora tree.
Format(s): .csv
Variables:
- Rename: Taxon name
- x_pres: Whether the taxon for each row contains that color eyes in its population (1 for yes, 0 for no).

Details for: poly_data_subset.csv and poly_data_main_subset.csv

Description: comma-delimited files containing the presence or absence of each overall eye color for each felid taxon considered in the study, formatted as a polymorphic trait. "poly_data_main_subset.csv" is the same, but only containing the most common eye colors, determined using our experimental methods.
Format(s): .csv
Variables:
- Rename: Taxon name
- color: Which eye colors are present for that taxon, separated by + signs.

Details for: enviro_data.csv and enviro_data_onlytree.csv

Description: a comma-delimited file containing the environmental/morphological data collected and made into parameters using our methods and supplemental methods. "enviro_data_onlytree.csv" only has species that appear on the Nyakatura and Bininda-Emonds (2012) Carnivora tree.
Format(s): .csv
Variables:
- Pupil_type: The pupil information for each species looked at in the study (i.e. whether they have round, vertical, or subcircular pupils).
- Pupil_type_bin: The pupil information as numbers: 0 = vertical, 1 = subcircular, and 2 = round.
- Pupil_type_revised: The pupil information for each species looked at in the study with subcircular pupils considered vertical (i.e. whether they have round or vertical pupils).
- Pupil_type_revised_bin: The pupil information, with subcircular pupils considered vertical, as numbers: 0 = vertical, 1 = round.
- Activity_type: The animal's observed activity habits (diurnal, nocturnal, and/or crepuscular) from the University of Michigan Animal Diversity Web.
- Nocturnal, Crepuscular, Diurnal: Each corresponds to one activity mode with a 1 if that activity mode is present and a 0 if it is absent.
- Nocturnal_prop: A metric for how nocturnal the animal is with a 3 for fully nocturnal, 2 if there is one other activity mode, 1 if there are two others, and 0 if the animal isn't nocturnal.
- Region: Data on the zoogeographical region that each species is mainly found in (ethiopian, oriental, palearctic, nearctic, or neotropical). The regions and names are from the paper Johnson WE., Eizirik E, Pecon-Slattery J, Murphy WJ, Antunes A, Teeling E, O'Brien SJ. 2006. The late Miocene radiation of modern Felidae: a genetic assessment. Science 311(5757):73-77.
- Ethiopian, Oriental, Palearctic, Nearctic, Neotropical: Each corresponds to one zoogeographical region with a 1 if the animal is present in the area and a 0 if it is absent.
- Habitat: The animal's main habitat(s), determined by the University of Michigan Animal Diversity Web. Non-mutually exclusive possible options are desert, forest, savanna, mountains, rainforest, swamp, marsh, tundra, and taiga.
- Desert, Savanna, Forest, Rainforest, Forest_Rainforest, Mountains: Each corresponds to one habitat with a 1 if the animal is present in the habitat and a 0 if it is absent. Forest_Rainforest is either/or Forest and Rainforest.
- Habitat_num: The number of different habitats occupied by the animals.
- Low_elevation_m, High_elevation_m: The lowest and highest elevation each taxon has been observed in (in meters). IMPORTANT: THIS DATA IS INCOMPLETE!
- Length_low_cm, Length_high_cm, Length_avg_cm: Low, high, and average body length in cm. IMPORTANT: THIS DATA IS INCOMPLETE!
- Skull_Length_mm: The skull length in mm. IMPORTANT: THIS DATA IS INCOMPLETE!
- Mating: The mating system (promiscuous, polygynous, and/or monogamous).
- Coat_pattern: Data on the coat pattern of each taxon: flecks, uniform, stripes, sblotch (small blotches), rosettes, and/or blotches. This is based on Werdelin L, Olsson L. 1997. How the leopard got its spots: a phylogenetic view of the evolution of felid coat patterns. Biol. J. Linn. Soc. 62(3):383-400.
- Flecks, Uniform, Stripes, SBlotch, Rosettes, Blotches: Each corresponds to one coat pattern with a 1 if the animal has that pattern and a 0 if it doesn't.
- Black_body_morethaneye: This has a 1 if there is black fur on the animal's body of greater area than the animal's eye and a 0 if it doesn't have that.
- Black_tail_morethaneye: This has a 1 if there is black fur on the animal's tail of greater area than the animal's eye and a 0 if it doesn't have that.
- Nose_color: Whether the animal has a black or pink nose.
- Nose_black, Nose_pink: Each corresponds to one nose color with a 1 if the animal has that color and a 0 if it doesn't.
- Hybridization: A list of the species that each animal has been seen to hybridize with in the modern day.
- Ancient Hybridization: A list of the species that each animal is hypothesized to have hybridized with historically.

Details for: Felidae_all_pixel_colors.csv

Description: a comma-delimited file containing the RGB values for every pixel of every image in the data set, with each row corresponding to one pixel. This is loaded into "Color Polymorphism Assessment.ipynb" and "Felid LMM.Rmd"
Format(s): .csv
Variables:
- R: The red RGB value (from 0-255)
- G: The green RGB value (from 0-255)
- B: The blue RGB value (from 0-255)
- Color: The identified color for the overall eye the pixel comes from (determined by "Data Collection Script.ipynb"
- Species: The species for the overall eye the pixel comes from
- Folder: The file path, including the image filename for the overall eye the pixel comes from

END OF README

Data set

In order to sample all felid species, we took advantage of public databases. Images of individuals from 40 extant felid species (all but Felis catus, excluded due to the artificial selection on eye color in domesticated cats by humans), as well as 12 identifiable subspecies and four outgroups (banded linsang, Prionodon linsang; spotted hyena, Crocuta crocuta; common genet, Genetta genetta; and fennec fox, Vulpes zerda), were found using Google Images and iNaturalist using both the scientific name and the common name for each species as search terms. This approach, taking advantage of the enormous resource of publicly available images, allows access to a much larger data set than in the published scientific literature or than would be possible to obtain de novo for this study. Public image-based methods for character state classification have been used previously, such as in a phylogenetic analysis of felid coat patterns (Werdelin and Olsson 1997) and a catalog of iris color variation in the white-browed scrubwren (Cake 2019). However, this approach does require implementing strong criteria for selecting images.

Criteria used to choose images included selecting images where the animal was facing towards the camera, at least one eye was unobstructed, the animal was a non-senescent adult, and the eye was not in direct light (causing glare) or completely in shadow (causing unwanted darkening). The taxonomic identity of the animal in each selected image was verified through images present in the literature, as well as the “research grade” section of iNaturalist. When possible, we collected five images per taxon, although some rarer taxa had fewer than five acceptable images available. In addition, some species with a large number of eye colors needed more than five images to capture their variation, determined by quantitative methods discussed below. Each of the 56 taxa and the number of images used are given in Supplementary Table 2.

Once the images were selected, they were manually edited using MacOS Preview. This editing process involved choosing the “better” of the two eyes for each image (i.e. the one that is most visible and with the least glare and shadow). Then, the section of the iris for that eye without obstruction, such as glare, shadow, or fur, was cropped out. An example of this is given in Figure S11. The strict selection criteria and image editing eliminated the need to color correct the images, a process that can introduce additional subjectivity; the consistency of the data can be seen in the lack of variation between eyes identified as the same color (Figure S5). This process resulted in a data set of 290 cropped, standardized, irises. These images, along with the original photos, can be found in the Supplementary Material.

Eye color identification

To impartially identify the eye color(s) present in each felid population, the data set images were loaded by species into Python (version 3.8.8) using the Python Imaging Library (PIL) (Van Rossum and Drake 2009; Clark 2015). For each image, the red, green, and blue (RGB) values for each of its pixels were extracted. Then, they were averaged and the associated hex color code for the average R, G, and B values was printed. The color associated with this code was identified using curated and open source color identification programs (Aerne 2022; Cooper 2022). There is no universally agreed upon list of colors, since exact naming conventions differ on an individual and cultural basis, but these programs offer a workable solution, consisting of tens of thousands of colors names derived from published, corporate, and governmental sources. This data allowed the color of each eye in the data set to be impartially assigned, removing a great deal of the bias inherent in a researcher subjectively deciding the color of each iris.

Eye colors were assigned on this basis to one of five fundamental color groups: brown, green (including hazel), yellow (including beige), gray, and blue. The possible color groups were determined before observation of the data based on basic color categories established in the literature: white, black, red, green, yellow, blue, brown, purple, pink, orange, and gray (Berlin and Kay 1991). Of course, not all of the eleven categories ended up being represented by any irises; no irises were observed to be white, black, red, purple, pink, or orange.

As an example of this method, if an iris’s color had the RGB values R: 114, G: 160, B: 193, this would correspond to the hex code #72A0C1. This hex code, when put into the color identification programs, results in the identification “Air Superiority Blue”, derived from the British Royal Air Force’s official flag specifications (Cooper 2022; Aerne 2022). Based on the identification, this iris would be added to the “blue” color group, bypassing a researcher having to choose the color themself. If a color’s name did not already contain one of the eleven aforementioned color categories, the name was searched for in the Inter-Society Color Council-National Bureau of Standards (ISCC–NBS) System of Color Designation (Judd and Kelly 1939). For instance, the color with RGB values R: 37, G: 29, B: 14 corresponds to hex code #251D0E, identified as “Burnt Coffee” by the color identification programs. The ISCC–NBS descriptor for this color is “moderate brown”, so the color would be added to the “brown” group. All colors were able to be placed directly from their color name or their ISCC–NBS descriptor and, for colors with both a color category in the name and an ISCC–NBS descriptor, there were no instances in which the two conflicted.

While color itself lies on a spectrum, splitting the colors into discrete fundamental groups is the most tractable approach to analyze eye color in a biologically reasonable way. If every eye color was instead taken together on one spectrum and analyzed as a continuous trait, the results would be highly unrealistic. As an example, if there were two sister taxa, one with blue eyes (R: 0, G: 0, B: 139) and one with brown eyes (R: 150, G: 75, B: 0), a continuous reconstruction would assign the ancestor the intermediate eye color in the color space: R: 75, G: 37, B: 69. However, this color is firmly within the “purple” category. It is highly unlikely that a recent ancestor of two taxa with blue and brown eyes had purple eyes, rather than blue eyes, brown eyes, or both, which would be the result if blue and brown were considered as separate categories. Indeed, one would run into the same issue if categories were removed at an earlier stage and each taxon was only considered to have one eye color, determined by averaging all irises. A taxon with blue and brown eyes would again be said to have purple eyes, a color which none of the members of that taxon have. The data being separated into color groups is the most realistic way to investigate this trait, preventing the loss of variation present in the natural populations and simultaneously creating impossible analyses. The lines between color categories are not always clear to an observer (e.g. grayish-blues and bluish-grays can look alike) and, no matter how they are defined, they may still be arbitrary. Nevertheless, this is why we used color identification programs, impartially defining the lines to make the analysis possible.

To ensure no data was missed due to low sample size, the first 500 Google Images, as well as all the “research grade” images on iNaturalist, were manually viewed for each species, while referring back to already analyzed data and periodically checked with the color identification programs (Aerne 2022; Cooper 2022). Any missed colors were added to the data set. This method nonetheless has a small, but non-zero, chance to miss rare eye colors that are present in species. However, overall, it provides a robust and repeatable way to identify the general iris colors present in animals.

In addition, if, for a given species, one, two, or three eye colors were greatly predominant in the available data online (i.e. the first 500 Google Images, as well as all the “research grade” images on iNaturalist), they were defined as being the most common eye color(s). For three colors to be considered the most common, each color had to be present for >26.6% of the images. For two colors, each had to be present for >40% of the images. If neither of these conditions were met, the eye color present in the highest percentage of the images was the single most common eye color. The cutoff of 20% was used for four colors, but no species had four colors that met that threshold. With this assessment, the phylogenetic analysis below could be carried out with all recorded eye colors, as well as using only the most common eye colors, thereby assuring that rare eye colors did not skew the results.

Color polymorphism assessment

Although placing the eyes in the data set into discrete color groups is useful for downstream analyses, we also wanted to make sure a polymorphic assessment of the iris color trait reflects the reality of the trait. To do this, we performed a principal component analysis (PCA) on the R, G, and B values of every pixel of every iris in the data set, using the package scikit-learn (version 1.2.0) in Python and the built-in stats package in R (version 4.2.1) (Pedregosa et al. 2011; R Core Team 2022). The utility of PCA for color polymorphism assessment has been demonstrated before (Paterson and Blouin-Demers 2017). By averaging the pixels, irrespective of color group, in twenty equally spaced bins along each of the first three principal components (PCs), we were able to get a sense of what aspect of the color variation each PC was capturing. Then, we fit a linear mixed model for eye color on each of the PCs using the R package lme4 (version 1.1.34), including the species and individual the pixels were coming from as nested random effects (Bates et al. 2015).

This method allowed us to compare the effect of assigned iris color along each principal component axis using Satterthwaite’s t-test with Bonferroni correction. A significant effect of color group for a given PC in the linear mixed model would indicate that the color category assigned according to the methods above is meaningfully predictive of a pixel’s value along that PC. As an example, if the real irises are not adequately represented by the discrete color categories proposed here (e.g. brown eyes are all brownish-gray and gray eyes are all grayish-brown, so the categories significantly overlap), then there should not be a significant effect of assigned eye color for a PC that separates pixels by color (e.g. a PC that separates gray and brown). Of course, due to the nature of converting not fully standardized photographs into pixel data, there are many individual pixels that are outliers within a given eye—for instance, a brown-colored pixel might show up in a eye categorized as “blue” because of a fleck of dust in the cat’s eye or some irregular pigmentation—but, unless these outliers are numerous (thus making them not outliers), they should not affect this analysis.

If a significant effect of a color group was found for a PC, the PC values for all categories were compared to one another using a post-hoc Tukey HSD test with Bonferroni correction from the package emmeans (version 1.8.8), in order to distinguish which groups in particular significantly differ for that PC (Lenth 2023). Although this analysis was able to determine which color groups adequately reflect the true trait distributions and are meaningful overall, this does not necessarily mean that a polymorphic view of eye color is appropriate for all species. To address this, since PC2 was demonstrated to be the axis that separates pixels by relevant colors, the pixels in each color group for each species were compared along PC2 using a Kruskal–Wallis test with Bonferroni correction to determine whether there was a significant effect of color group at all. If there was, a two-sided pairwise Mann-Whitney-Wilcoxon test with Bonferroni correction was used to compare each group to one another. In this way, we were able to determine the biological appropriateness of using discrete color categories to analyze felid iris color.

Shade measurements within each color group

Although averaging the pixels within each iris was sufficient to categorize the colors present for each felid taxon, not every felid iris has homogeneous pigmentation. For example, some colors in some taxa are subject to central heterochromia with a darker pigment near the pupil and a lighter pigment in the periphery (Figure 1b, h) or the reverse (Figure 1c). Thus, we calculated corresponding “shade” values for each color group in each species. To do this, the images were sorted into their color groups for each species. For each group, RGB values for each pixel in each image were again extracted, resulting in a three dimensional data set. This was reduced to two dimensions using Uniform Manifold Approximation and Projection (UMAP), a method selected for its preservation of local structure, important for potential fine shade differences (McInnes et al. 2018). The UMAP projection for each image was then analyzed using k-means clustering through scikit-learn (Pedregosa et al. 2011). The number of clusters (k), indicating the number of distinct shades of color in the iris of each animal, was determined using elbow plots.

After this was done for all images in the group, the k values were averaged and each image was clustered using the average k value, rounded to the nearest integer. This was done to standardize within groups, avoid confounders based on lower quality images, and allow for comparative analysis. After this, the average RGB values for each cluster for each image were calculated. Then, the clusters were matched up based on similarity. To do this, one image from the group had its clusters labeled in order (if there were three clusters, they would be 0, 1 and 2). Then, another image from the group would have the distances in 3D space between each of its clusters compared to each of the labeled clusters. The optimal arrangement of clusters was found by calculating the sum of squared errors for every possible combination of clusters and taking the minimum. Then, the clusters were merged. This method was repeated for every image in the group. Doing this for every color of every species resulted in an output with the number of shades within the iris for each color in each species, as well as an average of each different shade across the data. Throughout this process, images were not resized so as to allow higher quality images with more pixels to contribute a greater amount to the average. This was done to ensure any blurring from lower quality images did not obscure the true shade variety in each eye.

The final, combined clusters were ranked by how prevalent they were within the eyes, calculated by the number of pixels in each group, and the groups for each shade were categorized as “dark”, “medium”, or “light”. To do this, if there were three general clusters for a color of a species, the distance from black (RGB: 0,0,0) in 3D space for each of the cluster average RGB values was computed and then they were assigned to be “dark”, “medium”, or “light” based on increasing distance from black in the color space. For species with two shades in their eyes of a certain color, the cluster average RGB values were compared, again using distance, to the averages of the three-shade eye “dark”, “medium”, and “light” values. They would be assigned the label that they were closest to. The remaining space was filled: if “dark” or “light” was empty, the “medium” value was duplicated; if “medium” was empty, the “dark” and “light” values were averaged. This method allows two-shade eyes to be compared to three-shade eyes without losing vital information. For species with one shade of a color in their eyes (of which there ended up being none in the data set), its average RGB values were assigned to “dark”, “medium”, and “light”. Lastly, eyes with four shades had to combine the two most similar shades together in order to make them comparable to the rest of the data set. The importance of this pipeline is to create a data set that can be compared in a standardized way. The information about which shades are most represented was also collected and saved. This data can be found in the Supplementary Material.

To ensure these results were accurately assessing eye color, the RGB values for each shade within each species were compared with increasing numbers of images from the data set (for examples, see Figure S5). If the RGB values leveled off as sample size increased, that would indicate that the sample is representative of the “true” shades. If there were major fluctuations, that would indicate that the sample size is not high enough to overcome variations in lighting conditions. In this way, the sample sizes for each color present in each taxon were confirmed to be sufficient as their RGB values leveled off.

Phylogeny

The phylogeny used for this work was a subset from the Carnivora supertree from Nyakatura and Bininda-Emonds (2012). This ultrametric phylogeny takes into account 188 literature and gene trees and includes members of all eight Felidae lineages. More recent phylogenies are largely congruent, differing mainly in the placement of the Bay Cat Lineage and the Pallas’s cat (Otocolobus manul), partly due to differences in Y chromosome evolutionary evidence compared to other lines of evidence (Li et al. 2016). Alternate placements were tested and were found to not produce a significant difference in results, making these discrepancies irrelevant to this study.

This Carnivora supertree tree is missing nine of the extant felid groups for which data was collected. Thus, a second tree (termed the “full” tree) was created with the missing species being added manually according to their placements on a Felidae specific tree from Johnson et al. (2006) and/or the more recent tree from Li et al. (2016). The subspecies added were defined according to the most recent identification based on Kitchener et al. (2017) and Liu et al. (2018). Subspecies were added as a polytomy next to the previously defined species on the tree. Since divergence data was unavailable for some of the species and subspecies, the additions were made with branch lengths equal to the nearest resolved neighboring branch, a severe overestimation of the divergence between groups.

It is important to note that this method of manually adding taxa to a tree is flawed without proper sequence data and certainly should not be relied upon for ancestral state predictions or to make broad claims, as there is no guarantee that any addition reflects true divergence. However, this tree was created purely to provide some insight into local areas of the tree at the species level (e.g. what was the eye color of the ancestral tiger?). Even still, these predictions must be understood as far more uncertain than analyses with the original supertree with more limited taxa. The full tree with all the eye colors present for each species is shown in Figure 2. The main tree created only considering the most common eye colors is presented in Figure S3a-e.

General color reconstruction

To begin the process of ancestral state reconstruction, the phylogenetic trees were read into R using the package ape (version 5.6-2) (Paradis and Schliep 2019). A table of taxa, and the colors represented for each, was loaded in and scored with 0/1 for absence/presence. The same table with just the most common eye colors was also loaded in.

The optimal model of trait evolution was then determined for each of the five eye colors independently across the tree. This was done using an Akaike information criterion (AIC) analysis done on the results of 14 different models run using the R package phytools (version 1.2-0) (Revell 2012). Six of the models were run using equal/symmetric rates (ER): a continuous-time Markov (Mk) model using fitMK(); an Mk model with edge rates assumed to have been sampled randomly from a Γ distribution using fitgammaMk(); a hidden rates model with four hidden states using fitHRM(); a hidden rates model with six hidden states using fitHRM(); a hidden rates model with a possible hidden state when a color is present, but not when it is absent, using fitHRM(); and a hidden rates model with a possible hidden state when a color is absent, but not when it is present, using fitHRM(). Another six of the models were identical, except they were run using asymmetric rates (ARD). The final two used an Mk model, but assumed that a color cannot be lost after it is gained or that it cannot be gained after it is lost, respectively. This process was done for the data of all the observed eye colors, as well as for the data for the most common eye colors and for the full phylogeny, with the AIC output and weights for the ER and ARD models given in Supplementary Table 3. The model with the lowest AIC value, indicating the best explanation of the data given the number of parameters, was used for subsequent analyses. The best models were also rerun in corHMM (version 2.8) and no differences between the results were found (Beaulieu et al. 2022).

Although the presence/absence of each eye color were analyzed on their own, the colors are likely not fully independent. Therefore, they were also analyzed together as a polymorphic trait using stochastic mapping through fitpolyMk() in phytools. Since there were far too many states (25-1), including high parameter complexity, for adequate interpretation as a polymorphic character and the two analyses generally aligned (data not shown), the independent model was used. A color was said to be present at any given node (Figure 3) if the marginal maximum likelihood ancestral state reconstruction for that color was greater than 50%, indicating more support for presence than absence.

Quantitative color reconstruction

After data was collected on the eye colors present for every node on the tree, more specific reconstructions were possible. For each node, a new tree was created for each eye color present at that node. Each of these subset trees included every descendant of that node that shared each eye color with it, except for those where the color was lost and then re-arose independently. For example, an ancestral node that was determined to have green eyes and brown eyes present would have one tree with all its continuous, green-eyed descendants and another tree with all its continuous, brown-eyed descendants. A diagram of this method is given in Figure S12. This method was done to most accurately reconstruct along plausible evolutionary pathways. If one wants to predict the eye shade of a specific color for a specific node, one should omit taxa that either have lost that eye color (since their present condition cannot communicate any relevant information about the shade of that color for their ancestor), as well as taxa that have lost that eye color and then regained it (since it is unknown whether their present condition is at all related to the shade of that color for their ancestor).

After the trees were created, the specific colors were reconstructed using maximum likelihood methods with the function fastAnc() from phytools (Revell 2012). This was done independently for the red, green, and blue values for each of the data sets collected for the light, medium, and dark shades. Since RGB values can only be from 0-255, it was heartening that the 95% confidence intervals for the quantitative reconstructions were almost always well within the realistic range, lending considerable support to the reconstructions. Large confidence intervals are a known limitation of continuous trait likelihood reconstructions, so one should not understand the reconstructions to always communicate the exact eye shades of the felid ancestors, but they are useful in comparison to one another to illuminate larger trends.

Beyond reconstructing the colors themselves, corHMM’s rayDISC() was used to reconstruct the number of shades within each eye color for each node, using the shade representation data as a discrete, multistate trait (Beaulieu et al. 2022). This was also done for the primary and secondary shades within each eye. Put together, these methods allow for a high resolution understanding of the iris color of ancestral felids. For each ancestral felid population, we are able to know: which color eyes were present (out of brown, green, yellow, gray, and blue), how many different shades they had in their eyes for each color, which shades were more or less common, and approximately what those shades would have been.

Environmental/behavioral/physical trait encoding

Data on pupil shape was obtained from Banks et al. (2015) and data on activity by time of day and primary habitat(s) was obtained from the University of Michigan Animal Diversity Web (Banks et al. 2015; Myers et al. 2022). Data on zoogeographical regions were based on Johnson et al. (2006) and data on coat patterns were based on Werdelin et al. (1997). Nose color data (pink or black) and whether or not any black was present in the coat or tail were determined manually from observation of images.

For correlation comparisons, each multistate trait was converted into a set of binary traits. Pupil shape was scored with 0 for vertical/subcircular pupils and 1 for round pupils. Likewise, whether black coloration is present in the coat and whether black coloration is present in the tail were scored with a 0 for absence and a 1 for presence. Activity was split into three traits, each corresponding to an activity lifestyle: nocturnal, crepuscular, and diurnal. Then, for each felid taxon, each trait was scored as present or absent using 1 or 0, respectively. This was especially useful, given that some taxa fall into multiple categories. The same was done for historical zoogeographical region (nearctic [North America], neotropical [South America], palearctic [Europe and North Asia], oriental [South Asia], ethiopian [Africa]), primary habitat (mountains, rainforest, forest, savanna, desert), and coat pattern (flecks, uniform, stripes, rosettes, blotches, sblotch [small blotches]). This method was also used for pink and black nose colors because some felid noses contain both colors and some species have both of the colors represented in their populations; in both cases, both colors would be marked as present.

Correlation analysis

Apart from reconstructing ancestral states, different correlations were performed in order to investigate the possible evolutionary interactions related to eye color variation. The environmental/physical trait data, along with the presence/absence data for each eye color, was analyzed with a maximum likelihood approach using BayesTraits (version 3.0.5), made accessible in R through the package btw (version 2.0) (Pagel et al. 2004; Griffin 2018). This was done by building two models, one where the evolution of two binary traits is independent and one where their evolution is dependent on one another (i.e. where the rate of change in one trait is influenced by the state of the other trait). Then, the models were evaluated using a calculated log Bayes Factor, with a log Bayes Factor over 2 indicating positive evidence for the dependent model. Given the stochasticity of these models, the model comparisons were done 100 times and the calculated log Bayes Factors were averaged, ensuring robust and reproducible results. This process was done by comparing the presence of each eye color to all others, as well as the environmental/behavioral/physical data to the presence of each eye color, the average shade of the RGB values in each eye color, and the average shade of the RGB values in all eye colors overall. This latter average was computed for all taxa by dropping NA values in the averages. To transform the average values into discrete traits, each value was categorized using Jenks natural breaks optimization, performed through the getJenksBreaks() command in the package BAMMtools (version 2.1.10) (Rabosky et al. 2014). Finally, tetrachoric correlation coefficients were calculated using the tetrachoric() command in the package psych (version 2.2.9), to indicate the direction of each association (Revelle 2022). For the shade correlations, a positive association indicates that the trait is associated with lighter shades.

Evolutionary insights into Felidae iris color through ancestral state reconstruction

Data files

Abstract

Provenance for this README

Dataset Version and Release History

Dataset Attribution and Usage

Contact Information

Additional Dataset Metadata

Acknowledgements

Methodological Information

Data and File Overview

Summary Metrics

Table of Contents

Setup

Notes

File Details

Details for: Tabin_Chiasson_Supplemental_Results.docx

Details for: Tabin_Chiasson_Supplemental_Figures.pdf

Details for: Tabin_Chiasson_Supplemental_Table_1.xlsx

Details for: Tabin_Chiasson_Supplemental_Table_2.xlsx

Details for: Tabin_Chiasson_Supplemental_Table_3.xlsx

Details for: Data Collection Script.ipynb

Details for: Color Polymorphism Assessment.ipynb

Details for: Felid LMM.Rmd

Details for: Color Presence Reconstruction.Rmd

Details for: Quantitative Color Reconstruction.Rmd

Details for: Output Specific Colors.ipynb

Details for: Find Correlations.Rmd

Details for: Carnivore_phylo_Nyakatura2012.NEX

Details for: Tip_col_data.csv

Details for: Node_col_data.csv

Details for: general_data_brown_only.csv, general_data_green_only.csv, general_data_yellow_only.csv, general_data_gray_only.csv, general_data_blue_only.csv, general_data_reordered.csv, and general_data_reordered_withsub.csv

Details for: col_data.csv, dom_col_data.csv, col_data_nosub.csv, dom_col_data_nosub.csv, col_data_onlytree.csv, and dom_col_data_onlytree.csv

Details for: poly_data_subset.csv and poly_data_main_subset.csv

Details for: enviro_data.csv and enviro_data_onlytree.csv

Details for: Felidae_all_pixel_colors.csv

Evolutionary insights into Felidae iris color through ancestral state reconstruction

Data files

Abstract

README: Reference Information

Provenance for this README

Dataset Version and Release History

Dataset Attribution and Usage

Contact Information

Additional Dataset Metadata

Acknowledgements

Methodological Information

Data and File Overview

Summary Metrics

Table of Contents

Setup

Notes

File Details

Details for: Tabin_Chiasson_Supplemental_Results.docx

Details for: Tabin_Chiasson_Supplemental_Figures.pdf

Details for: Tabin_Chiasson_Supplemental_Table_1.xlsx

Details for: Tabin_Chiasson_Supplemental_Table_2.xlsx

Details for: Tabin_Chiasson_Supplemental_Table_3.xlsx

Details for: Data Collection Script.ipynb

Details for: Color Polymorphism Assessment.ipynb

Details for: Felid LMM.Rmd

Details for: Color Presence Reconstruction.Rmd

Details for: Quantitative Color Reconstruction.Rmd

Details for: Output Specific Colors.ipynb

Details for: Find Correlations.Rmd

Details for: Carnivore_phylo_Nyakatura2012.NEX

Details for: Tip_col_data.csv

Details for: Node_col_data.csv

Details for: general_data_brown_only.csv, general_data_green_only.csv, general_data_yellow_only.csv, general_data_gray_only.csv, general_data_blue_only.csv, general_data_reordered.csv, and general_data_reordered_withsub.csv

Details for: col_data.csv, dom_col_data.csv, col_data_nosub.csv, dom_col_data_nosub.csv, col_data_onlytree.csv, and dom_col_data_onlytree.csv

Details for: poly_data_subset.csv and poly_data_main_subset.csv

Details for: enviro_data.csv and enviro_data_onlytree.csv

Details for: Felidae_all_pixel_colors.csv

Methods

Works referencing this dataset