Biochemical and genomic underpinnings of carotenoid color variation across a hybrid zone between South Asian flameback woodpeckers
Data files
Jul 17, 2025 version files 42 MB
-
Dinopium_crown.zip
1.53 MB
-
Colorimetric.Data-Dinopium_mantle_and_crown_from_pavo.csv
22.23 KB
-
Dinopium_mantle.zip
1.53 MB
-
Genotype_ReadDepth_data-GWAS-SNP.csv
14.77 KB
-
HPLC_data_Mantle_Crown_feathers_All_Dinopium.csv
11.40 KB
-
DW.pre.filtered_n.106.mac3.biallelic.minDP3.minGQ25.NOindel.maxmiss0.4.BIMBAMimputedPC1untrnfm.lmm1.chronNumEdited.assoc.txt
38.87 MB
-
GWAS_using_GEMMA_pipeline.rtf
17.61 KB
-
README.md
6.58 KB
Abstract
Coloration and patterning have been implicated in lineage diversification across various taxa, as color traits are heavily influenced by sexual and natural selection. Investigating the biochemical and genomic foundations of these traits therefore provides deeper insights into the interplay between genetics, ecology, and social interactions in shaping the diversity of life. In this study, we assessed the pigment chemistries and genomic underpinnings of carotenoid color variation in naturally hybridizing Dinopium flamebacks in tropical South Asia. We employed reflectance spectrometric analysis to quantify species-specific plumage coloration, High-Performance Liquid Chromatography (HPLC) to elucidate the feather carotenoids of flamebacks across the hybrid zone, and Genome-Wide Association Study (GWAS) using next-generation sequencing data to uncover the genetic factors underlying carotenoid color variation in flamebacks. Our analysis revealed that the red mantle feathers of D. psarodes primarily contain astaxanthin, with small amounts of other 4-keto-carotenoids. In contrast, the yellow mantle feathers of D. benghalense predominantly contained lutein and 3'-dehydro-lutein, alongside minor amounts of zeaxanthin, β-cryptoxanthin, and canary-xanthophylls A and B. Hybrids with an intermediate, orange, coloration deposited all of these pigments in their mantle feathers, with notably higher concentrations of carotenoids with ε-end rings. The GWAS analysis identified the CYP2J2 gene, which plays a role in carotenoid ketolation, as associated with the expression of carotenoid coloration. This gene exhibited significant allele variation and evidence of multiple copies in flamebacks. These findings contribute to the growing knowledge of avian carotenoid metabolism and highlight how genomic architecture can influence phenotypic diversity.
Dataset DOI: 10.5061/dryad.gqnk98szb
Raw Data Availability
Please note that the raw Genotype-by-Sequencing (GBS) data used for the GWAS analysis are publicly available on the NCBI Sequence Read Archive (SRA) under the accession number SUB14620227.
Description of the Dataset and File Structure
Genomic Data Processing and Analysis:
GWAS_using_GEMMA_pipeline.rtf - This document provides a detailed overview of the pipeline used to conduct the Genome-Wide Association Study (GWAS) analysis using GEMMA.
Phenotypic Data Analysis:
All pigment concentration data and reflectance spectrometric measurements were analyzed using R. For a full demonstration of the data processing and visualization steps, please visit the website: Data Analysis and Visualization – Flameback
Data Files
-
Dinopium_crown.zip - This folder includes plain text files containing reflectance spectrometry data for crown feather samples collected across the Dinopium hybrid zone in Sri Lanka. Each sample includes three replicate measurements.
File Format: Each .txt file corresponds to a single feather sample and includes three replicate measurements for each individual.
Naming convention: Files are named using the bird’s sample ID, phenotype (“Red” for red-backed D. psarodes, “Yellow” for yellow-backed D. benghalense, and “Orange” for intermediate orange-backed individuals), and the replicate number (“001,” “002,” or “003”) — e.g., UC12RR02_Red_creast-001.txt, UC12RR02_Red_creast-002.txt, UC12RR02_Red_creast-003.txt.
File structure: Each file begins with metadata, followed by a table where the first column lists wavelength values (177–8830 nm), and the second column lists reflectance percentages (0–100%).
-
Dinopium_mantle.zip - This folder includes plain text files containing reflectance spectrometry data for mantle feather samples collected across the Dinopium hybrid zone in Sri Lanka. Each sample includes three replicate measurements.
File Format: Each .txt file corresponds to a single feather sample and includes three replicate measurements for each individual.
Naming convention: Files are named using the bird’s sample ID, phenotype (“Red” for red-backed D. psarodes, “Yellow” for yellow-backed D. benghalense, and “Orange” for intermediate orange-backed individuals), and the replicate number (“001,” “002,” or “003”) — e.g., UC12RR02_Red_mantle-001.txt, UC12RR02_Red_mantle-002.txt, UC12RR02_Red_mantle-003.txt.
File structure: Each file begins with metadata, followed by a table where the first column lists wavelength values (177–8830 nm), and the second column lists reflectance percentages (0–100%).
-
Colorimetric.Data-Dinopium_mantle_and_crown_from_pavo.csv - This CSV file contains extracted colorimetric variables using the pavo R package for crown and mantle feather samples. Each row corresponds to a single feather sample (crown or mantle) from an individual bird.
Key Columns and Layout:
ID: Unique sample identifier for each individual bird, Species: Scientific name of the species, Feather: Indicates whether the data corresponds to the* crown* or mantle feathers of the bird. B1, B2, B3: Brightness values, S1U – S10: Saturation values, and H1 – H5: hue values extracted from spectral data. For more detailed information on how brightness, saturation, and hue are calculated and interpreted, please refer to the pavo R package manual.
-
HPLC_data_Mantle_Crown_feathers_All_Dinopium.csv - This CSV file contains carotenoid pigment data from crown and mantle feathers of Dinopium flamebacks, measured using High-Performance Liquid Chromatography (HPLC). Each row represents a feather sample.
Key Columns and Layout:
Identification_II: unique sample identifier for each individual bird, Scientific name: scientific name of the species, color: feather coloration, feathers: Indicates whether the data corresponds to the crown or mantle feather, % beta-cryptoxanthin ident - %zeaxanthin ident: percentage of each identified pigment, lutein ident__concentration - papilioerythrinone_concentration: absolute concentrations of each of identified pigments (μg/g) and avg 4-keto_groups - avg epsilon rings: average functional group compositions in each feather sample
-
DW.pre.filtered_n.106.mac3.biallelic.minDP3.minGQ25.NOindel.maxmiss0.4.BIMBAMimputedPC1untrnfm.lmm1.chronNumEdited.assoc.txt - This file contains GWAS results generated by GEMMA using a linear mixed model (LMM).
Key Columns and Layout:
chr: Chromosome number/ name, rs: SNP identifier (formatted as contig: position), ps: Base-pair position, n_miss: Number of individuals with missing genotype calls, allele1 / allele0: Minor and major alleles at the SNP site, af: Allele frequency of the minor allele, beta: Estimated effect size of the SNP on the trait avlue, se: Standard error of the beta estimate, logl_H1: Log-likelihood of the alternative model (with SNP effect), l_remle: Restricted maximum likelihood estimate, p_wald: Wald test p-value for SNP-trait association
-
Genotype_ReadDepth_data-GWAS-SNP.csv - This CSV file contains genotype and standardized read depth data for each Dinopium flameback sample. Each row represents a single bird.
Key Columns and Layout:
ID: sample identifier, Phenotype: phenotype of the bird, CHR31_3082775 - CHR4_17938643: SNP genotypes in nucleotide format (e.g., GG, GC), Chr4_17938638 - Chr31_3082780: SNP genotypes coded as -1 (missing), 0 (homozygous reference), 1 (heterozygous), 2 (homozygous alternative), stndized_read_dpth_CH4_17938638 - stndized_read_dpth_CHR31_3082775: read depth at each SNP, standardized by dividing an individual’s read depth by the genome-wide mean read depth across all SNPs and all individuals, Ind_stndized_read_dpth_CH4_17938638 - Ind_stndized_read_dpth_CHR31_3082775: read depth at each SNP, standardized by dividing an individual’s read depth by the genome-wide mean read depth within that individual and mean_read_dpth_per_sample: mean read depth per individual across all loci.