Skip to main content

VCF data file and code for: CYP2J19 mediates carotenoid colour introgression across a natural avian hybrid zone

Cite this dataset

Kirschel, Alexander et al. (2021). VCF data file and code for: CYP2J19 mediates carotenoid colour introgression across a natural avian hybrid zone [Dataset]. Dryad.


It has long been of interest to identify the phenotypic traits that mediate reproductive isolation between related species, and more recently, the genes that underpin them. Much work has focused on identifying genes associated with animal colour, with the candidate gene CYP2J19 identified in laboratory studies as the ketolase converting yellow dietary carotenoids to red ketocarotenoids in birds with red pigments. But evidence that CYP2J19 explains variation between red and yellow feather coloration in wild populations of birds is lacking. Hybrid zones provide the opportunity to identify genes associated with specific traits. Here we investigate genomic regions associated with colour in red-fronted and yellow-fronted tinkerbirds across a hybrid zone in southern Africa. We sampled 85 individuals, measuring spectral reflectance of forecrown feathers and scoring colours from photographs, while testing for carotenoid presence with Raman spectroscopy. We performed a genome-wide association study to identify associations with carotenoid-based coloration, using double-digest RAD sequencing aligned to a short-read whole genome of a Pogoniulus tinkerbird. Admixture mapping using 104,933 SNPs identified a region of chromosome 8 that includes CYP2J19 as the only locus with more than two SNPs significantly associated with both crown hue and crown score, while Raman spectra provided evidence of ketocarotenoids in red feathers. Asymmetric backcrossing in the hybrid zone suggests that yellow-fronted females mate more often with red-fronted males than vice versa. Female red-fronted tinkerbirds mating assortatively with red-crowned males is consistent with the hypothesis that converted carotenoids are an honest signal of quality.


Field sampling, reflectance spectrometry, double-digest RAD sequencing, Genomewide association study

Usage notes

GEMMA can run on vcf data after the dataset is reduced to either the 78 samples used in plumage scoring or 57 used in spectrometry in vcftools. We then use PLINK to convert the data from the vcf file into a binary fileset: .bim, .bed and .fam files. The .fam file will have the phenotype information. The default value of -9 is then replaced with the trait information included in the supplementary information.


University of Cyprus, Award: Internal Grant

A G Leventis Foundation, Award: Scholarship