Skip to main content
Dryad

A supergene underlies linked variation in color and morphology in a Holarctic songbird

Cite this dataset

Funk, Erik et al. (2021). A supergene underlies linked variation in color and morphology in a Holarctic songbird [Dataset]. Dryad. https://doi.org/10.5061/dryad.q83bk3jjm

Abstract

The genetic architecture of a phenotype can have considerable effects on the evolution of a trait or species. Characterizing genetic architecture provides insight into the complexity of a given phenotype and, potentially, the role of the phenotype in evolutionary processes like speciation. We use genome sequences to investigate the genetic basis of phenotypic variation in redpoll finches (Acanthis spp.). We demonstrate that variation in redpoll phenotype is broadly controlled by a ~55-Mb chromosomal inversion. Within this inversion, we find multiple candidate genes related to melanogenesis, carotenoid coloration, and bill shape, suggesting the inversion acts as a supergene controlling multiple linked traits. A latitudinal gradient in ecotype distribution suggests supergene driven variation in color and bill morphology are likely under environmental selection, maintaining supergene haplotypes as a balanced polymorphism. Our results provide a mechanism for the maintenance of ecotype variation in redpolls despite a genome largely homogenized by gene flow.

Methods

Whole genome sequence data was sequenced from extracted DNA on two lanes of an S4 flow cell on an Illumina NovaSeq. Raw reads were trimmed using Timmomatic PE and aligned to a brown-capped rosy-finch (Leucosticte australis) reference genome using BWA mem with default settings. Variants were called using bcftools "mpileup" and filtered to keep only single nucleotide polymorphisms with a quality score greater than 80. Potentially paralogous loci were removed by filtering SNPs with a depth lower than 2x and higher than 12x. SNPs with a minor allele frequency less than 5% were removed and two data sets were generated based on allowed missing data, with a 100p dataset that did not allow for any missing data, and a 75p dataset that allowed for up to 25% missing data. Data are presented in Variant Call Format. One individual was dropped from the original sequencing effort due to sibling status with another individual.