Animal coloration is one of the most conspicuous phenotypic traits in natural populations and has important implications for adaptation and speciation. Changes in coloration can occur over surprisingly short evolutionary timescales, while recurrence of similar colour patterns across large phylogenetic distances is also common. Even though the genetic basis of pigment production is well understood, little is known about the mechanisms regulating colour patterning. In this study, we shed light on the molecular elements regulating regional pigment production in two genetically near-identical crow taxa with striking differences in a eumelanin-based phenotype: black carrion and grey-coated hooded crows. We produced a high-quality genome annotation and analysed transcriptome data from a 2 × 2 design of active melanogenic feather follicles from head (black in both taxa) and torso (black in carrion and grey in hooded crow). Extensive, parallel expression differences between body regions in both taxa, enriched for melanogenesis genes (e.g. ASIP, CORIN, and ALDH6), indicated the presence of cryptic prepatterning also in all-black carrion crows. Meanwhile, colour-specific expression (grey vs. black) was limited to a small number of melanogenesis genes in close association with the central transcription factor MITF (most notably HPGDS, NDP and RASGRF1). We conclude that colour pattern differences between the taxa likely result from an interaction between divergence in upstream elements of the melanogenesis pathway and genes that provide an underlying prepattern across the body through positional information. A model of evolutionary stable prepatterns that can be exposed and masked through simple regulatory changes may explain the phylogenetically independent recurrence of colour patterns that is observed across corvids and many other vertebrate groups.
Hooded crow genome annotation v2.7A
Whole-genome annotation file for hooded crow (Corvus [corone] cornix) in gff format.
C.c.cornix_annotation_v2.7A.gff
RSEM output (gene counts)
Contains the gene counts for each library as produced by the RSEM software. For each tissue, a separate file is provided, in which columns are libraries (with the library identifiers corresponding to those in the file containing metadata by library) and rows are genes (with gene identifiers as in the annotation file).
RSEM.output.tar.gz
PCA input
A matrix with the input data (normalized gene counts) for the PCA that is presented in the paper. Columns are libraries (with the library identifiers corresponding to those in the file containing metadata by library) and rows are genes (with gene identifiers as in the annotation file).
PCA.input.txt
Differential expression results (edgeR and ebseq output)
Contains files with results of the differential expression analyses conducted with edgeR and Ebseq. Separate files are present for each program for each of the comparisons that were made. Comparisons were made between carrion and hooded crow for each of the tissues: forebrain, liver, gonads, skin_bodypool (skin from torso), and skin_headpool (skin from head), and between body regions for each of the taxa: skin_CC (carrion crow skin) and skin_CX (hooded crow skin). For example, the file "ebseq_liver.txt" presents the results of carrion vs. hooded crow in liver performed with Ebseq. In each file, the columns are: gene_id (gene identifiers as in the annotation file), FDR (False Discovery Rate) and LFC (Log-Fold Change).
edgeR.and.ebseq.output.tar.gz
Metadata by individual
Metadata for each crow sampled in this study: including population of origin, GPS coordinates, dates of feather plucking, dates of sampling, times of day of sampling, weight at sampling, approximate age at sampling, and number of days of feather regrowth until sampling.
metadata_individuals.txt
Metadata by library
Metadata for each RNA-seq library used in this study, including the tissue and individual (see the metadata by individual file for more info on each individual) from which the library is derived, and the SRA accession numbers. Columns: "library_ID" (library identifier 2), "library_code" (library identifier 2), "individual_short" (individual ID, same as in metadata by individual file), "tissue_long" (name of tissue), "use_for_DE" (whether the sample was used for differential expression analysis (if TRUE) or only for annotation improvement (if FALSE), "Biosample_ID" (SRA Biosample identifier), "SRR_ID" (SRA run ID), "SRS_ID" (SRA sample ID), "sample".
metadata_libraries.txt
tophat and cufflinks script
This script will download our RNA-seq data from SRA (using the SRR IDs that can be found in metadata by library), use Tophat to map it to the genome (genome fasta file should be in the working dir), and then run Cufflinks.
tophat_script.sh