The color patterns of African cichlid fishes provide notable examples of phenotypic convergence. Across the more than 1200 East African rift lake species, melanic horizontal stripes have evolved numerous times. We discovered that regulatory changes of the gene agouti-related peptide 2 (agrp2) act as molecular switches controlling this evolutionarily labile phenotype. Reduced agrp2 expression is convergently associated with the presence of stripe patterns across species flocks. However, cis-regulatory mutations are not predictive of stripes across radiations, suggesting independent regulatory mechanisms. Genetic mapping confirms the link between the agrp2 locus and stripe patterns. The crucial role of agrp2 is further supported by a CRISPR-Cas9 knockout that reconstitutes stripes in a nonstriped cichlid. Thus, we unveil how a single gene affects the convergent evolution of a complex color pattern.
Agrp2_interval_reference_Pundamilia_nyererei
Reference sequence of the fine mapped QTL interval. Sequence is based on the genome assembly of Pundamilia nyererei (Brawand et al. 2014), but was fully corrected by Sanger sequencing to fill gaps of the assembly and correct nucleotide variants.
Agrp2_Enhancer_sequences_and_trees
The gene tree for the agrp2 enhancer (enh.a) for the same 24 Great Lake cichlid species examined for the association of agrp2 expression and the presence of a stripe as well as Oreochromis niloticus was reconstructed in BEAST. Analyses of relationships were run five times for 20 million generations each to infer the locus history, and we discarded the first 10 million generations from each run as burn-in samples. We then combined the post burn-in trees to estimate the posterior probabilities of nodes subtending the 24 Great Lake cichlids for this single locus.The zip file contains the alignment of the 25 species (Agrp2_Enhancer_sequences.nex) and the gene trees of the 5 runs (Agrp2_Enhancer_trees_*run.trees).
Gene_Trees_6_genes_dataset
To examine the evolutionary association between agrp2 expression and the presence of stripes, phylogenetic relationships among 24 focal cichlid species that either possess or do not possess stripes from Lakes Victoria, Malawi and Tanganyika were reconstructed. Oreochromis niloticus was also included as an outgroup to these species. Evolutionary relationships were estimated in a species tree framework using three data sets. First, we estimated the species tree with six loci including ND2, mitfa, mitfb, lws, s7, and rag1. This phylogenetic data matrix of 3474 bp was composed of genetic information available from GenBank combined with newly generated Sanger sequences that were used to reconstruct the species tree of these species in BEAST. This analysis was run five times for 20 million generations each, and we discarded the first 10 million trees as burn-in for each run. The zip file contains the alignment of the 25 species and 6 loci (Gene_Sequences_*.nex), a table with phenotype and relative agrp2 gene expression (Matrix_expression_stripes_6_genes.csv) and the gene trees of the 5 runs (Gene_Trees_6_genes_*run.trees).
Gene_Trees_30_genes_dataset
To examine the evolutionary association between agrp2 expression and the presence of stripes, phylogenetic relationships among 24 focal cichlid species that either possess or do not possess stripes from Lakes Victoria, Malawi and Tanganyika were reconstructed. Oreochromis niloticus was also included as an outgroup to these species. To obtain an independent phylogenetic hypothesis for testing associations between agrp2 expression and the presence of stripes, we constructed a phylogenetic matrix for 22 of the 25 species used in the Gene_Trees_6_genes_dataset using genomic sequences of 30 additional genes (arnt, azin1a, bcanb, bmp4, cspg5a, ctnnb1, dlx1a, dlx2a, dlx3b, dlx4a, dlx4b, edn2, fzd6, irx1a, irx2a, irx4a, isl1, myg1, ndrg1b, nrp2b, ntrk1, osr2, pitx1, pitx2, pitx3, shh, smo, sostdc1a, tuft1a and wnt4b) from published genomes as well as from target enrichment data. Also here, the analysis was run five times for 20 million generations each and we discarded the first 10 million trees for each run as burn-in. The zip file contains the alignment of the 22 species and 30 loci (Gene_Sequences_*.nex), a table with phenotype and relative agrp2 gene expression (Matrix_expression_stripes_30_genes.csv) and the gene trees of the 5 runs (Gene_Trees_30_genes_*run.trees).