Genetic introgression is pervasive in nature and may lead to large-scale phenotypic assimilation and/or admixture of populations, but there is limited knowledge on whether large phenotypic changes are typically accompanied by high levels of introgression throughout the genome. Using bioacoustic, biometric and spectrophotometric data from a flycatcher (Tyrannidae) system in the Neotropical genus Zimmerius, we document a mosaic pattern of phenotypic admixture in which a population of Z. viridiflavus in northern Peru (henceforth ‘mosaic’) is vocally and biometrically similar to conspecifics to the south but shares plumage characteristics with a different species (Z. chrysops) to the north. To clarify the origins of the mosaic population, we used the RAD-seq approach to generate a dataset of 37,361 genome-wide SNPs. A range of population-genetic diagnostics shows that the genome of the mosaic population is largely indistinguishable from southern Z. viridiflavus and distinct from northern Z. chrysops, and the application of parsimony and species tree methods to the genome-wide SNP dataset confirms the close affinity of the mosaic population with southern Z. viridiflavus. Even so, using a subset of 2710 SNPs found across all sampled lineages in configurations appropriate for a recently proposed statistical (‘ABBA/BABA’) test that distinguishes gene flow from incomplete lineage sorting, we detected low levels of gene flow from northern Z. chrysops into the mosaic population. Mapping the candidate loci for introgression from Z. chrysops into the mosaic population to the zebra finch genome reveals close linkage with genes significantly enriched in functions involving cell projection and plasma membranes. Introgression of key alleles may have led to phenotypic assimilation in the plumage of mosaic birds, suggesting that selection may have been a key factor facilitating introgression.
Supp_File_1
SUPPLEMENTARY FILE 1. Custom script (‘tetracolor’) by Rafael Maia in R
Supp_File_2
SUPPLEMENTARY FILE 2. Names (FASTA sequence labels) of the chromosome sequences from the zebra finch genome used in this study.
Supp_File_3
SUPPLEMENTARY FILE 3. Tabular BLASTN results for the Zimmerius contigs against the zebra finch genome, retaining only hits with e-value ≤ 1e-20.
Supp_File_6
SUPPLEMENTARY FILE 6. NEXUS file for parsimony analysis in PAUP.
Supp_File_7
SUPPLEMENTARY FILE 7. R script for gene ontology enrichment analysis.
Supp_File_8
SUPPLEMENTARY FILE 8. Estimates of theta for various datasets and priors. Each of 6 datasets was subjected to a set of alpha and beta parameters allowing for different current and ancestral population sizes: (1) a gamma (2,2000) prior (with`q = 0.001) for small population sizes, and (2) a gamma (1,10) prior (with`q = 0.1) for large population sizes. Three datasets (called ‘allLoci’; n = 954 SNPs) include all SNPs called across all individuals (see Methods), while the three remaining datasets (n = 947) have constant heterozygotes removed; in each of these two groups of datasets, the first dataset includes all lineages, while the other two datasets have one lineage (‘southern’ Z. viridiflavus or ‘intermediate’ mosaic birds) removed. Each theta stands for one of the nodes in the tree. Note how thetas are heavily influenced by the prior.
Supp_File_9
SUPPLEMENTARY FILE 9. A summary of the mapped ABBA/BABA contig statistics, produced by running the script in Supplementary File 5.
Supp_File_10
SUPPLEMENTARY FILE 10. Excel file with six sheets showing associations between ABBA and BABA-like sites with gene ontology (GO) terms from the zebra finch genome. The first three sheets refer to ABBA-like sites; the last three sheets refer to BABA-like sites. Sheets 1 and 4 give associations with GO terms related to cellular components (“cc”), sheets 2 and 5 refer to GO terms related to biological processes (“bp”), and sheets 3 and 6 refer to GO terms related to molecular function (“mf”). In each sheet, GO terms are accompanied by the number of zebra finch genes they are annotated to (“annotated”), followed by the number of ABBA or BABA-linked genes they are annotated to (“significant”), followed by the number of ABBA or BABA-linked genes they would be expected to be annotated to by chance (“expected”), followed by the corrected p value for their over-representation in the ABBA or BABA-linked gene set (“corrected”). The only gene set showing significant (p < 0.05) over-representation of any GO term is the ABBA set for cellular components (sheet 1).
Supp_File_4
SUPPLEMENTARY FILE 4. The output from VarScan used as our SNP datafile for downstream analysis.
Supp_File_5.tar
SUPPLEMENTARY FILE 5. Our main analysis script for population genetic and introgression analyses compiled by P.R.W. (requires python 2.6+ and Biopython). Includes all required accompanying data files. Best run interactively.
zimmerius_prg.fasta
This is the Pseudo-Reference Genome (PRG) assembly file.
sample_1_to_12_mpileup
This is the pileup file (see VarScan application).
Supplementary Tables
Supplementary tables
Fig_S1
FIGURE S1. Fraction of missing SNP calls in each individual.
Fig_S2
FIGURE S2. Number of 100-bp Illumina reads per individual. Individual labels refer to those used in Table S4 and Figure 5.
Fig_S3
FIGURE S3. STRUCTURE plots showing 10 in-group individuals for three replicated runs (‘rep’) using all available SNPs polymorphic in the ingroup lineages that mapped to the zebra finch genome (n = 9525) with K = 2 (upper panel) and K = 3 (lower panel).
Fig_S4
FIGURE S4. STRUCTURE plots showing 10 in-group individuals for three replicated runs (‘rep’) in which we excluded SNPs whose coverage amongst called individuals was greater than two standard deviations above the mean (n = 32,103) with K = 2 (upper panel) and K = 3 (lower panel).
Fig_S5
FIGURE S5. Parsimony consensus tree (outgroups not shown) using a stepmatrix in which each genotype is one step from the adjacent one sharing one allele, with heterozygotes being one step from each homozygote but the two homozygotes being two steps from each other; bootstrap support numbers at nodes are based on 2000 replicates; only bootstrap >50 is shown.
Fig_S6
FIGURE S6. (a) Number of contigs mapped against zebra finch chromosomes plotted versus chromosome length; fewer contigs mapped against the Z chromosome (blue) than expected by chromosome length. (b) Number of contigs that contain a SNP with an ABBA or BABA pattern mapped against zebra finch chromosomes versus chromosome length; far fewer ABBA-BABA contigs mapped against the Z chromosome (blue) than expected by chromosome length. (c) Number of contigs that contain an ABBA or BABA pattern versus number of total contigs for each zebra finch chromosome they mapped against; fewer ABBA-BABA contigs mapped to the Z chromosome (blue) than expected. (d) Frequency plot giving the number of ABBA-BABA contigs mapped against each zebra finch chromosome per contigs mapped.
Fig_S7
FIGURE S7. Illustration of an example case of ancient population subdivision that could lead to an identical data signal as that created by genetic introgression between mosaic birds and Z. chrysops. Note that substantial levels of persistent ancestral subdivision (across speciation events) are needed to account for such patterns of asymmetry of tree signal (Slatkin and Pollard 2008). Given most Andean birds’ turbulent Pleistocene evolutionary history, this alternative scenario is probably unlikely.