Skip to main content
Dryad

High heterogeneity in genomic differentiation between phenotypically divergent songbirds: A test of mitonuclear co-introgression

Citation

Nikelski, Ellen; Irwin, Darren; Rubtsov, Alexander (2023), High heterogeneity in genomic differentiation between phenotypically divergent songbirds: A test of mitonuclear co-introgression, Dryad, Dataset, https://doi.org/10.5061/dryad.tmpg4f538

Abstract

Comparisons of genomic variation among closely related species often show more differentiation in mitochondrial DNA (mtDNA) and sex chromosomes than in autosomes, a pattern expected due to the differing effective population sizes and evolutionary dynamics of these genomic components. Yet, introgression can cause species pairs to deviate dramatically from general differentiation trends. The yellowhammer (Emberiza citrinella) and pine bunting (E. leucocephalos) are hybridizing avian sister species that differ greatly in appearance and moderately in nuclear DNA, but that show no mtDNA differentiation. This discordance is best explained by adaptive mtDNA introgression—a process that can select for co-introgression at nuclear genes with mitochondrial functions (mitonuclear genes). To better understand these discordant differentiation patterns and characterize nuclear differentiation in this system, we investigated genome-wide differentiation between allopatric yellowhammers and pine buntings and compared it to what was seen previously in mtDNA. We found significant nuclear differentiation that was highly heterogeneous across the genome, with a particularly wide differentiation peak on the sex chromosome Z. We further investigated mitonuclear gene co-introgression between yellowhammers and pine buntings and found support for this process in the direction of pine buntings into yellowhammers. Genomic signals indicative of co-introgression were common in mitonuclear genes coding for subunits of the mitoribosome and electron transport chain complexes. Such introgression of mitochondrial DNA and mitonuclear genes provides a possible explanation for the patterns of high genomic heterogeneity in genomic differentiation seen among some species groups.

Methods

Full methods can be found in the linked 2023 Heredity paper with additional details provided in the attached scripts and text files. Briefly, DNA was extracted from blood and tissue samples taken from avian individuals and then sequenced using a Genotyping-By-Sequencing approach. DNA reads were processed using the scripts provided in the "GBS_Reads_processing_to_VCF.txt" file and then analyzed using the scripts provided in both the "Heredity_Pub_PopulationGenetic_Analysis_Code.R" and "Heredity_Pub_Mitonuclear_Analysis_Code.R"  files. The former of these two R files contains general genetic analyses while the latter contains mitonuclear analyses.

Usage notes

GBS_Reads_processing_to_VCF.txt

This text file contains the scripts used to convert raw Illumina GBS sequence reads into a genome-wide variant site file (in "012NA" format) and into chromosome-specific "all sites" files which include both variant and invariant sites (in "012NA" format). The resulting files were input into R and investigated with genetic and mitonuclear analyses. These scripts were adapted from those applied in Irwin et al. (2018; citation below). The scripts employed by Irwin et al. (2018) are available on Dryad at: https://doi.org/10.5061/dryad.4j2662g

GBS_Plate#_Barcods.txt

These four text files contain the nucleotide barcodes associated with DNA samples that were sequenced on four GBS plates. This information is necessary when running the GBS read processing scripts contained within "GBS_Reads_processing_to_VCF.txt".

Genome_wide_variant_site_files.zip

This zip file contains the files produced (.indv, .pos, .012NA) when using the scripts available in "GBS_Reads_processing_to_VCF.txt" to create a genome-wide variant site dataset (in "012NA" format). The produced files were loaded into R and analyzed using the scripts contained in "Heredity_Pub_Mitonuclear_Analysis_Code.R" and "Heredity_Pub_PopulationGenetic_Analysis_Code.R".

Chromosome_Info_site_files.zip

This zip file contains the files produced (.indv, .pos, .012NA) when using the scripts available in "GBS_Reads_processing_to_VCF.txt" to create chromosome-specific "all sites" datasets that contain both variant and invariant sites (in "012NA" format). There is a trio of files associated with each of the chromosomes analyzed in this research. The "all sites" files were loaded into R and analyzed using the scripts contained in "Heredity_Pub_Mitonuclear_Analysis_Code.R" and "Heredity_Pub_PopulationGenetic_Analysis_Code.R".

Heredity_Pub_PopulationGenetic_Analysis_Code.R

This R file contains the R scripts used to both conduct the genetic analyses and produce the figures found in the associated paper. Author comments accompany major lines of code to explain their purpose. The input files for these analyses can be found within "Genome-wide_variant_site_files.zip" and "Chromosome_Info_site_files.zip". To run these analyses, the "Genomics_R_functions_TajD.Ellen.Fixes.R" file must also be loaded in R. This file contains necessary functions that are applied in the analysis scripts. Also, two additional CSV files ("All_Plates_Fst_Allo_Sym.csv" and "Allopatric_Map_Info.csv") are needed to run certain parts of the analysis.

Heredity_Pub_Mitonuclear_Analysis_Code.R

This R file contains the R scripts used to conduct the mitonuclear analyses described in the associated paper. Author comments accompany major lines of code to explain their purpose. The input files for these analyses can be found within "Genome-wide_variant_site_files.zip" and "Chromosome_Info_site_files.zip". To run these analyses, the "Genomics_R_functions_TajD.Ellen.Fixes.R" file must also be loaded into R. This file contains necessary functions that are applied in the analysis scripts. Also, one additional CSV file ("All_Plates_Fst_Allo_Sym.csv") is needed to run certain parts of the analysis.

Genomics_R_functions_TajD.Ellen.Fixes.R

This R file contains the R functions that were used to conduct the genetic and mitonuclear analyses contained within the "Heredity_Pub_PopulationGenetic_Analysis_Code.R" and "Heredity_Pub_Mitonuclear_Analysis_Code.R" files. This file was adapted from the functions file employed by Irwin et al. (2018) but contains some modifications and additional functions that can be used to calculate Tajima's D for the populations being analyzed. The scripts employed by Irwin et al. (2018) are available on Dryad at: https://doi.org/10.5061/dryad.4j2662g

All_Plates_Fst_Allo_Sym.csv

This CSV file contains the descriptive metadata for the samples analyzed in this study. This file contains information on the sex, phenotype, species, distribution and location of collection for each sample. This file was used to sort samples into discrete groups that were then compared in the genetic and mitonuclear analyses contained within the "Heredity_Pub_PopulationGenetic_Analysis_Code.R" and "Heredity_Pub_Mitonuclear_Analysis_Code.R" files.

Allopatric_Map_Info.csv

This CSV file contains the information necessary to create the map included in Figure 1 of the associated paper using scripts contained within the "Heredity_Pub_PopulationGenetic_Analysis_Code.R" file. This CSV file describes the number and the identities of individuals collected at specific sampling locations as well as the latitude and longitude of each sampling location. Some sampling locations were combined on the map because they were too close together to be shown accurately. Information on how sampling locations were combined can be found in the associated paper.

References:

Irwin, D. E., Milá, B., Toews, D. P. L., Brelsford, A., Kenyon, H. L., Porter, A. N., Grossen, C., Delmore, K. E., Alcaide, M., & Irwin, J. H. (2018). A comparison of genomic islands of differentiation across three young avian species pairs. Molecular Ecology, 27(23), 4839-4855. https://doi.org/10.1111/mec.14858

Funding

Natural Sciences and Engineering Research Council of Canada, Award: 03919

Natural Sciences and Engineering Research Council of Canada, Award: 507830

Natural Sciences and Engineering Research Council of Canada, Award: CGSM

University of British Columbia, Award: Werner and Hildegard Hesse Research Awards