Skip to main content
Dryad logo

Data from: Whole-genome phylogeography of the Blue-faced honeyeater (Entomyzon cyanotis) and discovery and characterization of a neo-Z chromosome

Citation

Edwards, Scott; Burley, John; Orzechowski, Sophie; Sin, Yun Wa (2022), Data from: Whole-genome phylogeography of the Blue-faced honeyeater (Entomyzon cyanotis) and discovery and characterization of a neo-Z chromosome, Dryad, Dataset, https://doi.org/10.5061/dryad.7pvmcvdvg

Abstract

Whole-genome surveys of genetic diversity and geographic variation often yield unexpected discoveries of novel structural variation, which long-read DNA sequencing can help clarify. Here we report on whole-genome phylogeography of a bird exhibiting classic vicariant geographies across Australia and New Guinea, the Blue-faced honeyeater (Entomyzon cyanotis), and the discovery and characterization of a novel neo-Z chromosome by long-read sequencing. Using short-read genome-wide SNPs, we inferred population divergence events within E. cyanotis across the Carpentarian and other biogeographic barriers during the Pleistocene (~0.3 – 1.7 MYA). Evidence for introgression between non-sister populations supports a hypothesis of reticulate evolution around a triad of dynamic barriers around Pleistocene Lake Carpentaria between Australia and New Guinea. During this phylogeographic survey, we discovered a large (134 Mbp) neo-Z chromosome and explore its diversity, divergence and introgression landscape. We show that, as in some Sylvioid passerine birds, a fusion occurred between chromosome 5 and the Z chromosome to form a neo-Z chromosome, with the ancestral pseudoautosomal region (PAR) appearing to become non-recombinant between Z and W, along with most of the fused chromosome 5 (~37.2 Mbp). The added non-recombinant portion of the neo-Z displays reduced heterozygosity and faster population genetic differentiation compared with the ancestral Z. Yet, the new PAR shows elevated diversity and reduced differentiation compared to autosomes, potentially resulting from introgression. In our case, long-read sequencing helped clarify the genomic landscape of population divergence on autosomes and sex chromosomes in a species where prior knowledge of genome structure was still incomplete.

Methods

We generated VCF files for downstream analyses using the GATK pipeline (McKenna et al. 2010) and samtools (Li et al. 2009). We generated estimates of heterozygosity and coverage across scaffolds with samtools. Sliding window population genetic statistics were generated using ANGSD and ngstools (Fumagalli et al. 2013, 2014; Korneliussen et al. 2014). pixy was used to calculate population statistics across windows (Korunes and Samuk 2021). We used SNAPP to generate a coalescent estimate of the population tree using SNPs (Bryant et al. 2012). We estimated migration surfaces with EEMS (Petkova et al. 2016). Satsuma was used to align contigs and scaffolds between species, sexes and different assemblies (Grabherr et al. 2010). We generated statistics for a phylogenetic network using TreeMix (Pickrell and Pritchard 2012).

Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A., & RoyChoudhury, A. (2012). Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis. Molecular Biology and Evolution, 29(8), 1917–1932. https://doi.org/10.1093/molbev/mss086

Fumagalli, M., Vieira, F. G., Korneliussen, T. S., Linderoth, T., Huerta-Sánchez, E., Albrechtsen, A., & Nielsen, R. (2013). Quantifying population genetic differentiation from next-generation sequencing data. Genetics, 195(3), 979–992. https://doi.org/10.1534/genetics.113.154740

Fumagalli, M., Vieira, F. G., Linderoth, T., & Nielsen, R. (2014). ngsTools: Methods for population genetics analyses from next-generation sequencing data. Bioinformatics (Oxford, England), 30(10), 1486–1487. https://doi.org/10.1093/bioinformatics/btu041

Grabherr, M. G., Russell, P., Meyer, M., Mauceli, E., Alföldi, J., Di Palma, F., & Lindblad-Toh, K. (2010). Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics, 26(9), 1145–1151. https://doi.org/10.1093/bioinformatics/btq102

Korneliussen, T. S., Albrechtsen, A., & Nielsen, R. (2014). ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics, 15(1), 356. https://doi.org/10.1186/s12859-014-0356-4

Korunes, K. L., & Samuk, K. (2021). pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data. Molecular Ecology Resources, 21(4), 1359–1368. https://doi.org/10.1111/1755-0998.13326

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., & 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., & DePristo, M. A. (2010). The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9), 1297–1303. https://doi.org/10.1101/gr.107524.110

Petkova, D., Novembre, J., & Stephens, M. (2016). Visualizing spatial population structure with estimated effective migration surfaces. Nature Genetics, 48(1), 94–100. https://doi.org/10.1038/ng.3464

Pickrell, J. K., & Pritchard, J. K. (2012). Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data. PLoS Genetics 8(11), e1002967.

Usage Notes

See README.txt.

Funding

Harvard University

Erasmus Mundus Master Programme in Evolutionary Biology