Data from: Whole-genome phylogeography of the Blue-faced honeyeater (Entomyzon cyanotis) and discovery and characterization of a neo-Z chromosome
Data files
Jun 26, 2022 version files 5.54 GB
-
Entomyzon_data_sets.zip
5.54 GB
-
README.txt
19.78 KB
Abstract
Whole-genome surveys of genetic diversity and geographic variation often yield unexpected discoveries of novel structural variation, which long-read DNA sequencing can help clarify. Here we report on whole-genome phylogeography of a bird exhibiting classic vicariant geographies across Australia and New Guinea, the Blue-faced honeyeater (Entomyzon cyanotis), and the discovery and characterization of a novel neo-Z chromosome by long-read sequencing. Using short-read genome-wide SNPs, we inferred population divergence events within E. cyanotis across the Carpentarian and other biogeographic barriers during the Pleistocene (~0.3 – 1.7 MYA). Evidence for introgression between non-sister populations supports a hypothesis of reticulate evolution around a triad of dynamic barriers around Pleistocene Lake Carpentaria between Australia and New Guinea. During this phylogeographic survey, we discovered a large (134 Mbp) neo-Z chromosome and explore its diversity, divergence and introgression landscape. We show that, as in some Sylvioid passerine birds, a fusion occurred between chromosome 5 and the Z chromosome to form a neo-Z chromosome, with the ancestral pseudoautosomal region (PAR) appearing to become non-recombinant between Z and W, along with most of the fused chromosome 5 (~37.2 Mbp). The added non-recombinant portion of the neo-Z displays reduced heterozygosity and faster population genetic differentiation compared with the ancestral Z. Yet, the new PAR shows elevated diversity and reduced differentiation compared to autosomes, potentially resulting from introgression. In our case, long-read sequencing helped clarify the genomic landscape of population divergence on autosomes and sex chromosomes in a species where prior knowledge of genome structure was still incomplete.
We generated VCF files for downstream analyses using the GATK pipeline (McKenna et al. 2010) and samtools (Li et al. 2009). We generated estimates of heterozygosity and coverage across scaffolds with samtools. Sliding window population genetic statistics were generated using ANGSD and ngstools (Fumagalli et al. 2013, 2014; Korneliussen et al. 2014). pixy was used to calculate population statistics across windows (Korunes and Samuk 2021). We used SNAPP to generate a coalescent estimate of the population tree using SNPs (Bryant et al. 2012). We estimated migration surfaces with EEMS (Petkova et al. 2016). Satsuma was used to align contigs and scaffolds between species, sexes and different assemblies (Grabherr et al. 2010). We generated statistics for a phylogenetic network using TreeMix (Pickrell and Pritchard 2012).
Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A., & RoyChoudhury, A. (2012). Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis. Molecular Biology and Evolution, 29(8), 1917–1932. https://doi.org/10.1093/molbev/mss086
Fumagalli, M., Vieira, F. G., Korneliussen, T. S., Linderoth, T., Huerta-Sánchez, E., Albrechtsen, A., & Nielsen, R. (2013). Quantifying population genetic differentiation from next-generation sequencing data. Genetics, 195(3), 979–992. https://doi.org/10.1534/genetics.113.154740
Fumagalli, M., Vieira, F. G., Linderoth, T., & Nielsen, R. (2014). ngsTools: Methods for population genetics analyses from next-generation sequencing data. Bioinformatics (Oxford, England), 30(10), 1486–1487. https://doi.org/10.1093/bioinformatics/btu041
Grabherr, M. G., Russell, P., Meyer, M., Mauceli, E., Alföldi, J., Di Palma, F., & Lindblad-Toh, K. (2010). Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics, 26(9), 1145–1151. https://doi.org/10.1093/bioinformatics/btq102
Korneliussen, T. S., Albrechtsen, A., & Nielsen, R. (2014). ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics, 15(1), 356. https://doi.org/10.1186/s12859-014-0356-4
Korunes, K. L., & Samuk, K. (2021). pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data. Molecular Ecology Resources, 21(4), 1359–1368. https://doi.org/10.1111/1755-0998.13326
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., & 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., & DePristo, M. A. (2010). The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9), 1297–1303. https://doi.org/10.1101/gr.107524.110
Petkova, D., Novembre, J., & Stephens, M. (2016). Visualizing spatial population structure with estimated effective migration surfaces. Nature Genetics, 48(1), 94–100. https://doi.org/10.1038/ng.3464
Pickrell, J. K., & Pritchard, J. K. (2012). Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data. PLoS Genetics 8(11), e1002967.
See README.txt.