Chromosome assembly and preliminary gene and repeat annotations for Myzomela tristrami reference genome
Data files
Jul 27, 2024 version files 1.28 GB
-
Myzomela_tristrami_ReferenceGenome.zip
1.28 GB
-
README.md
4.01 KB
Abstract
Secondary contact between closely related taxa represents a “moment of truth” for speciation — an opportunity to test the efficacy of reproductive isolation that evolved in allopatry and to identify the genetic, behavioral, and/or ecological barriers that separate species in sympatry. Sex chromosomes are known to rapidly accumulate differences between species, an effect that may be exacerbated for neo-sex chromosomes that are transitioning from autosomal to sex-specific inheritance. Here we report that, in the Solomon Islands, two closely related bird species in the honeyeater family — Myzomela cardinalis and Myzomela tristrami — carry neo-sex chromosomes and have come into recent secondary contact after ~1.1 my of geographic isolation. Hybrids of the two species were first observed in sympatry ~100 years ago. To determine the genetic consequences of hybridization, we use population genomic analyses of individuals sampled in allopatry and in sympatry to characterize gene flow in the contact zone. Using genome-wide estimates of diversity, differentiation, and divergence, we find that the degree and direction of introgression varies dramatically across the genome. For sympatric birds, autosomal introgression is bidirectional, with phenotypic hybrids and phenotypic parentals of both species showing admixed ancestry. In other regions of the genome, however, the story is different. While introgression on the Z/neo-Z-linked sequence is limited, introgression of W/neo-W regions and mitochondrial sequence (mtDNA) is highly asymmetric, moving only from the invading M. cardinalis to the resident M. tristrami. The recent hybridization between these species has thus enabled gene flow in some genomic regions but the interaction of admixture, asymmetric mate choice, and/or natural selection has led to the variation in the amount and direction of gene flow at sex-linked regions of the genome.
I. Files
(GENOME)
Mt_v1.0_MAIN.fa.gz
Primary genome, (largely) scaffolded to chromosome-level, plus other primary assembled contigs
Mt_v1.0_MAIN.gff.gz
Simple gene annotations for primary genome, annotated using GeMoMa v1.8 and a zebra finch
(bTaeGut1.4.pri) annotation reference
Mt_v1.0_extra.fa.gz
Additional contigs, not for use in most analyses but some may be of interest
This set is a combination of hand-identified haplotigs of the main genome, and assembler-identified
“alternate” (haplotig) contigs
(ORIGINAL_ASSEMBLY_CONTIGS)
Mt_hifi.asm.p.fa.gz
“primary” assembly contigs, output from hifiasm (v0.13-r308)
Mt_hifi.asm.a.fa.gz
“alternate” assembly contigs, output from hifiasm (v0.13-r308)
(REPEAT_MASKING)
TElib_Myzo_preliminary.fa.gz
Preliminary Myzomela-tuned TE/repeat library, generated using RepeatModeler (v.2)
Mt_v1.0_MAIN_RM_sites_to_filter.txt
List of sites masked by RepeatMasker (v4.1.0), “MAIN” genome file, using Myzomela-specific library
Mt_v1.0_extra_RM_sites_to_filter.txt
Same for the “extra” contigs
(SCAFFOLDING_ANNOTATIONS)
Mt_hifi_zf_reference_gene_table.tabular
Tabular list of genes annotated in the M. tristrami hifiasm output, using zebra finch (bTaeGut1.4.pri)
annotations and GeMoMa (v1.8) software
Mt_hifi_ch_reference_gene_table.tabular
Tabular list of genes annotated in the M. tristrami hifiasm output, using chicken (bGalGal1.mat.broiler.GRCg7b)
annotations and GeMoMa (v1.8) software
zf_zf_reference_gene_table.tabular
Tabular list of genes from GeMoMa (v1.8) reannotation of zebra finch (bTaeGut1.4.pri), using its own
annotations as reference
ch_ch_reference_gene_table.tabular
Tabular list of genes from GeMoMa (v1.8) reannotation of chicken (bGalGal1.mat.broiler.GRCg7b, using its own
annotations as reference
Mt_v1.0_MAIN_key.txt
Key relating contigs to their Mt_v1.0_MAIN.fa scaffold/chromosome
Brief notes about the scaffolded genome (Mt_v1.0_MAIN.fa):
-- While the bulk of the primary genome has been scaffolded into chromosomes, the sex chromosomes have only been partially
scaffolded. In particular, the four large contigs that comprise W-linked sequence have been left un-joined. These
W-linked contigs are nevertheless thought to be physically contiguous, as are the “Z1” and “Z2” portions of chrZ
(along with their copy of the diploid “neoPAR” sequence that, in this assembly, is linked to a W contig).
-- We identified and removed haplotigs from the primary genome using hand-curation rather than automated pipelines. This
means that this genome will retain genuinely repetitive regions that might be of interest (satellite arrays,
etc.), but we also expect that there will be at least some erroneous duplications and misassemblies.
-- chr30-40 should be regarded as highly tentative. Standards for naming/inclusion in this genome version:
1. greater than 1Mb in length
2. at least 10 potential gene annotations
3. clear haplotig sequence, roughly syntenic barring a handful of rearrangements (ie, reasonably well-behaved diploid)
4. sorted/named by length.
We were conservative about joining contigs (doing so only when supported by an alternate haplotig) to form these
chromosomes, and it is likely that at least some of them are actually fragments of larger chromosomes. We
included them as named chromosomes here but stress that the arrangements and the specific names used here are highly
likely to change in future genome versions as more data become available.
-- A large number of un-scaffolded contigs from the hifiasm primary assembly are retained in the “MAIN” genome. Many of
these sequences seem to derive from non-specific repetitive sequence (telomeres and telomere-adjacent sequence,
for example), although there are some with gene annotations. Where possible, haplotigs of these contigs have been
moved to the “extra” file, but there are undoubtedly many remaining duplications in the retained contigs.
This data repository contains Myzomela tristrami reference genome files. The sequences associated with this assembly are available on NCBI sequence read archive at https://www.ncbi.nlm.nih.gov/sra/?term=SRA%20SRR29254783. We sequenced a M. tristrami female at the University of Delaware DNA sequencing & Genotyping Cener. HiFi libraries were prepared with SMRTbell prep kit, followed by Blue Pippin size selection (15-20Kbp) before sequencing on a PacBio Sequel IIe. We generated a de novo assembly using hifiasm v0.13-r308 with default parameters using the resulting long reads (Cheng et al. 2021, 2022). We used GeMoMa (v1.8) and the annotation from zebra finch genome bTaeGut1.4.pri to infer a rough annotation of genes in the Myzomela genome. We then used these rough annotations, comparing contigs against both zebra finch and the chicken genome bGalGal1.mat.broiler.GRCg7b to infer synteny relationships, remove duplicate haplotigs, and, finally, scaffold contigs into chromosomes in Myzomela. The resulting assembly uses the zebra finch numbering system for chromosomes 1-29; chromosome 30-40 were named in descending order of size. Final chromosomes and contigs were aligned with those of related species— helmeted honeyeater (Lichenostomus melanops cassidix), and blue-faced honeyeater (Entomyzon cyanotis)— using Mauve (version 2015-02-25), and visualized using FastANI (v1.33) (Darling et al. 2004, Jain et al. 2018, Robledo-Ruiz et al. 2022, Burley et al. 2023). We generated repetitive DNA libraries using the RepeatModeler v2 pipeline (Flynn et al. 2020). RepeatModeler employs a combination of de novo and homology-based characterization of different classes of repeats. The repeat library was annotated and combined with Repbase, and manually curated repeat libraries from other studies (Suh et al. 2018, Boman et al. 2019, Weissensteiner et al. 2020, Peona et al. 2021). We then used RepeatMasker ( v4.1.0) to identify and mask repetitive regions of the genome (Smit et al. 2013).
Boman, J., C. Frankl-Vilches, M. D. S. dos Santos, E. H. C. de Oliveira, M. Gahr, and A. Suh. 2019. The genome of Blue-capped Cordon-bleu uncovers hidden diversity of LTR retrotransposons in Zebra Finch. Genes 10.
Burley, J. T., S. C. M. Orzechowski, S. Y. W. Sin, and S. V. Edwards. 2023. Whole-genome phylogeography of the blue-faced honeyeater (Entomyzon cyanotis) and discovery and characterization of a neo-Z chromosome. Molecular Ecology 32:1248–1270.
Cheng, H., G. T. Concepcion, X. Feng, H. Zhang, and H. Li. 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18:170–175.
Cheng, H., E. D. Jarvis, O. Fedrigo, K. P. Koepfli, L. Urban, N. J. Gemmell, and H. Li. 2022. Haplotype-resolved assembly of diploid genomes without parental data. Nature Biotechnology 40:1332–1335.
Darling, A. C. E., B. Mau, F. R. Blattner, and N. T. Perna. 2004. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Research 14:1394–1403.
Flynn, J. M., R. Hubley, C. Goubert, J. Rosen, A. G. Clark, C. Feschotte, and A. F. Smit. 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America 117:9451–9457.
Jain, C., L. M. Rodriguez-R, A. M. Phillippy, K. T. Konstantinidis, and S. Aluru. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications 9:5114.
Peona, V., O. M. Palacios-Gimenez, J. Blommaert, J. Liu, T. Haryoko, K. A. Jønsson, M. Irestedt, Q. Zhou, P. Jern, and A. Suh. 2021. The avian W chromosome is a refugium for endogenous retroviruses with likely effects on female-biased mutational load and genetic incompatibilities. Philosophical Transactions of the Royal Society B: Biological Sciences 376.
Robledo-Ruiz, D. A., H. M. Gan, P. Kaur, O. Dudchenko, D. Weisz, R. Khan, E. Lieberman Aiden, E. Osipova, M. Hiller, H. E. Morales, M. J. L. Magrath, R. H. Clarke, P. Sunnucks, and A. Pavlova. 2022. Chromosome-length genome assembly and linkage map of a critically endangered Australian bird: the helmeted honeyeater. GigaScience 11:giac025.
Smit, A., R. Hubley, and P. Green. 2013, 2015. RepeatMasker Open-4.0.
Suh, A., L. Smeds, and H. Ellegren. 2018. Abundant recent activity of retrovirus-like retrotransposons within and among flycatcher species implies a rich source of structural variation in songbird genomes. Molecular Ecology 27:99–111.
Weissensteiner, M. H., I. Bunikis, A. Catalán, K. J. Francoijs, U. Knief, W. Heim, V. Peona, S. D. Pophaly, F. J. Sedlazeck, A. Suh, V. M. Warmuth, and J. B. W. Wolf. 2020. Discovery and population genomics of structural variation in a songbird genus. Nature Communications 11:1–11.