Data from: Repositioning of centromere-associated repeats during karyotype evolution in Oryzias fishes
Data files
Nov 22, 2023 version files 198.93 MB
-
OryCel_1.2.fasta.gz
198.93 MB
-
README.md
383 B
Abstract
The karyotype, which is the number and shape of chromosomes, is a fundamental characteristic of all eukaryotes. Karyotypic changes play an important role in many aspects of evolutionary processes, including speciation. In organisms with monocentric chromosomes, it was previously thought that chromosome number changes were mainly caused by centric fusions and fissions, whereas chromosome shape changes, that is changes in arm numbers, were mainly due to pericentric inversions. However, recent genomic and cytogenetic studies have revealed examples of alternative cases, such as tandem fusions and centromere repositioning, found in the karyotypic changes within and between species. Here, we employed comparative genomic approaches to investigate whether centromere repositioning occurred during karyotype evolution in medaka fishes. In the medaka family (Adrianichthyidae), the three phylogenetic groups differed substantially in their karyotypes. The Oryzias latipes species group has larger numbers of chromosome arms than the other groups, with most chromosomes being metacentric. The O. javanicus species group has similar numbers of chromosomes to the O. latipes species group, but smaller arm numbers, with most chromosomes being acrocentric. The O. celebensis species group has fewer chromosomes than the other two groups and several large metacentric chromosomes that were likely formed by chromosomal fusions. By comparing the genome assemblies of O. latipes, O. javanicus, and O. celebensis, we found that repositioning of centromere-associated repeats might be more common than simple pericentric inversion. Our results demonstrated that centromere repositioning may play a more important role in karyotype evolution than previously appreciated.
README: Repositioning of centromere-associated repeats during karyotype evolution in Oryzias fishes
https://doi.org/10.5061/dryad.280gb5mwf
Authors: Satoshi Ansai, Atsushi Toyoda, Kohta Yoshida, Jun Kitano
Description of the data and file structure
OryCel_1.2.fasta.gz
A fasta file containing new reference genome sequences of Oryzias celebensis assembled in this study.
Methods
De novo assembly of Oryzias celebensis genome
For reference genome assembly of O. celebensis with improved continuities, PacBio continuous long reads (CLRs) (PacBio RSII with P6/C4v2 chemistry; 7.5 million reads; 88.1 Gb; ~110x coverage; PacBio, Menlo Park, CA, USA) (DRX230016) acquired in our previous study (Ansai et al., 2021) were assembled using NextDeNovo 2.5.2 (Hu et al., 2023). The assembled contig sequences were then polished with the PacBio reads used for the assembly and the Illumina short reads (Illumina HiSeq 2500; 2 x 250 bp; 296 millions reads; 148Gb; ~185x coverage; San Diego, CA, USA) (DRX230015) using NextPolish 1.4.1 (Hu et al., 2020).
To scaffold the contigs into chromosome-level assemblies, a linkage map was constructed using an F2 family obtained by crossing an O. celebensis female with a male of a closely-related species O. woworae (2n = 42; (Myosho et al., 2018)) generated in a previous study (Ansai et al., 2021). Genotypes of the F2 family, consisting of two grandparents and 164 F2 fish, were determined using double digest restriction site-associated DNA sequencing (ddRAD-seq) (DRA010679). The ddRAD-seq reads were trimmed using Trim Galore 0.4.3 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) with cutadapt 1.12 (Martin, 2011) and mapped to the assembled contigs using BWA-backtrack 0.7.17 (Li & Durbin, 2009). The polymorphisms in the RAD tags were called using Stacks Version 2.64 (Rochette et al., 2019). A linkage map was constructed using 3,757 informative markers in lep-map3 (Rastas, 2017). After calling the parental genotypes using the ParentCall2 module, markers with high segregation distortion (dataTolerance=0.001) and an excess number of missing genotypes (>10% of individuals; missingLimit=0.1) were removed using the Filtering2 module. The 1,756 markers were assigned to 18 linkage groups (LGs) by grouping the markers with pairwise LOD score higher than 10 (lodLimit=10) using the SeparateChromosomes2 module, resulting in the total number of LGs equal to the haploid chromosome number of O. celebensis (n = 18). The order of the markers within each LG was determined by maximizing the likelihood of 50 iterations for each LG (numMergeIterations=50) using the OrderMarkers2 module.
Based on the linkage map, three contigs in which four or more markers were aligned to two different linkage groups were split into two different contigs. Their breakpoints were determined based on a cross-species synteny map between O. celebensis and O. latipes by aligning the cDNA sequences of O. latipes (Ensembl Release 104; http://www.ensembl.org/) to the reassembled contigs of O. celebensis using minimap2-2.24 (r1122) (Li, 2018). Finally, chromosomal sequences were reconstructed using the positional information of 1,755 markers of the genetic map with ALLMAPS in JCVI utility libraries v1.2.7 (Tang et al., 2015). The number of LG was determined based on the syntenic chromosomes of O. latipes. The names of the fused chromosomes correspond to all syntenic chromosomes (e.g., LG1_19_17 is syntenic to LG1, LG19, and LG17 of the O. latipes chromosomes).