Denovo assembly of a Japan Sea stickleback (Gasterosteus nipponicus) and a Japanese Pacific Ocean lineage of three-spined stickleback (Gasterosteus aculeatus)
Data files
Jun 19, 2025 version files 2.88 GB
-
Gaculeatus_male_hap1_20240517_dryad.zip
420.17 MB
-
Gaculeatus_male_hap2_20240517_dryad.zip
426.12 MB
-
Gnipponicus_male_hap1_20240517_dryad.zip
420.55 MB
-
Gnipponicus_male_hap2_20240517_dryad.zip
420.33 MB
-
JSv20240517_dryad.zip
605.65 MB
-
POv20240517_dryad.zip
589.60 MB
-
README.md
7.03 KB
Abstract
DNA within the nucleus is organized into a well-regulated three-dimensional (3D) structure. However, it remains largely elusive how such 3D genome structures influence speciation processes. Recent studies have shown that 3D genome structures influence mutation rates, including the occurrence of chromosomal rearrangement. For example, breakpoints of chromosomal rearrangements are often enriched at topologically associating domain (TAD) boundaries. Here, we hypothesized that TAD structures may constrain the location of chromosomal inversions and thereby shape the genomic landscape of divergence between species with ongoing gene flow, because inversions can contribute to barriers to gene flow. To test this hypothesis, we used a Japanese stickleback species pair, Gasterosteus nipponicus (Japan Sea stickleback) and G. aculeatus (three-spined stickleback). We first constructed high-quality genome assemblies of both species using PacBio HiFi and Dovetail Omni-C technologies and identified several chromosomal inversions. Second, population genomic analyses revealed higher genetic differentiation in inverted regions than in colinear regions and no gene flow within inversions, which contrasts with significant gene flow in colinear regions. Third, using Hi-C data, we revealed 3D genome structures of sticklebacks, such as A/B compartments and TAD. Finally, we found that inversion breakpoints were enriched at TAD boundaries. Thus, our study demonstrate that 3D genome constrains breakpoints of inversions that can act as barriers to gene flow in the stickleback. Further integration of 3D genome analyses with population genomics has the potential to provide novel insights into the mechanisms by which 3D genome influences speciation processes.
We assembled genomes of the two Gasterosteus species in Japan; G. nipponicus and G. aculeatus. We used PacBio HiFi and Dovetail Omni-C technologies to make chromosome-scale assemblies.
General information
- Title of Dataset: Genome assemblies of Gasterosteus nipponicus and Gasterosteus aculeatus
- Author Information
- Principal Investigator
Name: Jun Kitano
Institution: National Institute of Genetics
Address: Mishima, Shizuoka, Japan
Email: jkitano@nig.ac.jp - Data curation
Name: Yo Yamasaki
Institution: National Institute of Genetics
Address: Mishima, Shizuoka, Japan
Email: yamasaki@nig.ac.jp
- Date of sample collection
- G. nipponicus: 15/5/2023
- G. aculeatus: 24/5/2022
- Geographic location of data collection
- G. nipponicus: Hamanaka, Hokaido, Japan
- G. aculeatus: Akkeshi, Hokkaido, Japan
- Information about funding sources that supported the collection of the data
- JSPS Kakenhi (22H04983, 20J01503, 21H02542, and 22KK0105)
- JST CREST (JPMJCR20S2)
File extension and its contents
"XXX" in the following descriptions indicates prefix of the file.
- XXX.fa
FASTA file of genome assembly. - XXX.masked.fa
Following regions of XXX.fa were hardmasked: highly repetitive regions on chrXII and chrXXI, puseudo-autosomal regions, neo-Y part of the fused Y chromosome of G. nipponicus. - XXX.hardmasked.fa
Repeat masked XXX.fa by RepeatModeler and RepeatMasker pipeline. Repetitive regions were masked by N. - XXX.softmasked.fa
Repeat masked XXX.fa RepeatModeler and RepeatMasker pipeline. Repetitive regions were masked by small letters. - XXX.gff
Annotation of gene positions in XXX.fa. - XXX.centromere.bed
Annotation of centromere positions in XXX.fa. - XXX.GAP.bed
Positions of assembly gaps indicated by N. - XXX.compartment_KR_100kb.bed
Regions of A/B compartment. Calculation was conducted with Knight-Ruiz normalization and 100kb bin by FAN-C v0.9.1. - POvsJS_inversions.XXcoordinate.masked.bed
Inversions detected by G. aculeatus vs G. nipponicus alignment. One inversion found on highly repetitive region on chrXXI was removed.
Note
- Mean of columns in XXX.compartment_KR_100kb.bed is as follows;
- 1st column: chromosome
- 2nd column: domain start position
- 3rd column: domain end position
- 4th column: assignment of compartment types for the domain (A or B)
- 5th column: the average eigenvector entry values of all bins in the domain
- 6th column: no entry
- XXX.TAD_25kb_Level1.bed
Regions of topologically associated domains (TAD) estimated by SpectralTAD. Only primary TADs are included. - XXX.TAD_25kb_Level2.bed
Regions of topologically associated domains (TAD) estimated by SpectralTAD. TADs which estimated as secondary TAD are included. Some are completely matched with primary TADs.
Note
- Mean of columns in XXX.TAD_25kb_Level1.bed and XXX.TAD_25kb_Level2.bed is as follows;
- 1st column: chromosome
- 2nd column: domain start position
- 3rd column: domain end position
- 4th column: TAD hierarchy. "1" indicate primary TAD, "2" indicate secondary TAD
Description of the data and file structure
There are six groups of datasets. Details of data collection and analysis methods are described in the manuscripts.
1. JSv20240517_dryad
Genome assembly of G. nipponicus. We selected one chromosome-scale haplotype with fewer gaps for each autosome, and chromosome-scale X and Y_IX chromosomes were also selected from following phased assembly sets (i.e. JS_male_hap1.v20240517.fa and JS_male_hap2.v20240517.fa)
Files
- JSv20240517.fa
- JSv20240517.masked.fa
Some highly repetitive regions masked JSv20240517.fa. Masked regions were as follows: first 3,207,000 bp of chrY_IX, 21,72,300-44,829,387 bp region on the chrY_IX, 20,676,000-21,892,999 bp and 22,110,000-29,005,999 bp on chrXII, 580,259-3,826,000 bp region of chrXXI - JSv20240517.hardmasked.fa
- JSv20240517.softmasked.fa
- JSv20240517.gff
- JSv20240517.centromere.bed
- JSv20240517.GAP.bed
- JSv20240517.compartment_KR_100kb.bed
- JSv20240517.TAD_25kb_Level1.bed
- JSv20240517.TAD_25kb_Level2.bed
- POvsJS_inversions.JScoordinate.masked.bed
2. Gnipponicus_male_hap1_20240517_dryad
Haplotype set 1 of G. nipponicus. Included sex chromosome is X (chrXIX).
Files
- JS_male_hap1.v20240517.fa
- JS_male_hap1.v20240517.hardmasked.fa
- JS_male_hap1.v20240517.softmasked.fa
- JS_male_hap1.v20240517.gff
3. Gnipponicus_male_hap2_20240517_dryad
Haplotype set 2 of G. nipponicus. Included sex chromosome is Y (chrY_IX).
Files
- JS_male_hap2.v20240517.fa
- JS_male_hap2.v20240517.hardmasked.fa
- JS_male_hap2.v20240517.softmasked.fa
- JS_male_hap2.v20240517.gff
4. POv20240517_dryad
Genome assembly of G. aculeatus. We selected one chromosome-scale haplotype with fewer gaps for each autosome, and chromosome-scale X and Y chromosomes were also selected from following phased assembly sets (i.e. PO_male_hap1.v20240517.fa and PO_male_hap2.v20240517.fa).
Files
- POv20240517.fa
- POv20240517.masked.fa
Some highly repetitive regions masked POv20240517.fa. Masked regions were as follows: the first 3,236,000 bp of chrY, 21,026,999-23,662,723 bp on chrXII, 4,361,000-8,588,000 bp region of chrXXI - POv20240517.hardmasked.fa
- POv20240517.softmasked.fa
- POv20240517.gff
- POv20240517.centromere.bed
- POv20240517.GAP.bed
- POv20240517.compartment_KR_100kb.bed
- POv20240517.TAD_25kb_Level1.bed
- POv20240517.TAD_25kb_Level2.bed
- POvsJS_inversions.POcoordinate.masked.txt
5. Gaculeatus_male_hap1_20240517_dryad
Haplotype set 1 of G. aculeatus. Included sex chromosome is Y (chrY).
Files
- PO_male_hap1.v20240517.fa
- PO_male_hap1.v20240517.hardmasked.fa
- PO_male_hap1.v20240517.softmasked.fa
- PO_male_hap1.v20240517.gff
6. Gaculeatus_male_hap2_20240517_dryad
Haplotype set 2 of G. aculeatus. Included sex chromosome is X (chrXIX).
Files
- PO_male_hap2.v20240517.fa
- PO_male_hap2.v20240517.hardmasked.fa
- PO_male_hap2.v20240517.softmasked.fa
- PO_male_hap2.v20240517.gff
Sharing/Access information
- Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0 1.0) Public Domain
- Links to publications that cite or use the data:
Yo Y. Yamasaki, Atsushi Toyoda, Mitsutaka Kadota, Shigehiro Kuraku, Jun Kitano
3D genome constrains breakpoints of inversions that can act as barriers to gene flow in the stickleback
Links to other publicly accessible locations of the data:
- All raw sequence data are available from DDBJ: G. nipponicus HiFi and Omni-C data, PRJDB19945; G. aculeatus HiFi and Omni-C data, PRJDB19949; iconHi-C for G. nipponicus and G. aculeatus, PRJDB19958.
