Genome sequencing/assembly and genetic diversity of Heterocephalus glaber
Data files
Aug 06, 2025 version files 18.98 GB
-
Damara_mole_rat_genome.tgz
695.56 MB
-
Guinea_pig_genome.tgz
792.29 MB
-
Naked_mole_rat_genome.tgz
770.87 MB
-
README.md
4.28 KB
-
SuppDataFile1.json
111.70 KB
-
SuppDataFile2.Depth.tgz
1.45 GB
-
SuppDataFile2.Geno.tgz
341.36 MB
-
SuppDataFile3.Maf.tgz
13.49 GB
-
SuppDataFile3.Sup.tgz
1.35 GB
-
SuppDataFile4.tgz
84.25 MB
-
SuppTables1-8.xlsx
71.18 KB
-
SuppTextFile1.txt
1.21 KB
Abstract
Naked mole-rats (Heterocephalus glaber) are a species of rodent endemic to the horn of Africa, notable among mammals for their long lifespans, resistance to a variety of stresses, and eusocial mating behavior. Though their natural species range extends across large portions of Kenya, Ethiopia, Somalia, and Djibouti, the vast majority of genetic and genomic analyses focus on Kenyan specimens. Here, we constructed a chromosome-scale reference genome assembly for H.glaber, then leveraged it, along with modern whole-genome sequencing, to characterize the genetic diversity of specimens deriving from Kenya, southern Ethiopia, and eastern Ethiopia. We found the Kenyan and southern Ethiopian specimens to be closely related to each other and highly diverged from eastern Ethiopian specimens. We also found specimens collected from nearby locations in southern Ethiopia to be more closely related to Kenyan specimens than to each other, emphasizing the importance of local migration barriers to gene flow in wild H.glaber populations.
Dataset DOI: 10.5061/dryad.m37pvmdf4
Description of the data and file structure
This study presents an assembly of the genome for H.glaber, along with gene annotations and alignment to other mammalian genomes. It also provides polymorphism data from animals collected from multiple geographic regions.
Files and variables
File: SuppTables1-8.xlsx
Description: An Excel file with all Supplemental Tables:
Supplemental Table 1: Genome assembly statistics across assembly steps. This table provides a detailed breakdown of assembly metrics at each stage of the genome assembly pipeline.
Supplemental Table 2: Groups of super-scaffolds, compiled using chromosome-sorted sequencing (as depicted in Figure 1A). This table lists the organization of scaffolds into chromosome-scale groupings, based on chromosome sorting and sequencing.
Supplemental Table 3: Statistics for the 10-species whole genome alignment. Supplemental Figure 1 depicts the phylogenetic tree, which is provided in Newick format in Supplemental Text File 1.
Supplemental Table 4: Metadata for genome-sequenced animals. See preprint for details. includes species, sample identifiers,
Supplemental Table 5: For genome-sequenced animals, information about tissue/cell sample type, coverage statistics, and polymorphism counts. Each animal's SRA experiment ID is also listed.
Supplemental Table 6: Contains exact pairwise kinship values used to generate the kinship matrix displayed in Figure 2B. Useful for replicating or extending kinship analysis.
Supplemental Table 7: Provides the raw intersection counts between alleles that were used to generate the Venn diagrams in Figure 3, enabling precise comparison of overlapping features.
Supplemental Table 8: This table maps each RNA-seq slice (based on size selection) to its corresponding experiment ID in the SRA database. Further methodological details are available in the preprint.
File: SuppTextFile1.txt
Description: Phylogenetic trees in Newick text, for genome-aligned species and H.glaber individuals from Kenya and Ethiopia.
File: SuppDataFile1.json
Description: A JSON-formatted pandas data frame of the enrichment scores, from flow-sorted chromosome sequencing. These scores estimate the likelihood of each pair of scaffolds deriving from the same chromosome.
File: SuppDataFile2.Geno.tgz
Description: Polymorphisms discovered through whole-genome sequencing. This file unzips into 63 *.geno_score.csv files, organized into files by super-scaffold. This file pairs with coverage data provided in SuppDataFile2.Depth.tgz.
File: SuppDataFile2.Depth.tgz
Description: Per-animal, per-polymorphism information on the sequencing coverage used to call genotypes. This file unzips into 63 *.depth.csv files, organized like SuppDataFile2.Geno.tgz.
File: SuppDataFile3.Maf.tgz
Description: Whole-genome alignments, generated by Cactus, of the genome assemblies produced here (H.glaber, F.damarensis, and C.porcellus) plus relevant mammals' published genome assemblies (see preprint for details). This Gnu-zipped tar file includes the .maf-format whole-genome alignment. Support files can be found in SuppDataFile3.Sup.tgz.
File: SuppDataFile3.Sup.tgz
Description: Support files for the whole-genome alignments provided in SuppDataFile3.Maf.tgz (multiple files packaged into this GNU-zipped tar file).
File: SuppDataFile4.tgz
Description: Gene annotation files, in. .gff3 and .gp formats, bundled into a GNU-zipped tar file.
Genome assemblies: Naked_mole_rat_genome.tgz, Damara_mole_rat_genome.tgz, Guinea_pig_genome.tgz
Gnu-zipped tar files: each contains a fasta-format text file with the genome assembly. For the Damaraland mole-rat and guinea pig genomes, contig names had to be changed from the output of the assembler in order to make them NCBI-compliant. Original names are used in the multiple sequence alignments, so a table is also provided mapping updated names to original names (note that original names are also provided as extra information on the sequence header lines of the fasta files).
