Data from: The origin of a new chromosome in gerbils
Data files
May 16, 2023 version files 469.73 MB
-
Meriones_chromonome_v6.1_RepetitiveElementsAnnotated.gff
366.07 MB
-
Meriones_chromonome_v6.1.geneticmap.csv
114.45 KB
-
Meriones_chromonome_v6.1.sorted.gff
53.99 MB
-
populations.snps.vcf
49.55 MB
-
README_Dryad_upload.txt
596 B
Abstract
Gerbil genomes have both an extensive set of GC-rich genes and chromosomes strikingly enriched for constitutive heterochromatin. We sought to determine if there was a link between these two phenomena and found that the two heterochromatic chromosomes of the Mongolian gerbil (Meriones unguiculatus) have distinct underpinnings: chromosome 5 has a large block of intra-arm heterochromatin as the result of a massive expansion of centromeric repeats (probably due to centromeric drive); while chromosome 13 is comprised of extremely large (>150kb) repeated sequences. We suggest that chromosome 13 originated when a functionally important ‘seed’ broke off from another chromosome and underwent multiple breakage-fusion-bridge cycles. Genes with the most extreme GC skew are encoded on this chromosome, most likely due to the restriction of recombination to a narrow permissive region (since GC bias is linked with recombination-associated processes). Our results demonstrate the importance of including karyotypic features such as chromosome number and the locations of centromeres in the interpretation of genome sequence data, and highlight novel patterns involved in the evolution of chromosomes.
Here are a gff (gene annotation) file, a vcf (list of single nucleotide variants) for genetic map markers in a F2 mapping panel, a second gff containing repetitive elements, and a genetic map. Also included is the codebase for the project (i.e. Supplemental Material 3)
All of these files correspond with the chromosome-scale Meriones unguiculatus genome that we published which is available at NCBI (bioproject PRJNA397533). Methods are described in the manuscript "The origin of a new chromosome in gerbils".
The gene annotation was done with the maker pipeline and RNAseq data from kidney and testis. The VCF was based on the sequence data from the manuscript (Brekke et al 2019) and contains genotype data for the entire mapping panel at the genetic markers. The physical and genetic positions of the markers are given in the genetic map. The repetitive element annotation file was created with the EarlGrey pipeline.
Brekke, T. D., S. Supriya, M. G. Denver, A. Thom, K. A. Steele, and J. F. Mulley. 2019. A high-density genetic map and molecular sex-typing assay for gerbils. Mamm Genome 30:63–70.
These are all flat-text files in standard gff3 and vcf format.
The code is in bash, R, and Python3 and Processing.