Data from: De novo reference genome of a Geomyid rodent, Botta’s pocket gopher (Thomomys bottae)
Data files
Oct 16, 2024 version files 3.42 MB
-
mThoBot1_denovo_repeat_library.fa
3.42 MB
-
README.md
2.02 KB
Abstract
A well-known subterranean rodent of the North American west, Botta’s pocket gopher (Thomomys bottae) is both a garden pest and an important soil engineer. It is also a fascinating example of intraspecific variation, with considerable phenotypic diversity across its range and unusual levels of variation in chromosome number and composition. Here, we present a high-quality reference genome from a male T. b. bottae captured in the San Francisco Bay Area, representing one of the first two genomes assembled for the rodent family Geomyidae. The assembly, comprised of 2,792 scaffolds, with a scaffold N50 value of 23.6 Mb and a BUSCO score of 91.0%, fills a significant taxonomic sampling gap in rodent genome resources. With this new reference genome, we envision new opportunities to investigate questions regarding the genomics of adaptation to the belowground niche space and the impact of associated life history traits, such as limited dispersal and low population connectivity, on intraspecific genetic and phenotypic variation, genome evolution, speciation, and phylogenetic relationships across the Geomyoidea.
https://doi.org/10.5061/dryad.wh70rxwwp
Description of the data and file structure
This dataset includes two files:
- mThoBot1_denovo_repeat_library.fa: Fasta file containing 1,944 de novo identified repetitive elements from the Thomomys bottae genome (NCBI:GCA_024803745.1).
- (Zenodo) mThoBot_de_novo_repeat_annotation.sh: Script with basic commands and flags used to generate the fasta file.
Sharing/Access information
Please contact Erin Voss (erinvoss@berkeley.edu) if there are any questions or issues about the data shared here.
This data corresponds with the Thomomys bottae bottae reference genome mThoBot1.0.p, which is publicly available on NCBI:
https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_024803745.1/
Code/Software
The related script (in Zenodo) mThoBot_de_novo_repeat_annotation.sh includes packages and flags that were used to generate the de novo repetitive element library. Briefly, we used the following packages to identify transposable and repetitive elements in the Thomomys bottae genome:
RepeatModeler (v.2.0.3)
GenomeTools LTR harvest (v.1.6.5) https://github.com/genometools/genometools
LTR Retriever (v.2.9.8) https://github.com/oushujun/LTR_retriever
DeepTE (Commit babd65e950) https://github.com/LiLabAtVT/DeepTE
Downstream analyses:
This de novo repeat library was combined with known rodent and ancestral repetitive elements from the Dfam database (v.3.3) to mask the genome using RepeatMasker2 (v.4.1.2-p1). Overall masking and repetitive genome content are reported in Voss et al. 2024 (Journal of Heredity).
This dataset was generated in as a preliminary annotation step for the Thomomys bottae genome assembly available at NCBI BioProject ID PRJNA851166. We annotated repetitive elements in the genome using a combination of known repetitive elements and de novo identification of transposable element motifs in the genome. We identified these repetitive elements using RepeatModeler v.2.0.3 (Flynn et al. 2020) with GenomeTools LTR harvest (v.1.6.5), LTR Retriever v.2.9.8 (Ou and Jiang 2018) and DeepTE (version: Commit babd65e950, Yan et al. 2020) to perform de novo identification on the mThoBot1.0 genome.
Here, we make the the de novo annotated repetitive element library available in fasta format. Code used to perform annotation is also included.