Skip to main content
Dryad

Protein coding sequences (CDS) of the genome of a Mus musculus

Data files

May 01, 2026 version files 874.55 KB

Click names to download individual files

Abstract

This dataset contains 22,759 non-redundant protein-coding sequences (CDS) from the Mus musculus C57BL/6 reference genome (GenBank accession GCF_000001635.27_GRCm39). Each CDS is annotated with a gene symbol, GenBank reference ID, and sequence length in base pairs. This curated CDS reference file enables consistent cross-species comparisons at the transcript level and was used to identify orthologous expression patterns and differential regulation in response to viral infection. The values in this dataset are gene-level annotations, GenBank protein references, and sequence sizes. The file is structured as a single spreadsheet, with each row representing a unique gene and its corresponding CDS. The dataset is reusable for any study requiring canonical CDS references from M. musculus C57BL/6, particularly in contexts of comparative transcriptomics.