Skip to main content

A rodent anchored hybrid enrichment probe set for a range of phylogenetic utility – from order to species

Cite this dataset

Bangs, Max; Steppan, Scott (2021). A rodent anchored hybrid enrichment probe set for a range of phylogenetic utility – from order to species [Dataset]. Dryad.


Rodents are the largest order of mammals and contain several model organisms important to scientific research in a variety of fields, yet no large set of genomic markers have been designed for this group to date, hindering evolutionary studies into relationships of the group as a whole. Here we present a genomic probe set designed and optimized for rodents with a protocol easy to replicate with little laboratory investment. This design utilizes an anchored hybrid enrichment approach specifically targeting rodents to generate longer loci with a higher mutation rate than existing vertebrate probes to provide utility at various taxonomic levels. Using a test set of rodents from all five suborders we successfully obtained alignments for 416 of the 418 target loci with an average of 1,379 base pairs per locus and a total alignment of more than half a million base pairs. This genomic dataset performed well in all phylogenetic analyses, especially in recent phylogenetic splits, with ample parsimoniously-informative sites within genera and even within species, showing more than four times as many single nucleotide polymorphisms per locus than a recent vertebrate ultra-conserved elements study. Additional support is provided in resolving basal clades in Rodentia. By providing this probe design, we hope that more labs can easily generate data for answering questions in rodents from species delimitation to understanding relationships among families in rapid radiations.


Probes designed using MyBaits (Chafin et al. 2018) with the mouse-60-way alignment from UCSC along with a subset of probes from Lemmon et al. 2012. Each traget site ranged from 240-400bps and was split into 21 tiled probes for five species (one per suborder of rodent). For more details on the methods see linked manuscript from Molecular Ecology Resources.

Usage notes

Dataset includes:

1) Three .tre files; Astral LPP, Astral polytomy test, and IQ-tree ultra-fast bootstrap.

2) List of probes in .csv format a total of 46,893 probes designed for 446 loci and totally 5.6Mbps. This file can be used to directly order the rodent418probe set from Agilent of similar company. The first 25,116 probes are new rodent specific probes and are labeled as follows:

species, Name of locus, probe number

Example: mR001p1 = mouse locus 1 probe 1
The species key is as follows;
m = mouse, Mus musculus, genome version mm10
k = Ord's kangaroo rat, Dipodomys ordii, genome version dipOrd1
g = guinea pig, Cavia porcellus, genome version cavPor3
s = 13-lined ground squirrel, Ictidomys tridecemlineatus , genome version speTri2
c = Chinese hamster, Crietulus griseus , genome version criGri1
gr = Gerbilus robustus (from the sequences you sent me)
sm = Sigmodon mascotensis (from the sequences you sent me)
n = Nesomys rufus (from the sequences you sent me)
This covers 239 new rodent probes (labeled R001 - R239) as well as a set for RAG1 and IRBP (labeled as such). This file also includes the vertebrate probes (n=21,777) from the 205 loci that passed the filtering. These are the same probes from Lemmon et al. 2012 and are labeled the same names as in the paper had (frist and L then the locus number then a species abbreviation then probe number, i.e. L5D1 is locus 5 designed from Danio rerio probe 1).
3) Full sequecnes for all new rodent targets in .csv format.
4) Reference sequences pulled from the mm10 genome in .fasta format used for retrival of targets after sequencing.


National Science Foundation, Award: DEB-1754748