A new perspective on the taxonomy and systematics of Arvicolinae (Gray, 1821) and a new time-calibrated phylogeny for the clade
Data files
Dec 08, 2023 version files 5.93 MB
-
Appendix_A_FINAL.xlsx
46.49 KB
-
Calibrated_Bayesblock.txt
880.28 KB
-
Combined_Bayesblock.txt
879 KB
-
Combined_best_scheme.txt
6.90 KB
-
combined_dates.txt
3.28 KB
-
combined_newick.nwk
9.10 KB
-
Combined_RAxML_inputfile.txt
877.24 KB
-
Combined.nex
885.66 KB
-
mito_dates.txt
3.30 KB
-
mito_newick.nwk
9.04 KB
-
Mito_only_Bayesblock.txt
405.43 KB
-
Mitochondrial_best_scheme.txt
4.41 KB
-
Mitochondrial_only.nex
414.72 KB
-
Mitochondrial_RAxML_inputfile.txt
404.17 KB
-
Nuclear_best_scheme.txt
6.11 KB
-
nuclear_dates.txt
2.44 KB
-
nuclear_newick.nwk
6.87 KB
-
Nuclear_only_Bayesblock.txt
354.14 KB
-
Nuclear_only.nex
365.01 KB
-
Nuclear_RAxML_inputfile.txt
357.65 KB
-
RAxML_parameters_Cipres.txt
2.43 KB
-
README.md
5.46 KB
Abstract
Background. Arvicoline rodents are one of the most speciose and rapidly evolving mammalian lineages. Fossil arvicolines are also among the most common vertebrate fossils found in sites of Pliocene and Pleistocene age in Eurasia and North America. However, there is no taxonomically robust, well-supported, time-calibrated phylogeny for the group.
Methods. Here we present well-supported hypotheses of arvicoline rodent systematics using maximum likelihood and Bayesian inference of DNA sequences of two mitochondrial genes and three nuclear genes representing 146 (82% coverage) species and 100% of currently recognized arvicoline genera. We elucidate well-supported major clades, reviewed the relationships and taxonomy of many species and genera, and critically compared our resulting molecular phylogenetic hypotheses to previously published hypotheses. We also used five fossil calibrations to generate a time-calibrated phylogeny of Arvicolinae that permitted some reconciliation between paleontological and neontological data.
Results. Our results are largely congruent with previous molecular phylogenies, but we increased the support in many regions of the arvicoline tree that were previously poorly-sampled. Our sampling resulted in a better understanding of relationships within Clethrionomyini, the early-diverging position and close relationship of true lemmings (Lemmus and Myopus) and bog lemmings (Synaptomys), and provided support for recent taxonomic changes within Microtini. Our results indicate an origin of ~6.4 Ma for crown arvicoline rodents. These results have major implications (e.g., diversification rates, paleobiogeography) for our confidence in the fossil record of arvicolines and their utility as biochronological tools in Eurasia and North America during the Quaternary.
README: A new perspective on the taxonomy and systematics of Arvicolinae (Gray, 1821) and a new time-calibrated phylogeny for the clade
https://doi.org/10.5061/dryad.qrfj6q5cg
This dataset contains molecular data from five mitochondrial and nuclear genes of 146 species of arvicoline rodent and 3 outgroups. Maximum-likelihood analysis was completed in RAxML v8.2.12 (Stamatakis, 2014) and Bayesian Inference was computed in MrBayes 3.2.7 (Ronquist et al., 2012) on the Cipres Cluster. Divergence-dating using five fossil calibrations was also completed in MrBayes 3.2.7 (Ronquist et al., 2012).
Description of the data and file structure
Included in this dataset are all of the input files needed to reproduce our analyses.
RAxML_parameters_Cipres.txt
1. Parameters inputed into Cipres for the RAxML analysis. From top to bottom: Mitochondrial only, Nuclear only, Combined Mitochondrial and Nuclear data.
Age_prior_Evolutionary_rate_code.R
1. At the beginning is “R” code for producing the prior distribution for the age of a calibrated node.
- A new user should only change myOffset (the minimum age of a node), and myMedian (median of exponential distribution of the age prior).
2. Further down is the code derived from Gunnell et al. (2018) for calculating the evolutionary rate for an uncalibrated tree. This was done for the mitochondrial only, nuclear only, and combined dataset.
- You will need a newick file of your tree of interest and a text file of dates. If all taxa are modern than you will use an age of “0”. However, if fossils are used you would put the age of the fossil.
combined_newick.nwk
1. Newick file of the combined mitochondrial and nuclear dataset that you would use in your calculation of the evolutionary rate from the file Age_prior_Evolutionary_rate_code.R.
nuclear_newick.nwk
1. Newick file of the nuclear dataset that you would use in your calculation of the evolutionary rate from the file Age_prior_Evolutionary_rate_code.R.
mito_newick.nwk
1. Newick file of the mitochondrial dataset that you would use in your calculation of the evolutionary rate from the file Age_prior_Evolutionary_rate_code.R.
nuclear_dates.txt
1. Text file of the nuclear dataset ages that you would use in your calculation of the evolutionary rate from the file Age_prior_Evolutionary_rate_code.R.
mito_dates.txt
1. Text file of the mitochondrial dataset ages that you would use in your calculation of the evolutionary rate from the file Age_prior_Evolutionary_rate_code.R.
combined_dates.txt
1. Text file of the combined mitochondrial and nuclear dataset ages that you would use in your calculation of the evolutionary rate from the file Age_prior_Evolutionary_rate_code.R.
Combined_RAxML_inputfile.txt
1. Text file with the combined mitochondrial and nuclear dataset used in the RAxML analysis on Cipres.
Mitochondrial_RAxML_inputfile.txt
1. Text file with the mitochondrial dataset used in the RAxML analysis on Cipres.
Nuclear_RAxML_inputfile.txt
1. Text file with the nuclear dataset used in the RAxML analysis on Cipres.
Appendix_A_FINAL.xlsx
1. Table with all species included in the analysis
2. Missing data = total number of nucleotides missing from the complete dataset (n=5857)
3. Columns H-L = GenBank accession numbers
4. Columns M-Q = Voucher number
- Blank spaces = no specimen and no voucher
- NONE = Specimen present but no voucher
5. Source of Data = Primary source for data inputed on GenBank
Combined_best_scheme.txt
1. Partition and evolutionary model suggested by PartitionFinder2 for the combined mitochondrial and nuclear dataset.
Mitochondrial_best_scheme.txt
1. Partition and evolutionary model suggested by PartitionFinder2 for the mitochondrial dataset.
Nuclear_best_scheme.txt
1. Partition and evolutionary model suggested by PartitionFinder2 for the nuclear dataset.
Combined_Bayesblock.txt
1. Bayesblock of the combined mitochondrial and nuclear dataset inputed into Cipres.
Calibrated_Bayesblock.txt
1. Bayesblock of the time-calibrated combined mitochondrial and nuclear dataset inputed into Cipres.
Mito_only_Bayesblock.txt
1. Bayesblock of the mitochondrial dataset inputed into Cipres.
Nuclear_only_Bayesblock.txt
1. Bayesblock of the nuclear dataset inputed into Cipres.
Combined.nex
1. Nexus file of the combined mitochondrial and nuclear dataset.
2. Gene partitions are as follows:
- Cytb 1-1143
- COI 1144-2678
- GHR 2679-3558
- IRBP/RBP3 3559-4824
- BRCA1 4825-5857
Mitochondrial_only.nex
1. Nexus file of the combined mitochondrial and nuclear dataset.
2. Gene partitions are as follows:
- Cytb 1-1143
- COI 1144-2682
Nuclear_only.nex
1. Nexus file of the combined mitochondrial and nuclear dataset.
2. Gene partitions are as follows:
- GHR 1-880
- IRBP/RBP3 881-2146
- BRCA1 2155-3175
Sharing/Access information
This dataset was made possible by data found on GenBank. See individual accession numbers for the specific sequences used from GenBank (https://www.ncbi.nlm.nih.gov/genbank/).
Code/Software
All code included was computed in R Studio 2022.02.1.
Methods
This dataset is a portion of a chapter of Charles Withnell's doctoral dissertation at The University of Texas at Austin and an article published by PeerJ. Molecular data was obtained from GenBank (we did not sequence any genetic data ourselves). This resulted in three separate datasets; (1) a dataset of only taxa with mitochondrial data (n=146), (2) a dataset of only taxa with nuclear data (n=107), (3) a concatenated dataset that includes both mitochondrial and nuclear loci of n=146 species of extant arvicolines. Maximum Likelihood was computed in RAxML v8.2.12 (Stamatakis, 2014), and Bayesian Inference in MrBayes 3.2.7 (Ronquist et al., 2012). All analyses were conducted on the Cipres Cluster. Five fossil calibrations were used to constrain the phylogeny produced in MrBayes using all five genes.
Usage notes
All script should produce results. Note: Trying to run this script on different machines with different software updates can prove difficult. All analyses were conducted on a MacBook running the MacOS Monterey.