Temperature predicts the rate of molecular evolution in Australian Eugongylinae skinks
Data files
Aug 27, 2021 version files 300.44 MB
-
ALL_concat_loci.fasta
300.44 MB
-
README.txt
440 B
Abstract
Temperature differences over time and space has been hypothesized to cause variation in the rate of molecular evolution of species, but empirical evidence is mixed. To further test this hypothesis, we utilized a large exon-capture sequence data of Australian Eugongylinae skinks, exemplifying a radiation of temperature-sensitive ectotherms spanning a large latitudinal gradient. The association between temperature (and other species traits) and long-term substitution rate was assessed based on 1268 sequenced exons of 44 species pairs from the Eugongylinae subfamily using regression analyses. Temperature is the strongest, positively-correlated predictor of variation in substitution rate across the Australian Eugongylinae. It explains 45% of variation in synonymous substitution rate, and 11% after controlling for all the other factors. Synonymous substitution rate is also negatively associated with body size, with 6% variation explained by body size after controlling for the effects of temperature. Other factors are not associated with synonymous substitution rate after controlling for temperature. Overall, this study points to temperature as a strong predictor of the molecular evolution rate in the Eugongylinae subfamily, and demonstrates the power of large-scale exonic data to identify correlates of the rate of molecular evolution.
Exon sequences, generated using a custom exon-capture system (Bragg et al. 2016) were obtained from Blom et al. (2016), Bragg et al. (2018) and unpublished data. In brief, the raw sequencing reads were quality-assessed and assembled de novo as haplotypes for each individual following the pipelines from Singhal (2013) and Bragg et al. (2016, 2018). We note that this workflow involves mapping cleaned sequencing reads to a set of reference sequences that were assembled for the same sample (Bowtie 2, version 2.2.2, Langmead and Salzberg 2012) and calling genotypes appropriate for the estimation of individual level heterozygosity (using GATK release gb82c674, McKenna et al. 2010, and sites with a genotype quality, GQ, greater than 20). We included 435 samples (and 8 outgroups) from 135 species of Eugongylinae subfamily that had less than 1000 missing loci out of the initial set of 3171 loci.
Using EAPhy v.1.2 (Blom 2015), we first checked correct amino acid coding for all the aligned haplotypes (one per individual per locus), removed loci with more than 20% missing data and/or which had less than 200bp. The filtered sequence alignments (N = 1782 loci) were further cross-checked by eye using Geneious Prime 2020.1.1 to ensure correct reading frame and no stop codons in the middle of alignments. Then, the gene tree of each locus was estimated using IQTree v1.6.11 (Nguyen et al. 2014) with the MG codon model with additional transition/transversion rate ratio, as selected by ModelFinder (Kalyaanamoorthy et al. 2017). The gene trees and alignments were examined in TreeShrink (Mai and Mirarab 2018) to filter out loci in which some species had abnormally long branches, resulting in 1268 loci that passed the default parameters. Finally, the alignments that passed this filter concatenated using EAPhy.
ALL_concat_loci.fasta
Sequence alignment of 435 samples (and 8 outgroups) from 135 species of Eugongylinae subfamily in FASTA format. The README file contains additional information of the data.