Data from: Taller plants have lower rates of molecular evolution

Lanfear, Robert1; Ho, Simon Y. W.2; Davies, T. Jonathan3; Moles, Angela T.4; Aarssen, Lonnie5; Swenson, Nathan G.6; Warman, Laura4; Zanne, Amy E.7; Allen, Andrew P.8

Published May 22, 2013 on Dryad. https://doi.org/10.5061/dryad.43mg3

Data files

May 22, 2013 version files 6.60 MB

01_ML_phylogeny.zip

1.11 MB
02_sister_pairs.zip

10.66 KB
03_R8S.zip

3.96 MB
04_PGLS.zip

1.51 MB

Abstract

Rates of molecular evolution have a central role in our understanding of many aspects of species’ biology. However, the causes of variation in rates of molecular evolution remain poorly understood, particularly in plants. Here we show that height accounts for about one-fifth of the among-lineage rate variation in the chloroplast and nuclear genomes of plants. This relationship holds across 138 families of flowering plants, and when accounting for variation in species richness, temperature, ultraviolet radiation, latitude and growth form. Our observations can be explained by a link between height and rates of genome copying in plants, and we propose a mechanistic hypothesis to account for this—the ‘rate of mitosis’ hypothesis. This hypothesis has the potential to explain many disparate observations about rates of molecular evolution across the tree of life. Our results have implications for understanding the evolutionary history and future of plant lineages in a changing world.

01_ML_phylogeny.zip

This file contains all the data necessary to re-run our ML tree using the full alignmentcontaining all 564 species. There are 5 files. 'burleigh_concat.phy' is the alignment file in phylip format. 'commandline' is the commandline we used to run RAxML version 7. 'partitions' describes how we partitioned our data (RAxML uses that file). 'RAxML_result.burleigh_concat.raxml.out' has the tree file that results from our analysis. 'raxmlHPC' is the raxml executable we used to run our analyses, on a Mac desktop computer.

02_sister_pairs.zip

This file contains data and R code for the sister pairs analyses. There are 2 files. 'raw_data_sister_pairs.csv' is a comma separated values file that contains all of the branch length and life history data we used in the sister pairs analyses. There is one row per sister pair in the analysis, which has data on: the two families in the sister pair, the proportion of genera we had height data for in each family, the number of species in each family, the average height of each family (log transformed value in mm), nuclear branch length for each family (in units of substitutions/site), chloroplast dN branch length for each family (also in substitutions/site), chloroplast dS branch length for each family (also in substitutions/site), Latitude for each family (in distance from the equator), UV for each family (measured as in Davies et al 2004), and Temperature for each family (in Kelvins). 'sister_pairs_analyses.r' contains all R code used for the sister pairs analyses. To use it, you will need to download R (it's free), install the relevant packages (at the top of the .r file), and then change the line at the top which starts 'setwd' to point to the folder on your computer that contains the raw_data_sister_pairs.csv file. As you go through the R code, it will print out all of the results in the paper that used sister pairs analyses.

03_R8S

This file contains input files and the results files from running R8S on the ML tree, and on 1000 bootstraps of the ML tree. There are 3 files and 2 folders. 'base_files' is just a holder for some basic R8S input files, that the python script uses (see below). This folder contains: the r8s executable we used (compiled for macs), the basic r8s.txt input file we used for each r8s analysis (which has data on our fossil calibrations), and a .txt file that contains the ML tree, and 1000 boostrapped trees estimated in RAxML as described in the paper. 'bootstrap_rates.txt' contains the results of the R8S analyses, each family is listed on its own row, and each row has 1001 associated columns. The first column contains the ML rate, subsequent columns contain bootstrap rates (in substitutions/site/myr). This is the main output file produced by the python script (see next). 'run_BS_r8s.py' is a python script that will run r8s on the ML tree and the 1000 bootstrap trees. Before running it, create an empty directory in the same folder as the script called "bootstrap_results", then change the "start_dir" and "tree_file" variables at the top of the script to point to the directory the script is in, and the tree file in the 'base_files' folder respectively. Then run the script using python. Briefly, the script takes each tree from the tree file, makes a r8s input file, runs r8s, then parses the output to extract the rates for each family. It then outputs these to the 'bootstrap_rates.txt' file. Be aware that the script stores all r8s results, which can take a lot of space (about 1GB) when the analyses are all complete.

04_PGLS

This file contains 5 files, sufficient to re-do all of our PGLS analyses. 'bootstrap_rates.txt' contains the results of the ML r8s analysis and all 1000 subsequent bootstrap analyses. 'growth_forms.csv' contains information on the growth forms of species in each family. 'PGLS_analyses.r' is an R script which you can use to re-run all of our PGLS analyses. To use it you will need to change the line at the top that starts 'setwd' to point to the folder on your computer that contains all of the input files here. You'll also need to download R, and the packages listed at the top of the file. 'R8S_trees.txt' is a file of the 1001 trees from R8S. These are used in the PGLS analyses to correct for nonindepdence. The first tree is the ML tree, the rest are bootstrap trees. 'raw_data_sister_pairs.csv' is a csv file of the raw data. It's included here so that the R script will run without additional hassle. But it's identical to the file described in the '02_sister_pairs' section above.