This archive contains code and data used in creating the paper "A New Hierarchy of Phylogenetic Models Consistent with Heterogeneous Substitution Rates" by Michael Woodhams, Jesús Fernández-Sánchez and Jeremy Sumner, Systematic Biology, 2015. Please note that this software was written for research purposes rather than for general use. It is not user friendly, and due to not knowing where I was going when I started, the structure is often poor. It also contains many options to try parameterizations and optimizations which turned out to work poorly and are not recommended for use. The primary purpose of releasing it is to allow replication of our results. * supplementary_material.pdf Additional tables and figures, referenced in the main paper. This is mostly derived from likelihoods.ods and LieMarkov.nb, but in a more readable format. * LMM.jar An executable jar file for conducting the likelihood comparisons between models. See below for more information. * test.sh The Unix shell script used to run LMM.jar to create the results in the paper. This script runs each Lie Markov model under multiple parameterization schemes, runs four processes in parallel, and takes a few days to run. * likelihoods.ods Spreadsheet build from the output of test.sh, from which BIC and AICc scores are determined. This also compares the various parameterization schemes. File is in OpenOffice/LibreOffice format. * Java_source.tgz is an archive containing Java source code. They were created using the Eclipse IDE, but the source is easily transfered to some other environment. This release on Data Dryad is (as per Data Dryad policy) released under the Creative Commons 0 license (essentially public domain.) The pal-mdw library needs to be downloaded separately (licencing issues preclude including it here) from https://github.com/MichaelWoodhams/pal-mdw/releases ** LieAlgebraMarkovModels: The main project. Other projects are libraries to this one. This compiles to create LMM.jar. ** mdwUtils: Just a collection of useful little routines I've started building, which I might want to use from other projects also. ** Jama: Linear algebra library for Java. http://math.nist.gov/javanumerics/jama/ This is public domain software, included here for convenience. No alterations have been made. * sequences.tgz contains the DNA alignments used for testing. See the paper for full citations. ** 53humans.phy: Ingman et al (2000). Accessions: AF346963.1 to AF347015.1, aligned with dialign-tx ** acorus15.phy: Goremykin et al (2005). accessions AJ879453.1 and NC_007407.1 plus other data found by BLAST - see Goremykin for details. ** cormorants.nex: Holland et al 2010. Treebase study S10249 ** rokas.nex: Rokas et al 2003. Data available at http://as.vanderbilt.edu/rokaslab/data/Rokas_etal_Nature_2003.zip ** z.11x2178 : Zakon et al. 2006, http://www.pnas.org/content/103/10/3675.short from accessions AF378144 DQ351534 DQ351533 AY204537 DQ149506 DQ275142 DQ336343 DQ336344 M22252 CAAE01014976 AB030482 Downloadable from https://code.google.com/p/garli/source/browse/garli/trunk/tests/data/z.11x2178.wtset.nex ** buttercup.nex: Joly et al 2009 Treebase study S9948. From accessions FJ744168–FJ744237, FJ711776–FJ712023 M4310.nex: Phillips et al 2010. Treebase study S10184. ================== How to run LMM.jar: Run in a directory which contains the 'sequences' directory from this distribution. (This contains the default data files.) You also need a directory "LMM_lib" containing library jar files: pal.jar downloaded from https://github.com/MichaelWoodhams/pal-mdw/releases and Jama-1.0.3.jar downloadable from http://math.nist.gov/javanumerics/jama/Jama-1.0.3.jar To run: java -jar LMM.jar To get usage information java -jar LMM.jar -help Important options are: -M: which models to run -i, -g: use invariant sites and gamma rates respectively -F: which of the seven default data files to analyse -f: to specify a non-default data file to analyse For example: java -jar LMM.jar -F2,3 -M5-7,105-107 -i -g would analyse the cormorants and yeast data sets, using models RY2.2b, RY3.3b, RY3.3c using the 'cubic' parameterization (models 5,6 and 7) and the same models using the Fourier-Motzkin parameterization (models 105, 106 and 107.) The rates-across-sites model includes both invariable sites and gamma rate distribution. Some of the parameterizations available in this program are not discussed in the paper, as they work poorly.