The recent advancement in molecular sequencing techniques has led to a surge in the number of studies that incorporate large amounts of genetic data in phylogenetic studies. We test the assumption that analyzing large amounts of genetic data will lead to improvements in tree resolution and branch support using moths in the superfamily Bombycoidea, a group in which some of its inter-familial relationships have been difficult to resolve. Specifically, we examine how codon position and saturation might influence resolution and node support among three key families using a next-gen dataset that included 19 taxa and 938 genes (~1.2M bp). Maximum likelihood, parsimony, and species tree analysis using gene-tree parsimony, on numerous different nucleotide and amino acids datasets, resulted in largely congruent topologies with high bootstrap support, compared to prior studies that included a fewer number of loci. However, for a few shallow nodes, nucleotide and amino acid data provided high support for conflicting relationships. The third codon position was saturated and phylogenetic analysis of this position alone supported a completely different, potentially misleading sister group relationship. We used the program RADICAL to assess the number of genes needed to fix some of these difficult nodes. One such node needed a total of 850 genes, but only needed 250 when synonymous signal was removed. While transcriptomics can provide large amounts of data needed to resolve many difficult phylogenetic relationships, the importance of assessing the effect of synonymous substitutions and third codon positions in next-gen datasets still remains.
Breinholt_Kawahara_2013_nuc
Nexus file containing 938 genes for 19 taxa. See Taxon_list.txt for names of each taxon, this is a nucleotide nexus file with a CHARSET that defines each gene. Gene names correspond to gene numbers in the Insecta HMMER v3-2 core ortholog database. For further information on these genes see Supplementary Table 2 from Breinholt and Kawahara 2013.
Breinholt_Kawahara_2013_aa.nex
Nexus file containing 938 genes for 19 taxa. See taxa_list.txt for names of each taxon, this is a amino acid nexus file with a CHARSET that defines each gene. Gene names correspond to gene numbers in the Insecta HMMER v3-2 core ortholog database. For further information on these genes see Supplementary Table 2 from Breinholt and Kawahara 2013.
Taxon_list.txt
List of taxa codes and names and source of data for the two nexus files below in tab-delimited text.
acti2_assembly.fasta
Assembly of Actias luna from Genbank SRA accession #SRR1002974, using multiple kmers (13,23,33,43,63) with SOAPdenovo-Trans v1.01. Different Kmer assemblies were combined with cd-hit-est and processed with the fastx toolkit. See SOAP_assembly.qsub for the command used for this assembly.
attac_assembly.fasta
Assembly of Attacus atlas from Genbank SRA accession #SRR1002994, using multiple kmers (13,23,33,43,63) with SOAPdenovo-Trans v1.01. Different Kmer assemblies were combined with cd-hit-est and processed with the fastx toolkit. See SOAP_assembly.qsub for the command used for this assembly.
cundu3_assembly.fasta
Assembly of Ceratomia undulosa from Genbank SRA accession #SRR1002985, using multiple kmers (13,23,33,43,63) with SOAPdenovo-Trans v1.01. Different Kmer assemblies were combined with cd-hit-est and process with the fastx toolkit. See SOAP_assembly.qsub for the command used for this assembly.
dara_assembly.fasta
Assembly of Darapsa myron from Genbank SRA accession #SRR1002986, using multiple kmers (13,23,33,43,63) with SOAPdenovo-Trans v1.01. Different Kmer assemblies were combined with cd-hit-est and process with the fastx toolkit. See SOAP_assembly.qsub for the command used for this assembly.
elug1_assembly.fasta
Assembly of Enyo lugubris from Genbank SRA accession #SRR1002983, using multiple kmers (13,23,33,43,63) with SOAPdenovo-Trans v1.01. Different Kmer assemblies were combined with cd-hit-est and process with the fastx toolkit. See SOAP_assembly.qsub for the command used for this assembly.
SOAP_assembly.qsub
This script was used for multiple kmer transcriptome assemblies. The script is specific to the University of Florida module system but can be easily edited for use on other HPC systems.
HaMStR.qsub
This script contains commands used for HaMStR ortholog prediction. It is specific to HPC systems with PBS schedulers and requires the set of files and directories detailed in the HaMStR manual.
README.txt
This file contains descriptions of all the files associated with this Dryad package.