Data from: Resolving relationships among the megadiverse butterflies and moths with a novel pipeline for Anchored Phylogenomics

Breinholt JW, Earl C, Lemmon AR, Lemmon EM, Xiao L, Kawahara AY

Date Published: May 18, 2017

DOI: http://dx.doi.org/10.5061/dryad.rf7g5.2

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title README
Downloaded 6 times
Description README File containing list of files and script contained in this dryad package
Download README_new.txt (11.03 Kb)
Details View File Details
Title Breinholt_et_al_Supplementary_Figure_S1
Downloaded 26 times
Description Supplementary Figure S1 from Breinholt et al. (2017)
Download Breinholt_et_al_Supplementary_Figure_S1.pdf (156.5 Kb)
Details View File Details
Title Breinholt_et_al_Supplementary_Figure_S2
Downloaded 36 times
Description Supplementary Figure S2 from Breinholt et al. (2017)
Download Breinholt_et_al_Supplementary_Figure_S2.pdf (234.2 Kb)
Details View File Details
Title Breinholt_et_al_Supplementary_Figure_S3
Downloaded 21 times
Description Supplementary Figure S3 from Breinholt et al. (2017)
Download Breinholt_et_al_Supplementary_Figure_S3.pdf (321.6 Kb)
Details View File Details
Title Breinholt_et_al_Supplementary_Figure_S4
Downloaded 21 times
Description Supplementary Figure S4 from Breinholt et al. (2017)
Download Breinholt_et_al_Supplementary_Figure_S4.pdf (318.2 Kb)
Details View File Details
Title Breinholt_et_al_Supplementary_Figure_S5
Downloaded 22 times
Description Supplementary Figure S5 from Breinholt et al. (2017)
Download Breinholt_et_al_Supplementary_Figure_S5.pdf (272.6 Kb)
Details View File Details
Title Breinholt_et_al_Supplementary_File_1__S1-S11
Downloaded 14 times
Description Supplementary File 1: Microsoft excel document including Supplementary Table S1-S11 from Breinholt et al. (2017)
Download Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx (3.101 Mb)
Details View File Details
Title Breinholt_et_al_Supplementary_File_2_Lep1
Downloaded 13 times
Description Specification file for the Lep1 probe set used to order probes from Agilent Technologies (http://www.agilent.com/)
Download Breinholt_et_al_Supplementary_File_2_Lep1.txt (8.580 Mb)
Details View File Details
Title Breinholt_et_al_Supplementary_File_3
Downloaded 8 times
Description Word document that expands discussion of Breinholt et al. (2017) and discusses Lepidopteran relationships in more details
Download Breinholt_et_al_Supplementary_File_3.docx (140.2 Kb)
Details View File Details
Title Lep1_ref
Downloaded 4 times
Description Compressed file containing the data for each reference for each loci in the Lep1 kit as well as used in the IBA assembly.
Download Lep1_ref.tar.gz (1.149 Mb)
Details View File Details
Title JAVA_SourceCode
Downloaded 5 times
Description Compressed directory holding A.R.L (alemmon@evotutor.org) java source code This directory contains readme and instructions for use and to compile the java code for IdentifySpacedKmers7, QuickScan5, and ShallowMapper4. It also contain the Lep1_ProbeDesign directory used with the java programs to design the Lep1 probe set (IdentifySpacedKmers7, IdentifySpacedKmers7_readme.txt, Lep1_ProbeDesign, LepRefFiles.txt, QuickScan5_readme.txt, QuickScan5.java, ShallowMapper4_readme.txt, ShallowMapper4.java) ShallowMapper4: java script by A.R.L used to identify intron boundaries in genes for five reference taxa by mapping raw genomic reads to the corresponding transcriptomic sequences QuickScan5: java script by A.R.L used to scan the additional 23 transcriptomes and ESTs by generating reference kmers using the 5-species alignments and using those kmers to map contig sequences from the transcriptomes to the candidate locus set
Download JAVA_SourceCode.tar.gz (4.356 Mb)
Details View File Details
Title Breinholt_et_al_LOG_COMMANDS
Downloaded 5 times
Description Set of commands used to run the bioinformatic pipeline to generate data for Breinholt et al. 2017
Download Breinholt_et_al_LOG_COMMANDS.log (7.291 Kb)
Details View File Details
Title Scripts_README
Downloaded 12 times
Description Description of the python scripts and direction how to run them.
Download Scripts_README.txt (12.85 Kb)
Details View File Details
Title IBA
Downloaded 9 times
Description python script to assemble AHE data loci by loci
Download IBA.py (10.94 Kb)
Details View File Details
Title IBA_trans
Downloaded 5 times
Description python script to assemble AHE data loci by loci for using a fastq file from transcriptome data
Download IBA_trans.py (9.121 Kb)
Details View File Details
Title extract_probe_region
Downloaded 6 times
Description python script to split alignment into head, probe, and tail regions based on the beginning and end of a reference sequence in the alignment
Download extract_probe_region.py (3.405 Kb)
Details View File Details
Title s_hit_checker
Downloaded 7 times
Description python script to process the output of BLAST to find sequences that fit the single hit critera
Download s_hit_checker.py (2.365 Kb)
Details View File Details
Title ortholog_filter
Downloaded 4 times
Description python script to process the output of BLAST to find if the location of the best hit on the genome is the same location as the probe target from that genome.
Download ortholog_filter.py (4.722 Kb)
Details View File Details
Title split
Downloaded 7 times
Description python script to split a single line fasta file with many loci into locus specific fasta files
Download split.py (1.386 Kb)
Details View File Details
Title alignment_DE_trim
Downloaded 5 times
Description python script to trim alignments by density and entropy
Download alignment_DE_trim.py (5.534 Kb)
Details View File Details
Title flank_dropper
Downloaded 4 times
Description python script to remove poorly aligned sequences in the flanking head and tail regions
Download flank_dropper.py (7.253 Kb)
Details View File Details
Title counting_monster
Downloaded 4 times
Description python script to count the loci per taxa and put into a tab separated matrix
Download counting_monster.py (3.283 Kb)
Details View File Details
Title removelist
Downloaded 3 times
Description python script to remove list of sequences from a fasta file
Download removelist.py (892 bytes)
Details View File Details
Title getlist
Downloaded 4 times
Description python script to get list of sequences from a fasta file
Download getlist.py (866 bytes)
Details View File Details
Title contamination_filter
Downloaded 5 times
Description python script to process blast results of blasting sequences from each loci against themselves using usearch to identify contamination
Download contamination_filter.py (5.218 Kb)
Details View File Details
Title remove_duplicates
Downloaded 4 times
Description python script to identify and remove sequences for each taxon that had more than one sequence per locus
Download remove_duplicates.py (1.751 Kb)
Details View File Details
Title taxa_list
Downloaded 3 times
Description List of Sample ID's used in nexus files and corresponding species names in tab-delimited text
Download taxa_list.txt (12.08 Kb)
Details View File Details
Title Breinholt_et_al_RAW_DATA.tar.gz
Downloaded 3 times
Description compressed file containing the raw Illumina (2X100) AHE data
Download Breinholtetal_RAW_DATA.tar.gz (24.14 Gb)
Details View File Details
Title final_soap_FG120036B
Downloaded 4 times
Description Assembly of Apatelodes pithala from Genbank SRA accession #SRR1794032, using multiple kmers (13,23,33,43,63) with SOAPdenovo-Trans v1.01. Different Kmer assemblies were combined with cd-hit-est and processed with the fastx toolkit. See Breinholt et al. (2017) for more details.
Download final_soap_FG120036B.fa (35.62 Mb)
Details View File Details
Title final_soap_calo2
Downloaded 3 times
Description Assembly of Caloptilia triadicae from Genbank SRA accession #SRR1794032, using multiple kmers (13,23,33,43,63) with SOAPdenovo-Trans v1.01. Different Kmer assemblies were combined with cd-hit-est and processed with the fastx toolkit. See Breinholt et al. (2017) for more details.
Download final_soap_calo2.fa (47.00 Mb)
Details View File Details
Title final_soap_GV120010B
Downloaded 4 times
Description Assembly of Urbanus proteus from Genbank SRA accession #SRR1794082 , using multiple kmers (13,23,33,43,63) with SOAPdenovo-Trans v1.01. Different Kmer assemblies were combined with cd-hit-est and processed with the fastx toolkit. See Breinholt et al. (2017) for more details.
Download final_soap_GV120010B.fa (36.93 Mb)
Details View File Details
Title Breinholt_et_al_acrossLep_full_assemblies_all_loci
Downloaded 3 times
Description Fasta formatted sequence file containing sequences that pass pipeline step 1-6 for all loci and taxa in dataset 1-3. This file can be split using the split.py to separate into fasta files of individual loci.
Download Breinholt_et_al_acrossLep_full_assemblies_...ci.fa (25.41 Mb)
Details View File Details
Title Breinholt_et_al_shallow_full_assemblies_all_loci
Downloaded 5 times
Description Fasta formatted sequence file containing sequences that pass pipeline step 1-6 for all loci and taxa in dataset 4-6. This file can be split using the split.py to separate into fasta files of individual loci.
Download Breinholt_et_al_shallow_full_assemblies_al...ci.fa (39.87 Mb)
Details View File Details
Title Breinholt_et_al_allcodonpostion123_acrossLep
Downloaded 6 times
Description Nexus file containing codon position 1 & 2 & 3 for 557 loci and 75 taxa used to make dataset 1-3. See taxa_list.txt for species names of each taxon, this is a nucleotide nexus file with a CHARSET that defines each gene that starts with codon position 1. For further information see Breinholt et al. (2017) and Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx in this Dryad package for more details.
Download Breinholt_et_al_allcodonpostion123_acrossLep.nex (10.17 Mb)
Details View File Details
Title Breinholt_et_al_degen12_DS1
Downloaded 5 times
Description Dataset 1 (acrossLEP_AHE). Nexus file containing codon position 1 & 2 for 557 loci and 23 taxa. See taxa_list.txt for species names of each taxon, this is a nucleotide nexus file with a CHARSET that defines each gene that starts with codon position 1. Synonymous signal was removed using degen v1.4 Perl script (http://www.phylotools.com), and the third codon has been removed. Loci names correspond to Loci numbers in the Lep1 enrichment kit included in this DRAYD package. For further information see Breinholt et al. (2017) and Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx in this Dryad package for more details.
Download Breinholt_et_al_degen12_DS1.nex (2.091 Mb)
Details View File Details
Title Breinholt_et_al_aminoacid_DS1
Downloaded 4 times
Description Nexus file containing codon position 1 & 2 for 557 loci and 23 taxa. See taxa_list.txt for species names of each taxon, this is a nucleotide nexus file with a CHARSET that defines each gene that starts with codon position 1. Synonymous signal was removed using degen v1.4 Perl script (http://www.phylotools.com), and the third codon has been removed. Loci names correspond to Loci numbers in the Lep1 enrichment kit included in this DRAYD package. For further information see Breinholt et al. (2017) and Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx in this Dryad package for more details.
Download Breinholt_et_al_aminoacid_DS1.nex (1.055 Mb)
Details View File Details
Title Breinholt_et_al_degen12_DS2
Downloaded 6 times
Description Dataset 2 (acrossLEP_AHE+PARTtrans). Nexus file containing codon position 1 & 2 for 557 loci and 75 taxa. See taxa_list.txt for species names of each taxon, this is a nucleotide nexus file with a CHARSET that defines each gene that starts with codon position 1. Synonymous signal was removed using degen v1.4 Perl script (http://www.phylotools.com), and the third codon has been removed. Loci names correspond to Loci numbers in the Lep1 enrichment kit included in this DRAYD package. For further information see Breinholt et al. (2017) and Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx in this Dryad package for more details.
Download Breinholt_et_al_degen12_DS2.nex (6.785 Mb)
Details View File Details
Title Breinholt_et_al_aminoacid_DS2
Downloaded 3 times
Description Nexus file containing amino acid data for 557 loci and 75 taxa. See taxa_list.txt for species names of each taxon, this is an amino acid nexus file with a CHARSET that defines each loci. Loci names correspond to Loci numbers in the Lep1 enrichment kit included in this DRAYD package. For further information see Breinholt et al. (2017) and Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx in this Dryad package for more details.
Download Breinholt_et_al_aminoacid_DS2.nex (3.403 Mb)
Details View File Details
Title Breinholt_et_al_degen12_DS3
Downloaded 8 times
Description Dataset 3 (acrossLEP_AHE+ALLtrans ). Nexus file consists of both AHE and the transcriptomic data of Kawahara and Breinholt 2015. The file contains codon position 1 & 2 for 2948 loci and 76 taxa. See taxa_list.txt for species names of each taxon, this is a nucleotide nexus file with a CHARSET that defines each gene that starts with codon position 1. Synonymous signal was removed using degen v1.4 Perl script (http://www.phylotools.com), and the third codon has been removed. Loci names correspond to Loci numbers in the Lep1 enrichment kit included in this DRAYD package. For further information see Breinholt et al. (2017) and Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx in this Dryad package for more details.
Download Breinholt_et_al_degen12_DS3.nex (191.8 Mb)
Details View File Details
Title Breinholt_et_al_DS4
Downloaded 8 times
Description Dataset 4 (shallow_probe+flanks). Nexus file containing 749 loci and 48 taxa. Alignments were trimmed with a density of 60% and entropy of 1.5 using alignment_DE_trim.py and flacking regions were processed with the flank_dropper.py to remove head or tail sequences using 2 standard deviations for both the head and tail. See taxa_list.txt for species names of each taxon, this is a nucleotide nexus file with a CHARSET that defines each gene. For further information see Breinholt et al. (2017) and Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx in this Dryad package for more details.
Download Breinholt_et_al_DS4.nex (13.52 Mb)
Details View File Details
Title Breinholt_et_al_DS5
Downloaded 6 times
Description Dataset 5 (shallow_probe). Nexus file containing 749 loci and 48 taxa. The Extract_probe_region.py script was used on Dataset 4 to isolate data coming from the probe region. See taxa_list.txt for species names of each taxon, this is a nucleotide nexus file with a CHARSET that defines each gene. For further information see Breinholt et al. (2017) and Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx in this Dryad package for more details.
Download Breinholt_et_al_DS5.nex (8.028 Mb)
Details View File Details
Title Breinholt_et_al_DS6
Downloaded 4 times
Description Dataset 6 (shallow_flanks). Nexus file containing 749 loci and 35 taxa. The Extract_probe_region.py script was used on Dataset 4 to isolate data coming from the flanking regions region. See taxa_list.txt for species names of each taxon, this is a nucleotide nexus file with a CHARSET that defines each gene. For further information see Breinholt et al. (2017) and Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx in this Dryad package for more details.
Download Breinholt_et_al_DS6.nex (4.064 Mb)
Details View File Details

When using this data, please cite the original publication:

Breinholt JW, Earl C, Lemmon AR, Lemmon EM, Xiao L, Kawahara AY (2017) Resolving relationships among the megadiverse butterflies and moths with a novel pipeline for Anchored Phylogenomics. Systematic Biology, online in advance of print. http://dx.doi.org/10.1093/sysbio/syx048

Additionally, please cite the Dryad data package:

Breinholt JW, Earl C, Lemmon AR, Lemmon EM, Xiao L, Kawahara AY (2017) Data from: Resolving relationships among the megadiverse butterflies and moths with a novel pipeline for Anchored Phylogenomics. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.rf7g5.2
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Version History

Item Version Date Summary

* Selected Version

Search for data

Be part of Dryad

We encourage organizations to: