Data from: Resolving relationships among the megadiverse butterflies and moths with a novel pipeline for Anchored Phylogenomics
Data files
May 02, 2017 version files 24.59 GB
-
alignment_DE_trim.py
5.53 KB
-
Breinholt_et_al_acrossLep_full_assemblies_all_loci.fa
25.42 MB
-
Breinholt_et_al_allcodonpostion123_acrossLep.nex
10.17 MB
-
Breinholt_et_al_aminoacid_DS1.nex
1.06 MB
-
Breinholt_et_al_aminoacid_DS2.nex
3.40 MB
-
Breinholt_et_al_degen12_DS1.nex
2.09 MB
-
Breinholt_et_al_degen12_DS2.nex
6.79 MB
-
Breinholt_et_al_degen12_DS3.nex
191.84 MB
-
Breinholt_et_al_DS4.nex
13.52 MB
-
Breinholt_et_al_DS5.nex
8.03 MB
-
Breinholt_et_al_DS6.nex
4.06 MB
-
Breinholt_et_al_LOG_COMMANDS.log
7.29 KB
-
Breinholt_et_al_shallow_full_assemblies_all_loci.fa
39.88 MB
-
Breinholt_et_al_Supplementary_Figure_S1.pdf
156.60 KB
-
Breinholt_et_al_Supplementary_Figure_S2.pdf
234.23 KB
-
Breinholt_et_al_Supplementary_Figure_S3.pdf
321.63 KB
-
Breinholt_et_al_Supplementary_Figure_S4.pdf
318.25 KB
-
Breinholt_et_al_Supplementary_Figure_S5.pdf
272.67 KB
-
Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx
3.10 MB
-
Breinholt_et_al_Supplementary_File_2_Lep1.txt
8.58 MB
-
Breinholtetal_RAW_DATA.tar.gz
24.15 GB
-
contamination_filter.py
5.22 KB
-
counting_monster.py
3.28 KB
-
extract_probe_region.py
3.40 KB
-
final_soap_calo2.fa
47.01 MB
-
final_soap_FG120036B.fa
35.63 MB
-
final_soap_GV120010B.fa
36.93 MB
-
flank_dropper.py
7.25 KB
-
getlist.py
866 B
-
IBA_trans.py
9.12 KB
-
IBA.py
10.94 KB
-
JAVA_SourceCode.tar.gz
4.36 MB
-
Lep1_ref.tar.gz
1.15 MB
-
ortholog_filter.py
4.72 KB
-
README.txt
10.87 KB
-
remove_duplicates.py
1.75 KB
-
removelist.py
892 B
-
s_hit_checker.py
2.36 KB
-
Scripts_README.txt
12.85 KB
-
split.py
1.39 KB
-
taxa_list.txt
12.09 KB
May 18, 2017 version files 49.06 GB
-
alignment_DE_trim.py
5.53 KB
-
Breinholt_et_al_acrossLep_full_assemblies_all_loci.fa
25.42 MB
-
Breinholt_et_al_allcodonpostion123_acrossLep.nex
10.17 MB
-
Breinholt_et_al_aminoacid_DS1.nex
1.06 MB
-
Breinholt_et_al_aminoacid_DS2.nex
3.40 MB
-
Breinholt_et_al_degen12_DS1.nex
2.09 MB
-
Breinholt_et_al_degen12_DS2.nex
6.79 MB
-
Breinholt_et_al_degen12_DS3.nex
191.84 MB
-
Breinholt_et_al_DS4.nex
13.52 MB
-
Breinholt_et_al_DS5.nex
8.03 MB
-
Breinholt_et_al_DS6.nex
4.06 MB
-
Breinholt_et_al_LOG_COMMANDS.log
7.29 KB
-
Breinholt_et_al_shallow_full_assemblies_all_loci.fa
39.88 MB
-
Breinholt_et_al_Supplementary_Figure_S1.pdf
156.60 KB
-
Breinholt_et_al_Supplementary_Figure_S2.pdf
234.23 KB
-
Breinholt_et_al_Supplementary_Figure_S3.pdf
321.63 KB
-
Breinholt_et_al_Supplementary_Figure_S4.pdf
318.25 KB
-
Breinholt_et_al_Supplementary_Figure_S5.pdf
272.67 KB
-
Breinholt_et_al_Supplementary_File_1_S1-S11.xlsx
3.10 MB
-
Breinholt_et_al_Supplementary_File_2_Lep1.txt
8.58 MB
-
Breinholt_et_al_Supplementary_File_3.docx
140.26 KB
-
Breinholtetal_RAW_DATA.tar.gz
24.15 GB
-
contamination_filter.py
5.22 KB
-
counting_monster.py
3.28 KB
-
extract_probe_region.py
3.40 KB
-
final_soap_calo2.fa
47.01 MB
-
final_soap_FG120036B.fa
35.63 MB
-
final_soap_GV120010B.fa
36.93 MB
-
flank_dropper.py
7.25 KB
-
getlist.py
866 B
-
IBA_trans.py
9.12 KB
-
IBA.py
10.94 KB
-
JAVA_SourceCode.tar.gz
4.36 MB
-
Lep1_ref.tar.gz
1.15 MB
-
ortholog_filter.py
4.72 KB
-
README.txt
11.04 KB
-
remove_duplicates.py
1.75 KB
-
removelist.py
892 B
-
s_hit_checker.py
2.36 KB
-
Scripts_README.txt
12.85 KB
-
split.py
1.39 KB
-
taxa_list.txt
12.09 KB
Abstract
The advent of next-generation sequencing technology has allowed for the collection of large portions of the genome for phylogenetic analysis. Hybrid enrichment and transcriptomics are two techniques that leverage next-generation sequencing and have shown much promise. However, methods for processing hybrid enrichment data are still limited. We developed a pipeline for anchored hybrid enrichment (AHE) read assembly, orthology determination, contamination screening, and data processing for sequences flanking the target “probe” region. We apply this approach to study the phylogeny of butterflies and moths (Lepidoptera), a megadiverse group of more than 157,000 described species with poorly understood deep-level phylogenetic relationships. We introduce a new, 855 locus anchored hybrid enrichment kit for Lepidoptera phylogenetics and compare resulting trees to those from transcriptomes. The enrichment kit was designed from existing genomes, transcriptomes and expressed sequence tag (EST) data and was used to capture sequence data from 54 species from 23 lepidopteran families. Phylogenies estimated from AHE data were largely congruent with trees generated from transcriptomes, with strong support for relationships at all but the deepest taxonomic levels. We combine AHE and transcriptomic data to generate a new Lepidoptera phylogeny, representing 76 exemplar species in 42 families. The tree provides robust support for many relationships, including those among the seven butterfly families. The addition of AHE data to an existing transcriptomic dataset lowers node support along the Lepidoptera backbone, but firmly places taxa with AHE data on the phylogeny. To examine the efficacy of AHE at different taxonomic levels, phylogenetic analyses were also conducted on a sister group representing a more recent divergence, the Saturniidae and Sphingidae. These analyses utilized sequences from the probe region and data flanking it, nearly doubled the size of the dataset; all resulting trees were well supported. We hope that our data processing pipeline, hybrid enrichment gene set, and approach of combining AHE data with transcriptomes will be useful for the broader systematics community.