This is data for the DEPP project 1. If you want to place your query sequences onto WoL tree using DEPP, the file you need is accessory.tar.gz (Use DEPP version >= 0.1.51). If you want to place rRNA sequences, the file you need is 16s_accessory.tar.gz. 2. Simulated data * simulated_data/ils_data this directory contains the simulated data with ILS we used in the paper. For multiple genes, we concatenate sequences from all the genes. - simulated_data/ils_data/$c.tar.gz: data for $c model conditions. model.200.500000.0.000001, model.200.2000000.0.000001, model.200.10000000.0.000001 corresponding to high, medium or low discordance in the paper - simulated_data/ils_data/$c/$r/$n contains data for $r tree replicates with $n genes, files in it includes: - seq.fa: sequence file - query_label.txt: sequences id for the selected queries - model.ckpt: DEPP model - simulated_data/ils_data/$c/$r/backbone.nwk: backbone tree used in the paper with branch length reestimated using 32 genes - simulated_data/ils_data/$c/$r/s_trees.tree: complete species tree - simulated_data/ils_data/$c/$r/RAxML_bestTree.r100.run: backbone tree with branch length reestimated by RAxML - simulated_data/ils_data/$c/$r/subsample.fa: sequences for reestimating branch length which is generated by subsampling sites from 32 genes (each providing 500 sites). * simulated_data/hgt_data this directory contains the simulated data with HGT we used in the paper. - simulated_data/hgt_data/rep.$r.tar.gz: data for tree replicate $r. - simulated_data/hgt_data/rep.$r/$g/seq.fa: sequences data for tree replicate $r, gene $g - simulated_data/hgt_data/rep.$r/$g/model.ckpt: DEPP model - simulated_data/hgt_data/rep.$r/subsample.fa: sequences for reestimating branch length which is generated by subsampling sites from 5 genes (each providing 500 sites). - simulated_data/hgt_data/rep.$r/RAxML_bestTree.r100.run: backbone tree with branch length reestimated by RAxML - simulated_data/hgt_data/rep.$r/s_tree.trees: species tree for tree replicate $r - simulated_data/hgt_data/rep.$r/query_label.txt: sequences id for the selected queries 3. WoL data * WoL_data this directory contains the WoL data we used in the paper. - wol.nwk: WoL species tree - WoL_data/30_marker_genes.tar.gz: data for 30 marker genes with high, medium or low discordance (10 genes for each condition) - WoL_data/30_marker_genes/$gene_id: data for gene with id $gene_id - WoL_data/50_marker_genes.tar.gz: data for 50 randomly selected marker genes - WoL_data/50_marker_genes/$gene_id: data for gene with id $gene_id - WoL_data/5s: 5S data - WoL_data/16s.tar.gz: 16S data - WoL_data/16s/full_length: full length 16S data - WoL_data/16s/v3_v4: V3+V4 region of 16S data - WoL_data/16s/v4_150: part of V4 region in length ~150bp - WoL_data/16s/v4_100: part of V4 region in length ~100bp - files in the above directory include: - query_label.txt: sequences id for selected queries - seq.fa: all the sequences of the gene - model.ckpt: DEPP model for the gene - model.recon.ckpt (for 30 marker genes): DEPP model with reconstruction network - WoL/380_marker_genes: sequences for the 380 WoL marker genes (backbone sequences for the Traveler's Diarrhea experiment in the paper). 4. Traveler's Diarrhea * travelers_diarrhea this directory contains the Traveler's Diarrhea data we used in the paper. - travelers_diarrhea/MAG.tar.gz: MAG data - travelers_diarrhea/MAG/$gene_id.fa: query sequences for gene with id $gene_id - travelers_diarrhea/ASV: ASV data, files in the director include: - query.fa: query sequences from Traveler's Diarrhea data - backbone.fa: backbone sequences in WoL tree (trimed to have the same region as the ASV in queries)