# Dryad submission for Owen et al. (2022) Systematic Biology https://doi.org/10.1093/sysbio/syac043 Below we describe each file in the main directory, while within each compressed file there is an additional README.md file to describe the data within. ### Family_Cicadidae_AHE_genomic_analysis_samples.xlsx - This is a MS Excel file that contains summary statistics from our analyses. There are multiple tabs that we describe below. - "contamination-gene-align-stats": this tab contains the following columns for each locus before the identified contamination was removed 1. locus: the name of the AHE locus 2. num-seqs: number of taxa present for that locus 3. parsimony-informative-sites: number of parsimony informative sites for the aligned sequences for the locus 4. alignment-len: length of the alignment in base pair (bp) 5. constant-sites: number of constant sites in the alignment 6. partition-rate-multipliers: The relative rates of each partition from PartitionFinder 7. tree-len: the length of the maximum likelihood gene tree 8. models: each evolutionary model assigned to each partition identified by PartitionFinder according to the BIC 9. num-partitions: the number of partitions that PartitionFinder identified 10. mod-desc: this is the count of each model in all of the gene tree analyses - "contamination-gene-tree-info": this tab contains the average bootstrap support for gene trees that did not have contamination removed 1. locus: the name of the AHE locus 2. average-bs: average of all bootstrap supports in the gene tree - "NO-contamination-gene-alignment-stats": this tab contains the following columns for each locus after contamination was removed 1. locus: the name of the AHE locus 2. num-seqs: number of taxa present for that locus 3. parsimony-informative-sites: number of parsimony informative sites for the aligned sequences for the locus 4. alignment-len: length of the alignment in base pair (bp) 5. constant-sites: number of constant sites in the alignment 6. partition-rate-multipliers: The relative rates of each partition from PartitionFinder 7. tree-len: the length of the maximum likelihood gene tree 8. p4-pvals: the p-value returned by the program [p4](https://p4.nhm.ac.uk/tutorial/tut_compo.html) to test for compositional homogeneity - "NO-contamination-gene-tree-info": this tab contains the average bootstrap support for gene trees where identified contamination was removed 1. locus: the name of the AHE locus 2. average-bs: average of all bootstrap supports in the gene tree - "Taxon-occurences-in-datasets": this tab describes the cumulative number of times a taxon occurs in different datasets 1. Taxon: The name of the taxon in the study 2. WITH-contamination: number of loci the taxon in Column A that are present in the Contamination dataset 3. NO-contamination: number of loci the taxon in Column A that are present in the non-contamination dataset 4. NO-contamination-no-p4: number of loci the taxon in Column A that present in the non-contamination dataset with loci removed due to p4 analysis - "Contamination-stats": this tab describes the most common contamination pairs and the summary of contamination pairs in each locus 1. Contamination-Taxon-Pairs: pairs of taxa that were identified as those that were contaminated 2. loci-present: the number of contaminated loci for the pairs of taxa in Contaminated-Taxon-Pairs (Column A) 3. locus: the name of each locus in the dataset 4. num-contam-pairs: the number of pairs of taxa identified as contaminated for the locus in Column D ### Table S1.docx - This MS Word file contains the museum label data for each specimen used in this study. 1. Subfamily/Tribe: Linnean ranks for each taxon 2. Genus: Linnean rank for each taxon 3. Species: Linnean rank for each taxon 4. Species author: Name of the author that originally described the species 5. Code: unique laboratory collection code for the specimen 6. Lat.: decimal latitude that the specimen was collected 7. Lon.: decimal longitude that the specimen was collected 8. Location: general geographic location the specimen was collected 9. Date(D/M/Y): the data in the following format DAY/MONTH/YEAR 10. Collectors: name of the individual(s) or laboratory that collected the specimen ### Table S2.xlsx - This file describes the raw sequence output and assembly of the AHE data for each specimen in the study. Please see [Mendoza et al. 2020](https://www.frontiersin.org/articles/10.3389/fpls.2019.01761/full) and the supplementary data for a description. ### Phylogenetic-data-analysis-results-and-log-files.zip - This compressed directory contains the sequence alignments and phylogenetic analyses results for the manuscript. Please uncompress the .zip file and see the specific README.md inside specific to this directory. ### Contamination-analyses.zip - This compressed directory contains the contamination analyses we used to identify contamination in our study. Please uncompress the .zip file and see the specific README.md inside specific to this directory.