Premise of the study: Hybrid capture with high-throughput sequencing (Hyb-Seq) is a powerful tool for evolutionary studies. The applicability of an Asteraceae family-specific Hyb-Seq method and the outcomes of different phylogenetic analyses are assessed.
Methods: Hyb-Seq data from 112 Asteraceae samples were organized into groups at different taxonomic levels (tribe, genus, and species). For each group, datasets of non-paralogous loci were built and proportions of parsimony informative characters estimated. The impacts of the analyzing alternative datasets, removing long branches and type of analysis on tree resolution and inferred topologies were investigated in tribe Cichorieae.
Results: Alignments of the Asteraceae-wide Hyb-Seq locus set were parsimony informative at all taxonomic levels. Each genus contained uniquely non-paralogous loci within their respective tribes. Levels of resolution and topologies inferred at shallower nodes differed depending on the locus dataset and the type of analysis, and were affected by the presence of long branches.
Discussion: The approach used to build a Hyb-Seq locus dataset and methodological artefacts, such as long branch attraction, influence resolution and topologies inferred in phylogenetic analyses. The Astereaceae Hyb-Seq probe set is applicable at multiple taxonomic depths and therefore probe sets do not necessarily need to be lineage specific.
Trimming approach testing of supercontig (exon + flanking intron region) alignments
Two trimming approaches were assessed for two datasets: Cichorieae wide and Picris hieracioides species complex in order to select an approach for all other datasets (see Tables 1 and 2 Jones et al. 2019 APPS). Trimal v. 1.4 (Capella-Gutiérrez et al., 2009) was used to remove spurious sequences from each alignment in all datasets. The following two trimming approaches were assessed: 1, “strict” - resoverlap and -seqoverlap were set to 0.75 and 0.80 respectively and 2, “less-strict” - resoverlap and -seqoverlap were set to 0.65 and 70, respectively. In addition, we used the -gappyout parameter, which efficiently removes poorly aligned regions (Capella-Gutiérrez et al., 2009). We manually checked ~30 alignments from each dataset and found that the “less-strict” approach described above generated reliable alignments and minimized numbers of samples that needed to be removed. Therefore, all other sample groups were subject to the following parameters in trimal: -resoverlap and –seqoverlap = 0.65 and 70. We provide the alignments and summary statistics from both approaches for the two sample groups.
1_Alignments_preliminary_supercontig_splashzones_Appendix_S4_APPS_Jonesetal.zip
Asteraceae exon alignments after removing long branches in TreeShrink ("shrunken" datasets)
Alignments of exon "shrunken" datasets were built for the Cichorieae wide and Picris hieracioides species complex datasets according to the following steps: gene trees were estimated for locus exon alignments (see dryad data file "Asteraceae exon alignments of non paralogous loci after trimming") using RAxML (see Table 2 in Jones et al. 2019 APPS) with the GTR+GAMMA model and 100 rapid BS replicates. Subsequently, we used TreeShrink to detect samples that had unexpectedly long branches in gene trees (false-positive tolerance level 0.10); we then removed those samples from alignments (Table 2).
All_alignments_shrunken_APPS_Jonesetal.zip
Asteraceae exon alignments of non paralogous loci after trimming
Exon alignments of COS loci for all Asteraceae sample groups in Table 1 in Jones et al 2019 APPS. HybPhyloMaker steps 1-3 were used for raw read quality filtering, mapping to targets and contig assembly. Alignments were built according to the following steps: 1. aligning with MAFFT, 2. removing samples with >70% missing length from the particular locus alignment and 3. a 100% sample presence criterion was applied and loci that were not present in all samples were removed from each sample group. See Table 1 attached for names of sample groups and numbers of loci. And Appendix S9 in Jones et al. 2019 APPS for exon alignment summary statistics.
All_alignments_exons_nonparalogs_posttrimming.zip
Asteraceae supercontig (exon + intron) alignments
Supercontig matrices built using HybPiper (intronerate.py) for all sample groups in Table 1 (Jones et al. 2019 APPS). All matrices were aligned using MAFFT and cleaned using the following trimal settings: resoverlap and -seqoverlap were 0.65 and 70, respectively with the -gappyout parameter. Potentially paralagous loci flagged for each sample group (according to HybPiper) were then removed from all alignments. See Appendix S8 in Jones et al. APPS for details of numbers of samples in alignments per sample group and summary statistics for alignments for all samples groups.
All_alignments_supercontigs_APPS_Jonesetal_2019.zip
Cichorieae and Picris species complex supercontig alignments analysed
Supercontig alignments analysed for the Cichorieae wide and the Picris hieracioides species complex sample groups (see table 2 Jones et al. 2019 APPS). See Fig. 2 for pipeline and analyses conducted for the two datasets.
Supercontig_alignments_analysed_Cichorieae_and_Picris_spcomple_APPS_Jonesetal.zip