Data from: Misconceptions on missing data in RAD-seq phylogenetics with a deep-scale example from flowering plants

Eaton DAR, Spriggs EL, Park B, Donoghue MJ

Date Published: October 11, 2016

DOI: http://dx.doi.org/10.5061/dryad.g549v

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title Supplementary Figure 1
Downloaded 37 times
Description Simulation procedure for dropping RAD-seq data by mutation-disruption in the program simrrls.
Download DRYAD_fig_S1.pdf (70.22 Kb)
Details View File Details
Title Supplementary Figure 2
Downloaded 39 times
Description The impact of missing data on quartet informativeness for simulated data (a-f) and the empirical Viburnum data set (g). This is an extension of Fig. 1. Data were simulated on three topologies, a balanced tree, an imbalanced tree, and the Viburnum topology with branch lengths scaled by penalized likelihood, and the outgroup removed. The number of loci that are quartet informative for each split is shown under each tree. In the absence of missing data all 1,000 simulated loci are informative about every edge (a-f; black circles). Under mutation-disruption (a-c) quartet information is lost faster in double-digest data (light grey) than in single-digest data (dark grey), and its effect varies depending on tree shape (see description in Fig. 1). Data simulated at low sequencing coverage (d-f) had either 50% (dark grey) or 80% (light grey) of data randomly missing. Here the effect of tree shape is more pronounced. Nearly all information is recovered across the deepest splits in the balanced topology (d) due to its hierarchical redundancy, but no data is recovered in the imbalanced topology (e) which does not increase in hierarchical redundancy across deeper edges. The empiricalViburnum topology is relatively balanced, and data simulated on this topology (c, f) appears similar to that simulated on the balanced topology (a, d). The true distribution of quartet informativeness recovered in the Viburnum RAD-seq data set (g) is similar to the expectation when data were simulated on this topology under low sequencing coverage (f).
Download DRYAD_fig_S2.pdf (83.34 Kb)
Details View File Details
Title Supplementary Figure 3
Downloaded 20 times
Description The effect of two forms of mutation-disruption in causing allelic dropout in simulations. Single digest (restriction recognition site length = 4, 6, or 8 bp; grey) and double digest (ezyme1 recognition length = 4, 6, or 8, and enzyme2 recognition length = 4; black) differ in the number of loci retained. (a) When only mutations occurring within cut sites cause disruption (dashed lines) longer cutters recover fewer data than short cutters. When only mutations giving rise to new cut sites within sequences cause disruption (solid lines) shorter cutters recover fewer data than long cutters. In both cases, the double digest data recover fewer loci than single digest, due to the greater opportunity for disruption. (b) When both forms of disruption cause dropout simultaneously the length of a single cutter (4, 6, or 8 bp) has little effect on the amount of data loss, while adding a second independent cutter causes the rate of mutation-disruption to approximately double.
Download DRYAD_fig_S3.pdf (80.22 Kb)
Details View File Details
Title Supplementary Figure 4
Downloaded 44 times
Description Scatterplots of the number of shared loci among quartets of sampled individuals in ten empirical data sets and their relationship with two predictor variables: phylogenetic distance and log median number of input reads. All values are mean-standardized
Download DRYAD_fig_S4.pdf (3.983 Mb)
Details View File Details
Title Supplementary Figure 5
Downloaded 25 times
Description Histograms of sequencing depth (coverage) in clusters recovered by pyrad across ten empirical data sets. In each, the sample with the fewest excluded low depth clusters (loci at depth <6X; green) is shown. The proportion of low coverage loci varies greatly across data sets with respect to total sequencing effort and the evenness of sequencing (Table 1).
Download DRYAD_fig_S5.pdf (266.0 Kb)
Details View File Details
Title Supplementary Table 1
Downloaded 74 times
Description Archived locations of raw data files for ten empirical RAD-seq data sets. Jupyter notebooks containing the code and assembly statistics for each assembled data set are available in the online repository: https://github.com/dereneaton/RADmissing.
Download DRYAD_tab_S1.pdf (36.85 Kb)
Details View File Details
Title Supplementary Table 2
Downloaded 11 times
Description Sequence read archive metadata for bioproject accession PRJNA299402 -- Viburnum RAD sequences.
Download SRA_metadata_final.csv (20.48 Kb)
Details View File Details
Title Supplementary Figure 6
Downloaded 36 times
Description Maximum likelihood phylogeny of Viburnum inferred from the full (a) and half (b) min4 data sets, and (c) a species tree constructed by quartet-joining with quartets inferred from the full min4 SNP alignment.
Download DRYAD_fig_S6.pdf (42.72 Kb)
Details View File Details

When using this data, please cite the original publication:

Eaton DAR, Spriggs EL, Park B, Donoghue MJ (2016) Misconceptions on missing data in RAD-seq phylogenetics with a deep-scale example from flowering plants. Systematic Biology 66(3):399-412. http://dx.doi.org/10.1093/sysbio/syw092

Additionally, please cite the Dryad data package:

Eaton DAR, Spriggs EL, Park B, Donoghue MJ (2016) Data from: Misconceptions on missing data in RAD-seq phylogenetics with a deep-scale example from flowering plants. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.g549v
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Search for data

Be part of Dryad

We encourage organizations to: