Data from: Next-generation polyploid phylogenetics: rapid resolution of hybrid polyploid complexes using PacBio single-molecule sequencing
Data files
Jun 17, 2017 version files 1.74 MB
-
all_regimes_trees.zip
1.51 MB
-
barcodes_pacbio.xlsx
60.36 KB
-
batch_run_purc.sh
327 B
-
extract_specific_accessions.py
3 KB
-
merge_allRegimes_and_align.sh
2.73 KB
-
PURC_molecularlab_tracking.xlsx
81.30 KB
-
PURCvsSanger_trees.zip
8.67 KB
-
scripts_for_figures.zip
12.06 KB
-
vouchertable_allruns.xlsx
55.89 KB
Abstract
Difficulties in generating nuclear data for polyploids have impeded phylogenetic study of these groups. We describe a high-throughput protocol and an associated bioinformatics pipeline (PURC: “Pipeline for Untangling Reticulate Complexes”) that is able to generate these data quickly and conveniently, and demonstrate its efficacy on accessions from the fern family Cystopteridaceae. We conclude with a demonstration of the downstream utility of these data by inferring a multilabeled species tree for a subset of our accessions. We amplified four ~1kb-long nuclear loci and sequenced them in a parallel-tagged amplicon sequencing approach using the PacBio platform. PURC infers the final sequences from the raw reads via an iterative approach that corrects PCR and sequencing errors and removes PCR-mediated recombinant sequences (chimeras). We generated data for all gene copies (homeologs, paralogs, and segregating alleles) present in each of three sets of 50 mostly-polyploid accessions, for four loci, in three PacBio runs (one run per set). From the raw sequencing reads PURC was able to accurately infer the underlying sequences. This approach makes it easy and economical to study the phylogenetics of polyploids, and in conjunction with recent analytical advances, facilitates investigation of broad patterns of polyploid evolution.