Premise of the study: Phylogenetic inference is moving to large multilocus data sets, yet there remains uncertainty in the choice of marker and sequencing method at low taxonomic levels. To address this gap, we present a method for enriching long loci spanning intron-exon boundaries in the genus Heuchera. Methods: Two hundred seventy-eight loci were designed using a splice-site prediction method combining transcriptomic and genomic data. Biotinylated probes were designed for enrichment of these loci. Reference-based assembly was performed using genomic references; additionally, chloroplast and mitochondrial genomes were used as references for off-target reads. The data were aligned and subjected to coalescent and concatenated phylogenetic analyses to demonstrate support for major relationships. Results: Complete or nearly complete (>99%) sequences were assembled from essentially all loci from all taxa. Aligned introns showed a fourfold increase in divergence as opposed to exons. Concatenated analysis gave decisive support to all nodes, and support was also high and relationships mostly similar in the coalescent analysis. Organellar phylogenies were also well-supported and conflicted with the nuclear signal. Discussion: Our approach shows promise for resolving a recent radiation. Enrichment for introns is highly successful with little or no sequencing dropout at low taxonomic levels despite higher substitution and indel frequencies, and should be exploited in studies of species complexes.

Concatenated low-copy nuclear matrix

Concatenated data matrix of all 277 successfully enriched loci, assembled by BWA and aligned by MAFFT.

nuclearmatrix.phy

Concatenated nuclear matrix, exons only

Concatenated nuclear matrix, trimmed of introns.

exononlymatrix.phy.txt

Concatenated nuclear matrix, introns only

The concatenated nuclear matrix, with exonic regions deleted. The resultant matrix was shortened to match the exon-only alignment exactly in length, in order to fairly compare phylogenetic signal between the two regions.

intrononlyreducedmatrix.phy.txt

Chloroplast genome matrix

Chloroplast genome alignment, assembled by BWA and aligned by MAFFT.

chloroplastmatrix.phy

Mitochondrial genome matrix

Mitochondrial genome alignment, assembled in BWA and aligned with Mauve. Tree labels refer to specimen voucher numbers.

mitochondrialmatrix.phy

Concatenated nuclear ML tree

Maximum likelihood tree inferred on the concatenated low-copy nuclear data using RAxML. Tree labels refer to specimen voucher numbers.

RAxMLnucleartree.tre

Chloroplast ML tree

Maximum likelihood tree inferred on the chloroplast data using RAxML. Tree labels refer to specimen voucher numbers.

RAxMLchloroplasttree.tre

Mitochondrial ML tree

Maximum likelihood tree inferred on the mitochondrial data using RAxML. Tree labels refer to specimen voucher numbers.

RAxMLmitochondrialtree.tre

Annotated target loci

The 278 loci for which probes were developed, with exonic and intronic regions annotated, in GenBank flatfile format. The loci are numbered in descending order of length, so that "Locus 1" is the longest and "Locus 278" the shortest. One locus was not consistently enriched and was dropped from analyses; this is labeled "Locus 4".

AnnotatedTargetLoci.gb

Individual nuclear locus alignments

Individual locus alignments for the 277 successfully enriched loci. Labeling matches other files; hence loci are labeled by descending target length.

Alignments.zip

Gene trees

ML gene trees (inferred in RAxML) for each of the 277 enriched loci. The gene trees are unlabeled, but they are in precisely the same order as other files (descending locus length), hence line 1 contains the tree for Locus 1, and line 277 contains the tree for Locus 278 (Locus 4 is again omitted, hence the numbering discrepancy). Gene tree labels refer to specimen voucher numbers.

Genetrees.tre

MP-EST tree

Tree inferred in MP-EST; the STAR tree was topologically identical. Branch labels are coalescent branch lengths. Given the lack of infraspecific sampling, it is impossible to estimate tip branch lengths; coalescent programs generally plot these as the maximum possible value (here, 9) but these should be ignored. Internal branch estimates are correct assuming gene tree discord is solely due to the coalescent. Tree labels refer to specimen voucher numbers.

MP_EST.tre

Data from: A protocol for targeted enrichment of intron-containing sequence markers for recent radiations: a phylogenomic example from Heuchera (Saxifragaceae)

Data files

Abstract

Concatenated low-copy nuclear matrix

Concatenated nuclear matrix, exons only

Concatenated nuclear matrix, introns only

Chloroplast genome matrix

Mitochondrial genome matrix

Concatenated nuclear ML tree

Chloroplast ML tree

Mitochondrial ML tree

Annotated target loci

Individual nuclear locus alignments

Gene trees

MP-EST tree

Data from: A protocol for targeted enrichment of intron-containing sequence markers for recent radiations: a phylogenomic example from Heuchera (Saxifragaceae)

Data files

Abstract

Usage notes

Concatenated low-copy nuclear matrix

Concatenated nuclear matrix, exons only

Concatenated nuclear matrix, introns only

Chloroplast genome matrix

Mitochondrial genome matrix

Concatenated nuclear ML tree

Chloroplast ML tree

Mitochondrial ML tree

Annotated target loci

Individual nuclear locus alignments

Gene trees

MP-EST tree

Works referencing this dataset