Skip to main content

A two-tier bioinformatic pipeline to develop probes for target capture of nuclear loci with applications in Melastomataceae

Cite this dataset

Jantzen, Johanna et al. (2021). A two-tier bioinformatic pipeline to develop probes for target capture of nuclear loci with applications in Melastomataceae [Dataset]. Dryad.


Premise of the study: Putatively single-copy nuclear (SCN) loci, identified using genomic resources of closely related species, are ideal for phylogenomic inference. However, suitable genomic resources are not available for many clades, including Melastomataceae. We introduce a versatile approach to identify SCN loci for clades with few genomic resources and use it to develop probes for target enrichment in the distantly related Memecylon and Tibouchina (Melastomataceae).

Methods: We present a two-tiered pipeline. First, we identified putatively SCN loci using MarkerMiner and transcriptomes from distantly related species in Melastomataceae. Published loci and genes of functional significance were added (384 total loci). Second, using HybPiper, we retrieved 689 homologous template sequences for these loci using genome-skimming data from within the focal clades.

Results: We sequenced 193 loci from both Memecylon and Tibouchina, with probes designed from 56 template sequences successfully targeting sequences in both clades. Probes designed from genome-skimming data within a focal clade were more successful than probes designed from other sources.

Discussion: Our pipeline successfully identified and targeted SCN loci in Memecylon and Tibouchina, enabling phylogenomic studies in both clades and potentially across Melastomataceae. This pipeline could be easily applied to other clades with few genomic resources. 


The loci developed for this paper were identifying using a two tiered pipeline. First, MarkerMiner was used to identify putatively SCN loci. Second, genome-skimming reads were assembled using HybPiper to retrieve homologous sequences for these loci for probe design. 


Sequences were recovered using target enrichment and Illumina sequencing, and assembly was conducted using HybPiper. The success and utility of these loci and probes for use in phylogenomic analysis were assessed using HybPiper scripts and custom scripts.


Usage notes

This dataset includes the template sequences recovered by the second tier for the loci identified by the first tier in this pipeline as well as the probe sequences used for target enrichment. Custom scripts for analyzing these data are available on github at Cleaned reads are deposited on NCBI SRA (PRJNA592250, PRJNA573947,  PRJNA576018).


National Science Foundation, Award: DEB-1343612

American Society of Plant Taxonomists

Botanical Society of America

Society for the Study of Evolution

Society of Systematic Biologists

University of Florida

Florida Museum of Natural History