A two-tier bioinformatic pipeline to develop probes for target capture of nuclear loci with applications in Melastomataceae
Jantzen, Johanna et al. (2021), A two-tier bioinformatic pipeline to develop probes for target capture of nuclear loci with applications in Melastomataceae, Dryad, Dataset, https://doi.org/10.5061/dryad.8931zcrm2
Premise of the study: Putatively single-copy nuclear (SCN) loci, identified using genomic resources of closely related species, are ideal for phylogenomic inference. However, suitable genomic resources are not available for many clades, including Melastomataceae. We introduce a versatile approach to identify SCN loci for clades with few genomic resources and use it to develop probes for target enrichment in the distantly related Memecylon and Tibouchina (Melastomataceae).
Methods: We present a two-tiered pipeline. First, we identified putatively SCN loci using MarkerMiner and transcriptomes from distantly related species in Melastomataceae. Published loci and genes of functional significance were added (384 total loci). Second, using HybPiper, we retrieved 689 homologous template sequences for these loci using genome-skimming data from within the focal clades.
Results: We sequenced 193 loci from both Memecylon and Tibouchina, with probes designed from 56 template sequences successfully targeting sequences in both clades. Probes designed from genome-skimming data within a focal clade were more successful than probes designed from other sources.
Discussion: Our pipeline successfully identified and targeted SCN loci in Memecylon and Tibouchina, enabling phylogenomic studies in both clades and potentially across Melastomataceae. This pipeline could be easily applied to other clades with few genomic resources.
The loci developed for this paper were identifying using a two tiered pipeline. First, MarkerMiner was used to identify putatively SCN loci. Second, genome-skimming reads were assembled using HybPiper to retrieve homologous sequences for these loci for probe design.
Sequences were recovered using target enrichment and Illumina sequencing, and assembly was conducted using HybPiper. The success and utility of these loci and probes for use in phylogenomic analysis were assessed using HybPiper scripts and custom scripts.
This dataset includes the template sequences recovered by the second tier for the loci identified by the first tier in this pipeline as well as the probe sequences used for target enrichment. Custom scripts for analyzing these data are available on github at https://github.com/jjantzen/Probe_design. Cleaned reads are deposited on NCBI SRA (PRJNA592250, PRJNA573947, PRJNA576018).
National Science Foundation, Award: DEB-1343612
American Society of Plant Taxonomists
Botanical Society of America
Society for the Study of Evolution
Society of Systematic Biologists
University of Florida
Florida Museum of Natural History