A target capture approach for phylogenomic analyses at multiple evolutionary timescales in rosewoods (Dalbergia spp.) and the legume family (Fabaceae)
Crameri, Simon; Fior, Simone; Zoller, Stefan; Widmer, Alex (2022), A target capture approach for phylogenomic analyses at multiple evolutionary timescales in rosewoods (Dalbergia spp.) and the legume family (Fabaceae), Dryad, Dataset, https://doi.org/10.5061/dryad.73n5tb2z7
Understanding the genetic changes associated with the evolution of biological diversity is of fundamental interest to molecular ecologists. The assessment of genetic variation at hundreds or thousands of unlinked genetic loci forms a sound basis to address questions ranging from micro- to macro-evolutionary timescales, and is now possible thanks to advances in sequencing technology. Major difficulties are associated with i) the lack of genomic resources for many taxa, especially from tropical biodiversity hotspots, ii) scaling the numbers of individuals analyzed and loci sequenced, and iii) building tools for reproducible bioinformatic analyses of such datasets. To address these challenges, we developed a set of target capture probes for phylogenomic studies of the highly diverse, pantropically distributed and economically significant rosewoods (Dalbergia spp.), explored the performance of an overlapping probe set for target capture across the legume family (Fabaceae), and built a general-purpose bioinformatics pipeline. Phylogenomic analyses of Dalbergia species from Madagascar yielded highly resolved and well supported hypotheses of evolutionary relationships. Population genomic analyses identified differences between closely related species and revealed the existence of a potentially new species, suggesting that the diversity of Malagasy Dalbergia species has been underestimated. Analyses at the family level corroborated previous findings by the recovery of monophyletic subfamilies and many well-known clades, as well as high levels of gene tree discordance, especially near the root of the family. The new genomic and bioinformatics resources will hopefully advance systematics and ecological genetics research in legumes, and promote conservation of the highly diverse and endangered Dalbergia rosewoods.
We produced a transcriptome assembly of a cultivated individual of Dalbergia madagascariensis subsp. antongilensis Bosser & R. Rabev., based on 63 million paired-end sequencing reads generated on an Illumina® HiSeqTM 2000 platform. We performed de novo assembly of the transcriptome using Trinity release 2012-01-25 (Grabherr et al., 2011), resulting in 146,484 scaffolds, which were between 201 and 17,129 bp long, with a mean length of 815 bp (see Supplementary Methods). We then pairwise aligned the Dalbergia transcriptome with reference genomes of five legume species available in public databases to generate a set of 12,049 probes from 6,555 conserved target regions (see Supplementary Methods). This probe set was used for synthesis of hybridizing probes at myBaits® Custom Target Capture Kits (Arbor Biosciences; https://arborbiosci.com).
Motivated by the need for genomic resources to inform a reliable taxonomy and foster conservation practice for endangered rosewood species (Dalbergia spp., Fabaceae), we introduce a target capture approach for anchored phylogenomic analyses in Dalbergia (Dalbergia2396 set). We further explored the applicability of our approach for analyses across the entire legume family, which resulted in a second probe set (Fabaceae1005 set).
Target capture sequencing data has been uploaded here: https://www.ebi.ac.uk/ena/browser/view/PRJEB41848
The accompanying bioinformatics pipeline CaptureAl was used for data analysis. It is available and documented here: https://github.com/scrameri/CaptureAl