Taxon-specific or universal? Using target capture to study the evolutionary history of a rapid radiation
Data files
Nov 28, 2023 version files 7.60 MB
-
Baits-80-40-Pass_Client_filtering_GC_Under_70.fas.clust-75-95.fasta
-
README.md
-
supporting_information_Table1_legend.csv
-
supporting_information_Table1.csv
-
supporting_information_Table2.csv
-
supporting_information_Table3.csv
-
supporting_information_Table4.csv
-
supporting_information_Table5.csv
Abstract
Target capture emerged as an important tool for phylogenetics and population genetics in non-model taxa. Whereas developing taxon-specific capture probes requires sustained efforts, available universal kits may have a lower power to reconstruct relationships at shallow phylogenetic scales and within rapidly radiating clades. We present here a newly-developed target capture set for Bromeliaceae, a large and ecologically-diverse plant family with highly variable diversification rates. The set targets 1,776 coding regions, including genes putatively involved in key innovations, with the aim to empower testing of a wide range of evolutionary hypotheses. We compare the relative power of this taxon-specific set, Bromeliad1776, to the universal Angiosperms353 kit. The taxon-specific set results in higher enrichment success across the entire family, however, the overall performance of both kits to reconstruct phylogenetic trees is relatively comparable, highlighting the vast potential of universal kits for resolving evolutionary relationships. For more detailed phylogenetic or population genetic analyses, e.g. the exploration of gene tree concordance, nucleotide diversity or population structure, the taxon-specific capture set presents clear benefits. We discuss the potential lessons that this comparative study provides for future phylogenetic and population genetic investigations, in particular for the study of evolutionary radiations.
README: Taxon-specific or universal? Using target capture to study the evolutionary history of a rapid radiation
https://doi.org/10.5061/dryad.mpg4f4r11
Description of the data and file structure
- Baits-80-40-Pass_Client_filtering_GC_Under_70.fas.clust-75-95.fasta - FASTA sequences for bait probes. Baits are 80bp with x2 (40bp) overlap and designed as described in the manuscript: 10.1111/1755-0998.13523, further information in the github repository: https://github.com/giyany/Bromeliad1776/tree/main
- supporting_information_Table1.csv - Genes included in the Bromeliad1776 bait design, with identifiers as annotated in Ananas comosus genome v.3 (Ming et al., 2015). The table includes details about exon composition, copy number and putatively associated pathways. See legend in file supporting_information_Table1_legend.csv.
- supporting_information_Table1_legend.csv - Legend for table S1.
- supporting_information_Table2.csv - Categories of pathways and traits used to choose genes of interest for the Bromeliad1776 bait set, including literature source and number of genes in each category.
- supporting_information_Table3.csv - List of accessions used in the study, including source and collection details. For samples of Tillandsia subgenus Tillandsia locality codes are also indicated.
- supporting_information_Table4.csv - Number of reads, numbers and percentage of read mapping to target in all samples enriched with the Angiosperm353 kir or the Bromelia1776 kit.
- supporting_information_Table5.csv - Averaged levels of nucleotide diversity at synonymous (πS) and non-synonymous (πN) for 5 Tillandsia subgenus Tillandsia species.
Methods
The bait set was designed using whole-genome sequences and gene models from Ananas comosus v.3 (Ming et al., 2015). Random protein coding genes were selected based on genetic diversity parameters, total exonic size, individual exon size and copy-number variation. We then added genes associated with key innovative traits in Bromeliaceae, either genes previously annotated in A. comosus or when annotate in other species, using BLAST to find the A. comosus genes with the highest match scores. Genes underpinning innovative traits were included in the bait design, regardless of criteria used for random proteing coding genes, like size and duplication rate. We included markers previously used for phylogenomic inference in Bromeliaceae and genes orthologous to those in the Angiosperms353 bait set. An additional round of filtering was performed by the manufacturer of the final bait set, Arbor Biosciences (Ann Arbor, MI, 167 USA), where multi-copy genes with sequences that are more than 95% identical were collapsed into a single sequence, and baits with more than 70% GC content or containing at least 25% repeated sequences were excluded. In addition, targets including exons smaller than 80 bp were completed with regions flanking the exons according to the A. comosus reference genome. The final kit included 1776 genes and is subsequently referred to as the Bromeliad1776 bait set.
The 1776 selected genes as annonated for Ananas comosus v.3 are detailed in Supporting information Table S1 and S2.
Usage notes
List of bait sequences in fasta format of the Bromeliad1776 target capture kit including all 57,445 baits of 80bp.