Skip to main content
Dryad

Data from: Identifying conserved genomic elements and designing universal bait sets to enrich them

Cite this dataset

Faircloth, Brant C. (2017). Data from: Identifying conserved genomic elements and designing universal bait sets to enrich them [Dataset]. Dryad. https://doi.org/10.5061/dryad.v0k4h

Abstract

Targeted enrichment of conserved genomic regions is a popular method for collecting large amounts of sequence data from non-model taxa for phylogenetic, phylogeographic and population genetic studies. For example, two available bait sets each allow enrichment of thousands of orthologous loci from >20 000 species (Faircloth et al. Systematic Biology, 61, 717–726, 2012; Molecular Ecology Resources, 15, 489–501, 2015). Unfortunately, few open-source workflows are available to identify conserved genomic elements shared among divergent taxa and to design enrichment baits targeting these regions. Those that do exist require extensive bioinformatics expertise and significant amounts of time to use. These shortcomings limit the application of targeted enrichment methods to additional organismal groups. Here, I describe a universal workflow for identifying conserved genomic regions in available genomic data and for designing targeted enrichment baits to collect data from these conserved regions. These methods require less expertise, less time and better use commonly available information to identify conserved loci and design baits to capture them. I apply this computational approach to the understudied arthropod groups Arachnida, Coleoptera, Diptera, Hemiptera or Lepidoptera to identify thousands of conserved loci in each group and design target enrichment baits to capture these loci. I then use in silico analyses to demonstrate that targeted enrichment of the conserved loci can be used to reconstruct the accepted relationships among genome sequences from the focal arthropod orders. The software workflow I created allowed me to identify thousands of conserved loci in five diverse arthropod groups and design sequence capture baits to target them. This suite of capture bait designs should enable collection of phylogenomic data from >900 000 arthropod species. Although the examples in this manuscript focus on understudied arthropod groups, the approach I describe is applicable to all organismal groups having some form of pre-existing genomic information (e.g. other invertebrates, plants, fungi and microbes). Finally, the documentation, design steps, software code and bait sets developed here are available under an open-source license for restriction-free testing, use, and additional modification by any research group.

Usage notes