Although massively parallel sequencing has facilitated large-scale DNA sequencing, comparisons among distantly related species rely upon small portions of the genome that are easily aligned. Methods are needed to efficiently obtain comparable DNA fragments prior to massively parallel sequencing, particularly for biologists working with non-model organisms. We introduce a new class of molecular marker, anchored by ultraconserved genomic elements (UCEs), that universally enable target enrichment and sequencing of thousands of orthologous loci across species separated by hundreds of millions of years of evolution. Our analyses here focus on use of UCE markers in Amniota, because UCEs and phylogenetic relationships are well known in some amniotes. We perform an in silico experiment to demonstrate that sequence flanking 2,030 UCEs contains information sufficient to enable unambiguous recovery of the established primate phylogeny. We extend this experiment by performing an in vitro enrichment of 2,386 UCE-anchored loci from nine, non-model avian species. We then use alignments of 854 of these loci to unambiguously recover the established evolutionary relationships within and among three ancient bird lineages. Because many organismal lineages have UCEs, this type of genetic marker and the analytical framework we outline can be applied across the tree of life, potentially reshaping our understanding of phylogeny at many taxonomic levels.
primate-probes-matches.sqlite
SQLITE database of all-probes matches to various primate genome sequences. 'matches' tables shows match status (1 = TRUE) of probes to genomes and 'match-map' shows the primate contig matched and the orientation of the match.
all-probes
FASTA-formatted text file of UCE-anchored probe sequences, designed from UCEs identified across Reptiles (birds and lizard). We (1) aligned these sequences to extant primate genomic sequences and (2) used synthetic oligos identical to a subset of these sequences to enrich target DNA in birds. Fasta header gives probes location relative to chromosomes of galGal3 (UCSC).
probe-matches-to-taxa.sqlite
SQLITE database of all-probes.fasta matches to genome-enabled vertebrate taxa (Supplementary Table 1). Column value of 1 = TRUE (meaning there was a match).
primate-uce-anchored-loci.fasta.tar.bz2
BZIP2 file of FASTA sequences sliced from primate genomes that include the match-site of probes within the all-probes.fasta file ± flanking sequence. The archive contains fasta files for each primate in Supplementary Table 2.
primate-uce-anchored-alignments.nexus.tar.bz2
NEXUS-formatted files providing alignments of UCE-anchored genomic regions identified in primate genomes where probes from all-probes.fasta matches respective primate genomes. We used these alignments to reconstruct the primate phylogeny.
birds-contigs-assembled-from-captures.tar.bz2
BZIP2 archive of FASTA files providing bird contigs assembled from reads following target enrichment with a subset of probes in all-probes.fasta. Archive contains one file per species.
birds-probe-matches.sqlite
SQLITE database of contigs (in birds-contigs-assembled-from-captures.tar.bz2), that match, without duplication, probes within all-probes.fasta.
birds-uce-anchored-loci.fasta.bz2
BZIP2 archive of FASTA files, corresponding to those contigs matching target enrichment probes (in birds-probe-matches.sqlite) that are not duplicated. We assemble these reads, on a locus-by-locus basis to generate the alignments in birds-uce-anchored-alignments.nexus.
birds-uce-anchored-alignments.nexus.tar.bz2
BZIP2 archive of NEXUS-formatted file alignments generated, on a locus-by-locus basis, from FASTA sequences in birds-uce-anchored-loci.
dbSNP132-to-hg19-uce-200
CSV-formatted file giving the UCE overlapping SNP locations present in dbSNP132. The 'snp-name' column gives the rs-accession for the SNP record.
probe-matches-to-taxa.lastz.tar.bz2
TAB-delimited LASTZ output from matches of probes in all-probes.fasta to the vertebrate genomes in Supplementary Table 1. We parsed this file to remove duplicates. We archived results for non-duplicated matches in probe-matches-to-taxa.sqlite.
primate-probes-matches.lastz.tar.bz2
TAB-delimited LASTZ output from matches of probes in all-probes.fasta to the primate genomes in Supplementary Table 2. We parsed this file to remove duplicates. We archived results for non-duplicated matches in primate-probes-matches.sqlite
birds-probes-matches.lastz.tar.bz2
TAB-delimited LASTZ output from matches of probes in all-probes.fasta to the vertebrate genomes in Supplementary Table 3. We parsed this file to remove duplicates. We archived results for non-duplicated matches in birds-probe-matches.sqlite.
probe-subset-2560-synthesized
FASTA file of the subset of 2,560 probes from all-probes that we synthesized for in vitro targeted capture of 2,386 loci in birds. FASTA header gives the probe id, the probe position in the chicken (galGal3) genome, and the count of probes targeting that locus.
all-sample-sim1000.txt
Simulations results used in Supplementary table indicating the effects of adding additional taxa on the reduction of loci in a complete data matrix.
all-sample-sim1000.txt
Data used to create Supplementary Figure 7.
birds-contig-lengths-by-probe-filtered-birds.csv
Data used in Supplementary Figure 1.
birds-gc-length-species-matches.csv
Data used in Supplementary Figures 2-5.
supplementary-tables-and-figures
Supplementary Tables 1-4 and Supplementary Figures 1-8.
computer-code-README
Location, commit (i.e., snapshot), and URL information for computer code used as part of this manuscript. Placed into Dryad in lieu of committing code under CC0.