Data from: An evaluation of transcriptome-based exon capture for frog phylogenomics across multiple scales of divergence (Class: Amphibia, Order: Anura)

Portik DM, Smith LL, Bi K

Date Published: May 27, 2016

DOI: http://dx.doi.org/10.5061/dryad.pr3pr

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title Afrixalus paradorsalis annotated transcriptome
Downloaded 6 times
Description Whole RNA from a portion of liver sample preserved in RNA Later was extracted using the RNeasy Protect Mini Kit (Qiagen). Sequencing libraries were prepared using half reactions of the TruSeq RNA Library Preparation Kit V2 (Illumina), beginning with Poly-A selection for samples with high RIN scores (> 7.0) and Ribo-Zero Magnetic Gold (Epicentre) ribosomal RNA removal for samples with low RIN scores (< 7.0). Libraries were sequenced on an Illumina HiSeq2500 with 100 bp paired-end reads. Transcriptomic data were cleaned following Singhal (2013). Cleaned data were assembled using TRINITY (Grabherr et al. 2011) and annotated with Xenopus tropicalis (Ensembl) as a reference genome using reciprocal BLASTX (Altschul et al. 1997) and EXONERATE (Slater & Birney 2005).
Download Afr_paradorsalis.fasta (13.68 Mb)
Details View File Details
Title Hyperolius balfouri annotated transcriptome
Downloaded 5 times
Description Whole RNA from a portion of liver sample preserved in RNA Later was extracted using the RNeasy Protect Mini Kit (Qiagen). Sequencing libraries were prepared using half reactions of the TruSeq RNA Library Preparation Kit V2 (Illumina), beginning with Poly-A selection for samples with high RIN scores (> 7.0) and Ribo-Zero Magnetic Gold (Epicentre) ribosomal RNA removal for samples with low RIN scores (< 7.0). Libraries were sequenced on an Illumina HiSeq2500 with 100 bp paired-end reads. Transcriptomic data were cleaned following Singhal (2013). Cleaned data were assembled using TRINITY (Grabherr et al. 2011) and annotated with Xenopus tropicalis (Ensembl) as a reference genome using reciprocal BLASTX (Altschul et al. 1997) and EXONERATE (Slater & Birney 2005).
Download Hyp_balfouri.fasta (11.05 Mb)
Details View File Details
Title Hyperolius riggenbachi annotated transcriptome
Downloaded 4 times
Description Whole RNA from a portion of liver sample preserved in RNA Later was extracted using the RNeasy Protect Mini Kit (Qiagen). Sequencing libraries were prepared using half reactions of the TruSeq RNA Library Preparation Kit V2 (Illumina), beginning with Poly-A selection for samples with high RIN scores (> 7.0) and Ribo-Zero Magnetic Gold (Epicentre) ribosomal RNA removal for samples with low RIN scores (< 7.0). Libraries were sequenced on an Illumina HiSeq2500 with 100 bp paired-end reads. Transcriptomic data were cleaned following Singhal (2013). Cleaned data were assembled using TRINITY (Grabherr et al. 2011) and annotated with Xenopus tropicalis (Ensembl) as a reference genome using reciprocal BLASTX (Altschul et al. 1997) and EXONERATE (Slater & Birney 2005).
Download Hyp_riggenbachi.fasta (10.97 Mb)
Details View File Details
Title Kassina decorata annotated transcriptome
Downloaded 4 times
Description Whole RNA from a portion of liver sample preserved in RNA Later was extracted using the RNeasy Protect Mini Kit (Qiagen). Sequencing libraries were prepared using half reactions of the TruSeq RNA Library Preparation Kit V2 (Illumina), beginning with Poly-A selection for samples with high RIN scores (> 7.0) and Ribo-Zero Magnetic Gold (Epicentre) ribosomal RNA removal for samples with low RIN scores (< 7.0). Libraries were sequenced on an Illumina HiSeq2500 with 100 bp paired-end reads. Transcriptomic data were cleaned following Singhal (2013). Cleaned data were assembled using TRINITY (Grabherr et al. 2011) and annotated with Xenopus tropicalis (Ensembl) as a reference genome using reciprocal BLASTX (Altschul et al. 1997) and EXONERATE (Slater & Birney 2005).
Download Kass_decorata.fasta (10.72 Mb)
Details View File Details
Title Hyperoliid Orthologous Transcript Set
Downloaded 5 times
Description Marker set consisting of 1,265 orthologous transcripts (trimmed to 500-850 bp) from four species of hyperoliid frogs (5,060 total sequences). We compared annotated transcripts from the four species to search for orthologs via BLAST (Altschul et al. 1990). We removed mitochondrial loci from the transcripts. We only kept transcripts with a GC between 40%-70% because extreme GC content causes a reduced capture efficiency for the targets (Bi et al. 2012). Orthologous transcripts with a minimum length of 500 base pairs (bp) were identified across all four samples, resulting in the identification of 2,444 shared transcripts. Transcripts exceeding 850 bp were arbitrarily trimmed to this length for probe design, reflecting a trade-off decision between locus length and the total number of loci included in the experiment. The orthologous transcripts were subjected to additional filtering steps before a final gene set was chosen. The initial filtering step applied upper and lower limits on average transcript divergence, eliminating loci with low variation (< 5.0% average divergence) and exceptionally high variation (> 15.0% average divergence), resulting in the removal of 266 genes. The remaining 2,178 genes were examined for repetitive elements, short repeats, and low complexity regions, which are problematic for probe design and capture. The four sets of transcripts per gene (totaling 8,712 sequences) were screened using the REPEATMASKER Web Server (Smit et al. 2015). This step resulted in the masking of repetitive elements or low complexity regions in 929 sequences, with 7,783 sequences passing the filters. To be conservative, if any of the four transcripts for a gene contained masked sites, that gene was removed from the final marker set, which resulted in the removal of an additional 468 markers. From this reduced set of 1,710 markers, 400 markers with the highest divergence were selected (average divergence ranging from 10.4% to 14.9%) followed by 860 randomly drawn markers from the remaining subset. This marker set was supplemented with five positive controls, which consisted of nuclear sequence data generated using Sanger sequencing for five loci: POMC (624 bp), RAG-1 (777 bp), TYR (573 bp), FICD (524 bp), and KIAA2013 (540 bp). The final marker set selected for probe design included 1,265 genes from four species and 5,060 individual sequences.
Download Hyperoliid_Probe_Set.fasta (4.298 Mb)
Details View File Details
Title Hyperoliid MYbaits-3 custom probe set
Downloaded 14 times
Description The MYcroarray MYbaits-3 custom bait library (MYcroarray) design. There are 60,179 120mer baits in this file, allowing for 2x tiling (every 60 bp) of the 5,060 sequences. The kit allows 60,060 probes, therefore 119 probes were randomly dropped for final kit design.
Download bait-120-60.fas (10.64 Mb)
Details View File Details
Title Captured Exon Alignments
Downloaded 11 times
Description A fasta file of combined aligned sets of captured transcripts (exons only) for the four hyperoliid species used for transcriptome sequencing and probe design. Only markers with all 4 species present were kept, resulting in 999 transcripts. Names of transcripts correspond to those used in annotated transcriptomes, orthologous transcript set, and probe set for cross referencing. The concatenated alignment length of flanking regions is 592,651 base pairs.
Download Combined_Target_Output.fasta (2.603 Mb)
Details View File Details
Title Captured Flanking Region Alignments
Downloaded 3 times
Description A fasta file of combined aligned sets of captured flanking regions for the four hyperoliid species used for transcriptome sequencing and probe design. Only markers with all 4 species present were kept, resulting in 1071 flanking markers. Names of markers associated with flanking regions correspond to those used in annotated transcriptomes, orthologous transcript set, and probe set for cross referencing. The concatenated alignment length of flanking regions is 797,016 base pairs.
Download Combined_Flanking_Output.fasta (3.437 Mb)
Details View File Details

When using this data, please cite the original publication:

Portik DM, Smith LL, Bi K (2016) An evaluation of transcriptome-based exon capture for frog phylogenomics across multiple scales of divergence (Class: Amphibia, Order: Anura). Molecular Ecology Resources 16(5): 1069–1083. http://dx.doi.org/10.1111/1755-0998.12541

Additionally, please cite the Dryad data package:

Portik DM, Smith LL, Bi K (2016) Data from: An evaluation of transcriptome-based exon capture for frog phylogenomics across multiple scales of divergence (Class: Amphibia, Order: Anura). Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.pr3pr
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Search for data

Be part of Dryad

We encourage organizations to: