We developed genetic resources for two North American frogs, Lithobates clamitans and Pseudacris regilla, widespread native amphibians that are potential indicator species of environmental health. For both species, mRNA from multiple tissues was sequenced using 454 technology. De novo assemblies with Mira3 resulted in 50,238 contigs (N50 = 687 bp) and 48,213 contigs (N50 = 686 bp) for L. clamitans and P. regilla, respectively, after clustering with CD-Hit-EST and purging contigs below 200 bp. We performed BLASTX similarity searches against the Xenopus tropicalis proteome and, for predicted ORFs, HMMER similarity searches against the Pfam-A database. Because there is broad interest in amphibian immune factors, we manually annotated putative antimicrobial peptides. To identify conserved regions suitable for amplicon re-sequencing across a broad taxonomic range, we performed an additional assembly of public short-read transcriptome data derived from two species of the genus Rana and identified reciprocal best TBLASTX matches among all assemblies. Although P. regilla, a hylid frog, is substantially more diverged from the ranid species, we identified 56 genes that were sufficiently conserved to allow non-degenerate primer design with Primer3. In addition to providing a foundation for comparative genomics and quantitative gene expression analysis, our results enable quick development of nuclear sequence-based markers for phylogenetics or population genetics.
Annotation spreadsheet for transcriptome assembly of Lithobates clamitans
An Excel spreadsheet of contig sequence similarity for the Lithobates clamitans assembly. BLASTX hits to Xenopus tropicalis, HMMER matches to the Pfam-A database (for predicted ORFs), and other annotation information is given. The column headings are as follows:
Contig: contig name
Length (bp): length in base pairs
%GC content: percent of contig sequence that is C or G
Longest ORF (min 50 amino acids): the longest ORF identified on the contig that was at least 50 amino-acids long
Pfam (E-value threshold 0.1): A colon-delimited list of matched Pfam accessions, E-values, and domain descriptions
Xenopus.tropicalis BLASTX E-value: E-value of the best BlastX match to X. tropicalis proteins.
bit: bit score of this match
id%: percentage identity (protein sequence) of this match to the query contig
description: descriptive text in the fasta header of the best match
RefSeq ID: associated RefSeq ID for best match
Entrez gene ID: associated Entrez gene ID for best match
UniProt ID: associated UniProt ID for best match
Lithobates.clamitans.annotation.xls
Annotation spreadsheet for transcriptome assembly of Pseudacris regilla
A spreadsheet of contig sequence similarity for the Pseudacris regilla assembly. BLASTX hits to Xenopus tropicalis, HMMER matches to the Pfam-A database (for predicted ORFs), and other annotation information is given. Column headings are:Contig: contig name
Length (bp): length in base pairs
%GC content: percent of contig sequence that is C or G
Longest ORF (min 50 amino acids): the longest ORF identified on the contig that was at least 50 amino-acids long
Pfam (E-value threshold 0.1): A colon-delimited list of matched Pfam accessions, E-values, and domain descriptions
Xenopus.tropicalis BLASTX E-value: E-value of the best BlastX match to X. tropicalis proteins.
bit: bit score of this match
id%: percentage identity (protein sequence) of this match to the query contig
description: descriptive text in the fasta header of the best match
RefSeq ID: associated RefSeq ID for best match
Entrez gene ID: associated Entrez gene ID for best match
UniProt ID: associated UniProt ID for best match
Pseudacris.regilla.annotation.xls
antimicrobial-peptide-clusters
A FASTA-formatted file containing aligned representative sequences for each cluster of antimicrobial peptide reads and/or contigs. The sequence headers correspond to the phylogeny in Figure 1.
antimicrobial-peptide-cluster-reads
The raw reads that mapped to each sequence cluster given in antimicrobial-peptide-clusters.fasta. This file provides the raw data underlying our clustering of antimicrobial peptide transcripts.
3-way-orthologs
A simple list of contigs from each of three transcriptome assemblies that were reciprocal-best TBLASTX matches. Each row represents a triplet of putatively orthologous sequences that may be useful for comparative genomics, primer design, etc.
3-way-aligned-orthologous-segments
This FASTA-formatted file is a series of 56 sequence alignments. Each alignment contains one sequence from each frog transcriptome studied. The sequences were aligned at the protein level using MUSCLE (Edgar 2004) and then converted to nucleotide alignments. Annotation information for each set of contigs can be found in the two annotation.xls spreadsheets. This file provides the expected amplicons from each reference sequence for the primers listed in conserved-primer-candidates-for-orthologous-segments.xls, for the purposes of guiding the selection of sequences that may be useful for a given population-genetic or phylogenetic study.
conserved-primer-candidates-for-orthologous-segments
A spreadsheet containing sets of forward and reverse PCR primers in successive rows, for each of the 56 segments in 3-way-aligned-orthologous-segments.fas. The primers were predicted with BatchPrimer3 (You et al. 2008) and the principal output such as expected product size and Tm is included. We also include the estimated dN/dS ratio for each pairwise comparison of the three frog transcriptomes, as an index of the overall conservation of each set of predicted cDNAs. The primer sets have not been systematically evaluated on either cDNA or genomic DNA, and may or may not bridge intronic sequence.
Summary-figure-of-nucleotide-distance-conserved-regions
This figure summarizes the nucleotide distances from the spreadsheet "conserved-primer-candidates-for-orthologous-segments.xls" in order to better guide the selection of loci for molecular phylogenetics. Loci with greater nucleotide distances may be more informative for analysis of closely related species.
Conserved signal peptides of AMPs
Sequence logos of the highly conserved signal peptide and acidic propiece region for each species, based on aligned cluster sequences. Standard amino-acid symbols are used, with dark font representing more acidic residues. The predicted cleavage site is C-terminal of the conserved cysteine residue. Position 24 of the P. regilla alignment is blank because the majority of sequences had a gap at this alignment position.