Skip to main content
Dryad

Data from: Sexy fingers: Pheromones in the glands of male dendrobatid frogs

Cite this dataset

Abondano Almeida, Diana et al. (2024). Data from: Sexy fingers: Pheromones in the glands of male dendrobatid frogs [Dataset]. Dryad. https://doi.org/10.5061/dryad.dv41ns26n

Abstract

Our study investigates the role of the swollen fingers of two dendrobatid species, Leucostethus brachistriatus, and Epipedobates anthonyi, in pheromone production using whole-transcriptome sequencing (RNAseq). We examined differential gene expression in the swollen versus non-swollen fingers and toes. The overwhelming pattern of gene expression in both species was strong upregulation of sodefrin precursor-like factors (SPFs) in swollen fingers, a well-known pheromone system in salamanders. As part of the results, we include a fasta-file with the RNA-seq assemblies from each species, as well as their annotation and Differential expression (DE) analyses. Likewise, we include the alignment of some of the most highly expressed sodefrin precursor-like-factor (SPF) sequences

README: Sexy fingers: pheromones in the glands of male dendrobatid frogs

Diana Abondano Almeida1,a, Evan Twomey1, Fernando Vargas-Salinas2, Carmen Meyer1, Lisa M. Schulte1

Affiliations

1 Department of Wildlife-/Zoo-Animal-Biology and Systematics, Faculty of Biological Sciences, Goethe University Frankfurt, Frankfurt am Main, Germany

2 Grupo de Investigación en Evolución, Ecología y Conservación (EECO), Programa de
Biología, Universidad del Quindío, Armenia, Colombia

a Correspondence: e-mail, dianabondano1@gmail.com
ORCID: 0000-0002-3336-3487

We investigated the role of the swollen fingers of two dendrobatid species, Leucostethus brachistriatus, and Epipedobates anthonyi, in pheromone production using whole-transcriptome sequencing (RNAseq). We examined differential gene expression in the swollen versus non-swollen fingers and toes with a focus on searching for known amphibian pheromones.

The RNA-seq assemblies per species are FASTA files:

Our general approach was to (1) assemble on a per-individual basis using both tissue samples at once (swollen and control), (2) perform each assembly at multiple kmer sizes, and (3) merge the multiple assemblies into a single non-redundant assembly for each species. Multi-kmer assembly and subsequent merging with the tr2aacds pipeline (Gilbert, 2013) has been found to yield assemblies with greater completeness and lower sequence duplication with respect to BUSCO benchmarks (Mamrot et al., 2017). Transcriptome assembly was done with SPAdes v.3.13.1 (Bankevich et al., 2012). For each individual, we assembled with kmer sizes 25, 53 and 75, supplying both sets of reads (swollen and control) corresponding to that individual. As there were three individuals sequenced per species, this meant that a total of nine assemblies were generated per species, which were merged into a single assembly using the EvidentialGene (Evigene) tr2aacds pipeline v.2017.12.21 (Gilbert, 2013).

The explanations to the headers of the supplementary_data
Source original reference (see Table S8 & 9. List of annotated genes in L. brachistriatus and E. anthonyi)
qseqid= Query Seq-id
sseqid= Subject Seq-id
stitle= Subject Title (annotation)
pident= Percentage of identical matches
length=Alignment length
mismatch= Number of mismatches
gapopen= Number of gap openings
qstart= Start of alignment in query
qend=End of alignment in query
sstart=Start of alignment in subject
send=End of alignment in subject
evalue=Expect value
bitscore=Bit score

The explanations to the headers of the supplementary_data
Source original reference (see Table S10 & S11. Deseq_Leucostethus_brachistriatus and Deseq_Epipedobates_anthonyi)
ID_transcript= Accession number of the transcript in NCBI
Transcript= transcript ID in the assembly
annotations= Known protein reference sequence from NCBI
log2FoldChange= The effect size estimate. Indicates how much the transcript's expression changed between the comparison and control groups. This value is reported on a logarithmic scale to base 2.
lfcSE= The standard error estimate for the log2 fold change estimate
stat= The value of the test statistic for the gene or transcript
pvalue= P-value of the test for the gene or transcript
padj= Adjusted P-value for multiple testing for the gene or transcript.

Usage notes
Microsoft Excel is required to open annotation and Deseq analysis.

Funding
The Zoo Frankfurt financially supported this work.

Methods

We collected adult males from Leucostethus brachistriatus and Epipedobates anthonyi. Three males were used for the RNA sequencing analysis. We sampled both fingers IV (as a target tissue) of three males from each species and as a control tissue, we sampled both fingers III of L. brachistriatus and both toes III of E. anthonyi. We realized the RNA extraction, and library preparation and sequencing were done by Genewiz in Leipzig, Germany.
A total of 12 samples were sequenced for this study in a paired-sample design (i.e., paired control–swollen samples for each of three males per species).

Assembly
Our general approach was to (1) assemble on a per-individual basis using both tissue samples at once (swollen and control), (2) perform each assembly at multiple kmer sizes, and (3) merge the multiple assemblies into a single non-redundant assembly for each species. Multi-kmer assembly and subsequent merging with the tr2aacds pipeline (Gilbert, 2013) has been found to yield assemblies with greater completeness and lower sequence duplication with respect to BUSCO benchmarks (Mamrot et al., 2017). Transcriptome assembly was done with SPAdes v.3.13.1 (Bankevich et al., 2012). For each individual, we assembled with kmer sizes 25, 53 and 75, supplying both sets of reads (swollen and control) corresponding to that individual. As there were three individuals sequenced per species, this meant that a total of nine assemblies were generated per species, which were merged into a single assembly using the EvidentialGene (Evigene) tr2aacds pipeline v.2017.12.21 (Gilbert, 2013). The completeness of our merged assemblies were assessed with BUSCO v.3.0.2 (Simão et al., 2015) using the vertebrata_odb9 database. Contig metrics were obtained using TransRate v1.0.125 (Smith-Unna et al., 2016).

Annotation, read alignment, and gene expression analysis
For each species, we annotated the final merged (evigene) assembly with Diamond v.0.9.22.123 (Buchfink et al., 2015), using Blastx mode to search assembly transcripts against the Identical Protein Groups database from NCBI (NCBI, 2022), which was filtered to include only vertebrate sequences. Diamond was run with max-target-seqs 1 and max-hsps 1 in order to retain a single, best hit for each transcript.
Sequencing reads were aligned to the reference transcriptome with HISAT2 version 2.1.0 (Kim et al., 2015) in paired-end mode with default parameters. To quantify gene expression, we manually created a GTF annotation track from the Diamond results, appending the gene_id with transcript_id to ensure that each transcript had a unique identifier and thus the expression was calculated at the transcript (rather than gene) level. Reads aligning to annotated transcript regions were tallied with FeatureCounts version 2.0.6 (Liao et al., 2014), providing the counts data used for the differential expression analysis.
Differential expression (DE) analysis was performed with the R package “DESeq2” (version 4.2). The Benjamini-Hochberg correction was applied to p-values to account for multiple hypothesis testing (Love et al., 2014). Given that we employed a paired-sample design (swollen and control tissues taken from each individual), the final model included tissue type as a fixed effect and individual as a random effect to account for both treatment and individual variation.

Phylogenetic analysis of SPF transcripts
We conducted a phylogenetic analysis in order to investigate the evolution of the SPFs genes found in both studied species. In particular, we were interested in knowing whether these belong to the SPF beta clade, in which a pheromonal function has previously been demonstrated in salamanders (Van Bocxlaer et al., 2015), and is strongly suspected in frogs (Bossuyt et al., 2019). For this analysis, we selected SPFs that had both a significant p-value (from DEseq2) and high expression in the swollen tissue (average TPM > 1000 in swollen fingers). This was the case for six SPF transcripts in L. brachistriatus and eight in E. anthonyi. For consistency with previous studies, we added these new sequences to a dataset similar to what was previously used Bossuyt et al., (2019) and Schulte et al., (2022), which includes highly expressed SPFs from the breeding glands of several frog and salamander species (24 salamander and 19 frog SPF sequences), all SPFs from the Xenopus tropicalis genome (28 sequences), 13 SPFs from amniotes, and one sequence from the pufferfish Takifugu rubripes, used as an outgroup.
Amino acid sequences were aligned in MAFFT v7 (Katoh & Standley, 2013) using the L-INS-I procedure and a maximum 1,000 iterations. Phylogenetic analysis was done in IQ-TREE v2.3.2 (Minh et al., 2020), using the VT+I+G4 model of amino acid evolution (selected by ModelFinder according to the Bayesian Information Criterion), with 1000 initial parsimony trees (-ninit), 100 top initial trees (-ntop), thorough NNI search (-allnni), 500 unsucessful iterations to stop (-nstop), and 1000 ultrafast bootstrap replicates (-bb).

We use whole-transcriptome sequencing (RNAseq) and differential gene expression analysis to investigate gene expression and possible pheromone production in the swollen fingers of two dendrobatid species, Leucostethus brachistriatus and Epipedobates anthonyi 

Funding

The Zoo Frankfurt