Ecologically mediated differences in electric organ discharge drive evolution in a sodium channel gene in South American electric fishes
Cite this dataset
Hauser, Frances et al. (2024). Ecologically mediated differences in electric organ discharge drive evolution in a sodium channel gene in South American electric fishes [Dataset]. Dryad. https://doi.org/10.5061/dryad.7m0cfxq1x
Abstract
Active electroreception — the ability to detect objects and communicate with conspecifics via the detection and generation of electric organ discharges (EODs) — has evolved convergently in several fish lineages. South American electric fishes (Gymnotiformes) are a highly species-rich group, possibly in part due to evolution of an electric organ (EO) that produces diverse EODs. Neofunctionalization of a voltage-gated sodium channel accompanied the evolution of electrogenic tissue from muscle and resulted in a novel gene (scn4aa) uniquely expressed in the EO. Here, we investigate the link between variation in scn4aa and differences in EOD waveform. We combine gymnotiform scn4aa sequences encoding the C-terminus of the Nav1.4a protein with biogeographic data and EOD recordings. We test whether physiological transitions among EOD types accompany differential selection pressures on scn4aa. We found positive selection on scn4aa coincided with shifts in EOD types. Species that evolved in the absence of predators, which likely selected for reduced EOD complexity, exhibited increased scn4aa evolutionary rates. We model mutations in the protein that may underlie changes in protein function and discuss our findings in the context of gymnotiform signalling ecology. Together, this work sheds light on the selective forces underpinning major evolutionary transitions in electric signal production.
README: Molecular evolution of the Nav1.4a c-terminus in 105 species of South American Electric Fishes
Hauser FE, Xiao D, Van Nynatten A, Brochu-DeLuca KK, Rajakulendran T, Elbassiouny AE, Sivanesan H, Sivananthan P, Crampting WGR, Lovejoy NR. Ecologically mediated differences in electric organce discharge drive evolution in a sodium channel gene in South American electric fishes. In press, Biology Letters
Reference Information
Provenance for this README
- File name: README_Hauser_Nav.txt
- Authors: Frances E Hauser
- Other contributors: Dawn Xiao, Alexander Van Nynatten, Kristen K. Brochu-De Luca, Thanara Rajakulendran, Ahmed E. Elbassiouny, Harunya Sivanesan, Pradeega Sivananthan1, William G.R. Crampton, Nathan R. Lovejoy
- Date created: 2024-01-16
Dataset Attribution and Usage
Dataset Title: Data for the article "Ecologically mediated differences in electric organce discharge drive evolution in a sodium channel gene in South American electric fishes"
Persistent Identifier: https://doi.org/10.5061/dryad.7m0cfxq1x
Supplemental Information: https://figshare.com/s/7214ad8139a75357887a
Genbank Accession numbers for sequences: OR838936-OR839040
Contact Information
- Name: Frances E Hauser
- Affiliations: Biological Sciences, University of Toronto Scarborough
- ORCID ID: https://orcid.org/0000-0001-9694-7300
- Email: frances.hauser@utoronto.ca
Methodological Information
This paper investigated the molecular evolution of the Nav1.4a C-terminus (scn4aa gene) from 105 species of electric fishes. We used PAML random sites analysis (the M0, M1. M2, M3, M7, M8, and M8a models); and clade model analyses (CMC) compared to a null model (M2a_rel).
PAML analyses are run by combining an alignment file (.txt), phylogenetic tree (.tre) and control file (.ctl) that directs paml what models to run. PAML then outputs a variety of outfiles with parameter estimates about the dataset, including dN/dS (the ratio of nonsynonymous to synonymous nucleotide substitutions (random sites analyses).
Clade Model C (CMC), a different type of PAML analysis, tests whether preselected (Foreground) branches are evolving differently from the remaning (background) branches.
Analyses
We investigated:
- 105 GYMNOTIFORMES: We tested for positive selection by running random sites models M0-M8. We tested for divergent selection in wave-type, monophasic, and neuronally-derived electric organ fishes using Clade Model C.
- 26 GYMNOTIDAE: We tested for positive selection by running random sites models M0-M8 in only members of the family Gymnotidae (Electric eels and the genus Gymnotus). We tested for divergent selection in monophasic fishes using Clade Model C.
- 72 PULSE-TYPE GYMNOTIFORMES: We tested for positive selection by running random sites models M0-M8 in only pulse-type Gymnotiformes. We tested for divergent selection in monophasic fishes using Clade Model C.
Data and File Overview
Naming Conventions
ML = maximum likelihood
CMC = clade model C
all = all 105 gymnotiformes
mono = monophasic species selected as foreground branches in CMC
wave = wave type species selected as foreground branches in CMC
neuro = neurogenic electric organ species selected as foreground branches in CMC
XXXxXXX = refers to the nature of the alignment, eg 105x561 indicates 105 species, 561 nucleotides
File types
*.fas are fasta alignments
*.nex are nexus files that can be viewed in Mesquite
*.tre are phylogenetic tree files
*.txt are outfile results. the alignments_tree folder has an alignment that has a .txt filetype.
*.CTL are PAML control files that contain information on what analyses PAML runs
Some PAML outfiles (2NG.dN, 2NG.dS, 2NG.t, 4fold.nuc, lnf, rst, rst1, rub) always contain the same information (with different parameters for different datasets) and are only annotated once in the directory structure
The remaining files (*.ctl, *.txt) depend on the particular analysis run and the details of the analysis are found within the file.
Data Structure
there are four main directores:
alignments_tree: alignments in fasta and nexus and phyml format, a NEWICK tree
all: tests run on all 105 Gymnotiform species
gymno: tests run on 26 Gymnotidae speices
pulse: tests run on 72 Pulse-type gymnotiform species
├── alignments_tree
│ ├── scn4aa_ML_105x561.fas # fasta alignment
│ ├── scn4aa_ML_105x561.nex # nexus alignment
│ ├── scn4aa_ML_105x561.tre # NEWICK tree
│ └── scn4aa_ML_105x561.txt # phylip alignment
├── all # all 105 gymnotiformes tested
│ ├── CMC_mono # Monophasic gymnotiformes tested in CMC
│ │ ├── 2NG.dN # estimates of the rate of Nonysynonymous substitutions for the dataset
│ │ ├── 2NG.dS # estimates of the rate of Synonymous substitutions for the dataset
│ │ ├── 2NG.t # transition transversion ratios
│ │ ├── 4fold.nuc # 4fold degenerate codon sites
│ │ ├── CmC.ctl # PAML control file
│ │ ├── lnf # likelihood estimates
│ │ ├── rst # results on ancestral reconstruction, if performed
│ │ ├── rst1 # results on ancestral reconstruction, if performed
│ │ ├── rub # log likelihood and parameter estimates
│ │ └── scn4aa_ML_105x561_mono.txt # PAML results outfile
│ ├── CMC_mono_alt_topology # Monophasic gymnotiformes tested on an alternate tree topology with a different placement of Wave gymnotiformes
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── CmC.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ └── scn4aa_ML_105x561_mono.txt
│ ├── CMC_neuro # Neurophasic gymnotiformes tested in CMC
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── CmC.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ └── scn4aa_ML_105x561_neuro.txt
│ ├── CMC_neuro_alt_topology # Neurophasic gymnotiformes tested on an alternate tree topology with a different placement of Wave gymnotiformes
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── CmC.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ └── scn4aa_ML_105x561_neuro.txt
│ ├── CMC_wave # Wave gymnotiformes tested in CMC
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── CmC.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ └── scn4aa_ML_105x561_wave.txt
│ ├── CMC_wave_alt_topology # Wave gymnotiformes tested on an alternate tree topology with a different placement of Wave gymnotiformes
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── CmC.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ └── scn4aa_ML_105x561_wave.txt
│ ├── M2a_rel # CMC Null Model
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── M2a_rel.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ └── scn4aa_ML_105x561_m2arel.txt
│ ├── M2a_rel_alt_topology # CMC Null Model tested on an alternate tree topology with a different placement of Wave gymnotiformes
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── M2a_rel.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ └── scn4aa_ML_105x561_m2arel.txt
│ ├── random_sites # PAML random sites analyses (Models 0,1,2,3,7,8)
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── M012378.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ └── scn4aa_105x561(M012378).txt
│ └── random_sites_M8a # PAML M8 NUll model
│ ├── 2NG.dN
│ ├── 2NG.dS
│ ├── 2NG.t
│ ├── 4fold.nuc
│ ├── M8a.ctl
│ ├── lnf
│ ├── rst
│ ├── rst1
│ ├── rub
│ └── scn4aa_ML_105x561_m8a.txt
├── gymno # tests run on 26 members of Gymnotidae
│ ├── CMC_134 # CMC analysis with monophasic, biphasic, and tetraphasic gymnotidae selected as all experiencing divergent selection
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── CmC.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ ├── scn4aa_ML_26x561_gymno.txt # alignment
│ │ ├── scn4aa_ML_26x561_gymno_134.tre # tree
│ │ └── scn4aa_ML_gymno_134.txt # outfile
│ ├── CMC_mono # tests run on a foreground consisting only of monophasic Gymnotidae
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── CmC.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ ├── scn4aa_ML_26x561_gymno.txt
│ │ ├── scn4aa_ML_26x561_gymno_mono.tre
│ │ └── scn4aa_ML_gymno_mono.txt
│ ├── M2a_rel # CMC null model
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── M2a_rel.ctl
│ │ ├── lnf
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ ├── scn4aa_ML_26x561_gymno.tre
│ │ ├── scn4aa_ML_26x561_gymno.txt
│ │ └── sscn4aa_ML_26x561_gymno_m2arel.txt
│ ├── random_sites
│ │ ├── 2NG.dN
│ │ ├── 2NG.dS
│ │ ├── 2NG.t
│ │ ├── 4fold.nuc
│ │ ├── M012378.ctl
│ │ ├── lnf
│ │ ├── m8a # M8 null model
│ │ │ ├── 2NG.dN
│ │ │ ├── 2NG.dS
│ │ │ ├── 2NG.t
│ │ │ ├── 4fold.nuc
│ │ │ ├── M8a.ctl
│ │ │ ├── lnf
│ │ │ ├── rst
│ │ │ ├── rst1
│ │ │ ├── rub
│ │ │ ├── scn4aa_ML_26x561_gymno.tre
│ │ │ ├── scn4aa_ML_26x561_gymno.txt
│ │ │ └── scn4aa_gymno(M8a).txt
│ │ ├── rst
│ │ ├── rst1
│ │ ├── rub
│ │ ├── scn4aa_ML_26x561_gymno.tre
│ │ ├── scn4aa_ML_26x561_gymno.txt
│ │ └── scn4aa_gymno(M012378).txt
│ ├── scn4aa_ML_26x561_gymno.fas
│ └── scn4aa_ML_26x561_gymno.nex
└── pulse_only # tests run 72 Pulse-type Gymnotiformes
├── CMC_1234 # CMC analysis with monophasic, biphasic, triphasic and tetraphasic gymnotidae selected as all experiencing divergent selection
│ ├── 2NG.dN
│ ├── 2NG.dS
│ ├── 2NG.t
│ ├── 4fold.nuc
│ ├── CmC.ctl
│ ├── lnf
│ ├── rst
│ ├── rst1
│ ├── rub
│ ├── scn4aa_ML_72x561_pulse.txt # alignment
│ ├── scn4aa_ML_72x561_pulse_1234phase.tre # tree
│ └── scn4aa_ML_72x561_pulse_1234phase.txt # results oufile
├── CMC_mono # CMC analysis with monphasic species tested as foreground branches
│ ├── 2NG.dN
│ ├── 2NG.dS
│ ├── 2NG.t
│ ├── 4fold.nuc
│ ├── CmC.ctl
│ ├── lnf
│ ├── rst
│ ├── rst1
│ ├── rub
│ ├── scn4aa_ML_72x561_mono.tre
│ ├── scn4aa_ML_72x561_pulse.txt
│ └── scn4aa_ML_pulse_mono.txt
├── M2a_rel # CMC null model
│ ├── 2NG.dN
│ ├── 2NG.dS
│ ├── 2NG.t
│ ├── 4fold.nuc
│ ├── M2a_rel.ctl
│ ├── lnf
│ ├── rst
│ ├── rst1
│ ├── rub
│ ├── scn4aa_ML_72x561_pulse.tre
│ ├── scn4aa_ML_72x561_pulse.txt
│ └── sscn4aa_ML_72x561_pulse_m2arel.txt
└── random_sites # random sites analyses
├── M3 # results from the M3 model run separately
│ ├── 2NG.dN
│ ├── 2NG.dS
│ ├── 2NG.t
│ ├── 4fold.nuc
│ ├── M3.ctl
│ ├── lnf
│ ├── rst
│ ├── rst1
│ ├── rub
│ ├── scn4aa_ML_72x561_pulse.tre
│ ├── scn4aa_ML_72x561_pulse.txt
│ └── scn4aa_pulse(M3).txt
├── m8a #M8 null model
│ ├── 2NG.dN
│ ├── 2NG.dS
│ ├── 2NG.t
│ ├── 4fold.nuc
│ ├── M8a.ctl
│ ├── lnf
│ ├── rst
│ ├── rst1
│ ├── rub
│ └── scn4aa_pulse(M8a).txt
├── random_sites_M012 # Models 0, 1, 2 run together
│ ├── 2NG.dN
│ ├── 2NG.dS
│ ├── 2NG.t
│ ├── 4fold.nuc
│ ├── M012.ctl
│ ├── lnf
│ ├── rst
│ ├── rst1
│ ├── rub
│ ├── scn4aa_ML_72x561_pulse.tre
│ ├── scn4aa_ML_72x561_pulse.txt
│ └── scn4aa_pulse(M012).txt
├── random_sites_M378 # Models 3, 7 and 8 run together
│ ├── 2NG.dN
│ ├── 2NG.dS
│ ├── 2NG.t
│ ├── 4fold.nuc
│ ├── M378.ctl
│ ├── lnf
│ ├── rst
│ ├── rst1
│ ├── rub
│ ├── scn4aa_ML_72x561_pulse.tre
│ ├── scn4aa_ML_72x561_pulse.txt
│ └── scn4aa_pulse(M378).txt
├── scn4aa_ML_72x561_pulse.tre # NEWICK tree for pulse fishes
└── scn4aa_ML_72x561_pulse.txt # alignment for pulse fishes
Additional Supplemental Information can be found on FIGSHARE: https://figshare.com/s/7214ad8139a75357887a
Methods
Funding
Natural Sciences and Engineering Research Council