Data from: Comparative analysis of convergent jellyfish eyes reveals extensive differences in expression of vision-related genes
Data files
Aug 05, 2025 version files 290.16 MB
-
aurelia_ARSv1_proteins.opsins.fasta
2.30 KB
-
aurelia_workflow.tar.gz
137.95 MB
-
expression_analyses.tar.gz
28.43 MB
-
README.md
3.19 KB
-
readme.txt
3.81 KB
-
sarsia_lowredundancy_reference.tar.gz
23.10 MB
-
sarsia_Trinity_20210112_longestORFperGene.opsins.fasta
10.46 KB
-
sarsia_workflow.tar.gz
43 MB
-
tripedalia_GHAQ01.1.fsa_nt.fasta.fixed_longestORFperGene.opsins.fasta
5.68 KB
-
tripedalia_lowredundancy_reference.tar.gz
18.21 MB
-
tripedalia_workflow.tar.gz
39.44 MB
Abstract
Quantifying gene expression across convergent origins of traits clarifies the degree to which those traits arise from shared versus distinct genetic programs, revealing how gene re-use relates to the repeatability of evolution. Eyes are important traits that evolved in many distantly related lineages, including at least nine times within cnidarians. Here, we investigate gene expression in eye-bearing and non-visual tissues from three cnidarian species representing long-diverged lineages where eyes evolved convergently (Cubozoa, Scyphozoa, and Hydrozoa). We find gene expression in eye-bearing tissues to be mostly lineage-specific, with only a small proportion of genes having convergent expression across species. Nevertheless, all species express homologs of deeply conserved vision-related genes known from Bilateria, which likely reflects deep homology (parallel evolution across vast phylogenetic distances) of a metazoan phototransduction toolkit. A gene tree analysis of opsins—the prototypical animal photosensors—shows that convergent eyes recruited different opsin paralogs, with the potential exception of an opsin ortholog shared between scyphozoan and cubozoan eyes. Our results suggest that eyes have mostly lineage-specific patterns of gene expression, yet some key phototransduction components are repeatedly recruited across multiple independent eye origins in Medusozoa.
This repository contains Snakemake workflows and data files used in the analysis of RNA-seq data for Sarsia, Tripedalia, and Aurelia, including the generation of low-redundancy transcriptome references and downstream analyses.
Contents
readme.txt: copy of this readme file's contents
🔧 Workflow Archives
sarsia_workflow.tar.gz
This archive contains the Snakemake workflow used to generate a low-redundancy transcriptome reference for Sarsia and to map RNA-seq reads.
Directory Structure:
workflow/Snakefile: Main workflow file.envs/: Conda environment YAML files.rules/: Snakemake rules for each analysis step.scripts/: Custom Bash and Python scripts.
config/config.yaml: General configuration file with file paths and variable definitions.samples.tsv,conditions.tsv: Metadata about samples and conditions.
resources/GO_terms.gtf: GO term information.pia/: Accessory files forget_opsinsrule inworkflow/rules/pia.smk.reference/: IncludesTrinity_20210112.fastaused as the base transcriptome.rawdata/: Raw RNA-seq reads.
Note: When using SRA data, set
mergeReads: Falseinconfig.yamland updateconfig/units.tsvaccordingly, as technical replicates were already merged during SRA submission.
tripedalia_workflow.tar.gz
Snakemake workflow for generating a low-redundancy transcriptome for Tripedalia and mapping RNA-seq reads.
- Automatically downloads raw data from SRA to
resources/rawdata. - Reference transcriptome corresponds to TSA accession GHAQ00000000.1.
- Structure is similar to
sarsia_workflow.
aurelia_workflow.tar.gz
Workflow for mapping RNA-seq reads to the Aurelia aurita genome.
- Structure mirrors the Sarsia workflow.
- Technical replicates merged during SRA submission; set
mergeReads: Falseinconfig.yamland updateunits.tsvaccordingly.
🧬 Result Files
aurelia_ARSv1_proteins.opsins.fasta
Opsins from Aurelia genome protein models.sarsia_Trinity_20210112_longestORFperGene.opsins.fasta
Opsins from low redundancy reference assembly Sarsia longest ORFs.tripedalia_GHAQ01.1.fsa_nt.fasta.fixed_longestORFperGene.opsins.fasta
Opsins from low redundancy reference assembly Tripedalia longest ORFs.
📦 Reference Archives
sarsia_lowredundancy_reference.tar.gz
Protein and nucleotide sequences for the low redundancy Sarsia reference.tripedalia_lowredundancy_reference.tar.gz
Protein and nucleotide sequences for the low redundancy Tripedalia reference.
📊 Expression Analysis
expression_analyses.tar.gz
Contains R scripts and data for:- Differential expression
- GO enrichment
- Cross-species comparisons
Includes count matrices and primary data files.
📫 Contact
If you have questions or need assistance, please contact:
- Natasha Picciani – natasha.picciani@gmail.com
- Cory Berger – cberger@ucsb.edu
