Data from: Rewilding shows differential gene expression in sympatric Physella acuta (Draparnaud, 1805) snail lineages
Data files
May 19, 2025 version files 9.22 MB
-
3302_ALL_Physella_acuta_DEGS.fasta
6.11 MB
-
MitoCarta_Physella_acuta_transcripts.fasta
3.08 MB
-
MitoCarta_Table.csv
33.45 KB
-
README.md
3.79 KB
Abstract
This dataset contains 3,302 differentially expressed genes (DEGs) from the freshwater snail Physella acuta, identified through transcriptomic analysis comparing environmental conditions (Field vs. Lab) across both invasive (A) and native (B) lineages. The FASTA file includes sequences for both upregulated (URGs) and downregulated genes (DRGs) from four pairwise comparisons: Field A vs. Field B, Field B vs. Lab B, Field A vs. Lab A, and Lab A vs. Lab B. DEGs were consistently identified using two independent RNA-seq analysis pipelines—STAR/Limma and Kallisto/DESeq2—ensuring robustness of detection. Additionally, this dataset includes a subset of 1,228 transcript sequences representing orthologs of human and mouse MitoCarta genes. The MitoCarta database catalogs nuclear-encoded mitochondrial proteins involved in ATP synthesis, metabolism, and cellular stress responses. These sequences were identified through transcriptome-wide similarity searches against the MitoCarta 3.0 reference set (Rath et al. 2021) using a genome-guided transcriptome assembly derived from RNA-seq data of both laboratory and rewilded snails. An accompanying CSV file is provided, categorizing each transcript ID based on homology to the human database only, the mouse database only, or both.
Dataset DOI: 10.5061/dryad.wstqjq2z8
Description of the data and file structure
Laboratory-bred Physella acuta snails from two genetically distinct lineages (A and B) were exposed to either controlled laboratory or natural field conditions for one week to investigate molecular mechanisms underlying differential fitness. RNA-seq was performed on these snails, and differential gene expression was analyzed using STAR/Limma and Kallisto/DESeq2. The resulting dataset includes 3,302 differentially expressed genes and a set of 1,228 transcript sequences identified as orthologs of vertebrate mitochondrial genes based on similarity to the MitoCarta 3.0 database. Transcriptome assembly was genome-guided and incorporated RNA-seq data from both rewilded and laboratory conditions.
Files and variables
File: 3302_ALL_Physella_acuta_DEGS.fasta
Description:
This FASTA file contains 3,302 nucleotide sequences corresponding to differentially expressed genes (DEGs) identified in Physella acuta snails. The DEGs were mutually detected by two independent RNA-seq analysis pipelines—STAR/Limma (SL) and Kallisto/DESeq2 (KD)—from pairwise comparisons of invasive (A) and native (B) lineages under laboratory and field conditions. Each sequence entry includes a unique transcript identifier and the corresponding nucleotide sequence.
File: MitoCarta_Table.csv
Description:
This CSV file contains a categorized table of the 1,228 Physella acuta transcript IDs from the MitoCarta_Physella_acuta_transcripts.fasta file. The table sorts transcripts into three groups based on their similarity to vertebrate mitochondrial genes: those matching only the human MitoCarta database, only the mouse database, or both.
File: MitoCarta_Physella_acuta_transcripts.fasta
Description:
This FASTA file includes 1,228 transcript sequences from Physella acuta identified as orthologs of genes in the human and mouse MitoCarta 3.0 databases. These genes encode nuclear-encoded mitochondrial proteins involved in energy metabolism, ATP synthesis, and cellular stress responses. The sequences were derived from a genome-guided transcriptome assembly of RNA-seq data collected from both rewilded and laboratory snails of lineages A and B.
Code/software
To view or analyze the FASTA files, various free and open-source software programs are available. Commonly used tools include simple text editors such as Notepad++ (version 8.6.2) for Windows, TextEdit (version 1.18) for macOS, and nano (version 7.2) or vim (version 9.1) for Linux, which can be used for basic viewing or editing of sequence data. These programs allow users to inspect the contents of the FASTA file directly, including sequence headers and nucleotide sequences.
The accompanying CSV file contains a table categorizing transcript IDs by homology to the human and/or mouse MitoCarta databases. For viewing and editing a CSV file on Windows, macOS, or Linux, several free and open-source tools are available. LibreOffice Calc and OnlyOffice provide full-featured spreadsheet functionality with support for sorting, filtering, and advanced data manipulation. Lightweight alternatives include Gnumeric (ideal for Linux users) and CSVed (a dedicated CSV editor for Windows). If a text-based approach is preferred, Notepad++ or VSCode (with CSV plugins) allow quick viewing and syntax highlighting, while VisiData offers a powerful terminal-based approach.
No proprietary software is required to access or analyze this dataset. All mentioned programs are freely available, open-source, and widely used within the bioinformatics community.
