Comprehensive analysis of mRNA 3′ ends during early zebrafish development reveals dynamics of 3′UTR changes on a genome-wide scale
Data files
Jul 17, 2025 version files 112.12 GB
-
0hpf-2_1.fastq
1.55 GB
-
0hpf-2_2.fastq
1.55 GB
-
0hpf-3_1.fastq
3.80 GB
-
0hpf-3_2.fastq
3.80 GB
-
2hpf-1_1.fastq
12.08 GB
-
2hpf-1_2.fastq
12.08 GB
-
2hpf-2_1.fastq
16.74 GB
-
2hpf-2_2.fastq
16.74 GB
-
2hpf-3_1.fastq
15.16 GB
-
2hpf-3_2.fastq
15.16 GB
-
4hpf-2_1.fastq
2.68 GB
-
4hpf-2_2.fastq
2.68 GB
-
4hpf-3_1.fastq
4.04 GB
-
4hpf-3_2.fastq
4.04 GB
-
README.md
4.29 KB
Abstract
Eggs accumulate more than ten thousand mRNAs, from which proteins are synthesized constitutively or in temporally controlled manners. Although changes in the translation state of these mRNAs are crucial for early development, how embryos orchestrate them remains unclear. Here, we investigated changes in mRNA 3′ untranslated regions (UTRs) using 3′end-RNA sequencing of zebrafish embryos and computational methods. Consistent with our previous finding that pou5f3 mRNA shortens the 3′UTR during early development, thousands of mRNAs were found to shorten their 3′UTRs. Moreover, we found that most mRNAs in embryos contained several different 3′ ends, and their proportion was dynamically changed. These changes were coupled with protein synthesis. Our results reveal genome-wide 3′-end dynamics in the direction of biological processes. The datasets consist of RNA sequencing data from 3'end-RNA sequencing of zebrafish embryos at 0, 2, and 4 hours post-fertilization (hpf).
Dataset DOI: 10.5061/dryad.0zpc8678g
Description of the data and file structure
3’ end-RNA sequencing libraries were constructed using the Lexogen QuantSeq 3’ mRNA-seq Library Prep Kit for Illumina platforms following the manufacturer’s instructions. RNA samples were collected at 3 separate time-points (0, 2, and 4 hours post fertilisation (hpf)) from 50 embryos each time. The experiment was repeated thrice to ensure reproducibility with different fish pairs. The quality before sequencing was ensured using Tape Station (high D1000) to confirm the DNA fragments size and purity. Sequencing was carried out by Macrogen Inc. using the Illumina HiSeqX Ten platform. The sequencing depth for each sample was set to a target of 30 million reads to ensure sufficient coverage for downstream analyses.
Files and variables
File: 0hpf-3_1.fastq
Description: Basecalled raw file from the 0 hpf dataset (sample n°3), forward
File: 0hpf-3_2.fastq
Description: Basecalled raw file from the 0 hpf dataset (sample n°3), reverse
File: 2hpf-1_1.fastq
Description: Basecalled raw file from the 2 hpf dataset (sample n°1), forward
File: 2hpf-1_2.fastq
Description: Basecalled raw file from the 2 hpf dataset (sample n°1), reverse
File: 4hpf-3_1.fastq
Description: Basecalled raw file from the 4 hpf dataset (sample n°3), forward
File: 4hpf-3_2.fastq
Description: Basecalled raw file from the 4 hpf dataset (sample n°3), reverse
File: 0hpf-2_1.fastq
Description: Basecalled raw file from the 0 hpf dataset (sample n°2), forward
File: 0hpf-2_2.fastq
Description: Basecalled raw file from the 0 hpf dataset (sample n°2), reverse
File: 2hpf-2_1.fastq
Description: Basecalled raw file from the 2 hpf dataset (sample n°2), forward
File: 2hpf-2_2.fastq
Description: Basecalled raw file from the 2 hpf dataset (sample n°2), reverse
File: 4hpf-2_1.fastq
Description: Basecalled raw file from the 4 hpf dataset (sample n°2), forward
File: 4hpf-2_2.fastq
Description: Basecalled raw file from the 4 hpf dataset (sample n°2), reverse
File: 2hpf-3_1.fastq
Description: Basecalled raw file from the 2 hpf dataset (sample n°3), forward
File: 2hpf-3_2.fastq
Description: Basecalled raw file from the 2 hpf dataset (sample n°3), reverse
Code/software
Raw sequencing reads were initially processed with the fastp tool (version 0.23.4) to perform quality trimming and filtering of low-quality bases, adapters, and short reads. The resulting high-quality reads were subjected to a quality control check using FASTQC (version 0.11.8), and any problematic reads identified through the FASTQC analysis were further trimmed or excluded. Reads were aligned to the zebrafish genome (GRCz11) using STAR (version 2.7.9a) with default parameters.
Subsequent bam files were then analysed in pairs (ex: 0hpf-3 vs 2hpf-1 or 2hpf-1 vs 4hpf-3) using 3 software: a change point model (http://utr.sourceforge.net/) with Java, APAlyzer (https://github.com/RJWANGbioinfo/APAlyzer) with R, and Peaks2UTR (https://github.com/haessar/peaks2utr) with Python.
For these codes to run, the same annotation from Ensembl was used (GRCz11.110). The basic provided code, as per the manual of each of aforementioned tool, was run plus or minus below modifications:
For change point (version 0.1.1): the shortening mode was used, and a default of 20 reads was necessary for each region to be analysed.
For APAlyzer (version 1.0): 3’end files were created using the ThreeMostPair function, and reference files were created using the Ensembl annotation file (GRCz11.110) with PAS2GEF.
For Peaks2UTR (version 1.1.2): the override-utr function was used, and the maximum distance was set to 3000 bases, as zebrafish utrs tend to be long.
Access information
Libraries were created for the purpose of this study hence, this is the first upload of the raw data.