Bulk RNA sequencing the zebrafish sox17 lineage at 8, 12, and 24 hours post-fertilization
Data files
May 03, 2025 version files 17.51 MB
-
All_genes.Weiner_Woo_0662_copy.xlsx
15.75 MB
-
counts.STARZEBRAFISH.txt
1.74 MB
-
Log.STARZEBRAFISH.txt
3.23 KB
-
README.md
12.72 KB
Abstract
Although much progress has been made in identifying the molecular and genetic factors that contribute to gastrointestinal tract development and disease, how these factors are translated into dynamic cell behaviors that shape the organ is less well understood. The zebrafish is an excellent model system for studying the cell biology of gut formation in vivo as the embryo develops externally, is optically transparent, and is amenable to numerous methods for manipulating gene function. The gastrointestinal epithelium is derived from the endoderm. In zebrafish, endodermal cells are specified just prior to gastrulation and soon after become highly motile and quickly disperse across the inner surface of the embryo. These scattered cells then undergo a switch in migratory behavior to converge into a coherent endodermal sheet, which ultimately gives rise to the epithelial lining of the gut tube. The aim of our studies is to identify the cell biological mechanisms driving the transition from single-cell migration to epithelial sheet formation.
https://doi.org/10.5061/dryad.nk98sf83h
Contact Stephanie Woo (swoo6@ucmerced.edu) for any questions. These data are described in the following bioRxiv preprint:
LaBelle, J., Wyatt, T., and Woo, S. Endodermal cells use contact inhibition of locomotion to achieve uniform cell dispersal during zebrafish gastrulation. bioRxiv 2023.06.01.543209. doi: https://doi.org/10.1101/2023.06.01.543209
Description of the data and file structure
Project Background: In zebrafish, endodermal cells are specified just prior to gastrulation and soon after become highly motile and quickly disperse across the inner surface of the embryo. These scattered cells then undergo a switch in migratory behavior to converge into a coherent endodermal sheet, which ultimately gives rise to the epithelial lining of the gut tube. To identify the cell biological mechanisms driving the transition from single-cell migration to epithelial sheet formation, we performed transcriptional profiling by RNA sequencing of endodermal cells at three different time points representing different morphogenetic movements — 8 hpf (CIL), 12 hpf (convergence), and 24 hpf (mature adhesion).
Sample preparation: GFP-positive cells were isolated from Tg(sox17:GFP) embryos at 8, 12, and 24 hpf by fluorescence-activated cell sorting. For each time point, at least 100,000 cells were collected per replicate. We collected 4 replicates at the 8 hpf time point (n = 4), 4 replicates at 10 hpf (n = 3), and 3 replicates at 24 hpf (n = 3). RNA was extracted from sorted GFP-positive cells using the RNAqueous-Micro Kit (Ambion). Library construction was performed using the Illumina TruSeq mRNA stranded kit. 50-base pair (bp) single-end sequencing was performed on an Illumina HiSeq 4000 machine.
Analysis: Sequencing yielded approximately 1.4 billion reads with an average read depth of 62 million reads per sample. Reads were then normalized and aligned to the zebrafish genome (GRCz10.87) using the STAR_2.5.2b aligner. Reads that mapped uniquely to known mRNAs were used to assess differential expression. An unusually high number of differentially expressed (DE) genes were detected across these samples at an false discovery rate (FDR) threshold of 0.1. To compensate for this, we used FDR < 1e-5 as the cutoff for DE genes.
Files and variables
File: All_genes.Weiner_Woo_0662_copy.xlsx
Description: Contains all statistics for all pairwise comparisons as well as read counts for each sample (in counts per million reads).
Variables: Column headings defined below:
A. Ensembl_ID: Unique identifier for each gene assigned by the Ensembl genome database
B. Gene: Gene symbol as approved by the Zebrafish Information Network (ZFIN) Zebrafish Nomenclature Committee
C. Description: Full-length gene name as approved by ZFIN
D. Chromosome_Name: Chromosome the gene is located on according to genome assembly version Ensembl GRCz10.87.
E. Start_Position: Start position in base pairs of the gene according to genome assembly version Ensembl GRCz10.87.
F. End_Position: End position in base pairs of the gene according to genome assembly version Ensembl GRCz10.87.
G. Strand: Strand location of the gene. 1 indicates the forward strand, -1 indicates the reverse strand according to genome assembly version Ensembl GRCz10.87.
H. GC_Percent: Ensembl annotation the gene as follows:
- antisense, transcripts that overlap the genomic span (i.e. exon or introns) of a protein-coding locus on the opposite strand
- IG_C_gene, constant chain immunoglobulin gene that undergoes somatic recombination before transcription
- IG_C_pseudogene, inactivated constant chain immunoglobulin gene
- IG_J_pseudogene, inactivated joining chain immunoglobulin gene
- IG_pseudogene, inactivated immunoglobulin gene
- IG_V_pseudogene, inactivated variable chain immunoglobulin gene
- lincRNA, long intergenic non-coding RNAs
- miRNA, microRNA
- misc_RNA, miscellaneous other RNA
- Mt_rRNA, ribosomal RNA located in the mitochondrial genome
- Mt_tRNA, transfer RNA located in the mitochondrial genome
- polymorphic_pseudogene, pseudogene owing to a SNP/indel but in other individuals/haplotypes/strains the gene is translated
- processed_pseudogene, pseudogene that lack introns and is thought to arise from reverse transcription of mRNA followed by reinsertion of DNA into the genome
- processed_transcript, gene or transcript that doesn’t contain an open reading frame (ORF)
- protein_coding, gene or transcipt that contains an ORF
- pseudogene, a gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF
- ribozyme, catalytically active RNA
- rRNA, ribosomal RNA
- scaRNA, small Cajal body-specificRNA
- sense_intronic, a long non-coding transcript in introns of a coding gene that does not overlap any exons
- sense_overlapping, a long non-coding transcript that contains a coding gene in its intron on the same strand
- snRNA , small nuclear RNA
- snoRNA, small nucleolar RNA
- sRNA, small cytoplasmic RNA
- TEC, to be experimentally confirmed
- TR_D_gene, diversity chain T cell receptor gene
- TR_J_gene, joining chain T cell receptor gene
- TR_V_gene, variable chain T cell receptor gene
- TR_V_pseudogene, inactivated variable chain T cell receptor gene
- transcribed_unprocessed, pseudogene where protein homology or genomic structure indicates a pseudogene, but the presence of locus-specific transcripts indicates expression and retains introns.
- Unprocessed_pseudogene, pseudogene that can contain introns since produced by gene duplication
I. Biotype: Corresponding unique identifier in the National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database
J. Total_Exon_Length: Total length in base pairs of the exonic sequences
K. RefSeq_ID, Corresponding reference sequence identifier in the NCBI RefSeq
L. Unigene ID: Corresponding unique identifier in the NCBI Unigene transcriptome database
M. Entrez_ID: Corresponding unique identifier in the NCBI Gene database
N. Mean_Normalized_Counts, normalized read counts averaged across all replicates and time points
O. age_12hpfvs8hpf.log2FC, log2 fold change for 12 versus 8 hours post-fertilization comparison
P. age_12hpfvs8hpf.FC, unlogged fold change for 12 versus 8 hours post-fertilization comparison
Q. age_12hpfvs8hpf.RawP, unadjusted p value for 12 versus 8 hours post-fertilization comparison
R. age_12hpfvs8hpf.FDR, p value adjusted for multiple comparisons for 12 versus 8 hours post-fertilization comparison
S. age_24hpfvs8hpf.log2FC, log2 fold change for 24 versus 8 hours post-fertilization comparison
T. age_24hpfvs8hpf.FC, unlogged fold change for 24 versus 8 hours post-fertilization comparison
U. age_24hpfvs8hpf.RawP, unadjusted p value for 24 versus 8 hours post-fertilization comparison
V. age_24hpfvs8hpf.FDR, p value adjusted for multiple comparisons for 24 versus 8 hours post-fertilization comparison
W. age_24hpfvs12hpf.log2FC, log2 fold change for 24 versus 12 hours post-fertilization comparison
X. age_24hpfvs12hpf.FC, unlogged fold change for 24 versus 12 hours post-fertilization comparison
Y. age_24hpfvs12hpf.RawP, unadjusted p value for 24 versus 12 hours post-fertilization comparison
Z. age_24hpfvs12hpf.FDR, p value adjusted for multiple comparisons for 24 versus 12 hours post-fertilization comparison
AA. sox17_12hpf_1.norm, normalized read count for 12 hours post-fertilization, replicate 1
AB. sox17_12hpf_2.norm, normalized read count for 12 hours post-fertilization, replicate 2
AC. sox17_12hpf_3.norm, normalized read count for 12 hours post-fertilization, replicate 3
AD. sox17_12hpf_4.norm, normalized read count for 12 hours post-fertilization, replicate 4
AE. sox17_24hpf_1.norm, normalized read count for 24 hours post-fertilization, replicate 1
AF. sox17_24hpf_2.norm, normalized read count for 24 hours post-fertilization, replicate 2
AG. sox17_24hpf_3.norm, normalized read count for 24 hour post-fertilization, replicate 3
AH. sox17_8hpf_1.norm, normalized read count for 8 hours post-fertilization, replicate 1
AI. sox17_8hpf_2.norm, normalized read count for 8 hours post-fertilization, replicate 2
AJ. sox17_8hpf_3.norm, normalized read count for 8 hours post-fertilization, replicate 3
AK. sox17_8hpf_4.norm, normalized read count for 8 hours post-fertilization, replicate 4
AL. sox17_12hpf_1.stabilized, read count adjusted for variance across sample conditions 12 hours post-fertilization, replicate 1
AM. sox17_12hpf_2. stabilized, read count adjusted for variance across sample conditions for 12 hours post-fertilization, replicate 2
AN. sox17_12hpf_3.stabilized, read count adjusted for variance across sample conditions 12 hours post-fertilization, replicate 1
AO. sox17_12hpf_4. stabilized, read count adjusted for variance across sample conditions for 12 hours post-fertilization, replicate 4
AP. sox17_24hpf_1. stabilized, read count adjusted for variance across sample conditions for 24 hours post-fertilization, replicate 1
AQ. sox17_24hpf_2. stabilized, read count adjusted for variance across sample conditions for 24 hours post-fertilization, replicate 2
AR. sox17_24hpf_3. stabilized, read count adjusted for variance across sample conditions for 24 hour post-fertilization, replicate 3
AS. sox17_8hpf_1. stabilized, read count adjusted for variance across sample conditions for 8 hours post-fertilization, replicate 1
AT. sox17_8hpf_2. stabilized, read count adjusted for variance across sample conditions for 8 hours post-fertilization, replicate 2
AU. sox17_8hpf_3. stabilized, read count adjusted for variance across sample conditions for 8 hours post-fertilization, replicate 3
AV. sox17_8hpf_4. stabilized, read count adjusted for variance across sample conditions for 8 hours post-fertilization, replicate 4
File: counts.STARZEBRAFISH.txt
Description: Tab delimited text file containing samples and their raw (unnormalized ) read counts per gene
Variables: Column headings defined below
· Ensembl_ID, Unique identifier for each gene assigned by the Ensembl genome database
· sox17_12hpf_1, read count for 12 hours post-fertilization, replicate 1
· sox17_12hpf_2, read count for 12 hours post-fertilization, replicate 2
· sox17_12hpf_3, read count for 12 hours post-fertilization, replicate 3
· sox17_12hpf_4, read count for 12 hours post-fertilization, replicate 4
· sox17_24hpf_1, read count for 24 hours post-fertilization, replicate 1
· sox17_24hpf_2, read count for 24 hours post-fertilization, replicate 2
· sox17_24hpf_3, read count for 24 hours post-fertilization, replicate 3
· sox17_8hpf_1, read count for 8 hours post-fertilization, replicate 1
· sox17_8hpf_2, read count for 8 hours post-fertilization, replicate 2
· sox17_8hpf_3, read count for 8 hours post-fertilization, replicate 3
· sox17_8hpf_4, read count for 8 hours post-fertilization, replicate 4
File: Log.STARZEBRAFISH.txt
Description: Tab delimited text file containing some summary information regarding the mapping of each sample
Variables: Column headings defined below
· Stat, Parameter of the mapping process
· sox17_12hpf_1, 12 hours post-fertilization, replicate 1
· sox17_12hpf_2, 12 hours post-fertilization, replicate 2
· sox17_12hpf_3, 12 hours post-fertilization, replicate 3
· sox17_12hpf_4, 12 hours post-fertilization, replicate 4
· sox17_24hpf_1, 24 hours post-fertilization, replicate 1
· sox17_24hpf_2, 24 hours post-fertilization, replicate 2
· sox17_24hpf_3, 24 hours post-fertilization, replicate 3
· sox17_8hpf_1, 8 hours post-fertilization, replicate 1
· sox17_8hpf_2, 8 hours post-fertilization, replicate 2
· sox17_8hpf_3, 8 hours post-fertilization, replicate 3
· sox17_8hpf_4, 8 hours post-fertilization, replicate 4
Code/software
The file “All_genes.Weiner_Woo_0662.xls” can be opened and viewed with Microsoft Excel. All other files can be opened with any text editor program.
In this study, GFP-positive cells were isolated from transgenic Tg(sox17:GFP)s870 zebrafish embryos by fluorescence activated cell sorting at 8, 12, and 24 hours post-fertlization. Total RNA was extracted from the sorted cells. Library preparation and sequencing was performed by the Genomics Core at the Unviersity of California San Francisco (Core manager, Andrea Barczak; Core director, David Erle).
Sequencing type: Single-end 50bp RNAseq
Library Kit: Illumina TruSeq mRNA stranded
Machine: Illumina HiSeq 4000
Aligner: STAR_2.5.2b
Alignment Genome: Ensembl Zebrafish GRCz10.87