Library transgenesis in zebrafish through delayed site-specific mosaic integration for in vivo pooled screening of transgenes
Data files
Feb 10, 2026 version files 50.66 GB
-
6978_001_S58_R2_001.fastq
2.70 GB
-
7462_001_S101_R1_001.fastq
2.18 GB
-
Fish1_barcoded-GFP_25_5_26_25.nd2
453.74 MB
-
Fish1_FP-library_hind-fore_8_19_25.nd2
4.35 GB
-
Fish1_FP-library_spinal_8_19_25.nd2
1.29 GB
-
Fish10_barcoded-GFP_25_5_26_25.nd2
378.15 MB
-
Fish11_barcoded-GFP_25_5_26_25.nd2
504 MB
-
Fish12_barcoded-GFP_25_5_26_25.nd2
478.95 MB
-
Fish2_barcoded-GFP_25_5_26_25.nd2
453.74 MB
-
Fish2_FP-library_hind_8_19_25.nd2
4.08 GB
-
Fish3_barcoded-GFP_25_5_26_25.nd2
604.80 MB
-
Fish3_FP-library_hind-fore_8_19_25.nd2
4.21 GB
-
Fish4_barcoded-GFP_25_5_26_25.nd2
428.49 MB
-
Fish4_FP-library_hind-mid_8_19_25.nd2
5.37 GB
-
Fish5_barcoded-GFP_25_5_26_25.nd2
327.81 MB
-
Fish5_FP-library_hind-mid_8_19_25.nd2
4.28 GB
-
Fish6_barcoded-GFP_25_5_26_25.nd2
478.83 MB
-
Fish6_FP-library_dpf3_fore_25_12_5.nd2
2.24 GB
-
Fish6_FP-library_dpf3_hind_25_12_5.nd2
1.71 GB
-
Fish7_barcoded-GFP_25_5_26_25.nd2
327.81 MB
-
Fish7_FP-library_hind_9_10_25.nd2
881.55 MB
-
Fish8_barcoded-GFP_25_5_26_25.nd2
403.44 MB
-
Fish8_FP-library_hind_25_12_27.nd2
1.23 GB
-
Fish9_barcoded-GFP_25_5_26_25.nd2
403.44 MB
-
FP-library_dpf3_fore_25_12_5.nd2
2.24 GB
-
FP-library_dpf3_hind_25_12_5.nd2
1.71 GB
-
FP-library_dpf3_spinal_25_12_5.nd2
680.28 MB
-
FP-library_dpf3_spinal2_25_12_5.nd2
982.32 MB
-
FP-library_dpf5_25_9_10_10xObjective.nd2
151.50 MB
-
FP-library_dpf5_fore_25_12_27.nd2
1.84 GB
-
FP-library_dpf5_hind-spinal_25_12_27.nd2
1.01 GB
-
FP-library_dpf5_mid-hind_25_12_27.nd2
1.54 GB
-
FP-library_dpf5_spinal_25_12_27.nd2
730.62 MB
-
README.md
3.60 KB
Abstract
This dataset is from a study developing a zebrafish library transgenesis method using delayed site-specific integration to achieve single-transgene-per-cell mosaicism with high library diversity. The raw imaging data includes confocal Z-stacks (.nd2 files) of zebrafish larvae expressing a 2-component library of fluorescent proteins (GFP and mScarlet) in neurons, captured across multiple brain regions to assess transgene mutual exclusivity in mosaic animals. The raw sequencing data (.fastq files) include Illumina amplicon reads generated from genomic DNA of 12 mosaic larvae with integrated libraries of barcoded GFP constructs (6978-R2), as well as the injected source barcode library (7462-R1). Larvae were injected with a complex library of barcoded plasmids, enabling quantification of independent integration events.
Dataset DOI: 10.5061/dryad.d2547d8h0
Description of the data and file structure
This dataset contains confocal microscopy images and amplicon sequencing data characterizing a zebrafish library transgenesis method using delayed site-specific integration. This dataset includes all the raw imaging data associated with Fig. 2, S1, S2, and Table S1, and all the raw Illumina sequencing data associated with Fig. 3, S3, S4, S5, S,6 and Tables S2, S3.
Files and variables
GFP-mScarlet Fluorescent Library Imaging
Format: .nd2 (Nikon raw imaging format, confocal hyperstacks).
Experimental setup: pIGLET embryos were injected wita h 50:50 mix of GFP-CAAX and mScarlet-CAAX plasmids. Larvae imaged at 3 dpf or 5 dpf, live or fixed in 4% PFA, mounted in 1% agarose. Spinning disk confocal microscopy (Yokogawa CSU-W1 on Nikon Ti) with 10×/40× objectives, 488 nm (GFP) and 561 nm (mScarlet) channels, Z-stacks. Used to quantify integration mutual exclusivity (prevalence of neurons with single vs. multiple transgenes per cell). Associated with Fig. 2, S2, Table S1.
File naming:
Files used for mutual-exclusivity quantification (Table S1): [FishID]_FP-library_[Brain_Region]_[Date_acquired].nd2
Files used to produce representative images (Figure 2, S2):FP-library_[Age]_[Region]_[Date].nd2
Barcoded GFP Library Imaging
Format: .nd2 (Nikon raw imaging format, confocal hyperstacks).
Experimental setup: pIGLET embryos were injected with a library of barcoded GFP-CAAX plasmids (each with a unique 15-nt random barcode). Larvae were raised to 5 dpf and imaged live before genome extraction. Confocal imaging (same setup as above) with 488 nm channel to confirm GFP expression before selecting fish for genomic DNA extraction and sequencing. These 12 fish correspond to the samples in sequencing file 6978. Associated with Fig. S1.
File naming: Fish[N]_barcoded-GFP_25_5_26_25.nd2
Sequencing Data Files
Format: .fastq
Experimental setup: 12 larvae injected with barcoded plasmid library at 1-cell stage were raised to 5 dpf, then genomic DNA was extracted from whole bodies. Junction-spanning PCR amplified integrated barcodes and added 5-nt sample barcodes and Illumina overhangs. The source library was directly amplified with primers adding Illumina overhangs. Sequenced on Illumina NextSeq 2000 (paired-end 2×150 bp). Associated with: Fig. 3, S3, S4, S5, S6, Tables S2, S3.
File naming: [SampleID]_001_S[RunNumber]_R[ReadDirection]_001.fastq
6978_001_S58_R2_001.fastq- Fish-derived amplicons (12 fish pooled, multiplexed)7462_001_S101_R1_001.fastq- Source library amplicons
Code/software
The complete Python pipeline for processing the sequencing (.fastq) data is available at: https://github.com/shaharbr/library_transgenesis
The pipeline performs demultiplexing of the 12 fish samples (using the 5-nt sample barcodes) from sample 6978, extraction of 15-nt variable barcodes (via flanking anchor sequences) from both samples, error correction (Levenshtein distance ≤1 clustering), quality filtering (≥3 reads/fish, ≥2 reads for source library), and diversity analysis. Reproduces all sequencing figures (Fig. 3D-F, S3-S6) and tables (Tables S2-S3).
The raw imaging files (.nd2) can be opened with ImageJ/Fiji using the Bio-Formats plugin.
