Skip to main content
Dryad

Scalable and cost-efficient custom gene library assembly from oligopools

Data files

Apr 09, 2026 version files 269.96 GB

Click names to download individual files Select up to 11 GB of files for zip download

Abstract

This dataset contains all next-generation sequencing (NGS) data generated to evaluate the OMEGA (Oligo-based Multiplexed Efficient Gene Assembly) platform for multiplexed gene library construction. It includes PacBio HiFi long-read sequencing (BAM with index and metadata), Illumina paired-end sequencing (FASTQ), and Oxford Nanopore long-read sequencing (FASTQ) across assembly validation (Rubisco and Cas9 libraries), amplicon-based quality control (PCR and JSCAN), and functional screening of GFP variant libraries. Data are organized by sequencing platform and experiment type, including replicate-level presort libraries and fluorescence-activated cell sorting (FACS) populations spanning multiple fluorescence bins and negative controls. These files enable reconstruction of full-length variants, quantification of assembly accuracy and uniformity, and analysis of sequence–function relationships. All data are standard DNA sequencing formats compatible with common open-source tools and are suitable for benchmarking gene assembly methods, developing analysis pipelines, and training machine learning models. All sequences are synthetic or non-pathogenic research constructs; no human or clinical data are included, and there are no ethical or legal restrictions on reuse.