Characterizing genotype-phenotype relationships of biomolecules (e.g., ribozymes) requires accurate ways to measure activity for a large set of molecules. Kinetic measurement using high-throughput sequencing (e.g., k-Seq) is an emerging assay applicable in various domains that potentially scales up measurement throughput to over 10⁶ unique nucleic acid sequences. However, maximizing the return of such assays requires understanding the technical challenges introduced by sequence heterogeneity and DNA sequencing. We characterized the k-Seq method in terms of model identifiability, effects of sequencing error, accuracy and precision using simulated datasets and experimental data from a variant pool constructed from previously identified ribozymes. Relative abundance, kinetic coefficients, and measurement noise were found to affect the measurement of each sequence. We introduced bootstrapping to robustly quantify the uncertainty in estimating model parameters and proposed interpretable metrics to quantify model identifiability. These efforts enabled the rigorous reporting of data quality for individual sequences in k-Seq experiments. Here we present detailed protocols, define critical experimental factors, and identify general guidelines to maximize the number of sequences and their measurement accuracy from k-Seq data. Analogous practices could be applied to improve the rigor of other sequencing-based assays.

DNA sequencing data (FASTQ) were collected from a kinetic sequencing (k-Seq) experiment on a variant RNA pool constructed from previously identified ribozymes. FASTQ files were processed using EasyDIVER (https://github.com/ichen-lab-ucsb/EasyDIVER) to sequence counts for the quantification of sequences in k-Seq samples. Reacted fractions were further calculated and fit into a pseudo-first-order kinetic model to estimate the kinetic coefficients for each sequence, using the k-seq package (https://github.com/ichen-lab-ucsb/k-seq). Additional simulation datasets were synthesized to study the utility of the k-Seq method. Please see the paper "Kinetic sequencing (k-Seq) as a massively parallel assay for ribozyme kinetics: utility and critical parameters" for details.

This dataset is for the paper "Kinetic sequencing (k-Seq) as a massively parallel assay for ribozyme kinetics: utility and critical parameters".

It contains the following files:
- core-data.tar.gz: data/results/code necessary to repeat paper's analysis and generate figures.
It contains the following folders:
- data: preprocessed k-Seq data and data for qPCR standard curve
- results: results for experimental and simulation studies
- code: the codebase including a snapshot of k-seq package, scripts to repeat data analysis, and notebooks to generate figures in the paper
- raw-data.tar.gz: raw or large data including sequencing FASTQ files and deduplicated reads

Please see the README files under each archive (.tar.gz) for details.

Kinetic sequencing (k-Seq) as a massively parallel assay for ribozyme kinetics: utility and critical parameters

Data files

Abstract

Kinetic sequencing (k-Seq) as a massively parallel assay for ribozyme kinetics: utility and critical parameters

Data files

Abstract

Methods

Usage notes

Works referencing this dataset