Raw data associated with the article: "Single-molecule DNA sequencing of widely varying GC-content using nucleotide release, capture and detection in microdroplets.", NAR, Puchtler et.al.
Cite this dataset
Puchtler, Tim (2020). Raw data associated with the article: "Single-molecule DNA sequencing of widely varying GC-content using nucleotide release, capture and detection in microdroplets.", NAR, Puchtler et.al. [Dataset]. Dryad. https://doi.org/10.5061/dryad.4xgxd2575
All data taken in the production of the corresponding paper: "Single-molecule DNA sequencing of widely varying GC-content using nucleotide release, capture and detection in microdroplets."
The associated manuscript describes a method for DNA sequencing which involves the sequential release of nucleotides from a single, immobilised strand of DNA via pyrophosphorolysis (PPL). Released nucleotides, in the form of dNTPs, are captured in microdroplets which are manipulated using an optical-EWOD platform. A detection chemistry within each droplet releases a specific dye depending on which dNTPs are present, allowing the optical read-out of bases within each droplet. Hence, by capturing bases sequentially within droplets as they are cleaved from the strand of DNA, the sequence can be optically identified.
Manipulation of the droplets, each containing the required reagents for PPL, is performed using a photoactive-dielectric chip for optical electrowetting-on-dielectric (oEWOD). This chip is addressed using programmable illumination from a DMD-projector setup, with droplets in oil between the dielectric stack.
DNA is immobilised on the sample surface by binding to a silica microsphere and subsequentially trapping the bead using compression.
Once droplets containing the reagents necessary for PPL are passed over the DNA, they are mixed with the detection reagents and kept in-order. They are subsequentially scanned using a standard fluorescence microscope, where each wavelength band used corresponds to a difference base present.
All data is derrived from fluorescence images of the droplets after they have captured the released nucleotides, with the parameters of droplet size and brightness in each channel being recorded (see 'Basecalling' tabs in data). It is from this intensity data per detection channel that the presence of nucleotides can be determined, and hence the sequence of released nucleotides reconstructed, allowing a sequence to be generated.
All methods for collection of this data are presented in detail in the corresponding manuscript.
Whilst the majority of data presented is self-explanatory with the associated manuscript, some notes should be provided for the 'Basecalling' tabs, as this is more complicated:
- Each row represents a different identified droplet taken from the fluorescence scan.
- Of these, entries with 'RA Success' should be taken, denoting that the droplets have successfully been merged (PPL and DET mixes) after passing over DNA, as determined by background fluorescence in a tracer channel (not linked to dNTP detection fluors)
- For these droplets, the raw data is essentially "size" and intensities in 4 scanning channels (i.e. "I_532", "I594", "I_633" and "I_700"). This is the data presented in the manuscript figures.
- These intensities are plotted, as in the manuscript, and the distributions used to associate occupation of the bases. The decision of whether each droplet belongs to the 'occupied' or 'unoccupied' peaks is written in the columns labelled "called_532", "called_594" etc. It is from this that sequence data is generated.
- All other data in the 'Basecalling' tabs are additional estimations of the liklihood of misassigning the droplets to the wrong catagory, or error handeling for other models used. They may be of interest, but are not in any way key to the dataset as already described.
One other set of data should be given additional user notes, "False-Signal Analysis". In this, blocks of data are grouped vertically for a given sequence of DNA, and horisontal groupings are for every identified droplet within an experiment ("All droplets"), those which have not passed over DNA but still contain PPL and detection reagents (denoted "Merge-only") and those which have actually passed over the DNA and are to be used for sequencing ("DOB-1s). Each block contains the data of one experiment, and data for each scan wavelength measured. Estimations are then made for the likely concentration of dNTPs present ("scan concentration") based off estimations of the size of droplets and their intensities, as well as the overlap between 'occupied' and 'unoccupied' peaks of the intensity distribution, as described previously ("FP" and "FN"). Concentrations of dNTPs are also put into a 'per droplet' value ("bases per drop").
Base4 Innovation, Ltd.