.bam alignment files of Illumina and ONT sequencing of pREF plasmid
Data files
Feb 05, 2024 version files 155.88 MB
-
Illumina_DNApREF_1mil.sorted.bam
-
Illumina_DNApREF_1mil.sorted.bam.bai
-
ONT_DNApREF_sorted.bam
-
ONT_DNApREF_sorted.bam.bai
-
README.md
Abstract
The expression of genes encompasses their transcription into mRNA followed by translation into protein. In recent years, next-generation sequencing and mass spectrometry methods have profiled DNA, RNA and protein abundance in cells. However, there are currently no reference standards that are compatible across these genomic, transcriptomic and proteomic methods, and provide an integrated measure of gene expression. Here, we use synthetic biology principles to engineer a multi-omics control, termed pREF, that can act as a universal molecular standard for next-generation sequencing and mass spectrometry methods. The pREF sequence encodes 21 synthetic genes that can be in vitro transcribed into spike-in mRNA controls, and in vitro translated to generate matched protein controls. The synthetic genes provide qualitative controls that can measure sensitivity and quantitative accuracy of DNA, RNA and peptide detection. We demonstrate the use of pREF in metagenome DNA sequencing and RNA sequencing experiments and evaluate the quantification of proteins using mass spectrometry. Unlike previous spike-in controls, pREF can be independently propagated and the synthetic mRNA and protein controls can be sustainably prepared by recipient laboratories using common molecular biology techniques. Together, this provides the first universal synthetic standard able to integrate genomic, transcriptomic and proteomic methods.
README
This README file was generated on 2024-01-25 by Helen Gunter
GENERAL INFORMATION
1. Title of Dataset: bam alignment files of Illumina and ONT sequencing
of pREF plasmid
2. Author information
A. Principle Investigator Contact Information Name: Timothy Mercer
Institution: Australian Institute for Bioengineering and Nanotechnology,
The University of Queensland Address: Brisbane, Qld, Australia Email:
t.mercer@uq.edu.au
B. Co-investigator Name: Helen Gunter Institution: Australian Institute
for Bioengineering and Nanotechnology, The University of Queensland
Address: Brisbane, Qld, Australia Email: h.gunter@uq.edu.au
3. Date of data collection: 2021
4. Geographic location of data collection: Sydney, Australia
5. Information about funding sources that supported the collection of
the data: NHMRC grants APP1108254, APP1114016, APP1136067, UNSW Tuition
Fee Scholarship and Cancer Institute NSW Early Career Fellowship
2018/ECF013.
---
SHARING/ACCESS INFORMATION
1. Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0
1.0) Public Domain
2. Links to publications that cite or use the data: Gunter, H. M., et
al. (in editorial review) A universal molecular control for DNA, mRNA
and protein expression. Nat. Commun.
3. Links to other publicly accessible locations of the data: None
4. Links/relationships to ancillary data sets:
Genomic and transcriptomic sequencing data generated from generated from
pREF: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA815898
5. Was data derived from another source? No A. If yes, list source(s):
NA
6. Recommended citation for this dataset:
Gunter, H. M., et al. (in editorial review) A universal molecular
control for DNA, mRNA and protein expression. Nat. Commun.
DATA & FILE OVERVIEW
1. File List:
A) ONT_DNApREF_sorted.bam.bai B) ONT_DNApREF_sorted.bam C)
Illumina_DNApREF_1mil.sorted.bam.bai D) Illumina_DNApREF_1mil.sorted.bam
2. Relationship between files, if important:
A) ONT_DNApREF_sorted.bam.bai is an index file for
ONT_DNApREF_sorted.bam
B) llumina_DNApREF_1mil.sorted.bam.bai is an index file for
Illumina_DNApREF_1mil.sorted.bam
C) ONT_DNApREF_sorted.bam and Illumina_DNApREF_1mil.sorted.bam include
sequences of the same pREF plasmid, generated using different sequencing
chemistries
3. Additional related data collected that was not included in the
current data package:
Genomic and transcriptomic sequencing data generated from generated from
pREF: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA815898
4. Are there multiple versions of the dataset? No A. If yes, name of
file(s) that was updated: NA i. Why was the file updated? NA ii. When
was the file updated? NA
#########################################################################
DATA-SPECIFIC INFORMATION FOR: ONT_DNApREF_sorted.bam
1. Sample type sequenced: pREF plasmid
2. Library preparation method: Ligation Sequencing
3. Sequencing platform used: Oxford Nanopore
4. Data filtering: trimmed, passed reads
5. Aligned to: pREF plasmid sequence
#########################################################################
DATA-SPECIFIC INFORMATION FOR: Illumina_DNApREF_1mil.sorted.bam
1. Sample type sequenced: pREF plasmid
2. Library preparation method: KAPA HyperPlus PCR-based kit
3. Sequencing platform used: Illumina
4. Data filtering: trimmed, passed reads
5. Aligned to: pREF plasmid sequence
#########################################################################
Methods
Illumina DNA sequencing pREF.
We first sequenced neat preparations of pREF. Four replicate libraries were prepared using the KAPA HyperPlus PCR-based kit (Illumina) according to the manufacturer’s instructions. Prepared libraries were quantified on a Qubit (Invitrogen) and verified on the Agilent 2100 Bioanalyzer with the Agilent High Sensitivity DNA Kit (Agilent Technologies). The libraries were then sequenced on a NovaSeq (Illumina). The sequencing was performed at the Kinghorn Centre for Clinical Genomics, Darlinghurst, New South Wales.
ONT DNA sequencing pREF.
pREF was linearised using restriction enzymes, and four replicate libraries were prepared for nanopore sequencing, with the LSK108 kit (1D ligation) according to the manufacturer’s instructions. The resulting libraries were sequenced on a PromethION instrument, at the Kinghorn Centre for Clinical Genomics, Darlinghurst, New South Wales. Base-calling was achieved using ONT Albacore Sequencing Pipeline Software (version 1.2.6).
pREF alignment and kmer analysis.
The four replicate Illumina short read DNA libraries were aligned to reference sequences containing the pREF plasmid using BWA-MEM2, while the two Illumina RNA libraries (transcribed with Sp6 and T7 polymerase) were aligned using bowtie2 (v2.4.0). Long read DNA and RNA libraries generated by Oxford Nanopore sequencing were aligned to the pREF reference sequence using MiniMap2 (v2.17-r941) with the parameters 'minimap2 -ax map-ont' optimized for Oxford Nanopore libraries. Alignment files were sorted and indexed using samtools (v1.9) and pysamstats were used to retrieve the coverage and specific error types, such as mismatches or insertions and deletions, for every reference sequence position.