Nanopore sequencing data analysis using Microsoft Azure cloud computing service
Data files
Oct 10, 2022 version files 9.66 GB
-
BC01_P17-7990954.fastq
220.32 MB
-
BC02_P17-7990841.fastq
14.02 MB
-
BC03_P17-7990891.fastq
38.39 MB
-
BC04_P17-7990931.fastq
7.07 MB
-
BC05_P17-7990932.fastq
120.90 MB
-
BC06_P17-7990962.fastq
5.07 MB
-
BC07_P17-7990894.fastq
100.49 MB
-
BC08_P17-7990895.fastq
6.30 MB
-
BC09_P17-7990897.fastq
191.12 MB
-
BC10_P17-7990846.fastq
115.67 MB
-
BC11_P16-7991191.fastq
131.92 MB
-
BC12_P16-7991192.fastq
93.84 MB
-
BC13_P16-7991203.fastq
309.99 MB
-
BC14_P17-7990597.fastq
221.04 MB
-
BC15_P17-7990614.fastq
252.80 MB
-
BC16_P17-7990600.fastq
75.27 MB
-
BC17_P17-7990603.fastq
151.08 MB
-
BC18_P17-7990608.fastq
20.42 MB
-
BC19_P17-7990966.fastq
165 MB
-
BC20_P17-7990832.fastq
6.75 MB
-
BC21_P17-7990906.fastq
107.64 MB
-
BC22_P17-7990904.fastq
223.25 MB
-
BC23_P17-7990952.fastq
71.41 MB
-
BC24_P17-7990957.fastq
68.97 MB
-
BC25_P07-7427787.fastq
35.84 MB
-
BC26_P19-9406426.fastq
42.85 MB
-
BC27_P19-9505478.fastq
22.48 MB
-
BC28_P19-9078230.fastq
23.59 MB
-
BC29_P12-9901151.fastq
239.70 MB
-
BC30_IMM-20-542.fastq
25.77 MB
-
BC31_IMM-20-774.fastq
14.47 MB
-
BC32_IMM-20-776.fastq
204.95 MB
-
BC33_IMM-20-888.fastq
60.40 MB
-
BC34_IMM-20-1275.fastq
98.71 MB
-
BC35_IMM-20-1947.fastq
208.34 MB
-
BC36_IMM-20-1805.fastq
215.76 MB
-
BC37_IMM-20-2215.fastq
84.28 MB
-
BC38_IMM-20-2122.fastq
413.94 MB
-
BC39_P18-3725089.fastq
106.92 MB
-
BC40_IMM-20-2609.fastq
16.29 MB
-
BC41_R97-0900270.fastq
266.04 MB
-
BC42_R05-0114117.fastq
81.28 MB
-
BC43_Q94-0054090.fastq
66.44 MB
-
BC44_Q94-0056290.fastq
281.21 MB
-
BC45_Q94-0057490.fastq
304.29 MB
-
BC46_Q94-0053774.fastq
100.63 MB
-
BC47_Q95-0051198.fastq
116.97 MB
-
BC48_Q94-0052178.fastq
259.16 MB
-
IMM-20-1275_barcode34.fastq
62.95 MB
-
IMM-20-1805_barcode36.fastq
59.16 MB
-
IMM-20-1947_barcode35.fastq
115.09 MB
-
IMM-20-2122_barcode38.fastq
325.47 MB
-
IMM-20-2215_barcode37.fastq
50.04 MB
-
IMM-20-2609_barcode40.fastq
3.79 MB
-
IMM-20-542_barcode30.fastq
3.73 MB
-
IMM-20-774_barcode31.fastq
2.65 MB
-
IMM-20-776_barcode32.fastq
86.83 MB
-
IMM-20-888_barcode33.fastq
12.85 MB
-
P07-7427787D_barcode25.fastq
29 MB
-
P12-9901151E_barcode29.fastq
63.96 MB
-
P16-7991191_barcode11.fastq
137.36 MB
-
P16-7991192_barcode12.fastq
105.32 MB
-
P16-7991203_barcode13.fastq
227.39 MB
-
P17-7990597_barcode14.fastq
75.09 MB
-
P17-7990600_barcode16.fastq
23.86 MB
-
P17-7990603_barcode17.fastq
117.67 MB
-
P17-7990608_barcode18.fastq
28.93 MB
-
P17-7990614_barcode15.fastq
102.11 MB
-
P17-7990832_barcode20.fastq
2.21 MB
-
P17-7990841_barcode02.fastq
17.39 MB
-
P17-7990846_barcode10.fastq
119.90 MB
-
P17-7990891_barcode03.fastq
40.25 MB
-
P17-7990894_barcode07.fastq
107.90 MB
-
P17-7990895_barcode08.fastq
9.54 MB
-
P17-7990897_barcode09.fastq
197.63 MB
-
P17-7990904_barcode22.fastq
70.24 MB
-
P17-7990906_barcode21.fastq
58.15 MB
-
P17-7990931_barcode04.fastq
12.70 MB
-
P17-7990932_barcode05.fastq
141.73 MB
-
P17-7990952_barcode23.fastq
31.35 MB
-
P17-7990954_barcode01.fastq
239.24 MB
-
P17-7990957_barcode24.fastq
50.03 MB
-
P17-7990962_barcode06.fastq
7.80 MB
-
P17-7990966_barcode19.fastq
141.66 MB
-
P18-3725089A_barcode39.fastq
62.19 MB
-
P19-9078230S_barcode28.fastq
16.71 MB
-
P19-9406426S_barcode26.fastq
21.57 MB
-
P19-9505478C_barcode27.fastq
2.81 MB
-
Q94-0052178_barcode48.fastq
86.89 MB
-
Q94-0053774_barcode46.fastq
23.95 MB
-
Q94-0054090_barcode43.fastq
44.54 MB
-
Q94-0056290_barcode44.fastq
154.96 MB
-
Q94-0057490_barcode45.fastq
157.44 MB
-
Q95-0051198_barcode47.fastq
90.63 MB
-
R05-0114117_barcode42.fastq
42.59 MB
-
R97-0900270_barcode41.fastq
67.58 MB
-
README.md
416 B
Abstract
Genetic information provides insights into the exome, genome, epigenetics and structural organisation of the organism. Given the enormous amount of genetic information, scientists are able to perform mammoth tasks to improve the standard of health care such as determining genetic influences on outcome of allogeneic transplantation. Cloud-based computing has increasingly become a key choice for many scientists, engineers and institutions as it offers on-demand network access and users can conveniently rent rather than buy all required computing resources. With the positive advancements of cloud computing and nanopore sequencing data output, we were motivated to develop an automated and scalable analysis pipeline utilizing cloud infrastructure in Microsoft Azure to accelerate HLA genotyping service and improve the efficiency of the workflow at lower cost. In this study, we describe (i) the selection process for suitable virtual machine sizes for computing resources to balance between the best performance versus cost-effectiveness; (ii) the building of Docker containers to include all tools in the cloud computational environment; (iii) the comparison of HLA genotype concordance between the in-house manual method and the automated cloud-based pipeline to assess data accuracy. In conclusion, the Microsoft Azure cloud-based data analysis pipeline was shown to meet all the key imperatives for performance, cost, usability, simplicity and accuracy. Importantly, the pipeline allows for the ongoing maintenance and testing of version changes before implementation. This pipeline is suitable for data analysis from MinION sequencing platforms and could be adopted for other data analysis application processes.
Methods
Manual analysis performed onsite versus Automated analysis performed by Azure cloud server.