Skip to main content
Dryad

Data from: Structural variation and its potential impact on genome instability: novel discoveries in the EGFR landscape by long-read sequencing

Data files

Aug 04, 2021 version files 4.85 MB

Abstract

Studies of structural variation (SV) have been challenging due to technological contraints. With the advent of third generation (long-read) sequencing technology, exploration of longer stretches of DNA not easily examined previously has been made possible. In the present study, we utilized third generation (long-read) sequencing techniques to examime SV in the EGFR landscape of four haplotypes derived from two human samples. We analyzed the EGFR gene and its landscape (+/- 500,000 base pairs) using this sequencing approach and were able to identify regions of non-coding DNA which had relatively high similarity to the most common activating EGFR mutation in non-small cell lung cancer. We discovered that reverse complements to the exon 19 deletion mutation which had at least 60% homology to the EGFR exon 19 canonical deletion and were within ± 421,000 bp of the deletion varied across the five haploid genomes examined (4 patient landscapes and hg38). Although the sample size is limited in this study, the estimated variation observed in genomic stability between the five EGFR haplotypes examined is novel and encourages further work to examine structural variation in larger cohorts.