Data from: DNA O-MAP uncovers the molecular neighborhoods associated with specific genomic loci
Data files
Apr 01, 2026 version files 60.26 GB
-
chr10anchor0_e_S4_R1_001.fastq.gz
949.37 MB
-
chr10anchor0_e_S4_R2_001.fastq.gz
863.08 MB
-
chr10anchor0_i_S8_R1_001.fastq.gz
588.74 MB
-
chr10anchor0_i_S8_R2_001.fastq.gz
541.39 MB
-
chr3anchor1_e_S1_R1_001.fastq.gz
856.20 MB
-
chr3anchor1_e_S1_R2_001.fastq.gz
781.19 MB
-
chr3anchor1_i_S5_R1_001.fastq.gz
590.24 MB
-
chr3anchor1_i_S5_R2_001.fastq.gz
542.39 MB
-
chr3anchor2_e_S3_R1_001.fastq.gz
14.63 GB
-
chr3anchor2_e_S3_R2_001.fastq.gz
13.39 GB
-
chr3anchor2_i_S7_R1_001.fastq.gz
693.50 MB
-
chr3anchor2_i_S7_R2_001.fastq.gz
638.11 MB
-
DNA.OMAP.manuscript.figure.panel.micrographs.tar.gz
1.98 MB
-
DNA.OMAP.manuscript.tablular.raw.data.tar.gz
4.16 MB
-
Figure2S3_Images.tar.gz
4.24 GB
-
figure3_genome_pileups.tar.gz
16.78 GB
-
R-anchor-3_input_S13_R1_001.fastq.gz
116.86 MB
-
R-anchor-3_input_S13_R2_001.fastq.gz
120.90 MB
-
R-anchor-3_S5_R1_001.fastq.gz
726.90 MB
-
R-anchor-3_S5_R2_001.fastq.gz
751.40 MB
-
R-anchor-4_input_S15_R1_001.fastq.gz
234.24 MB
-
R-anchor-4_input_S15_R2_001.fastq.gz
245.02 MB
-
R-anchor-4_S7_R1_001.fastq.gz
970.87 MB
-
R-anchor-4_S7_R2_001.fastq.gz
1 GB
-
README.md
18.80 KB
Abstract
This dataset contains the raw sequencing data, processed alignments, fluorescent microscopy images, and tabular figure source data associated with the development and application of DNA O-MAP, a method for profiling the proteins and DNA interactions proximal to specific genomic loci in fixed cells. The dataset includes: (1) 14 paired-end FASTQ files from Illumina MiSeq sequencing of DNA O-MAP experiments targeting single-copy chromatin loop anchors on chromosomes 3, 10, and 19 in HCT 116 cells; (2) 10 BAM alignment files (GRCh38/hg38, ENCODE blacklist-filtered) derived from these sequencing libraries; (3) 20 raw three-dimensional fluorescent microscopy image stacks (.nd2 format) used to quantify labeling specificity and efficiency for telomere, pericentromeric alpha-satellite, mitochondrial, and no-probe control conditions; (4) 48 individual micrograph panels (PNG) used in manuscript figures; and (5) 32 CSV files and 1 Cytoscape JSON file containing the processed numerical data underlying all manuscript figure panels, including proteomic enrichment analyses, gene set enrichment results, genome coverage profiles, and protein interaction networks. FASTQ and BAM files can be reanalyzed with standard genomic alignment tools. Microscopy files can be opened with Fiji/ImageJ (Bio-Formats plugin) or NIS-Elements. Tabular data files can be read with R, Python, or any spreadsheet software.
Dataset DOI: 10.5061/dryad.fn2z34v98
Overview
This dataset contains the raw sequencing data, processed alignments, microscopy images, and tabular source data underlying all figures in Liu*, McGann*, Herlihy* et al. (eLife, 2026). DNA O-MAP is a method that uses programmable oligonucleotide probes conjugated to horseradish peroxidase (HRP) to biotinylate proteins in spatial proximity to targeted genomic loci in fixed cells. The biotinylated proteins and associated DNA can then be purified and analyzed by mass spectrometry or next-generation sequencing. This dataset includes data from experiments targeting telomeres, pericentromeric alpha-satellite repeats, the mitochondrial genome, single-copy chromatin loop anchors, HOX gene clusters, and the active and inactive X chromosomes.
Description of the data and file structure
This dataset consists of four compressed archives and 14 paired-end FASTQ sequencing files. The contents and naming conventions for each are described below.
Paired-end FASTQ sequencing files
These files contain raw paired-end Illumina sequencing reads from DNA O-MAP experiments targeting single-copy chromatin loop anchors in HCT 116 cells (corresponding to Figure 3 and associated supplements in the manuscript). DNA O-MAP was used to biotinylate chromatin at specific loop anchor sites, and the biotinylated DNA was purified and sequenced to detect trans-interacting genomic loci. The reference genome used for downstream alignment was GRCh38 (hg38).
File naming convention: {target}_{fraction}_S{index}_R{read}_001.fastq.gz
- target: The genomic locus targeted by the DNA O-MAP probe.
chr3anchor1= Chromosome 3 loop anchor, left side (Figure 3D)chr3anchor2= Chromosome 3 loop anchor, right side (Figure 3D)chr10anchor0= Chromosome 10 non-loop anchor control (Figure 3D)R-anchor-3= Chromosome 19 loop anchor (Figure 3E/F)R-anchor-4= Chromosome 19 loop anchor (Figure 3E/F)
- fraction: The sample fraction.
eor no suffix (for R-anchor files without_input) = Eluate, the biotin-enriched fraction pulled down with streptavidin beadsiorinput= Input, the whole-cell chromatin sample prior to streptavidin pulldown, serving as a normalization control
- S{index}: Illumina sample index number assigned during library preparation (can be ignored for data interpretation)
- R1/R2: Paired-end sequencing read direction. R1 = forward read, R2 = reverse read. Each R1 and R2 file pair represents the two ends of the same sequencing library, not biological replicates.
Complete file list:
| File | Target | Fraction |
|---|---|---|
| chr3anchor1_e_S1_R1_001.fastq.gz | Chr3 loop anchor, left | Eluate, read 1 |
| chr3anchor1_e_S1_R2_001.fastq.gz | Chr3 loop anchor, left | Eluate, read 2 |
| chr3anchor1_i_S5_R1_001.fastq.gz | Chr3 loop anchor, left | Input, read 1 |
| chr3anchor1_i_S5_R2_001.fastq.gz | Chr3 loop anchor, left | Input, read 2 |
| chr3anchor2_e_S3_R1_001.fastq.gz | Chr3 loop anchor, right | Eluate, read 1 |
| chr3anchor2_e_S3_R2_001.fastq.gz | Chr3 loop anchor, right | Eluate, read 2 |
| chr3anchor2_i_S7_R1_001.fastq.gz | Chr3 loop anchor, right | Input, read 1 |
| chr3anchor2_i_S7_R2_001.fastq.gz | Chr3 loop anchor, right | Input, read 2 |
| chr10anchor0_e_S4_R1_001.fastq.gz | Chr10 non-loop control | Eluate, read 1 |
| chr10anchor0_e_S4_R2_001.fastq.gz | Chr10 non-loop control | Eluate, read 2 |
| chr10anchor0_i_S8_R1_001.fastq.gz | Chr10 non-loop control | Input, read 1 |
| chr10anchor0_i_S8_R2_001.fastq.gz | Chr10 non-loop control | Input, read 2 |
| R-anchor-3_S5_R1_001.fastq.gz | Chr19 loop anchor | Eluate, read 1 |
| R-anchor-3_S5_R2_001.fastq.gz | Chr19 loop anchor | Eluate, read 2 |
| R-anchor-3_input_S13_R1_001.fastq.gz | Chr19 loop anchor | Input, read 1 |
| R-anchor-3_input_S13_R2_001.fastq.gz | Chr19 loop anchor | Input, read 2 |
| R-anchor-4_S7_R1_001.fastq.gz | Chr19 loop anchor | Eluate, read 1 |
| R-anchor-4_S7_R2_001.fastq.gz | Chr19 loop anchor | Eluate, read 2 |
| R-anchor-4_input_S15_R1_001.fastq.gz | Chr19 loop anchor | Input, read 1 |
| R-anchor-4_input_S15_R2_001.fastq.gz | Chr19 loop anchor | Input, read 2 |
Archive: figure3_genome_pileups.tar.gz
Contains BAM alignment files used to generate the genome browser pileup tracks in Figure 3 and associated supplements. These are the aligned and blacklist-filtered versions of the FASTQ files described above, aligned to GRCh38 (hg38).
File naming convention: {target}_{fraction}_no_blacklist.bam
- target: Same naming as the FASTQ files above (e.g.,
chr3anchor1,R-anchor3) - fraction:
e= eluate (enriched);i= input. For R-anchor files, files without_iare eluate. no_blacklist: Indicates that reads mapping to ENCODE blacklist regions have been removed.
Contents:
| File | Description |
|---|---|
| chr3anchor1_e_no_blacklist.bam | Chr3 left anchor, eluate |
| chr3anchor1_i_no_blacklist.bam | Chr3 left anchor, input |
| chr3anchor2_e_no_blacklist.bam | Chr3 right anchor, eluate |
| chr3anchor2_i_no_blacklist.bam | Chr3 right anchor, input |
| chr10anchor0_e_no_blacklist.bam | Chr10 non-loop control, eluate |
| chr10anchor0_i_no_blacklist.bam | Chr10 non-loop control, input |
| R-anchor3_no_blacklist.bam | Chr19 anchor 3, eluate |
| R-anchor3_i_no_blacklist.bam | Chr19 anchor 3, input |
| R-anchor4_no_blacklist.bam | Chr19 anchor 4, eluate |
| R-anchor4_i_no_blacklist.bam | Chr19 anchor 4, input |
Archive: DNA.OMAP.manuscript.figure.panel.micrographs.tar.gz
Contains the individual fluorescent micrograph panels (PNG format) used to compose the microscopy figure panels in Figures 1–5 of the manuscript. These are cropped, single-channel or merged images exported from the microscopy acquisition software.
Directory: final_microscopy/
File naming convention: Figure{N}_{NN}.png
- N: The manuscript figure number (1–5).
- NN: A two-digit panel index within that figure (01, 02, 03, ...). Individual panels correspond to different channels (e.g., DAPI, DNA FISH, streptavidin) or merged views of the same field of view. The ordering follows the left-to-right, top-to-bottom arrangement of micrograph panels within each manuscript figure.
The archive contains 48 PNG files total: 8 for Figure 1, 16 for Figure 2, 8 for Figure 3, 12 for Figure 4, and 4 for Figure 5.
Archive: Figure2S3_Images.tar.gz
Contains the raw three-dimensional fluorescent microscopy image stacks (Nikon .nd2 format) used for the quantitative analysis of DNA O-MAP labeling specificity and efficiency reported in Figure S3 (supplement to Figure 2). Images were acquired on a Nikon Ti2-Eclipse microscope equipped with a Yokogawa SoRa spinning disk confocal unit using a 60x objective. Each .nd2 file contains a multi-channel z-stack with DAPI (nuclear stain), DNA FISH (probe signal), and streptavidin (biotin detection) channels.
Directory: Images_For_Nico/
File naming convention: {Condition}_{objective}FoV_p{plate}_{date}_{field}.nd2
- Condition: The DNA O-MAP probe target or control.
Telomere= Telomere-targeting oligonucleotide probePanAlpha= Pericentromeric alpha-satellite-targeting oligonucleotide probeMito= Mitochondrial genome-targeting oligonucleotide probeNoProbe= No-primary-probe negative control
- objective:
60xin all files - p{plate}: Plate identifier (e.g., p27, p28)
- date: Acquisition date in YYYY-MM-DD format
- field: An integer index identifying the field of view (0-indexed)
The archive contains 20 .nd2 files total: 5 fields of view each for Telomere, PanAlpha, and NoProbe conditions, and 5 fields of view for the Mito condition.
Archive: DNA.OMAP.manuscript.tablular.raw.data.tar.gz
Contains CSV files with the underlying numerical data for all plots in the manuscript figures, as well as one JSON file containing protein interaction network data.
Directory: Figure-Data/
File naming convention: Figure{N}{Panel}_Graph-Data.csv
- N: The manuscript figure number (1, 2, 4, 5) or supplement designation (e.g., 2S1, 2S2, 2S4, 2S5, 4S1, 5S1).
- Panel: The letter or letter-number identifier of the specific panel within that figure (e.g., A, B, C, D, E, F, G, H, I, J).
Each CSV file contains the processed data underlying a single figure panel. Because these files correspond to diverse plot types (volcano plots, bar charts, box plots, gene set enrichment plots, genome coverage tracks, BioPlex network analyses), the column structure varies by file. As an exemplar, the structure of Figure1E_Graph-data.csv is described below. This file contains the protein overlap analysis comparing DNA O-MAP telomeric hits to five previously published telomeric proteomics datasets.
Columns in Figure1E_Graph-data.csv:
- Prey: UniProt accession ID of the detected protein
- PreyGene: Gene symbol of the detected protein
- dataset1 through dataset5: Columns indicating the name of each previously published telomeric proteomics dataset in which the protein was detected. NA indicates the protein was not found in that dataset. The five datasets are PICh, C-BERST, CAPLOCUS, CAPTURE, and BioID.
- dataset0: Indicates the dataset of origin for the current study. "OMAP" indicates the protein was detected by DNA O-MAP.
- count: The number of prior datasets (out of 5) in which the protein was previously observed.
The archive also contains Figure 1 Bioplex Cytoscape Confident interactors.json, which is a Cytoscape-format JSON file encoding the BioPlex protein-protein interaction network for DNA O-MAP telomeric hits shown in Figure 1D. This file can be opened with Cytoscape (version 3.10 or later).
The archive contains 32 CSV files and 1 JSON file total.
Figure descriptions:
Figure 1: Overview of DNA O-MAP workflow and label-free quantitative proteomics analysis of telomeres.
A) Schematic of DNA O-MAP. B) Fluorescent microscopy data showing the observed patterns of DNA (DAPI, left) and in situ biotinylation detected by staining with fluorescent streptavidin conjugates (middle, left) and overview of telomere targeted DNA O-MAP experiment. C) Significant gene sets identified by the Gene Set Enrichment Analysis of the proteins enriched by the telomere probe. D) DNA O-MAP telomeric proteins mapped onto the BioPlex interaction network. The red box highlights shelterin complex proteins. Nodes are colored by the fold-enrichment compared to a no-primary-probe control shown in B, excluding unconnected nodes. E) Telomeric proteins observed in five previous datasets (PICh, C-BERST, CAPLOCUS, CAPTURE, BioID) superimposed onto Figure 1E, colored by the number of prior datasets where the protein was present and including unconnected nodes. Scale bars, 5 µm.
Figure 2: DNA O-MAP reveals distinct features of the sub-proteomes at peri-centromeric alpha satellites, telomeres, and the mitochondrial genome.
A) Workflow of DNA O-MAP integrated with sample multiplexing quantitative proteomics. B) Schematic of the three DNA loci examined in the TMT16plex experiment: peri-centromeric alpha satellites, telomeres, and mitochondrial genomes. C) Co-localization of DNA FISH and the streptavidin staining of the proteins biotinylated by DNA O-MAP targeting the peri-centromeric alpha satellites, telomeres, and mitochondrial genomes. Scale bar: 5 µm. D) Principal component analysis of scaled intensities of proteins enriched by the pan-alpha probe, telomere probe, mitochondrial genome oligo pool, and no-primary-probe control. E) Unsupervised hierarchical clustering of scaled intensities of proteins enriched by the pan-alpha probe, telomere probe, mitochondrial genome oligo pool, and no-primary-probe control. F) Log2 fold change of proteins compared to no-primary-probe control, grouped by HPA subcellular location. Significance calculated based on Welch’s t-test for pairwise comparisons (****: p-value <0.0001). G–J) Log2 fold change of proteins compared to mitochondrial probe enriched proteins for the RNA Polymerases (G), mtDNA nucleoid packaging proteins (H), Shelterin (I), and CENP-A nucleosomal complexes (J). Significance calculated based on Welch’s t-test for pairwise comparisons (p-value: *<0.05, **<0.01, ***<0.001, ****<0.0001).
Figure 4: DNA O-MAP efficiently identifies the local proteome of the HOXA and HOXB gene clusters.
A) Schematic of DNA O-MAP being applied to the HOXA and HOXB gene clusters for identification of differentially enriched proteins. B) Representative images depicting overlap of FISH and Streptavidin labeling at HOXA and HOXB loci. C) Volcano plot of proteins identified at HOXA and HOXB loci. Each dot represents a single protein with proteins of interest called out in black. Green dots indicate proteins that passed significant enrichment thresholds with an absolute Log2 Fold Change greater than 1 (2-fold change) and corrected p-value < 0.05. D) ENCODE ChIP-seq data showing peak calls and p-values at HOXA and HOXB loci for selected enriched proteins, ZC3H13, SMARCB1, HDAC3, and TCF12. E) Schematic depicting the use of GSK126 with DNA O-MAP. F) Bar chart showing proteins with significantly altered abundance following treatment with GSK126 at HOXA. G) Bar chart showing proteins with significantly altered abundance at both HOXA and HOXB following treatment with GSK126 (Welch’s t-test, corrected p-value < 0.05).
Figure 5: DNA O-MAP elucidates the homolog-resolved chromosome X proteome.
A) Schematic of DNA O-MAP being applied to Xi and Xa for identification of differentially enriched proteins. B) Schematic showing the region of the X chromosome targeted by our primary hybridization probes. C) Representative images depicting overlap of Xist FISH and Xi Streptavidin labeling while spatially differentiated from Xa FISH. Scale bars are 10 uM. D) Volcano plot of proteins identified at Xa and Xi. Each dot represents a single protein with proteins of interest called out in black text. Green dots indicate proteins that passed significant enrichment thresholds with an absolute Log2 Fold Change greater than 1 (2-fold change) and corrected p-value < 0.1. E) Bar chart showing example proteins with significant enrichment at Xi in green and at Xa in blue. Corrected p-value < 0.1. F) ENCODE ChIP-seq data in mouse fibroblast cells at our targeted region of chromosome X for SMC3. G) Protein interaction networks of EIF and SWI/SNF complexes enriched at Xi. Node width is a function of corrected p-value and node color is a function of enrichment (Log2 Fold Change). H) Protein interaction network of the SWI/SNF complex from previously published RNA O-MAP of Xist. Node width is a function of -Log10(p-value) and node color is a function of enrichment (Log2 Fold Change).
Figure S1: Predicted genome-wide binding profile of the pan-alpha probe.
Figure S2: Replicate analysis of multi-target DNA O-MAP proteomics experiment. A) Pearson correlation coefficient of the raw protein intensity values for each replicate of the analysis with hierarchical clustering on the rows and columns.
Figure S4: Relative quantitation for the multi-target DNA O-MAP proteomics experiment compared to no-probe control and mtDNA datasets. Volcano plots from multiplexed proteomics experiments with proteins of interest highlighted. A-C) Fold-changes and significance calculated compared to no probe. D-F) Fold-changes and significance calculated compared to mtDNA probe.
Figure S5: Comparison of histone proteins between telomere and pan-alpha probes.
A) Log2 fold change of proteins compared to mitochondrial probe enriched histone complex proteins. Significance calculated based on Welch’s t-test for pairwise comparisons (p-value: *<0.05, **<0.01, ***<0.001, ****<0.0001). B) Volcano plot comparing the fold change of pan-alpha to the mtDNA probe with spindle proteins highlighted.
Data collection
- Cell lines: HCT 116 (human colorectal carcinoma) was used for label-free proteomics (telomere, Figure 1) and chromatin loop sequencing experiments (Figure 3). K562 (human chronic myelogenous leukemia) was used for multiplexed proteomics experiments (Figures 2, 4, 5). EY.T4 (mouse embryonic stem cells) was used for Xist-related experiments where noted in the manuscript.
- Sequencing: Paired-end sequencing was performed on an Illumina MiSeq.
- Microscopy: Fluorescent images were acquired on a Nikon Ti2-Eclipse microscope equipped with a Yokogawa SoRa spinning disk confocal unit.
- Mass spectrometry: Proteomic data were collected on a Thermo Fisher Orbitrap Eclipse (label-free experiments) or Orbitrap Fusion Eclipse (multiplexed TMTpro experiments). Raw mass spectrometry data are available at the proteomics data repository indicated in the manuscript. The CSV files in this archive contain the processed proteomic results used for figure generation.
Reuse potential
The raw FASTQ and BAM files can be realigned and reanalyzed with standard genomic pipelines (e.g., BWA, Bowtie2, SAMtools) to examine chromatin loop interactions at the targeted loci. The .nd2 microscopy stacks can be opened in Fiji/ImageJ (with the Bio-Formats plugin) or NIS-Elements for reanalysis of labeling specificity and colocalization. The tabular CSV data can be loaded in R, Python, or any spreadsheet application. The Cytoscape JSON file can be visualized in Cytoscape 3.10 or later.
Code availability
Analysis code is available in the associated GitHub repository linked in the manuscript.
Associated publication
Liu Y*, McGann CD*, Herlihy CP*, et al. DNA O-MAP uncovers the molecular neighborhoods associated with specific genomic loci. eLife (2026). https://doi.org/10.7554/eLife.102489
Access information
Data were generated using the following instruments:
- Illumina MiSeq (sequencing)
- Nikon Ti2-Eclipse with Yokogawa SoRa spinning disk confocal (microscopy)
- Thermo Fisher Orbitrap Eclipse / Orbitrap Fusion Eclipse (mass spectrometry)
