Data from: Generative modeling of multi-mapping reads with mHi-C advances analysis of Hi-C studies

Zheng, Ye1; Ay, Ferhat2; Keles, Sunduz1

Published Feb 04, 2019 on Dryad. https://doi.org/10.5061/dryad.v7k3140

Data files

Feb 04, 2019 version files 6.50 GB

CTCF.zip

800.68 MB
H3K27ac.zip

981.14 MB
H3K27me3.zip

1.17 GB
H3K36me3.zip

903.34 MB
H3K4me1.zip

1.23 GB
H3K4me3.zip

661.96 MB
p300.zip

224.35 MB
p65.zip

181.21 MB
PolII.zip

357.66 MB

Abstract

Current Hi-C analysis approaches are unable to account for reads that align to multiple locations, and hence underestimate biological signal from repetitive regions of genomes. We developed and validated mHi-C, a multi-read mapping strategy to probabilistically allocate Hi-C multi-reads. mHi-C exhibited superior performance over utilizing only uni-reads and heuristic approaches aimed at rescuing multi-reads on benchmarks. Specifically, mHi-C increased the sequencing depth by an average of 20% resulting in higher reproducibility of contact matrices and detected interactions across biological replicates. The impact of the multi-reads on the detection of significant interactions is influenced marginally by the relative contribution of multi-reads to the sequencing depth compared to uni-reads, cis-to-trans ratio of contacts, and the broad data quality as reflected by the proportion of mappable reads of datasets. Computational experiments highlighted that in Hi-C studies with short read lengths, mHi-C rescued multi-reads can emulate the effect of longer reads. mHi-C also revealed biologically supported bona fide promoter-enhancer interactions and topologically associating domains involving repetitive genomic regions, thereby unlocking a previously masked portion of the genome for conformation capture studies.

p300 ChIP-seq peaks using uni- and multi-reads

ChIP-seq peaks detected following the standard ChIP-seq data processing pipeline of ENCODE (The ENCODE Project Consortium, 2012) using both uni-reads and multi-reads aligned by Permseq (Zeng et al., 2015) for p300.

p300.zip

p65 ChIP-seq peaks using uni- and multi-reads

p65.zip

PolII ChIP-seq peaks using uni- and multi-reads

PolII.zip

H3K4me1 ChIP-seq peaks using uni- and multi-reads

H3K4me1.zip

CTCF ChIP-seq peaks using uni- and multi-reads

CTCF.zip

H3K4me3 ChIP-seq peaks using uni- and multi-reads

H3K4me3.zip

H3K27ac ChIP-seq peaks using uni- and multi-reads

H3K27ac.zip

H3K27me3 ChIP-seq peaks using uni- and multi-reads

H3K27me3.zip

H3K36me3 ChIP-seq peaks using uni- and multi-reads

H3K36me3.zip