Skip to main content
Dryad

Predicting genome-wide tissue-specific enhancers via combinatorial transcription factor genomic occupancy analysis

Data files

Oct 07, 2024 version files 18.15 MB

Abstract

Background

Enhancers belong to the class of non-coding cis-regulatory elements that play a vital role in transcriptional regulation. Mutations in enhancers effect gene regulation and can lead to various disease phenotypes. This has led to an increased interest in identifying enhancers and evaluating the impact of mutations on the enhancers’ activities. However, in contrast to protein-coding intervals, enhancers lack a stereotyped sequence composition. Therefore, the computational prediction of enhancers and their tissue-specificity has remained challenging. Consequently, enhancers are typically predicted based on certain chromatin features, including DNA accessibility, post-translational modifications of histones, and transcription factor (TF) binding.  Although these features correlate with enhancer regions, they are only imperfect predictors.

Results

The present study reports a sequence-based computational model that employs combinatorial TF genomic occupancy as principal determinant to predict tissue-specific enhancers. This model was trained on different data sets including the Encyclopedia of DNA Elements (ENCODE) based DNA accessibility data, Vista enhancer browser based in vivo experimental data, and phylogenetic foot-printing of binding motifs. The application of this novel computational scheme has enabled the prediction of 25,000 forebrain specific cis-regulatory modules (CRMs) in human genome. These predicted CRMs were subjected to validation phase by using ENCODE based enhancer-associated biochemical features, GWAS-based disease associated SNPs and in vivo analysis in zebrafish.

 Conclusion

Validation data revealed that this new computational model is suitable for predicting less well-conserved tissue-specific enhancers regions that are devoid of characterized chromatin features, and therefore is able to complement and facilitate experimental approaches in tissue-specific enhancer discovery.