This archive contains material to reproduce the identification of candidate regulators of cell states. The analysis is presented in SA08_MotifEnrichmentAnalysis.Rmd in archive Hydra_Seurat_NMF_regulators_analyses. The document holds more information on the objects that can be found in this archive. The directory enrichment_resources/ contains multiple files that are needed to execute the enrichment analysis. It should be placed in the same directory as SA08_MotifEnrichmentAnalysis.Rmd. The directory Enrichment_Results/ contains analysis results from our original analysis. We include lists of genes within extended metagenes, plots for top transcription factors with metagene correlated expression, peak lists (regions of open chromatin) and identified motifs within putative egulatory regions of genes within metagenes. Please see comments and text in SA08_MotifEnrichmentAnalysis.Rmd for further details. enrichment_resources/ 2Rep.IDR.mod.bed - ATAC-seq peak consensus file, available as track on the Hydra 2.0 genome browser https://research.nhgri.nih.gov/hydra/. collapsedHeatmap.csv - manually collapsed motif enrichment results table used to generate enrichment matrix in manuscript Fig. 5A. findMotifs_homer.sh - shell script used to run HOMER enrichment analysis on metagene peaks Hm105_Dovetail_Assembly_1.0.fasta - Hydra 2.0 genome reference as available at https://research.nhgri.nih.gov/hydra/ Hydra_PFMs/ - JASPAR motifs identified as candidate binding motifs for Hydra proteins hydra.augustus.nameMod.fastp - Protein sequences derived from Hydra 2.0 gene models used in JASPAR profile inference hydra.augustus.pfam.filtered.csv - Pfam domains identified in Hydra 2.0 proteins using an independent expect-value equal to or below 1e-6 and with a minimum alignment length of 4aa JASPAR2018_CORE_redundant_pfms_jaspar - Complete set of available JASPAR motifs (available at http://jaspar.genereg.net) jaspar2homer.sh - Shell script to reformat JASPAR motifs in folder Hydra_PFMs to HOMER format. Uses: o parseJasparMatrix.pl - script provided by HOMER used to convert JASPAR PWM files to HOMER motif format o PWM_Convert.R metaMap.txt - Metagene - cell state annotations. Used as columns in the enrichment matrix that is presented in markdown Fig. 2 (SA08_MotifEnrichmentAnalysis.pdf). motifHeatmapFull.csv - Results table of significantly enriched motifs for all metagenes with at least one enriched motif S_Enrichment_Workflow.png - Figure to be included in markdown (Fig. 1, SA08_MotifEnrichmentAnalysis.pdf) TF_domains.txt - List of considered Pfam DBDs. This list was a modified from a previously published set of Pfam domains by adding selected domains (Mendoza et al. 2013, doi:10.1073/pnas.1311818110) Whole_2Rep_IDR_finalhits.txt - File containing peak - gene annotations (UROPA output, (Kondili et al. 2017, doi: 10.1038/s41598-017-02464-y) Enrichment_Results/ corResults.Rdata _ data frame of scores showing how well genes correlate with each metagene enrich.results.Rdata _ HOMER motif enrichment results enrichment_workspace.RData _ workspace holding multiple objects from the original analysis, for objects and content see annotations in SA08_MotifEnrichmentAnalysis.Rmd genes.motifs.Rdata _ genes and identified jaspar binding motif Homer_results/ _ enriched motifs in putative regulatory regions of metagene gene members as identified by HOMER. metagene_genes/ _ extended list of genes assigned to a metagene. metagene_peaks/ _ regions of open chromatin (peaks) in putative regulatory regions of metagene gene members TF.match.csv/ _ table of putative regulators of cells state (table S5 in the supplemental material of the manuscript). A correlation cut-off for transcription factor expression/metagene expression of 0.3 was applied. premissive.TF.match.csv _ more permissive table of putative regulators of cells state. A correlation cut-off for transcription factor expression/metagene expression of 0.1 was applied. MotifEnrichmentTable.xlsx _ more permissive table of putative regulators of cells state (excel version). A correlation cut-off for transcription factor expression/metagene expression of 0.1 was applied. topTF/ _ transcription factors (identified using Pfam) with expression strongly correlated to the expression of a metagene.