Skip to main content
Dryad

Supporting data for: Three-dimensional genome re-wiring in loci with Human Accelerated Regions

Cite this dataset

Keough, Kathleen (2023). Supporting data for: Three-dimensional genome re-wiring in loci with Human Accelerated Regions [Dataset]. Dryad. https://doi.org/10.7272/Q6057D5N

Abstract

Human Accelerated Regions (HARs) are conserved genomic loci that evolved at an accelerated rate in the human lineage and may underlie human-specific traits. We generated HARs and chimpanzee accelerated regions with an automated pipeline and an alignment of 241 mammalian genomes. Combining deep-learning with chromatin capture experiments in human and chimpanzee neural progenitor cells, we discovered a significant enrichment of HARs in topologically associating domains (TADs) containing human-specific genomic variants that change three-dimensional (3D) genome organization. Differential gene expression between humans and chimpanzees at these loci suggests rewiring of regulatory interactions between HARs and neurodevelopmental genes. Thus, comparative genomics together with models of 3D genome folding revealed enhancer hijacking as an explanation for the rapid evolution of HARs.

Methods

Lentivirus-based massively parallel reporter assay (lentiMPRA) library design and synthesis

Tiles of 270bp in length were generated from all 312 zooHARs. Multiple tiles were generated with a sliding window of 20bp if the zooHAR was longer than 270bp. In total, 549 oligos were designed to cover all zooHARs. We also included 143 oligos centered on active chromatin marks as positive controls. This oligo pool was synthesized by Twist Bioscience.

Primary cortical cell culture for lentiMPRA

De-identified tissue samples were collected with consent in strict observance of legal and institutional ethical regulations. Protocols were approved by the Human Gamete, Embryo, and Stem Cell Research Committee (institutional review board) at the University of California, San Francisco. Gestational week 18 cortical tissue was dissociated into a single-cell suspension using papain (LK003150, Worthington Biochemical) and plated on 15cm dishes coated with poly-O-lysine, laminin, and fibronectin. DMEM culture medium (Gibco) with B27 (Gibco) and PennStrep (Gibco) was changed every 24 hours.

Construction and sequencing of plasmid libraries for lentiMPRA

LentiMPRA was performed as previously described with all modifications noted here. A 31bp minimal promoter and a 15bp random barcode were added to each lentiMPRA oligo through two rounds of PCR. The amplicon was then cloned into empty reporter backbone pLS-SceI (plasmid #137725, Addgene). Recombination products were amplified in electrocompetent cells (C3020, NEB) and grown in LB Agar plates (100217-214, VWR) at 37℃ overnight. We harvested ~3.5M colonies by Midiprep (12945, Qiagen), which yielded around 70 barcodes per oligo. A 477bp region in the plasmid containing the oligo and barcode was amplified and sequenced in one lane of Illumina Nextseq Mid-Output to identify the barcodes associated with each oligo.

Lentivirus packaging and infection

Lentivirus production was performed following the manufacturer’s protocol (LT002, Genecopoeia). To achieve high titer, the crude solution was concentrated using Lenti-X concentrator (631232, Takara Bio). The concentrated virus was immediately stored at -80 in single-use aliquots. For each replicate, a cell counter was used to estimate cell density, and then 20 million primary cortical cells were plated and cultured in a 10cm dish for 2 days before infection. Each dish was infected with 500 ul of lentivirus to achieve a multiplicity of infection (MOI) of 85. Each barcode was estimated to be integrated into random loci over 200 times. Medium was refreshed the next day and cells were incubated for 2 days before harvesting DNA and RNA.

DNA/RNA harvest and sequencing

DNA and RNA were simultaneously extracted from infected samples using the Allprep kit (80204, Qiagen). To prepare sequencing libraries, 8μg RNA was reverse transcribed to generate cDNA using Superscript IV RT (Invitrogen; 18090200). The integrated barcodes in cDNA and 15μg gDNA were PCR amplified to add a unique molecular identifier (UMI), an index and Illumina P5/P7 sequence. DNA and RNA barcode libraries were pooled with a 1:3 molar ratio and sequenced with NextSeq High-Output.

LentiMPRA computational analyses

Sequencing libraries were batch corrected to account for differences between samples from different donors. Oligos were required to have at least 10 unique barcodes and exact match to the designed sequence. UMI-normalized reads per oligo were summed over all barcodes, and oligos with less than 40 total DNA reads were discarded. Out of 312 zooHARs, 276 passed these quality control steps. For each of these zooHARs, depth normalization was performed using counts per million reads sequenced (CPM), and a RNA CPM / DNA CPM ratio was calculated for each oligo in each replicate. A zooHAR was determined to be active if its maximally active tile had an average (over replicates) normalized RNA CPM / DNA CPM value exceeding the median value of this statistic for a set of positive control sequences with enhancer-associated epigenetic marks in neurodevelopment (median = 1.06). To compare machine learning predictions to lentiMPRA measurements, the 276 zooHARs passing lentiMPRA quality control were evaluated for whether they had activity above 1.06 (139 active zooHARs) and/or a machine learning scores > 0.3 (175 predicted zooHARs), resulting in 88 high-confidence zooHAR neurodevelopmental enhancers validated by both approaches.

Funding

National Institute of Mental Health, Award: DP2MH122400-01

Gladstone Institutes

Schmidt Futures Foundation

Shurl and Kay Curci Foundation

National Institute of Mental Health, Award: R01MH109907

National Institute of Mental Health, Award: U01MH116438