CENH3 information from: Einkorn genomics sheds light on history of the oldest domesticated wheat
Data files
Jun 12, 2023 version files 70.45 GB
Abstract
Einkorn (Triticum monococcum) is the first domesticated wheat species, being central to the birth of agriculture and the Neolithic Revolution in the Fertile Crescent ~10,000 years ago. Here, we generate and analyze 5.2-gigabase genome assemblies for wild and domesticated einkorn, including completely assembled centromeres. Einkorn centromeres are highly dynamic, showing evidence of ancient and recent centromere shifts caused by structural rearrangements. Whole-genome sequencing of a diversity panel uncovered the population structure and evolutionary history of einkorn, revealing complex patterns of hybridizations and introgressions following the dispersal of domesticated einkorn from the Fertile Crescent. We also discovered that around 1% of the modern bread wheat (Triticum aestivum) A subgenome originates from einkorn. These resources and findings highlight the history of einkorn evolution and provide a basis to accelerate the genomics-assisted improvement of einkorn and bread wheat.
Methods
Chromatin immunoprecipitation (ChIP) and sequencing (ChIP-seq): Chromatin immunoprecipitation (ChIP) was performed according to the method given by Nagaki et al. standardized with wheat CENH3 antibody. Nuclei were isolated from 2-week-old seedlings and digested with micrococcal nuclease (Sigma, MO) to liberate nucleosomes. The digested mixture was incubated overnight with 3 mg of wheat CENH3 antibody at 4°C. The chromatin-antibody complexes were captured using Dynabeads Protein G (Invitrogen, CA). Elution of the chromatin was done using 100 ml of preheated elution buffer (1% sodium dodecyl sulfate and 0.1 M NaHCO3) for 30 min at 65°C. DNA from the ChIP was isolated using ChIP DNA Clean and Concentrator Kit (Zymo Research, CA). ChIP-seq libraries were then constructed using the TruSeq ChIP Library Preparation Kit (Illumina, CA) according to the manufacturer’s instructions, and libraries were sequenced using NovoSeq S4 with 150-bp paired-end sequencing run.
CENH3 ChIP-seq data analysis: Raw ChIP sequencing reads were quality filtered and adapter sequences were removed with trimmomatic using LEADING:3 TRAILIN G:3 SLIDINGWINDOW:4:20 MINLEN:50. Trimmed reads were then mapped to the respective genomes using bowtie2 with the following parameters: --score-min L,-0.6,-0.25. The sam output file was converted to bam format, the reads were sorted by position, and duplicates were removed using SAMtools (v1.8). Multi-mappers were allowed to include reads that come from highly repetitive regions. The resulting bam file was filtered using samtools view with the -h flag and the command grep -e $'^@' -e $'\t151M\t=' -e $'\t150M\t=' to only keep reads that have no mismatch over the full read length, which can be either 150bp or 151bp long. The filtered bam files were indexed using samtools index with the -c flag and read depth per base was calculated using samtools depth with the -a flag to also report positions with no mapped reads. The resulting depth per base was then averaged in non-overlapping 100kb windows.
The resulting CENH3 Chip-seq read coverage plots were the used to define boundaries of functional centromeres. Boundaries were determined by eye, as the regions when CENH3 coverage sharply increased and decreased, respectively. Boundaries were defined with a resolution of 100 kb. To assess the contiguous assembly of the centromeres, we obtained the breakpoints (start and end) of each contig in the pseudomolecule assemblies using MUMmer (v4.0.0.2). We then mapped the contig breakpoints with the CENH3 read density to see whether the centromeres are contained on a single contig.
Usage notes
The link contain the BED files and the BAM (mapped) files of CENH3 reads against the respective genome assembly.