Data from: Centromere innovations within a mouse species
Data files
Nov 15, 2023 version files 69.65 KB
-
2020-04-29-INP-consensus-align-hist-line.m
2.36 KB
-
Code1_joining_fq2sam_linux_JDMmod20190521.m
10.41 KB
-
Code2_aligning_main_JDM_20160401.m
7.14 KB
-
Code2_FwdRevSum.m
6.42 KB
-
Code2_getFileNames.m
1.04 KB
-
Code2_hist.m
2.89 KB
-
Code2_loadFile.m
486 B
-
Code2_mkdir.m
478 B
-
Code2_parentSeq.m
14.87 KB
-
Code3_plotting_fixIncrement_1sizeClass_JDM20170206_allPlots.m
20.44 KB
-
findKmers.py
1.42 KB
-
README.md
1.69 KB
Abstract
Mammalian centromeres direct faithful genetic inheritance and are typically characterized by regions of highly repetitive and rapidly evolving DNA. We focused on a mouse species, Mus pahari, that we found has evolved to house centromere-specifying CENP-A nucleosomes at the nexus of a satellite repeat that we identified and term p-satellite (p-sat), a small number of recruitment sites for CENP-B, and short stretches of perfect telomere repeats. One M. pahari chromosome, however, houses a radically divergent centromere harboring ~6 Mbp of a homogenized p-sat-related repeat, p-satB, that contains >20,000 functional CENP-B boxes. There, CENP-B abundance drives accumulation of microtubule-binding components of the kinetochore, as well as a microtubule-destabilizing kinesin of the inner centromere. The balance of pro- and anti-microtubule-binding by the new centromere permits it to segregate during cell division with high fidelity alongside the older ones whose sequence creates a markedly different molecular composition.
README: Centromere innovations within a mouse species
List of scripts used :
findKmers.py
Code2_hist.m
Code2_loadFile.m
Code3_plotting_fixIncrement_1sizeClass_JDM20170206_allPlots.m
Code2_mkdir.m
Code2_parentSeq.m
Code2_aligning_main_JDM_20160401.m
Code2_FwdRevSum.m
Code2_getFileNames.m
Code1_joining_fq2sam_linux_JDMmod20190521.m
findKmers.py
Identifies instances of a Kmer sequence in a fasta file. Python version 3.8.10. Run from the command line as ./findKmers.py --kmers {CENP-B_box_sequences} --fasta {genome_assembly.fasta} --out {out.bed}/. Dependent on matplotlib.pyplot. Dependent on Bio.SeqIO and matplotlib.pyplot (v3.5.3).
Code1_joining_fq2sam_linux_JDMmod20190521.m
Joins fastq paired-end fastq reads and converts to a sam file. Plots a histogram of frequency of joined reads. Run from MATLAB 2017a (9.2.0.556344)
Code2_aligning_main_JDM_20160401.m
Aligns joined reads from Code1 to a sequence. Sequence choices are described in Code2_parentSeq.m. Run from MATLAB 2017a (9.2.0.556344). Dependent on Code2_hist.m, Code2_loadFile.m, Code2_mkdir.m, Code2_parentSeq.m, Code2_FwdRevSum.m, Code2_getFileNames.m.
Code3_plotting_fixIncrement_1sizeClass_JDM20170206_allPlots.m
Using the output from Code2, generates several plots of sequencing reads aligned to a specific sequence including nucleosome occupancy plots, midpoints plots and stack plots. Run from MATLAB 2017a (9.2.0.556344). See "Hasson et al. NSMB. 2013." for examples.
2020-04-29-INP-consensus-align-hist-line.m
Using the output from Code2_aligning_main_JDM_20160401.m, generates the a histogram of % identity of reads to the target sequence. Run from MATLAB 2017a (9.2.0.556344).