Divergent C. elegans toxin alleles are suppressed by distinct mechanisms
Data files
Dec 13, 2024 version files 460.04 MB
-
README.md
9.99 KB
-
supp_data_noexternal.zip
460.03 MB
Abstract
Toxin-antidote elements (TAs) are selfish DNA sequences that bias their transmission to the next generation. TAs typically consist of two linked genes: a toxin and an antidote. The toxin kills progeny that do not inherit the TA, while the antidote counteracts the toxin in progeny that inherit the TA. We previously discovered two TAs in C. elegans that follow the canonical TA model of two linked genes: peel-1/zeel-1 and sup-35/pha-1 . Here, we report a new TA that exists in three distinct states across the C. elegans population. The canonical TA, which is found in isolates from the Hawaiian islands, consists of two genes that encode a maternally deposited toxin (MLL-1) and a zygotically expressed antidote (SMLL-1). The toxin induces larval lethality in embryos that do not inherit the antidote gene. A second version of the TA has lost the toxin gene but retains a partially functional antidote. Most C. elegans isolates, including the standard laboratory strain N2, carry a highly divergent allele of the toxin that has retained its activity, but have lost the antidote through pseudogenization. We show that the N2 toxin allele has acquired mutations that enable piRNA binding to initiate MUT-16-dependent 22G small RNA amplification that targets the transcript for degradation. The N2 haplotype represents the first naturally occurring unlinked toxin-antidote system where the toxin is post-transcriptionally suppressed by endogenous small RNA pathways.
README: supplementary data sets
https://doi.org/10.5061/dryad.3ffbg79tq
Description of the data and file structure
Supplementary materials for the manuscript: Divergent *C. elegans *toxin alleles are suppressed by distinct mechanisms
Supplementary tables that are used to make the conclusions in the manuscript : Divergent C. elegans toxin alleles are suppressed by distinct mechanisms
- Table S1: Strains - A table of strains that were used in or constructed for this paper. has the following columns:
- Strain - strain name
- Allele name - Kruglyak lab allele name
- Genotype - relevant genotype information associated with the strain
- Description - a description of what the strain is an what was used for
- Table S2: Plasmids - A table of plasmids that were used in or constructed for this paper. has the following columns:
- Strain - plasmid name
- Genotype - genetic parts that comprise the plasmid
- Description - a description of what the plasmid is an what was used for
- Source - description of how the plasmid was acquired
- Table S3: Plasmid Construction - A table containing information on key reagents that were used to for this paper. has the following columns:
- Plasmid - plasmid name
- Backbone - the plasmid that was used to construct this plasmid
- Insert name - a description of what the insert sequence
- Insert sequence - Nucleotide sequence that was inserted into the specified backbone
- Insert amplified with - the oligos that were used to amplify the insert for cloning
- Other - additional notes about the construction of the plasmid
- Table S4: Crosses - Lethality numbers associated with crosses used to define the toxin-antidote element described in this manuscript. has the following columns:
- Cross - The cross that was performed
- Description - A brief description of the cross
- Embryos - number of F2 embryos that were transferred to a fresh 6cm agar plate to assess lethality associated with the cross
- Alive - The number of F2 worms that were healthy
- Dead L1s - The number of F2 worms that arrested as L1s
- Arrested embryos - The number of F2 embryos that were arrested in the cross
- Sick worms - The number of F2 worms that displayed a non-rod arrest phenotype
- Corresponding figure - The figure that the data for associated cross was displayed in
- Table S5: Oligos - A table of DNA oligos that were used to verify the strains constructed for this manuscript. Has the following columns:
- Purpose - a brief description of what the corresponding oligo pair was used for
- Fwd name - the oZ name of the forward primer
- Fwd primer - the nucleotide sequence of the forward primer
- Rev name - the oZ name of the reverse primer
- Rev primer - the nucleotide sequenceof the reverse primer
- Strains generated - the strains that these oligos were used to construct
- Table S6: gRNAs and repair - A table of gRNAs and repair templates for Cas9 experiments described in this manuscript. Has the following columns:
- Description - a brief description of what the guide RNA was used for
- gRNA name - the name of the guide RNA
- gRNA sequence - the nucleotide sequence of the guide RNA
- Repair name - the name of the repair template that was used
- Repair template - the nucleotide sequence of the repair template
- Table S7: Inducible construct phenotypes - A table of phenotypes associated with the inducible toxin constructs. The first row of this table is descriptive and the data starts on the second row. The table has the following columns:
- the strain and inducible construct that was used
- G(+) - The number of GFP+ worms
- number healthy - The number of healthy GFP+ worms
- number phenoA - the number of worms that displayed phenotype A
- number phenoB - the number of worms that displayed phenotype B
- number phenoC - the number of worms that displayed phenotype C
- number phenoD - the number of worms that displayed phenotype D
- number phenoE - the number of worms that displayed phenotype E
- number phenoF - the number of worms that displayed phenotype F
- Key - a key of the phenotypesA-F
- Table S8: COPAS biosort data - individual animal measurements from the Biosorter. This data set has many standard columns output by the COPAS Biosrter. Each row corresponds to an object that was detected by the biosorter and the columns contain information regarding the object. The only columns that were used to draw conclusions in this manuscript include:
- strain - the strain name
- bleach - the replicate number
- TOF - Time of flight or the length of the object
- EXT - Extinction or the side scatter of the object
- Green - Green fluorescence - the intensity of Green fluorescence of the object - note that the strains tested did not have any fluorescence associated with them
- Yellow - Yellow fluorescence - the intensity of Yellow fluorescence of the object - note that the strains tested did not have any fluorescence associated with them
- Red - Red fluorescence - the intensity of Red fluorescence of the object - note that the strains tested did not have any fluorescence associated with them
Additional files contained in the folder:
- Figure1.R - Script to generate Figure 1, Figure S1
- Figure2.R - Script to generate Figure 2, Figure S1B
- Figure3_dnds.R - Script to generate Figure 3E
- Figure3_cross.R - Script to generate Figure 3C
- Figure4.R - Script to generate Figure 4 and Figure S6
- this script also populates the "sorter_plots" folder which contains plots similar to those from Fig. 4 in our manuscript for different bubble SVM thresholds
- region_tree.R - Script to build the similarity tree in Figure 3A [V_20454811-20473950.tree is the output of this script]
- world_map.R - Script to generate Figure 3B
Additional datasets:
- qx_2_xz_coverage.tsv - sequencing depth of QX1211 short reads aligned to the XZ1516 genome. Used in Figure2.R to generate Figure 2A.
- 20220216_Caendr_c_elegans_strain_data.csv - C. elegans strain isolation information. Used to generate the map in Figure 3B
- V20454811-20473950.vcf.gz - VCF of the mll-1/smll-1 locus used to construct the tree in regiontree.R
- XZ1516_TA_strains.txt - C. elegans strains that contain the XZ1516 mll-1/smll-1 TA element
- lethality_numbers.csv - lethality numbers from crosses used throughout the paper
- XZ1516_dRNA.cram[.crai] - dRNA sequencing reads aligned to the XZ1516 genome. Used to generate Figure S2
- XZ1516_QX1211_F4_frequencies.tsv - allele frequencies associated with Figure 1A
- qx1211_alnto_xz_variants.tsv - variants identified after aligning QX1211 to the XZ1516 genome
- XZ1516_DL238_F14_frequencies.table - gatk aser output table of allele counts for the XZ1516-DL238 cross used for Figure 1 and Figure S1
- XZ1516_QX1211_F14_frequencies.table - gatk aser output table of allele counts for the XZ1516-QX1211 cross used for Figure 1 and Figure S1
- XZ1516-DL238_parental_sites.tsv - variant sites betwee XZ1516 and DL238 used for Figure 1 and Figure S1
- XZ1516-QX1211_parental_sites.tsv - variant sites betwee XZ1516 and QX1211 used for Figure 1 and Figure S1
- xz_qx_genetic_map.rds - genetic map of XZ and QX1211
Additional Folders:
- reed_2020 - contains a figure made from the supplementary data sets from the following publication:
- K. J. Reed, J. M. Svendsen, K. C. Brown, B. E. Montgomery, T. N. Marks, T. Vijayasarathy, D. M. Parker, E. O. Nishimura, D. L. Updike, T. A. Montgomery, Widespread roles for piRNAs and WAGO-class siRNAs in shaping the germline transcriptome of Caenorhabditis elegans. Nucleic Acids Res. 48, 1811–1827 (2020).
- reed2020_mut16_transcripts_srna.pdf - figure that is part of supplemental figure 5
- to reproduce the visualization one would have to download supplemental table 17 from the above publication and convert it to a csv and put it in this folder, which will enable supp3.R in the comprehensive_arg_paper folder to work
- comprehensive_arg_paper - contains a script to visualize supplemental data from the below publication supplementary data sets from the following publication:
- U. Seroussi, A. Lugowski, L. Wadi, R. X. Lao, A. R. Willis, W. Zhao, A. E. Sundby, A. G. Charlesworth, A. W. Reinke, J. M. Claycomb, A comprehensive survey of C. elegans argonaute proteins reveals organism-wide gene regulatory networks and functions. Elife 12 (2023).
- The only relevant files in this folder are:
- supp3.R - this is a script that loads in "supp3.csv" from this folder and "table17.csv" from the reed_2020 folder and generates the component plots for Fig. S5 of our publication
- to reproduce the visualization one would have to download supplemental table 3 from the above publication and convert it to a csv and put it in this folder, which will enable supp3.R in the comprehensive_arg_paper folder to work
- dnds - this is a folder that contains the relevant files to calculate dnds between the N2 and XZ1516 proteomes. the relevant files are:
- caenorhabditis_XZ1516.cds-transcripts.fa - these are the coding sequences in fasta format for XZ1516
- WS276.cds.fa - these are the coding sequences in fasta format for N2
- parse_orthogroups.R - this is a script that calculates dnds across all orthogroups defined by the orthologr R package. the output of this script is:
- XZ_N2_ds.Rda - this is the output of the dNdS function in the ortholgr R package and the output columns are described in the package details. this file is used by the "Figure3_dnds.R" script to plot the dnds between the N2 and XZ1516 coding sequences as described above for Fig. 3E
- plots - this is a folder that contains the output plots generated by all the aforementioned scripts
- Finalized version