Data from: The role of breakpoint mutations, supergene effects, and ancient nested rearrangements in the evolution of adaptive chromosome inversions in the yellow monkey flower, Mimulus guttatus
Data files
Feb 05, 2025 version files 2.06 GB
-
high-impact-L1_synSNV_nothets.zip
3.76 MB
-
high-impact-S1_synSNV_nothets.zip
3.79 MB
-
L1_filtered.AL.zip
57.09 MB
-
L1_pseudogene_intact_gene.zip
19.03 KB
-
L1_reads_minimap2_sniffles.zip
9.21 MB
-
L1_snps_filtered.zip
267.75 MB
-
L1_svim_variants.zip
17.76 MB
-
L1query.pseudo_S1ref.pseudo_mumandco.SVs_notranslocations_nocontractions_noNNNN.zip
47.26 MB
-
L1reads_S1ref_minimappbsv.sorted.readgr.zip
31.78 MB
-
Mimulus_guttatus.LMC-L1-2.v1.2.cds.fa
29.78 MB
-
Mimulus_guttatus.LMC-L1-2.v1.2.functional-annotations.tsv
3.84 MB
-
Mimulus_guttatus.LMC-L1-2.v1.2.gff
32.17 MB
-
Mimulus_guttatus.LMC-L1-2.v1.2.intact-TEs.gff
4.41 MB
-
Mimulus_guttatus.LMC-L1-2.v1.2.proteins.fa
10.22 MB
-
Mimulus_guttatus.LMC-L1-2.v1.2.pseudogenes.gff
3.14 MB
-
Mimulus_guttatus.LMC-L1-2.v1.2.repeatmasker.gff
65.58 MB
-
Mimulus_guttatus.LMC-L1-2.v1.2.suspect-TE-genes.txt
11.86 KB
-
Mimulus_guttatus.LMC-L1-2.v1.2.TEs.fa
9.60 MB
-
Mimulus_guttatus.LMC-L1-2.v1.2.TEs.gff
74.42 MB
-
Mimulus_guttatus.LMC-L1-2.v1.2.transcripts.fa
39.60 MB
-
Mimulus_guttatus.LMC-L1-2.v1.fa
282.31 MB
-
Mimulus_guttatus.SWB-S1-2.v1.2.cds.fa
29.76 MB
-
Mimulus_guttatus.SWB-S1-2.v1.2.functional-annotations.tsv
3.76 MB
-
Mimulus_guttatus.SWB-S1-2.v1.2.intact-TEs.gff
4.16 MB
-
Mimulus_guttatus.SWB-S1-2.v1.2.proteins.fa
10.21 MB
-
Mimulus_guttatus.SWB-S1-2.v1.2.pseudogenes.gff
3.21 MB
-
Mimulus_guttatus.SWB-S1-2.v1.2.repeatmasker.gff
65.92 MB
-
Mimulus_guttatus.SWB-S1-2.v1.2.suspect-TE-genes.txt
9.35 KB
-
Mimulus_guttatus.SWB-S1-2.v1.2.TEs.fa
9.56 MB
-
Mimulus_guttatus.SWB-S1-2.v1.2.TEs.gff
72.46 MB
-
Mimulus_guttatus.SWB-S1-2.v1.2.transcripts.fa
39.59 MB
-
Mimulus_guttatus.SWB-S1-2.v1.fa
283.03 MB
-
Mimulus_guttatus.SWB-S1-2.v1.gff
32.15 MB
-
README.md
10.74 KB
-
S1_filtered.AL.zip
57.90 MB
-
S1_pseudogene_intact_gene.zip
19.49 KB
-
S1_reads_minimap2_sniffles.zip
9.21 MB
-
S1_snps_filtered.zip
274.52 MB
-
S1_svim_variants.zip
18.59 MB
-
S1query.pseudo_L1ref.pseudo_mumandco.SVs__notranslocations_nocontractions_noNNNN.zip
47.54 MB
-
S1reads_L1ref_minimappbsv.sorted.readgr.zip
31.73 MB
-
Supplemental_files-20231211T163038Z-001.zip
35.98 MB
-
survivor_pbsv_sniffles_svim_300_L1.zip
21.60 MB
-
survivor_pbsv_sniffles_svim_300_S1.zip
17.11 MB
Abstract
Large chromosomal inversion polymorphisms are ubiquitous across the diversity of diploid organisms and play a large role in the evolution of adaptations in those species. Despite their importance, the underlying mechanisms by which inversions produce their adaptive phenotypic effects and become geographically widespread are still poorly understood. One way inversions could cause phenotypic effects is through meiotic recombination suppression, which can result in the formation of a supergene containing linked adaptive alleles. This supergene hypothesis has been promoted by theoreticians, but thus far, no studies have definitively identified multiple linked adaptive genes within an inversion. Alternatively, according to the breakpoint mutation hypothesis, the inversion mutation itself could result in adaptive phenotypic effects if it disrupts genes or alters the regulation of nearby genes. Here, we evaluate both of these hypotheses using new long-read sequencing-based genomes of the yellow monkey flower, Mimulus guttatus. Our results provide support for both the supergene and breakpoint mutation hypotheses of adaptive inversion evolution and suggest that functional molecular studies will be required to definitively test each of these hypotheses. We also identified an ancient large inversion nested within a well-established adaptive inversion. This finding suggests that the supergene mechanism may occur in phases, with an expansion of the region of suppressed recombination capturing an increasing number of adaptive loci over time.
README: Mimulus guttatus inland annual and coastal perennial genome assembly and associated files
https://doi.org/10.5061/dryad.2ngf1vhwj
These datasets are the result of long read Oxford Nanopore Technologies (ONT) genome assemblies. We assembled genomes for an individual from the inland annual (LMC) and an individual from the coastal perennial ecotype (SWB). We used these genomes to identify chromosomal inversion breakpoint regions, call single nucleotide polymorphisms (SNPs), identify structural variants (SVs), and run differential expression and population genetic analyses on previously published datasets.
Description of the data and file structure
Genome assemblies and annotations
- Mimulus_guttatus.SWB-S1-2.v1.fa- SWB-S1 genome assembly (pseudochromosomes)
- Mimulus_guttatus.LMC-L1-2.v1.fa- LMC-L1 genome assembly (pseudochromosomes)
- Mimulus_guttatus.LMC-L1-2.v1.2.gff - LMC-L1 genome annotation file
- Mimulus_guttatus.SWB-S1-2.v1.2.gff - SWB-S1 genome annotation file
- Mimulus_guttatus.LMC-L1-2.v1.2.TEs.gff - LMC-L1 TE annotation file
- Mimulus_guttatus.SWB-S1-2.v1.2.TEs.gff - SWB-S1 TE annotation file
- Mimulus_guttatus.LMC-L1-2.v1.2.pseudogenes.gff- LMC-L1 pseudogene annotation file
- Mimulus_guttatus.SWB-S1-2.v1.2.pseudogenes.gff- SWB-S1 pseudogene annotation file
- Mimulus_guttatus.LMC-L1-2.v1.2.cds.fa - fasta file with coding sequence.
- Mimulus_guttatus.SWB-S1-2.v1.2.cds.fa - fasta file with coding sequence.
- Mimulus_guttatus.LMC-L1-2.v1.2.transcripts.fa - fasta file with transcripts
- Mimulus_guttatus.SWB-S1-2.v1.2.transcripts.fa - fasta file with transcripts
- Mimulus_guttatus.LMC-L1-2.v1.2.repeatmasker.gff - repeatmasker annotation file
- Mimulus_guttatus.SWB-S1-2.v1.2.repeatmasker.gff - repeatmasker annotation file
- Mimulus_guttatus.LMC-L1-2.v1.2.proteins.fa - protein sequence file
- Mimulus_guttatus.SWB-S1-2.v1.2.proteins.fa - protein sequence file
- Mimulus_guttatus.LMC-L1-2.v1.2.suspect-TE-genes.txt - potential TE genes
- Mimulus_guttatus.SWB-S1-2.v1.2.suspect-TE-genes.txt - potential TE genes
- Mimulus_guttatus.LMC-L1-2.v1.2.intact-TEs.gff - intact TE annotation file
- Mimulus_guttatus.SWB-S1-2.v1.2.intact-TEs.gff - intact TE annotation file
- Mimulus_guttatus.LMC-L1-2.v1.2.functional-annotations.tsv - functional annotation file
- Mimulus_guttatus.SWB-S1-2.v1.2.functional-annotations.tsv - functional annotation file
- Mimulus_guttatus.SWB-S1-2.v1.2.TEs.fa - TE sequence file
- Mimulus_guttatus.LMC-L1-2.v1.2.TEs.fa - TE sequence file
________________________________________________________________
Variant files
SNP supplemental files descriptions
All files listed in this section were generated by aligning the WGS to the opposite ecotype's reference genome and using the GATK pipeline and filtering to call SNPs.
- L1_snps_filtered.vcf - Filtered and genotyped SNPs for the inland annual ecotype (LMC)
- S1_snps_filtered.vcf - FIltered and genotyped SNPs for the coastal perennial ecotype (SWB)
- L1_filtered.AL.avinput - Filtered and annotated SNPS for the inland annual ecotype (LMC)
- S1_filtered.AL.avinput - Filtered and annotated SNPS or the inland annual ecotype (LMC)
- high-impact-L1_synSNV_nothets.csv - A list of high impact mutations for the inland annual ecotype (LMC)
- high-impact-S1_synSNV_nothets.csv - A list of high impact mutations for the inland annual ecotype (SWB)
Whole genome alignment structural variant calling supplemental files
All files listed in this section were generated by aligning both the SWB-S1 and LMC-L1 assemblies and calling SVs using MUM&co.
- L1query.pseudo_S1ref.pseudo_mumandco.SVs_notranslocations_nocontractions_noNNNN.zip - L1 structural variants from mumandco where complicated calls (translocations, contractions, and nested SVs), long strings of “NNNN” in insertions and deletions were removed, low quality SVs, and heterozygous SVs were removed.
- S1query.pseudo_L1ref.pseudo_mumandco.SVs__notranslocations_nocontractions_noNNNN.zip - S1 structural variants from mumandco where complicated calls (translocations, contractions, and nested SVs), long strings of “NNNN” in insertions and deletions were removed, low quality SVs, and heterozygous SVs were removed.
- L1_querysnps_SV_intersected_genes.txt - L1 structural variants intersected with LMC-L1 genes
- S1_querysnps_SV_intersected_genes.txt - S1 structural variants intersected with SWB-S1 genes
- S1_pseudogene_intact_gene.csv - A list of SWB-S1 pseudogenes with their corresponding intact gene and the intact genes functional annotation.
- L1_pseudogene_intact_gene.csv - A list of LMC-L1 pseudogenes with their corresponding intact gene and the intact genes functional annotation.
Read base alignment structural variant calling supplemental files
All files listed in this section were generated by aligning ONT reads from LMC-L1 to the SWB-S1 assembly using minimap2 and vice versa.
- S1reads_L1ref_minimappbsv.sorted.readgr.vcf - SWB-S1 SVs called using the program pbsv.
- S1_reads_minimap2_sniffles.vcf - SWB-S1 SVs called using the program sniffles
- S1_svim_variants.vcf - SWB-S1 SVs called using the program for svim.
- L1reads_S1ref_minimappbsv.sorted.readgr.vcf - LMC-L1 SVs called using the program pbsv.
- L1_reads_minimap2_sniffles.vcf - LMC-L1 SVs called using the program pbsv.
- L1_svim_variants.vcf - LMC-L1 SVs called using the program pbsv
Differential expression supplemental files
All files in this section were generated from the RNAseq data in Gould et al 2018. These files are the output of deseq2 and represent differentially expressed genes.
** All files below this line are in Supplemental_files-20231211T163038Z-001.zip
- SWB_aligned_differential_expression_LMC_between_fieldsites.csv - transcripts were aligned to the SWB-S1 genome and differentially expressed between fieldsites within LMC-L1.
- SWB_aligned_differential_expression_SWB_between_fieldsites.csv - transcripts were aligned to the SWB-S1 genome and differentially expressed between fieldsites within SWB-S1.
- SWB_aligned_differential_expression_genotype_by_fieldsites.csv - transcripts were aligned to the SWB-S1 genome and differentially expressed G x E.
- SWB_aligned_differential_expression_between_genotypes.csv - transcripts were aligned to the SWB-S1 genome and differentially expressed between genotypes.
- LMC_aligned_differential_expression_LMC_between_fieldsites.csv - transcripts were aligned to the LMC-L1 genome and differentially expressed between fieldsites within LMC-L1.
- LMC_aligned_differential_expression_SWB_between_fieldsites.csv - transcripts were aligned to the LMC-L1 genome and differentially expressed between fieldsites within SWB-S1.
- LMC_aligned_differential_expression_genotype_by_fieldsites.csv - transcripts were aligned to the LMC-L1 genome and differentially expressed G x E.
- LMC_aligned_differential_expression_between_genotypes.csv - transcripts were aligned to the LMC-L1 genome and differentially expressed between genotypes.
Pool sequencing supplemental files
All files listed in this section were generated from the pool sequencing data in Gould et al 2017. These files are the top 1% of the *G *statistic and Fst, and π ratio and also the bottom 1% of π ratio.
- S1_top1_pi_intersect_descriptions_all.txt - the top 1% of pi ratio windows falling within genes. This file was aligned to the SWB genome.
- S1_bottom1_pi_intersect_descriptions_all.txt - the bottom 1% of pi ratio windows falling within genes. This file was aligned to the SWB genome.
- S1_top1_FST_descriptions.txt - the top 1% of FST windows falling within genes. This file was aligned to the SWB genome.
- S1_top1_gstat_descriptions.txt - the top 1% of *G *statistic windows falling within genes. This file was aligned to the SWB genome.
- L1_top1_pi_intersect_descriptions_all.txt - the top 1% of pi ratio windows falling within genes. This file was aligned to the LMC genome.
- L1_bottom1_pi_intersect_descriptions_all.txt - the bottom 1% of pi ratio windows falling within genes. This file was aligned to the LMC genome.
- L1_top1_FST_descriptions.txt - the top 1% of FST windows falling within genes. This file was aligned to theLMC genome.
- L1_top1_gstat_descriptions.txt - the top 1% of *G *statistic windows falling within genes. This file was aligned to the LMC genome.
GENESPACE output files for CNVs, PAVs, tandem duplicates, and synthetic and non syntenic genes
All files listed in this section were generated using orthology constrained synteny (GENESPACE) and further manipulated using custom scripts.
- SWB_SWB_dotplots.pdf - comparison of SWB and SWB synteny
- LMC_LMC_dotplots.pdf - comparison of LMC and LMC synteny
- LMC_SWB_dotplots.pdf - comparison of LMC and SWB synteny
- LMC_LMC_synHits.txt.gz - a list of LMC syntenic hits
- LMC_SWB_synHits.txt.gz - a list of LMC and SWB syntenic hits
- SWB_SWB_synHits.txt.gz - a list of SWB syntenic hits
- syntenicBlocks.txt.gz - a list of syntenic blocks for both LMC and SWB
- tandem_arrays_L1S1.txt - a list of tandem arrays between LMC and SWB
- S1_syntelogs.txt - a list of SWB syntelogs
- L1_syntelogs.txt - a list of LMC syntelogs
- S1_nonsyntenic.txt - a list of non syntenic genes in SWB
- L1_nonsyntenic.txt - a list of non syntenic genes in LMC
- L1_S1_different_copy_numbers.txt - a list of genes with different copy numbers between LMC and SWB
- L1_1x1_syntelogs.txt - filtered LMC syntelogs
- S1*_*1x1_syntelogs.txt - filtered SWB syntelogs
- absent_S1_present_L1.txt - a list of genes present in LMC but not SWB.
- absent_L1_present_S1.txt - a list of genes present in SWB but not LMC
Sharing/Access information
Links to other publicly accessible locations of the data:
- Raw ONT data - NCBI Sequence Read Archive under project identifier PRJNA1047892 – accessions SAMN38597945 (LMC-L1) and SAMN38597946 (SWB-S1)
- Raw RNA sequencing - NCBI Sequence Read Archive under project identifier PRJNA1054563.
- LMC and SWB Genome assemblies - Can be found on CoGe as well as this repository
Code/Software
Code for genome assembly can be found at https://github.com/niederhuth/mimulus-assembly and code for calling structural variants and SNPs, reanalysis of pooled sequencing and RNAseq, and figures can be found at https://github.com/Kollarlm/Breakpoint-mutations-supergene-effects-and-ancient-nested-rearrangements-in-yellow-monkey-flower.git.