Skip to main content
Dryad

Data from: Stepwise evolution of a butterfly supergene via duplication and inversion

Cite this dataset

Martin, Simon Henry (2022). Data from: Stepwise evolution of a butterfly supergene via duplication and inversion [Dataset]. Dryad. https://doi.org/10.5061/dryad.xwdbrv1g0

Abstract

Supergenes maintain adaptive clusters of alleles in the face of genetic mixing. Although usually attributed to inversions, supergenes can be complex, and reconstructing the precise processes that led to recombination suppression and their timing is challenging. We investigated the origin of the BC supergene, which controls variation in warning colouration in the African Monarch butterfly, Danaus chrysippus. By generating chromosome-scale assemblies for all three alleles, we identified multiple structural differences. Most strikingly, we find that a region of >1 million bp underwent several segmental duplications at least 7.5 million years ago. The resulting duplicated fragments appear to have triggered four inversions in surrounding parts of the chromosome, resulting in stepwise growth of the region of suppressed recombination. Phylogenies for the inversions are incongruent with the species tree, and suggest that structural polymorphisms have persisted for at least 4.1 million years. In addition to the role of duplications in triggering inversions, our results suggest a previously undescribed mechanism of recombination suppression through independent losses of divergent duplicated tracts. Overall, our findings add support for a stepwise model of supergene evolution involving a variety of structural changes.

Methods

  • PacBio Sequel and Sequel II HiFi sequencing
  • Illumina sequencing
  • De novo genome assembly
  • Alignment to reference genome and genotype calling
  • Whole genome alignments and synteny analysis
  • Read depth and copy number variation
  • Phylogenetic analysis

Usage notes

Genome_assemblies.tgz (7 files):

  • MB18102_MAT.fasta.gz - Genome assembly of maternal haplotype of individual MB18102, made with Canu.
  • MB18102_PAT.fasta.gz - Genome assembly of paternal haplotype of individual MB18102, made with Canu.
  • SB211_MAT.fasta.gz - Genome assembly of maternal haplotypes from brood SB211, made with Canu.
  • SB211_PAT.fasta.gz - Genome assembly of paternal haplotypes from brood SB211, made with hifiasm.
  • SB211_MAT.hifiasm.fasta.gz - Alternative genome assembly of paternal haplotypes from brood SB211, made with hifiasm.
  • SB211_PAT.canu.fasta.gz - Alternative genome assembly of paternal haplotypes from brood SB211, made with Canu.
  • Dchry2.haplotigs.fasta.gz - Purged haplotigs from Danaus chrysippus assembly Dchry2.2 (Singh et al. 2022).

 

Genome_annotations.tgz (4 files)

  • MB18102_MAT.tidy.gff3 - Genome annotation for MB18102MAT maternal assembly
  • MB18102_PAT.tidy.gff3 - Genome annotation for MB18102PAT paternal assembly
  • SB211_MAT.tidy.gff3 - Genome annotation for SB211MAT maternal assembly
  • SB211_PAT.tidy.gff3 - Genome annotation for SB211PAT paternal assembly

 

Repeat_data.tgz (7 files)

  • Lepidoptera_and_danaus_chrysippus2.2.repeatmasker - Repeat library for Dchry2.2 assembly
  • Dchry2.2.chr15.TE_50kb - Repeat content in 50 kb windows for Dchry2.2 assembly chr15
  • MB18102MAT.chr15.TE_50kb - Repeat content in 50 kb windows for MB18102MAT maternal assembly chr15
  • MB18102PAT.chr15.TE_50kb - Repeat content in 50 kb windows for MB18102PAT paternal assembly chr15
  • SB211PAT.chr15.TE_50kb - Repeat content in 50 kb windows for SB211PAT paternal assembly chr15
  • SB211MAT.chr15.TE_50kb - Repeat content in 50 kb windows for SB211MAT maternal assembly chr15
  • DplexMex.chr15.TE_50kb - Repeat content in 50 kb windows for D. plexippus DplexMex assembly chr15

 

minimap2_alignments.tgz (13 files)

  • DplexMex_Dchry2.2_minimap2.asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=Dchry2.2
  • DplexMex_Dchry2hap.mm2asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=Dchry2.2HAP haplotigs
  • DplexMex_MB18102MAT.mm2asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=MB18102MAT
  • DplexMex_MB18102PAT.mm2asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=MB18102PAT
  • DplexMex_SB211MAT.mm2asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=SB211MAT
  • DplexMex_SB211PAT.mm2asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=SB211PAT
  • Dchry2.2_MB18102MAT_mm2asm10.paf.gz - minimap2 alignment: reference=Dchry2.2, query=MB18102MAT
  • Dchry2.2_MB18102PAT_mm2asm10.paf.gz - minimap2 alignment: reference=Dchry2.2, query=MB18102PAT
  • Dchry2HAP_MB18102MAT_mm2asm10.paf.gz - minimap2 alignment: reference=Dchry2.2HAP haplotigs, query=MB18102MAT
  • MB18102MAT_SB211PAT.mm2asm10.paf.gz - minimap2 alignment: reference=MB18102MAT, query=SB211PAT
  • SB211PAT_Dchry2.2_mm2asm10.paf.gz - minimap2 alignment: reference=SB211PAT, query=Dchry2.2
  • SB211MAT_hifiasm_vs_canu.mm2asm10.paf.gz - minimap2 alignment: reference=SB211MAT alternative hifiasm assembly, query=SB211MAT canu assembly
  • SB211PAT_hifiasm_vs_canu.mm2asm10.paf.gz - minimap2 alignment: reference=SB211PAT hifiasm assembly, query=SB211PAT alternative canu assembly

 

Regions_coordinates.tgz (7 files)

  • BC_regions_coordinates_DplexMex_Dplex4.xlsx - Regions 1-4 coordinates in D. plexippus assemblies DplexMex and Dplex4
  • DplexMex_region_coordinates.tsv - Regions 1-4 coordinates in DplexMex assembly
  • MB18102MAT_region_coordinates.tsv - Regions 1-4 coordinates in MB18102MAT assembly
  • MB18102PAT_region_coordinates.tsv - Regions 1-4 coordinates in MB18102PAT assembly
  • Dchry2.2_region_coordinates.tsv - Regions 1-4 coordinates in Dchry2.2 assembly
  • SB211MAT_region_coordinates.tsv - Regions 1-4 coordinates in SB211MAT assembly
  • SB211PAT_region_coordinates.tsv - Regions 1-4 coordinates in SB211PAT assembly

 

VCF_and_geno_files.tgz (14 files)

  • dan17.BT.DP5GQ20.CDS.vcf.gz - VCF for CDS sites only for 16 Danaus samples and outgroup Tirumala formosa aligned to the Danaus plexippus Dplex4 assembly.
  • dan17.BT.DP5GQ20.4Dsites.geno.gz - Genotypes file for 4-fold degenerate sites only for 16 Danaus samples and outgroup Tirumala formosa aligned to the Danaus plexippus Dplex4 assembly.
  • chry10.BT.Dchry2.2.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the Dchry2.2 assembly, chr15 only.
  • chry10.BT.Dchry2HAP.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the Dchry2.2 alternative haplotig, chr15 only.
  • chry10.BT.MB18102MAT.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the MB18102MAT maternal assembly, chr15 only.
  • chry10.BT.MB18102PAT.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the MB18102PAT paternal assembly, chr15 only.
  • chry10.BT.SB211MAT.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the SB211MAT maternal assembly, chr15 only.
  • chry10.BT.SB211PAT.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the SB211PAT paternal assembly, chr15 only.
  • chry10.BT.Dchry2.2.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the Dchry2.2 assembly, chr15 only.
  • chry10.BT.Dchry2HAP.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the Dchry2.2 alternative haplotig, chr15 only.
  • chry10.BT.MB18102MAT.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the MB18102MAT maternal assembly, chr15 only.
  • chry10.BT.MB18102PAT.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the MB18102PAT paternal assembly, chr15 only.
  • chry10.BT.SB211MAT.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the SB211MAT maternal assembly, chr15 only.
  • chry10.BT.SB211PAT.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the SB211PAT paternal assembly, chr15 only.

 

all_gene_alignments.tgz (5954 files) - sequence alignments for each of 5954 genes (Dplex4 assembly) for 16 Danaus samples and an outgroup Tirumala formosa sample.

  • BC_Region_alignments.tgz (4 files)
  • Region1.1_concat.fasta - Concatenated alignment for genes in Region 1.1
  • Region1.2_concat.fasta - Concatenated alignment for genes in Region 1.2
  • Region2_concat.fasta - Concatenated alignment for genes in Region 2
  • Region4_concat.fasta - Concatenated alignment for genes in Region 4

 

Diversity_and_divergence_data.tgz (7 files)

  • chry10.BT.Dchry2.2.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gz - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the Dchry2.2 assembly chr15
  • chry10.BT.Dchry2HAP.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gz - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the Dchry2HAP haplotig assembly chr15
  • chry10.BT.MB18102MAT.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gz - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the MB18102MAT assembly chr15
  • chry10.BT.MB18102PAT.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gzv - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the MB18102PAT assembly chr15
  • chry10.BT.SB211MAT.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gz - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the SB211MAT assembly chr15
  • chry10.BT.SB211PAT.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gz - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the SB211PAT assembly chr15
  • dan17.BT.DP5GQ20.4Dsites.divStats.geneWindows.csv.gz - divergence measures for each gene for 16 Danaus samples and one outgroup Tirumala formosa aligned to the Dplex4 assembly

 

Read_depth_data.tgz (7 files)

  • chry10.BT.Dchry2.2.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the Dchry2.2 assembly
  • chry10.BT.Dchry2HAP.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the Dchry2.2HAP haplotig assembly
  • chry10.BT.MB18102MAT.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the MB18102MAT assembly
  • chry10.BT.MB18102PAT.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the MB18102PAT assembly
  • chry10.BT.SB211MAT.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the SB211MAT assembly
  • chry10.BT.SB211PAT.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the SB211PAT assembly
  • dan17.Dplex4.chr7.BT.CDS.allshared.dpstats.geneWindows.csv - Read depth statistics for each gene in Dplex4 chr7 (=chr15 in D. chrysippus) for 16 Danaus samples and an outgroup Tirumala formosa sample

 

IBSrelate_results_NewTrios140202.txt - Relatedness measures for SM18W01, SM18S10 and MB18102

Funding

Royal Society, Award: URF\R1\180682

Royal Society, Award: RGF\EA\181071

Swiss National Research Foundation, Award: P2BEP3_195567

National Geographic Society, Award: WW-138R-17