Striking variation in chromosome structure within Musa acuminata and its diploid cultivars
Data files
Apr 19, 2024 version files 16.26 MB
Abstract
The majority of cultivated bananas originated from inter- and intra(sub)specific crosses between two wild diploid species, Musa acuminata and Musa balbisiana. Hybridization and polyploidization events during the evolution of bananas led to the formation of clonally propagated cultivars characterized by a high level of genome heterozygosity and reduced fertility. The combination of low fertility of edible clones and differences in the chromosome structure among M. acuminata subspecies greatly hampers the breeding of improved banana cultivars. Using comparative oligo painting we investigated large chromosomal rearrangements in a set of wild M. acuminata subspecies and cultivars that originated by natural crosses. Additionally, we analyzed chromosome structure of F1 progeny that resulted from crosses between Mchare bananas and wild M. acuminata ‘Calcutta 4’ genotype. Analysis of chromosome structure within M. acuminata revealed the presence of a large number of chromosomal rearrangements showing a correlation with banana speciation. Chromosome painting of F1 hybrids was complemented by Illumina resequencing, which enabled to identify the contribution of parental subgenomes to the diploid hybrid clones. Balanced presence of both parental genomes was revealed in all F1 hybrids with the exception of one clone, which contained only Mchare specific SNPs, and thus most probably originated from an unreduced diploid gamete of Mchare.
README: SNP datasets (vcf files) used for in silico painting of Mchare x M. acuminata 'Calcutta 4' F1 hybrid clones
https://doi.org/10.5061/dryad.44j0zpcnq
Genomic DNA was isolated with the NucleoSpin PlantII kit (Macherey-Nagel, Düren, Germany) according to the manufacturer’s recommendations and further sheared by Bioruptor Plus (Diagenode, Liege, Belgium) to achieve an insert size of about 500 bp. Libraries for sequencing were prepared from 2 μg of fragmented DNA using TruSeq® DNA PCR-free kit (Illumina) and sequenced on a NovaSeq 6000 (Illumina), producing 2 × 150-bp paired-end reads to achieve a minimal sequence depth of 25 ×. Raw data were trimmed for low-quality bases and adapter sequences and to the same length using fastp v.0.20.1 (Chen et al., 2018).
Analysis of proportion of individual parental subgenomes in the F1 hybrid clones was done using vcfHunter pipeline (https://github.com/SouthGreenPlatform/vcfHunter according to Baurens et al. (2019). Briefly, trimmed reads were aligned to reference genome sequence of M. acuminata ssp. malaccensis ‘DH Pahang’ v4 (Belser et al., 2021) by BWA-MEM v0.7.15 (Li 2013), followed by removing redundant reads using MarkDuplicate from Picard Tools v2.7.0, and locally realigned around indels using the IndelRealigner tool of GATK v3.3 package (McKenna et al., 2010). Bases with a mapping quality ≥10 were counted using the process_reseq_1.0.py python script (https://github.com/SouthGreenPlatform/vcfHunter. Variant calling and SNP filtering steps were performed according to Baurens et al. (2019) using the VcfPreFilter.1.0 python script (alleles supported by at least three reads and with a frequency 0.25 were kept as variant) and vcfFilter.1.0.py python script (<6-fold coverage for the minor allele were converted to missing data) (https://github.com/SouthGreenPlatform/vcfHunter. Finally, proportion of parental genomes in the F1 hybrid clones along the individual chromosomes of the reference genome sequence was called using biallelic SNPs (SNPs specific to Mchare cultivars and M. acuminata spp. burmannicoides ‘Calcutta 4’) in CDS genome regions using vcf2allPropAndCov.py and vcf2allPropAndCovByChr.py python scripts (https://github.com/SouthGreenPlatform/vcfHunter according to Baurens et al. (2019).
Description of the data and file structure
Genome proportion of eight F1 hybrid clones was analyzed:
Accession name of F1 hybrid | Male parent | Female parent (Mchare clone) |
---|---|---|
‘NM275_4’ | Musa acuminata ‘Calcutta 4’ | ‘Mchare Laini’ |
‘NM258_3’ | Musa acuminata ‘Calcutta 4’ | ‘Mchare Laini’ |
‘NM209_3’ | Musa acuminata ‘Calcutta 4’ | ‘Mchare Laini’ |
‘NM237_8’ | Musa acuminata ‘Calcutta 4’ | ‘Ijihu Inkudu’ |
‘T2269_1’ | Musa acuminata ‘Calcutta 4’ | ‘Huti White’ |
‘T2274_6’ | Musa acuminata ‘Calcutta 4’ | ‘Huti White’ |
‘T2274_9’ | Musa acuminata ‘Calcutta 4’ | ‘Huti White’ |
‘T2619_15’ | Musa acuminata ‘Calcutta 4’ | ‘Mchare Mlelembo’ |
The vcf files contain biallelic SNPs specific to male (M. acuminata 'Calcutta4'; 2n = 2x = 22 ) and female (Mchare cultivars; 2n = 2x = 22) parents,whcih were used to analyze contribution of parental genomes in F1hybrids. Analysis was done using vcfHunter pipeline (https://github.com/SouthGreenPlatform/vcfHunter according to Baurens et al. (2019).
Sharing/Access information
This is a section for linking to other ways to access the data, and for linking to sources the data is derived from, if any.
Data was derived from the following sources:
- Raw Illumina sequences of the parents and F1 hybrid clones are stored in the NCBI sequence Read Archive (SRA): SRA experiments: SRX22339926 - SRX22339938.
Methods
Analysis of proportion of individual parental subgenomes in the F1 hybrid clones was done using vcfHunter pipeline (https://github.com/SouthGreenPlatform/vcfHunter) according to Baurens et al. (2019). Briefly, trimmed reads were aligned to reference genome sequence of M. acuminata ssp. malaccensis ‘DH Pahang’ v4 (Belser et al., 2021) by BWA-MEM v0.7.15 (Li 2013), followed by removing redundant reads using MarkDuplicate from Picard Tools v2.7.0, and locally realigned around indels using the IndelRealigner tool of GATK v3.3 package (McKenna et al., 2010). Bases with a mapping quality ≥10 were counted using the process_reseq_1.0.py python script (https://github.com/SouthGreenPlatform/vcfHunter). Variant calling and SNP filtering steps were performed according to Baurens et al. (2019) using the VcfPreFilter.1.0 python script (alleles supported by at least three reads and with a frequency 0.25 were kept as variant) and vcfFilter.1.0.py python script (<6-fold coverage for the minor allele were converted to missing data) (https://github.com/SouthGreenPlatform/vcfHunter). Finally, proportion of parental genomes in the F1 hybrid clones along the individual chromosomes of the reference genome sequence was called using biallelic SNPs (SNPs specific to Mchare cultivars and M. acuminata spp. burmannicoides ‘Calcutta 4’) in CDS genome regions using vcf2allPropAndCov.py and vcf2allPropAndCovByChr.py python scripts (https://github.com/SouthGreenPlatform/vcfHunter) according to Baurens et al. (2019).