Sequencing and assembly of the Mexican lime genome
Data files
Aug 28, 2025 version files 2.98 GB
-
Assembly_pipeline_MXLIME.txt
18.73 KB
-
Mxlime_hifiasm_hap1.fa
396.33 MB
-
Mxlime_hifiasm_hap2.fa
361.33 MB
-
Mxlime_USDA_v1_proteins.fa
25.28 MB
-
Mxlime_USDA_v1_unplaced_scaffolds.fa
19.94 MB
-
Mxlime_USDA_v1.fa
684.80 MB
-
Mxlime_USDA_v1.gtf
99.80 MB
-
Mxlime_USDA_v1.hrd.msk
694.21 MB
-
Mxlime_USDA_v1.sft.msk
694.21 MB
-
README.md
2.71 KB
Abstract
Many citrus species show high levels of heterozygosity due to their hybrid origin and clonal propagation. This heterozygosity can both hinder and aid efforts to study and improve these cultivars, making it increasingly clear that diploid assemblies have significant advantages over the previous generation of haploid assemblies. In this work, we assemble both subgenomes of Mexican lime (Citrus x aurantifolia), an interspecific hybrid between C. hystrix var. micrantha and C. medica. The resulting diploid assembly is nearly telomere-to-telomere, spanning 680 Mb. Using subgenome-specific repeats, we were able to phase the 18 chromosomes based on their parent of origin. The resulting hystrix and medica haplotypes show a number of large structural variations, consistent with their distant hybrid ancestry. Despite divergence between haplotypes, syntenic gene pairs were identified for over 90% of the annotated protein-coding genes. Within these genes, we find extensive divergence between haplotypes, with at least 89% harboring polymorphisms at an average rate of 13 per kilobase of coding sequence. Knowledge of this variation will be important for future efforts to improve this cultivar using genetic engineering technologies.
Sequencing and assembly of the Mexican lime genome
https://doi.org/10.5061/dryad.3xsj3txqv
The genomes were assembled using hifiasm using HiC reads to generate haplotype assemblies. Both haplotype assemblies were then used for scaffolding using SALSA and the HiC reads.
Masked assemblies were generated using RepeatMasker (v.4.0.7) using a de novo repeat library for Mexican lime made with RepeatModeler (v.2.0.1).
BRAKER3 was used to identify protein-coding genes of the softmasked genome on the 9 chromosomes for each haplotype.
The dataset contains the following:
Mxlime_hifiasm_hap1.fa HiFi assembly before scaffolding and subgenome assignment (haplotype 1).
Mxlime_hifiasm_hap2.fa HiFi assembly before scaffolding and subgenome assignment (haplotype 2).
Mxlime_USDA_v1.fa Diploid assembly, no masking (diploid assembly, 18 chromosomes).
Mxlime_USDA_v1.hrd.msk Hardmasked chromosomes (diploid assembly, 18 chromosomes).
Mxlime_USDA_v1.sft.msk Softmasked chromosomes (diploid assembly, 18 chromosomes).
Mxlime_USDA_v1_unplaced_scaffolds.fa Unplaced scaffolds following pseudomolecule construction.
Mxlime_USDA_v1_cds.fa Coding sequences for Mxlime_USDA_v1.fa assembly.
Mxlime_USDA_v1_proteins.fa Protein sequences for Mxlime_USDA_v1.fa assembly.
Mxlime_USDA_v1.gtf Annotations for protein-coding genes from the Mxlime_USDA_v1.fa assembly.
Assembly_pipeline_MXLIME.txt Computational pipeline for assembling and annotating the genome.
Assemblies and annotations for Mexican lime.
Description of the data and file structure
Extended genome assemblies and annotation files for Mxlime_USDA_v1 (Bioproject PRJNA1137419).
Sharing/Access information
The assembly, associated sample information, and raw sequencing data are linked to the Bioproject PRJNA1137419.
The dataset contains the following:
Mxlime_hifiasm_hap1.fa HiFi assembly before scaffolding and subgenome assignment (haplotype 1).
Mxlime_hifiasm_hap2.fa HiFi assembly before scaffolding and subgenome assignment (haplotype 2).
Mxlime_USDA_v1.fa Diploid assembly, no masking (diploid assembly, 18 chromosomes).
Mxlime_USDA_v1.hrd.msk Hardmasked chromosomes (diploid assembly, 18 chromosomes).
Mxlime_USDA_v1.sft.msk Softmasked chromosomes (diploid assembly, 18 chromosomes).
Mxlime_USDA_v1_unplaced_scaffolds.fa Unplaced scaffolds following pseudomolecule construction.
Mxlime_USDA_v1_cds.fa Coding sequences for Mxlime_USDA_v1.fa assembly.
Mxlime_USDA_v1_proteins.fa Protein sequences for Mxlime_USDA_v1.fa assembly.
Mxlime_USDA_v1.gtf Annotations for protein-coding genes from the Mxlime_USDA_v1.fa assembly.
Assembly_pipeline_MXLIME.txt Computational pipeline for assembling and annotating the genome.
