Data supporting: Chromosome-level genome of the transformable northern wattle, Acacia crassicarpa
Data files
Nov 27, 2023 version files 4.76 GB
-
Acra_USDA_v1_cds.fa
44.18 MB
-
Acra_USDA_v1_proteins.fa
14.98 MB
-
Acra_USDA_v1.100k.fa
774.06 MB
-
Acra_USDA_v1.1M.hrd.msk
765.75 MB
-
Acra_USDA_v1.1M.sft.msk
765.75 MB
-
Acra_USDA_v1.fsa
763.25 MB
-
Acra_USDA_v1.gtf
58.56 MB
-
Acra_USDA_v1.hap1.fa
792.17 MB
-
Acra_USDA_v1.hap2.fa
782.30 MB
-
Assembly_pipeline_ACRA3RX.txt
13.55 KB
-
README.md
1.28 KB
Abstract
The genus Acacia is a large group of woody legumes containing an enormous amount of morphological diversity in leaf shape. This diversity is at least in part the result of an innovation in leaf development where many Acacia species are capable of developing leaves of both bifacial and unifacial morphology. While not unique in the plant kingdom, unifaciality is most commonly associated with monocots, and its developmental genetic mechanisms have yet to be explored beyond this group. Here we identify an accession of Acacia crassicarpa with high regeneration rates and isolate a clone for genome sequencing. We generate a chromosome-level assembly of this readily transformable clone and using comparative analyses confirm a whole genome duplication unique to Caesalpinoid legumes. This resource will be important for future work examining genome evolution in legumes and the unique developmental genetic mechanisms underlying unifacial morphogenesis in Acacia.
Assemblies and annotations for Acacia crassicarpa, clone Acra3RX.
Description of the data and file structure
Extended genome assemblies and annotation files for Acra_USDA_v1 (Bioproject PRJNA975180).
Sharing/Access information
The assembly, associated sample information and raw sequencing data is linked to the Bioproject PRJNA975180.
This dataset contains the following:
Acra_USDA_v1.100k.fa Scaffolds larger than 100 kb from the SALSA scaffolding step.
Acra_USDA_v1.1M.hrd.msk Hardmasked scaffolds larger than 1 Mb.
Acra_USDA_v1.1M.sft.msk Softmasked scaffolds larger than 1 Mb.
Acra_USDA_v1_cds.fa Coding sequences for Acra_USDA_v1.fsa assembly.
Acra_USDA_v1.fsa Scaffolds larger than 1 Mb. No masking.
Acra_USDA_v1.gtf Annotations for protein coding genes from the Acra_USDA_v1.fsa assembly.
Acra_USDA_v1.hap1.fa Haplotype 1 before SALSA scaffolding (no size filtering).
Acra_USDA_v1.hap2.fa Haplotype 2 before SALSA scaffolding (no size filtering).
Acra_USDA_v1_proteins.fa Protein sequences for Acra_USDA_v1.fsa assembly.
Assembly_pipeline_ACRA3RX.txt Computational pipeline for assembling and annotating genome.
The genomes were assembled using hifiasm using Omni-C reads to generate haplotype-resolved assemblies. The larger of the two haplotype assemblies was then used for scaffolding using SALSA and the Omni-C reads.
Masked assemblies were generated using RepeatMasker (v.4.0.7) using an A. crassicarpa de novo repeat library made with RepeatModeler (v.2.0.1).
BRAKER3 was used to identify protein coding genes of the softmasked genome on scaffolds larger than 1 Mb.
The dataset contains the following:
Acra_USDA_v1.100k.fa Scaffolds larger than 100 kb from the SALSA scaffolding step.
Acra_USDA_v1.1M.hrd.msk Hardmasked scaffolds larger than 1 Mb.
Acra_USDA_v1.1M.sft.msk Softmasked scaffolds larger than 1 Mb.
Acra_USDA_v1_cds.fa Coding sequences for Acra_USDA_v1.fsa assembly.
Acra_USDA_v1.fsa Scaffolds larger than 1 Mb. No masking.
Acra_USDA_v1.gtf Annotations for protein coding genes from the Acra_USDA_v1.fsa assembly.
Acra_USDA_v1.hap1.fa Haplotype 1 before SALSA scaffolding (no size filtering).
Acra_USDA_v1.hap2.fa Haplotype 2 before SALSA scaffolding (no size filtering).
Acra_USDA_v1_proteins.fa Protein sequences for Acra_USDA_v1.fsa assembly.
Assembly_pipeline_ACRA3RX.txt Computational pipeline for assembling and annotating genome.