Aralia elata genome and annotation
Data files
Mar 19, 2022 version files 1.17 GB
Abstract
We report a chromosome-level genome of Aralia elata with a total length of 1.08 Gb and a contig N50 of 1.2 Mb using SMRT, HiC, and next-generation sequencing. The Aralia elata genome is the first chromosome-level genome of genus Aralia. This dataset also includes the annotation of Aralia elata genome and the gene expression profiles.
Methods
Genomic DNA was extracted from a young leaf sample on a 5-year-old plant of Aralia elata, which was collected from Harbin, Heilongjiang Province, China, using the DNAsecure Plant Kit (TIANGEN) and broken into random fragments. Short-read libraries were constructed according to the manufacturer’s instructions (Illumina, San Diego, CA) and then sequenced on an Illumina HiSeq 4000 system at the Beijing Genomics Institute-Shenzhen (BGI-Shenzhen).For PacBio libraries, the whole genome was sequenced on the PacBio Sequel II System based on single-molecule real-time (SMRT) sequencing technology, and 99.88 Gb (~90.6×) of data were obtained.
We combined three strategies to predict genes in the A. elata genome: homology-based, de novo, and RNA-Seq data alignment. Total RNA was extracted from four tissues (root, stem, leaf, and seed, with three biological repeats) using an RNA extraction kit (Huayueyang, ZH120, China). The mRNA from all samples was purified using the TruSeq RNA Sample Prep Kit (Illumina), followed by sequencing on an Illumina HiSeq 4000 sequencer. We used MCSCAN python-version (v1.1.18) to identify the synteny between A. elata and other species.
Usage notes
"AE.chrom_level.genome.fa" is the chromosome-level genome file of Aralia elata.
"AE.maker.final.chrom_level.gene.gff" is the annotations of Aralia elata's genome,including gene, mRNA, cds, exon and UTR.
"AE.genes.fpkm.txt" is the gene expression profile of different tissues of Aralia elata which used FPKM value. "R_AE_Le2","R_AE_Le4","R_AE_Le6" indicate the leaf of Aralia elata with three repeats. "R_AE_St2","R_AE_St4","R_AE_St6" indicate the stem of Aralia elata with three repeats. "R_AE_Ro","R-AE-RoA-2","R-AE-RoA" indicate the root of Aralia elata with three replicates. "R_AE_Se","R-AE-SeA","R-AE-SeA-2" indicate the seed of Aralia elata with three replicates.
"ae.ae.lifted.anchors" is the paralogous groups in Aralia elata by using python version MACSCAN(JCVI).
"AE.maker.final.chrom_level.gene.cds.fa" is the cds sequences of Aralia elata.
"AE.maker.final.chrom_level.gene.pep.fa" is the protein sequences of Aralia elata.