Whole genome sequence and annotation of Penstemon davidsonii
Data files
Oct 02, 2023 version files 1.12 GB
-
P_davidsonii_2022_Trinity_transcriptome.fasta
-
P_davidsonii_2022_v1.1.fasta
-
P_davidsonii_2022_v1.1.fasta.fai
-
P_davidsonii_2022_v1.1.gff
-
README.md
Oct 09, 2023 version files 1.12 GB
-
P_davidsonii_2022_Trinity_transcriptome.fasta
-
P_davidsonii_2022_v1.1.fasta
-
P_davidsonii_2022_v1.1.fasta.fai
-
P_davidsonii_2022_v1.1.gff
-
README.md
Jan 02, 2024 version files 1.12 GB
-
P_davidsonii_2022_Trinity_transcriptome.fasta
-
P_davidsonii_2022_v1.1.fasta
-
P_davidsonii_2022_v1.1.fasta.fai
-
P_davidsonii_2022_v1.1.gff
-
README.md
Abstract
Penstemon is the most speciose flowering plant genus endemic to North America. Penstemon species' diverse morphology and adaptation to various environments have made them a valuable model system for studying evolution, but the absence of publicly available reference genomes limits possible research directions. Here we report the first reference genome assembly and annotation for Penstemon davidsonii. Using PacBio long-read sequencing and Hi-C scaffolding technology, we constructed a de novo reference genome of 437,568,744 bases, with a contig N50 of 40 Mb and L50 of 5. The annotation includes 18,199 gene models, and both the genome and transcriptome assembly contain over 95% complete eudicot BUSCOs. This genome assembly will serve as a valuable reference for studying the evolutionary history and genetic diversity of the Penstemon genus.
README: Whole genome sequence and annotation of Penstemon davidsonii
https://doi.org/10.5061/dryad.4f4qrfjjr
These are files associated with the reference genome and annotation for Penstemon davidsonii.
Description of the data and file structure
There are 4 files included here:
P_davidsonii_2022_v1.1.gff = annotation file
P_davidsonii_2022_Trinity_transcriptome.fasta = transcriptome file
P_davidsonii_2022_v1.1.fasta = genome reference file
P_davidsonii_2022_v1.1.fasta.fai = genome reference index file
For information about how the files were created, please read the manuscript. https://doi.org/10.1093/g3journal/jkad296
Sharing/Access information
The raw sequencing data used to produce the above (PacBio WGS, Hi-C, & RNA-seq) have been deposited into the NCBI Sequence Read Archive under the accession number PRJNA1010203.
Code/Software
Note that the PATHs will need to be changed to run these scripts.
P_davidsonii_2022_canu_clr.sh = Assembles genome used HiCanu v2.1
P_davidsonii_2022_ccs.sh = Generates HiFi reads from the Pacbio subreads.bam
P_davidsonii_2022_busco_slurm.sh = Assesses completeness of assembly using BUSCO
P_davidsonii_2022_annotation_scripts.sh = Aligns RNA-seq to the genome, runs RepeatModeler and RepeatMasker, assembles transcriptome, and annotates the genome using Maker