Skip to main content
Dryad

A genome-wide investigation of adaptations related to tool use behaviour in New Caledonian and Hawaiian crows

Cite this dataset

Dussex, Nicolas et al. (2021). A genome-wide investigation of adaptations related to tool use behaviour in New Caledonian and Hawaiian crows [Dataset]. Dryad. https://doi.org/10.5061/dryad.w0vt4b8m9

Abstract

GFF3 file with protein-coding gne predictions for the C. moneduloides de novo genome assembly (available at the National Center for Biotechnology Information (NCBI); assembly accession number: VRTO00000000), generated using the MAKER2 pipeline.

Methods

The C. moneduloides de novo genome assembly was annotated using the Maker2 annotation pipeline (v2.31.7; (Holt and Yandell 2011) in an iterative setup. After masking putative repeats within a genome, this pipeline generates gene models including 5´ and 3´ UTRs by integrating ab initio gene predictions with aligned transcript and protein evidence. For the first iteration, we used an avian protein dataset (uniprot.aves.reviewed.fasta: downloaded from UniProtKB), a dataset of 1,300 manually curated C. (corone) cornix proteins based on the C. (corone) cornix reference assembly (assembly accession GCF_000738735.1; (Poelstra et al. 2014, 2015), available at the National Centre for Biotechnology Information (NCBI), and a transcriptome dataset for C. woodfordi which is closely-related to C. moneduloides (https://www.ebi.ac.uk/ena; ENA project accession number PRJEB33755) using default parameters except for est2genome=1, protein2genome=1. We used ‘aves’ as model organism in RepeatMasker.

For the second iteration, two ab initio gene predictors were included in the annotation process: Augustus (v2.5.5; (Stanke, Tzvetkova, and Morgenstern 2006) and SNAP (v2010-7-28; (Stanke, Tzvetkova, and Morgenstern 2006; Korf 2004), using est2genome=0, protein2genome=0. Augustus was trained with a set of 1,300 manually curated gene models  for C. (corone) cornix (Poelstra et al. 2015). SNAP was trained using the MAKER2 gene models produced during the first round of annotation to create an *.hmm file. All gene predictors were run with default parameter values. tRNAs were identified using tRNAscan-SE (v1.23; (Lowe and Eddy 1997) within the Maker2 pipeline during the second iteration. In order to reduce running time, the genome was split into 137 parts that were independently analysed on a cluster. The resulting .gff files were then merged and filtered to include exon, CDS and mRNA information from the entire genome assembly.

In order to check the quality of the annotations visually, annotations for C. moneduloides were lifted over to the annotated C. cornix reference assembly (NCBI annotation; (Poelstra et al. 2014). We used the SatsumaSynteny program implemented in Satsuma-3.0 (Grabherr et al. 2010) to produce a table of corresponding coordinates for both species from a synteny-based pairwise whole genome alignment. This liftover file was uploaded to the Apollo genome browser (Lewis et al. 2002) for visual inspection of C. moneduloides gene predictions to the C. cornix annotation.

We assessed the quality of the assembly and annotation of the two reference genomes using the BUSCO (Simão et al. 2015) pipeline and the “vertebrata” dataset (631 genes). The C. moneduloides and C. cornix genomes contained 2,514 and 2,478 complete, single-copy orthologs (97.2% and 96.2%) and 33 and 66 partial orthologs (1.3% and 2.6%) of the BUSCO proteins, respectively. Finally, we identified orthologous coding sequences of the two annotations, C. cornix and C. moneduloides, using Orthagogue (Ekseth, Kuiper, and Mironov 2014).