Reference genome of an irruptive migrant, the pine siskin (Spinus pinus)
Data files
Oct 18, 2025 version files 2.40 GB
-
curated_sequences_NR.fa
6.38 MB
-
curated_sequences_R.fa
9.92 MB
-
pisiContigs_hap1.fasta
1.27 GB
-
pisiContigs_hap2.fasta
1.11 GB
-
pisiRepeats-families.fa
1.37 MB
-
ragtag.scaffold.Hap1.agp
64.26 KB
-
ragtag.scaffold.Hap2.agp
22.22 KB
-
README.md
1.96 KB
Abstract
This dataset contains unscaffolded and supporting files associated with the chromosome-level genome assembly of the Pine Siskin (Spinus pinus). We provide two unscaffolded haplotype assemblies (pisiContigs_hap1.fasta and pisiContigs_hap2.fasta) that correspond to the scaffolded assemblies deposited in NCBI under BioProject accession numbers [PRJNA1281197] (primary haplotype) and [PRJNA1281196] (alternate haplotype). To document the scaffolding process, we include AGP files generated by RagTag (ragtag.scaffold.Hap1.agp and ragtag.scaffold.Hap2.agp), which specify the ordering and orientation of contigs into pseudochromosomes.
To support repeat annotation and comparative analyses, we also provide both raw and curated repeat libraries. The raw library (pisiRepeats-families.fa) contains de novo repeat family predictions. Curated libraries were refined using MCHelper and are provided in redundant (curated_sequences_R.fa) and nonredundant (curated_sequences_NR.fa) formats.
Dataset DOI: 10.5061/dryad.xsj3tx9tc
Description of the data and file structure
These data were generated as part of an effort to produce a high-quality, chromosome-level genome assembly of the pine siskin (Spinus pinus). The assembly was designed to support research on avian genome evolution, comparative genomics, and studies of immune and ecological adaptation.
Files and variables
File: pisiContigs_hap1.fasta
Description: Unscaffolded contig-level assembly of the pine siskin (Spinus pinus) primary haplotype. Corresponds to the scaffolded genome deposited under BioProject accession PRJNA1281197.
File: pisiContigs_hap2.fasta
Description: Unscaffolded contig-level assembly of the pine siskin (Spinus pinus) alternate haplotype. Corresponds to the scaffolded genome deposited under BioProject accession PRJNA1281196.
File: ragtag.scaffold.Hap1.agp
Description: Scaffold path file produced by RagTag v2.1.0 showing the order and orientation of contigs for the primary haplotype relative to the Pyrrhula murina (bPyrMur1.1) reference genome.
File: ragtag.scaffold.Hap2.agp
Description: Scaffold path file produced by RagTag v2.1.0 showing the order and orientation of contigs for the alternate haplotype relative to the Pyrrhula murina (bPyrMur1.1) reference genome.
File: pisiRepeats-families.fa
Description: Raw de novo repeat family predictions generated by RepeatModeler2 v2.0.6 from the pine siskin genome assemblies.
File: curated_sequences_R.fa
Description: Curated repeat library generated with MCHelper v1.1.7, containing redundant sequences from raw repeat predictions.
File: curated_sequences_NR.fa
Description: Curated repeat library generated with MCHelper v1.1.7, containing nonredundant sequences from raw repeat predictions.
To generate scaffolded assemblies, we used RagTag v2.1.0 to order and orient contigs against the Azores Bullfinch (Pyrrhula murina) reference genome (bPyrMur1.1, NCBI). This process produced AGP files documenting contig placement and orientation (ragtag.scaffold.Hap1.agp, ragtag.scaffold.Hap2.agp).
For repeat annotation, we ran RepeatModeler2 v2.0.6, which identified and classified de novo repeat families (pisiRepeats-families.fa). Curated repeat libraries were then generated using MCHelper v1.1.7, producing redundant (curated_sequences_R.fa) and nonredundant (curated_sequences_NR.fa) sequence sets for downstream analyses.
Unscaffolded haplotype assemblies (pisiContigs_hap1.fasta, pisiContigs_hap2.fasta) correspond to the scaffolded genomes deposited in NCBI under BioProject accessions PRJNA1281197 (primary) and PRJNA1281196 (alternate).
