Synteny enabled upgrade of the Galapagos giant tortoise genome improves inferences of runs of homozygosity
Data files
May 16, 2025 version files 1.24 GB
-
37GalTort_FilteredSNPs_cheloabing1.0.vcf.gz
229.47 MB
-
37GalTort_FilteredSNPs_cheloabing2.0.vcf.gz
267.15 MB
-
CheloAbing2.0.fasta.zip
730.12 MB
-
liftoff_annotation_1.0_to_2.0.gff.zip
14.92 MB
-
README.md
1.51 KB
Abstract
The utility and importance of whole genome sequences is recognized across various fields, including evolution and conservation. However, for some taxa, like extinct species, using methods to generate contiguous genomes that rely on high-quality DNA is impossible. In such cases, an alternative may be to employ synteny-based methods using a genome from a closely related taxa to generate more complete genomes. Here we update the reference genome for the Pinta Island Galapagos giant tortoise (Chelonoidis abingdonii) without conducting additional sequencing through rescaffolding against the most closely related chromosome-level genome assembly, the Aldabra giant tortoise (Aldabrachelys gigantea). This effort resulted in a much more contiguous genome, CheloAbing_2.0, with an N50 that is two orders of magnitude longer, and large reductions in L50 and the number of gaps. We then examined the impact of the CheloAbing_2.0 genome on estimates of runs of homozygosity (ROH) using genome resequencing data from 37 individual Galapagos giant tortoises from the 13 extant lineages to test the mechanisms by which a fragmented assembly may over- or underestimate the number and extent of ROH. The use of CheloAbing_2.0 resulted in individual estimates of inbreeding, including ROH proportion (FROH), number (NROH), and cumulative length (SROH), that were statistically different to those derived from the earlier genome assembly. This improved genome will serve as a resource for future efforts focusing on the ecology, evolution, and conservation of this species group. More broadly, our results highlight that synteny-based scaffolding is promising for generating contiguous genomes without needing additional data types. --
Dataset DOI: 10.5061/dryad.sxksn03f6
Description of the data and file structure
The files herein consist of the final genome assembly, annotation file and filtered VCF files used for ROH analyses.
Files and variables
File: 37GalTort_FilteredSNPs_cheloabing1.0.vcf.gz
Description: VCF file of 37 Galapapagos giant tortoise individuals used for ROH analyses based on genome version CheloAbing_1.0. This file can be opened and manipulated using the software vcftools, or R via the package vcfR.
File: 37GalTort_FilteredSNPs_cheloabing2.0.vcf.gz
Description: VCF file of 37 Galapapagos giant tortoise individuals used for ROH analyses based on genome version CheloAbing_2.0. This file can be opened and manipulated using the software vcftools, or R via the package vcfR.
File: liftoff_annotation_1.0_to_2.0.gff.zip
Description: the gff file of the annotation of ChelAbing_2.0. Once unzipped, this file can be opened in a standard text editor.
File: CheloAbing2.0.fasta.zip
Description: fasta file containing the ChelAbing_2.0 genome assembly. Once unzipped, this file can be opened in a standard text editor.
Access information
Other publicly accessible locations of the data:
- This study made use of existing data available on the NCBI SRA under biosample SAMN07840320 and bioproject PRJNA761229.
