Skip to main content

Evaluation of four methods to identify the homozygotic sex chromosome in small populations

Cite this dataset

Hansen, Charles Christian Riis; Westfall, Kristen; Pálsson, Snæbjörn (2022). Evaluation of four methods to identify the homozygotic sex chromosome in small populations [Dataset]. Dryad.


Whole genomes are commonly assembled into a collection of scaffolds and often lack annotations of autosomes, sex chromosomes and, and organelle genomes (i.e., mitochondrial and chloroplast). As these chromosome types can have highly disparate evolutionary histories, it is imperative to take this information into account when analyzing genomic variation. Here we assessed the accuracy of four methods for identifying the homogametic sex chromosome using two whole genome sequenced (WGS) and 133 RAD sequenced white-tailed eagles (Haliaeetus albicilla): i) difference in read depth per scaffold, ii) heterozygosity per scaffold in a male and female bird, iii) mapping to a reference genome of a related species (chicken) with identified sex chromosomes, and iv) an analysis of SNP-loadings from a principal components analysis (PCA), based on low-depth RADseq data from 133 individuals. In i and ii, the WGS were mapped to a reference genome consisting of 1142 assembled scaffolds from the golden eagle (Aquila chrysaetos) with no identified chromosomes. The read depth per scaffold identified 86.41% of the homogametic sex chromosome (Z) with few false positives. The SNP-loading scores found 78.6% of the Z-chromosome but had a false positive discovery rate of more than 10%. The heterozygosity per scaffold did not provide clear results due to a lack of diversity in both the Z and autosomal chromosomes, and potential interference from the heterogametic sex chromosome (W).


Blood samples were collected from white-tailed eagle chicks as a part of an ongoing monitoring program in Iceland since 2001 by the Natural History Institute of Iceland. The sex of the chicks was determined in the field based on morphology. Three to ten mL of blood was extracted from each chick. The blood was stored in EDTA buffer at -20 degrees until DNA extraction.

DNA from blood samples from 133 chicks were extracted using the ThermoFisher GeneJET Whole Blood Genomics DNA Purification Mini Kit following the standard protocol (Thermo Fisher, 2016). DNA concentration was estimated using the NanoDrop 1000 and run on 0.7% agarose gels to evaluate the fragment size. Samples with concentration higher than 60 ng/µl were selected for library preparation and sequencing.

The 133 samples were prepared for double digest restriction-site associated DNA sequencing (ddRADseq) using modified protocols from Elshire et al. (2011) and Peterson et al. (2012). Total genomic DNA (100-500 ng) was sequentially digested using the restriction endonucleases Sau3AI (1U) and ApeKI (2U), respectively, each for four hours at manufacturer (NEB) recommended temperatures in NEB Buffer 4. Digested DNA (100 ng) was ligated to adapters (sequences in Elshire et al. (2011)) containing unique combinatorial barcodes (16 unique 5 bp barcodes for ApeKI adapters and five unique 6 bp barcodes for Sau3AI adapters) for each individual (barcode and adapter sequences in Supplementary Information S1) using T4 DNA ligase (NEB) in supplied buffer at 21°C for four hours. Ligation reactions contained a 6:1 molar excess of adapter to fragmented DNA, calculated using the mean fragment size determined from an agarose gel. Ligated DNA was pooled and purified using magnetic beads (Macherey-Nagel NGS clean-up and size selection) following the manufacturers protocol. Size selection of ligated DNA fragments was performed on a Pippin Prep (Sage Science) with 2% ethidium-free agarose gels and external size standard. The narrow range setting included a mean fragment size of 350 bp ± 18 bp. The eluate was split among eight PCR reactions and amplified using the primers and PCR conditions as in Elshire et al. (2011). Each PCR reaction had a total volume of 25 μL containing; 1x OneTaq Master Mix with Standard Buffer (NEB), 0.5 mM each primer, and 8 μL template DNA. PCR products were pooled and purified using magnetic beads before quantification using a SYBR Gold fluorometric assay (protocol in Supplementary Information S2). The library was prepared for sequencing following manufacturer’s instructions with a final concentration of 38 nM. The library was sequenced on an Illumina HiSeq2500 using the Illumina TruSeq kit (2x125bp) following the manufacturer’s instructions. The sequencing was done on one lane and obtained 303 million unambiguous PE reads.

Two individuals of white-tailed eagle, a male and a female, were selected for high-depth whole genome shotgun sequencing with two lanes each on an Illumina HiSeqX.  Library preparation and sequencing was done at deCODE genetics, using the TruSeq Nano sample preparation method.

Two reference assemblies from male golden eagles (ZZ), one in 1142 scaffolds and one assembled to chromosome level (GenBank Assembly Accession numbers: GCA_000766835.1 and GCA_900496995.2, respectively) and female chicken (ZW) (GenBank Assembly Accession: GCA_0000023315.3) were downloaded from NCBI and used in the analysis (Doyle et al., 2014; Hillier et al., 2004).

The white-tailed eagle RADseq data was demultiplexed, sorting sequence reads into individual files, both for forward and reverse sequences using the command process_radtags in Stacks (Catchen et al., 2013; Catchen et al., 2011). Standard setting was used for the RADseq data, applying the option “r”, to rescue barcodes and RAD-Tags.

After demultiplexing, FastQC (Babraham Bioinformatics, 2010) was run for quality control. For the RADseq data, we found an excess of specific sequences (kmers) which were removed using AdapterRemoval v2 (version 2.2.2) (Schubert, Lindgreen & Orlando, 2016). The high depth shotgun sequenced individuals were tested in the same way but found no excess of kmers. 

The Burrows-Wheeler Aligner (BWA) and SAMtools (Li & Durbin, 2009; Li et al., 2009) were used to process RADseq and high depth shotgun data and map reads to the golden eagle scaffold assembly of 1142 scaffolds with no identified chromosomes (GCA_000766835.1) (Doyle et al., 2014).


The Icelandic Centre for Research, Award: 185280-052