Evaluation of four methods to identify the homozygotic sex chromosome in small populations
Data files
Apr 27, 2022 version files 172.11 GB
-
A6511_truncated_sorted.bam
140.77 MB
-
A6514_truncated_sorted.bam
96.86 MB
-
A6515_truncated_sorted.bam
55.06 MB
-
A6517_truncated_sorted.bam
27.69 MB
-
A6532_truncated_sorted.bam
68.55 MB
-
A6533_truncated_sorted.bam
181.27 MB
-
A6534_truncated_sorted.bam
64.44 MB
-
A6537_truncated_sorted.bam
124.73 MB
-
A6538_truncated_sorted.bam
68.13 MB
-
A6540_truncated_sorted.bam
53.47 MB
-
A6542_truncated_sorted.bam
104.42 MB
-
A6543_truncated_sorted.bam
31.87 MB
-
A6544_truncated_sorted.bam
83.09 MB
-
A6545_truncated_sorted.bam
69.76 MB
-
A6546_truncated_sorted.bam
54.41 MB
-
A6547_truncated_sorted.bam
46.60 MB
-
A6548_truncated_sorted.bam
93.62 MB
-
A6549_truncated_sorted.bam
145.45 MB
-
A6550_truncated_sorted.bam
127.38 MB
-
A6551_truncated_sorted.bam
84.35 MB
-
A6552_truncated_sorted.bam
70.65 MB
-
A6553_truncated_sorted.bam
156.57 MB
-
A7000_truncated_sorted.bam
202.55 MB
-
A7001_truncated_sorted.bam
196.63 MB
-
A7002_truncated_sorted.bam
381.58 MB
-
A7003_truncated_sorted.bam
160.16 MB
-
A7004_truncated_sorted.bam
167.87 MB
-
A7005_truncated_sorted.bam
241.62 MB
-
A7006_truncated_sorted.bam
118.90 MB
-
A7007_truncated_sorted.bam
182.75 MB
-
A7009_truncated_sorted.bam
62.30 MB
-
A7010_truncated_sorted.bam
134.23 MB
-
A7012_truncated_sorted.bam
205.60 MB
-
A7013_truncated_sorted.bam
119.61 MB
-
A7014_truncated_sorted.bam
125.11 MB
-
A7015_truncated_sorted.bam
127.59 MB
-
A7016_truncated_sorted.bam
107.01 MB
-
A7018_truncated_sorted.bam
125.73 MB
-
A7019_truncated_sorted.bam
156.93 MB
-
A7020_truncated_sorted.bam
94.29 MB
-
A7023_truncated_sorted.bam
78.60 MB
-
A7024_truncated_sorted.bam
66.15 MB
-
A7026_truncated_sorted.bam
81.08 MB
-
A7028_truncated_sorted.bam
114.50 MB
-
A7029_truncated_sorted.bam
49.56 MB
-
A7030_truncated_sorted.bam
95.38 MB
-
A7031_truncated_sorted.bam
88.27 MB
-
A7032_truncated_sorted.bam
195.69 MB
-
A7033_truncated_sorted.bam
114.17 MB
-
A7034_truncated_sorted.bam
160.06 MB
-
A7035_truncated_sorted.bam
183.26 MB
-
A7036_truncated_sorted.bam
157.05 MB
-
A7037_truncated_sorted.bam
217.46 MB
-
A7038_truncated_sorted.bam
139.82 MB
-
A7040_truncated_sorted.bam
121.56 MB
-
A7041_truncated_sorted.bam
102.38 MB
-
A7042_truncated_sorted.bam
192.32 MB
-
A7043_truncated_sorted.bam
130.59 MB
-
A7045_truncated_sorted.bam
169.96 MB
-
A7047_truncated_sorted.bam
261.94 MB
-
A7049_truncated_sorted.bam
152.40 MB
-
A7052_truncated_sorted.bam
124.60 MB
-
A7054_truncated_sorted.bam
83.38 MB
-
A7059_truncated_sorted.bam
100.82 MB
-
A7061_truncated_sorted.bam
126.56 MB
-
A7064_truncated_sorted.bam
128.81 MB
-
A7070_truncated_sorted.bam
108.24 MB
-
A7076_truncated_sorted.bam
64.53 MB
-
A7077_truncated_sorted.bam
20.38 MB
-
A7078_truncated_sorted.bam
38.52 MB
-
A7081_truncated_sorted.bam
17.81 MB
-
A7082_truncated_sorted.bam
57.11 MB
-
A7083_truncated_sorted.bam
73.90 MB
-
A7084_truncated_sorted.bam
54.22 MB
-
A7085_truncated_sorted.bam
200.53 MB
-
A7086_truncated_sorted.bam
80.93 MB
-
A7088_truncated_sorted.bam
171.03 MB
-
A7090_truncated_sorted.bam
141 MB
-
A7091_truncated_sorted.bam
58.16 MB
-
A7092_truncated_sorted.bam
63.72 MB
-
A7095_truncated_sorted.bam
26.49 MB
-
A7097_truncated_sorted.bam
92.37 MB
-
A7098_truncated_sorted.bam
51.65 MB
-
A7099_truncated_sorted.bam
27.87 MB
-
A7101_truncated_sorted.bam
86.07 MB
-
A7102_truncated_sorted.bam
38.23 MB
-
A7104_truncated_sorted.bam
52.67 MB
-
A7105_truncated_sorted.bam
144.14 MB
-
A7109_truncated_sorted.bam
125.53 MB
-
A7112_truncated_sorted.bam
232.56 MB
-
A7115_truncated_sorted.bam
23.09 MB
-
A7118_truncated_sorted.bam
277.36 MB
-
A7123_truncated_sorted.bam
40.34 MB
-
A7124_truncated_sorted.bam
58.47 MB
-
A7125_truncated_sorted.bam
55.66 MB
-
A7127_truncated_sorted.bam
222.38 MB
-
A7128_truncated_sorted.bam
127.13 MB
-
A7130_truncated_sorted.bam
65.09 MB
-
A7131_truncated_sorted.bam
40.93 MB
-
A7133_truncated_sorted.bam
105.38 MB
-
A7135_truncated_sorted.bam
197.49 MB
-
A7136_truncated_sorted.bam
182.51 MB
-
A7139_truncated_sorted.bam
127.47 MB
-
A7140_truncated_sorted.bam
50.93 MB
-
A7141_truncated_sorted.bam
43.78 MB
-
A7143_truncated_sorted.bam
87.18 MB
-
A7147_truncated_sorted.bam
349.89 MB
-
A7149_truncated_sorted.bam
180.91 MB
-
A7150_truncated_sorted.bam
22.95 MB
-
A7153_truncated_sorted.bam
156.70 MB
-
A7167_truncated_sorted.bam
319.81 MB
-
A7170_truncated_sorted.bam
100.24 MB
-
A7171_truncated_sorted.bam
110.70 MB
-
A7179_truncated_sorted.bam
378.84 MB
-
A7181_truncated_sorted.bam
239.80 MB
-
A7183_truncated_sorted.bam
273.44 MB
-
A7184_truncated_sorted.bam
269.35 MB
-
A7185_truncated_sorted.bam
241.87 MB
-
A7188_truncated_sorted.bam
444.94 MB
-
A7189_truncated_sorted.bam
193.96 MB
-
A7192_truncated_sorted.bam
327.29 MB
-
A7193_truncated_sorted.bam
335.34 MB
-
A7194_truncated_sorted.bam
415.50 MB
-
A7195_truncated_sorted.bam
298.82 MB
-
A7203_truncated_sorted.bam
537.67 MB
-
A7206_truncated_sorted.bam
225.04 MB
-
A7207_truncated_sorted.bam
198.16 MB
-
A7252_truncated_sorted.bam
92.35 MB
-
A7254_truncated_sorted.bam
112.57 MB
-
A7255_truncated_sorted.bam
174.66 MB
-
A7261_truncated_sorted.bam
163.97 MB
-
A7262_truncated_sorted.bam
39.37 MB
-
A7268_truncated_sorted.bam
162.31 MB
-
hiseqx18_160628_HiSeqX18_0137_BHGWVLCCXX.s_1.001.R1.fastq.gz
34.48 GB
-
hiseqx18_160628_HiSeqX18_0137_BHGWVLCCXX.s_1.001.R2.fastq.gz
40.08 GB
-
hiseqx18_160628_HiSeqX18_0137_BHGWVLCCXX.s_3.001.R1.fastq.gz
36.23 GB
-
hiseqx18_160628_HiSeqX18_0137_BHGWVLCCXX.s_3.001.R2.fastq.gz
42.91 GB
-
README_SexChromosome_Dataset.txt
13.17 KB
Abstract
Whole genomes are commonly assembled into a collection of scaffolds and often lack annotations of autosomes, sex chromosomes and, and organelle genomes (i.e., mitochondrial and chloroplast). As these chromosome types can have highly disparate evolutionary histories, it is imperative to take this information into account when analyzing genomic variation. Here we assessed the accuracy of four methods for identifying the homogametic sex chromosome using two whole genome sequenced (WGS) and 133 RAD sequenced white-tailed eagles (Haliaeetus albicilla): i) difference in read depth per scaffold, ii) heterozygosity per scaffold in a male and female bird, iii) mapping to a reference genome of a related species (chicken) with identified sex chromosomes, and iv) an analysis of SNP-loadings from a principal components analysis (PCA), based on low-depth RADseq data from 133 individuals. In i and ii, the WGS were mapped to a reference genome consisting of 1142 assembled scaffolds from the golden eagle (Aquila chrysaetos) with no identified chromosomes. The read depth per scaffold identified 86.41% of the homogametic sex chromosome (Z) with few false positives. The SNP-loading scores found 78.6% of the Z-chromosome but had a false positive discovery rate of more than 10%. The heterozygosity per scaffold did not provide clear results due to a lack of diversity in both the Z and autosomal chromosomes, and potential interference from the heterogametic sex chromosome (W).
Methods
Blood samples were collected from white-tailed eagle chicks as a part of an ongoing monitoring program in Iceland since 2001 by the Natural History Institute of Iceland. The sex of the chicks was determined in the field based on morphology. Three to ten mL of blood was extracted from each chick. The blood was stored in EDTA buffer at -20 degrees until DNA extraction.
DNA from blood samples from 133 chicks were extracted using the ThermoFisher GeneJET Whole Blood Genomics DNA Purification Mini Kit following the standard protocol (Thermo Fisher, 2016). DNA concentration was estimated using the NanoDrop 1000 and run on 0.7% agarose gels to evaluate the fragment size. Samples with concentration higher than 60 ng/µl were selected for library preparation and sequencing.
The 133 samples were prepared for double digest restriction-site associated DNA sequencing (ddRADseq) using modified protocols from Elshire et al. (2011) and Peterson et al. (2012). Total genomic DNA (100-500 ng) was sequentially digested using the restriction endonucleases Sau3AI (1U) and ApeKI (2U), respectively, each for four hours at manufacturer (NEB) recommended temperatures in NEB Buffer 4. Digested DNA (100 ng) was ligated to adapters (sequences in Elshire et al. (2011)) containing unique combinatorial barcodes (16 unique 5 bp barcodes for ApeKI adapters and five unique 6 bp barcodes for Sau3AI adapters) for each individual (barcode and adapter sequences in Supplementary Information S1) using T4 DNA ligase (NEB) in supplied buffer at 21°C for four hours. Ligation reactions contained a 6:1 molar excess of adapter to fragmented DNA, calculated using the mean fragment size determined from an agarose gel. Ligated DNA was pooled and purified using magnetic beads (Macherey-Nagel NGS clean-up and size selection) following the manufacturers protocol. Size selection of ligated DNA fragments was performed on a Pippin Prep (Sage Science) with 2% ethidium-free agarose gels and external size standard. The narrow range setting included a mean fragment size of 350 bp ± 18 bp. The eluate was split among eight PCR reactions and amplified using the primers and PCR conditions as in Elshire et al. (2011). Each PCR reaction had a total volume of 25 μL containing; 1x OneTaq Master Mix with Standard Buffer (NEB), 0.5 mM each primer, and 8 μL template DNA. PCR products were pooled and purified using magnetic beads before quantification using a SYBR Gold fluorometric assay (protocol in Supplementary Information S2). The library was prepared for sequencing following manufacturer’s instructions with a final concentration of 38 nM. The library was sequenced on an Illumina HiSeq2500 using the Illumina TruSeq kit (2x125bp) following the manufacturer’s instructions. The sequencing was done on one lane and obtained 303 million unambiguous PE reads.
Two individuals of white-tailed eagle, a male and a female, were selected for high-depth whole genome shotgun sequencing with two lanes each on an Illumina HiSeqX. Library preparation and sequencing was done at deCODE genetics, using the TruSeq Nano sample preparation method.
Two reference assemblies from male golden eagles (ZZ), one in 1142 scaffolds and one assembled to chromosome level (GenBank Assembly Accession numbers: GCA_000766835.1 and GCA_900496995.2, respectively) and female chicken (ZW) (GenBank Assembly Accession: GCA_0000023315.3) were downloaded from NCBI and used in the analysis (Doyle et al., 2014; Hillier et al., 2004).
The white-tailed eagle RADseq data was demultiplexed, sorting sequence reads into individual files, both for forward and reverse sequences using the command process_radtags in Stacks (Catchen et al., 2013; Catchen et al., 2011). Standard setting was used for the RADseq data, applying the option “r”, to rescue barcodes and RAD-Tags.
After demultiplexing, FastQC (Babraham Bioinformatics, 2010) was run for quality control. For the RADseq data, we found an excess of specific sequences (kmers) which were removed using AdapterRemoval v2 (version 2.2.2) (Schubert, Lindgreen & Orlando, 2016). The high depth shotgun sequenced individuals were tested in the same way but found no excess of kmers.
The Burrows-Wheeler Aligner (BWA) and SAMtools (Li & Durbin, 2009; Li et al., 2009) were used to process RADseq and high depth shotgun data and map reads to the golden eagle scaffold assembly of 1142 scaffolds with no identified chromosomes (GCA_000766835.1) (Doyle et al., 2014).