Pronounced differentiation on the Z chromosome and parts of the autosomes in crowned sparrows contrasts with mitochondrial paraphyly: implications for speciation
Cite this dataset
McCallum, Quinn et al. (2024). Pronounced differentiation on the Z chromosome and parts of the autosomes in crowned sparrows contrasts with mitochondrial paraphyly: implications for speciation [Dataset]. Dryad. https://doi.org/10.5061/dryad.bzkh189cw
Abstract
When a single species evolves into multiple descendent species, some parts of the genome can play a key role in the evolution of reproductive isolation while other parts flow between the evolving species via interbreeding. Genomic evolution during the speciation process is particularly interesting when major components of the genome—for instance, sex chromosomes vs. autosomes vs. mitochondrial DNA—show widely differing patterns of relationships between three diverging populations. The golden-crowned sparrow (Zonotrichia atricapilla) and the white-crowned sparrow (Zonotrichia leucophrys) are phenotypically differentiated sister species that are largely reproductively isolated despite possessing similar mitochondrial genomes, likely due to recent introgression. We assessed variation in more than 45,000 single nucleotide polymorphisms (SNPs) to determine the structure of nuclear genomic differentiation between these species and between two hybridizing subspecies of Z. leucophrys. The two Z. leucophrys subspecies showed moderate levels of relative differentiation and patterns consistent with a history of recurrent selection in both ancestral and daughter populations, with much of the sex chromosome Z and a large region on the autosome 1A showing increased differentiation compared to the rest of the genome. The two species Z. leucophrys and Z. atricapilla show high relative differentiation and strong heterogeneity in the level of differentiation among various chromosomal regions, with a large portion of the sex chromosome (Z) showing highly divergent haplotypes between these species. Studies of speciation often emphasize mitochondrial DNA differentiation, but speciation between Z. atricapilla and Z. leucophrys appears primarily associated with Z chromosome divergence and more moderately associated with autosomal differentiation, whereas mitochondria appear highly similar due apparently to recent introgression. These results add to the growing body of evidence for highly heterogeneous patterns of genomic differentiation during speciation, with some genomic regions showing lack of gene flow between populations many hundreds of thousands of years before other genomic regions.
README: Extreme sex chromosome differentiation, likely driven by inversion, contrasts with mitochondrial paraphyly between species of crowned sparrows
This dataset contains all the metadata, barcodes, and photos associated with the samples included in the study "Extreme sex chromosome differentiation, likely driven by inversion, contrasts with mitochondrial paraphyly between species of crowned sparrows" by Mcallum, Askelson, Fogarty, Natola, Nikelski, Huang and Irwin, as well as all scripts used in the bioinformatics pipline (Bash and perl) and data analyses (R).
Associated sequences can be found on the SRA as part of the BioProject "Genotyping-by-sequencing reads from the bird genera Sphyrapicus, Zonotrichia, and Leiothlypis"
Description of the Data and file structure
barcodes_plate1.txt - Contains sample names barcodes for samples included in this study on plate 1 ("Plate 1 - Zonotrichia, and Leiothlypis GBS Library" on SRA). This is used by the demultiplex and trim perl script included in Zonotrichia-all-command-line-scripts-tidy.txt.
barcodes_plate2.txt - Contains sample names barcodes for samples included in this study on plate 2 ("Plate 2 - Sphyrapicus, Zonotrichia, and Leiothlypis GBS Library" on SRA). This is used by the demultiplex and trim perl script included in Zonotrichia-all-command-line-scripts-tidy.txt.
photo-archive.zip - Contains high-resolution photographs in .jpg format of all birds sampled in this study.
Zonotrichia-metadata.csv - Contains metadata for all samples included in this study. Samples are organized by by sample name.
Zonotrichia-metadata-README.txt - Contains detailed descriptions of the contents of each column of Zonotrichia-metadata.csv.
Zonotrichia-all-command-line-scripts-tidy.txt - Contains all Bash and Perl scripts used to demulitplex, trim, and align our sequences, call and filter snps, and calculate Fst and Linkage disequilibrium.
ld_heatmap_plotting__sparrows.R - Contains all R scripts used to produce figures 7B and S7B.
genomics_R_functions_V2.R - Contains a variety of R functions from Irwin et al. 2016 that are used by several other R scripts used in this project.
genotype-by-individual-plot.R - Contains R scipts used to produce figures 7A, S6, and S7A.
GCSP_vs_WCSP_fst_nuclear_genome.R - Contains R scripts used to calculate pairwise genome-wide Fst between groups (table 1) and produce prelimiary per-snp Fst manhattan plots (Figure S5), and PCAs (figures 1B and S3)
genotype_by_individual_function.R - Contains R function used by genotype-by-individual-plot.R
Sparrow_slidingWindow.R - Contains R scripts that calculate Fst, Pi_between, and Pi_within accross sliding windows, perform analyses using these sliding windows, and prodce summary plots of these analyses (figures 2, 3, 4, 6, S4). Additionally, contains scripts used to compare the Z chromosome to the autosomes (table 2) and produce figure 5.
Sharing/access Information
Links to other publicly accessible locations of the data:
Sequences associated with this project are uploaded to the SRA under BioProject "Genotyping-by-sequencing reads from the bird genera Sphyrapicus, Zonotrichia, and Leiothlypis" as BioSamples "Plate 1 - Zonotrichia, and Leiothlypis GBS Library" and "Plate 2 - Sphyrapicus, Zonotrichia, and Leiothlypis GBS Library".
URL of BioProject:
Was data derived from another source?
No data was derived from another source.
Some of the scripts used are derived from other sources. Please refer to the comments in the scripts and the manuscript for sources.
Methods
Sample Collection
For this project, we collected blood samples for genomic analysis from individuals of wild Z. leucophrys and Z. atricapilla in south-western British Columbia. The dataset includes a metadata file in .csv format, with locality, date and time of collection, measurements, barcodes, etc. associated with all samples.
The first round of fieldwork was conducted at the Iona Island Bird Observatory (IIBO) in Richmond, British Columbia, in Spring 2019. Birds were captured passively in mist nets and banded as part of the station’s normal migration monitoring operations. We then took 10–40 μL of blood from the brachial vein of each bird of the target species and stored these samples in 500 μL of Queen’s Lysis Buffer (Seutin et al., 1991). We measured wing chord, tail length, tarsus length, mass, culmen length, beak depth, and beak width. A total of 19 Z. atricapilla and 14 Z. leucophrys of indeterminate subspecies were sampled during this initial period.
The second round of field sampling focused on increasing the sample size of Z. leucophrys and was conducted between June 16 and 26, 2020 at the Vancouver Campus of the University of British Columbia (UBC). Singing birds were located, and a mist net was set up nearby. Song recordings of local Z. leucophrys were used to attract birds to the net. Once birds were captured, they were immediately removed, banded, sampled, and measured using the methods outlined above. We sampled an additional 16 breeding Z. leucophrys individuals during this collection period, all of the pugetensis subspecies based on their bill colour and location. Additionally, we took samples of pectoral muscle tissue from seven unprepared specimens from the freezer of the Cowan Tetrapod Collection at the Beaty Biodiversity Museum of UBC that had been salvaged from throughout BC.
Bioinformatics Pipeline
The dataset includes all the scripts used to process the raw GBS reads acquired from the Genome Quebec Innovation Centre.
Reads were demultiplexed using custom scripts from Baute et al. (2016), and sequences were trimmed for quality using Trimmomatic (version 0.36; Bolger et al., 2014). The reads were then aligned to the Taeniopygia guttata reference genome version bTaeGut2.pat.W.v2 (Warren et al., 2010; Rhie et al., 2021; GenBank accession number GCA_008822105.2) using BWA-MEM (version 0.7.15; Li & Durbin, 2009). The resulting SAM files were converted to BAM files using Picard (version 2.23.8; Broad Institute https://broadinstitute.github.io/picard/), and single-end and paired-end BAM files were combined using SAMtools (version 1.3.2; Li et al., 2009). Single Nucleotide Polymorphisms (SNPs) were identified using the HaplotypeCaller tool in GATK (version 3.8; McKenna et al., 2010), which produced a gvcf file for each individual. Individual g.vcf files were then combined into a single vcf file using the GenotypeGVCFs tool in GATK. We filtered genomic sites using VCFtools (version 1.16; Danecek et al., 2011), first by removing indels and SNPs with more than 2 alleles. Using a custom Perl script (Owens et al., 2016), we removed sites with a mapping quality lower than 20 and a heterozygosity above 0.6. We then filtered for individuals with more than 60% missing data using VCFtools, removing three Z. atricapilla and three Z. leucophrys from the dataset. Finally, we used VCFtools to filter out SNPs missing from more than 20% of individuals and SNPs with a minimum allele frequency of less than 0.05.
A comparison of mean coverage for each individual across the sex chromosomes (W and Z) and a representative autosome (chromosome 3), revealed that coverage across the W chromosome was approximately three times higher than across the autosomes (Fig. S2). This was true for known female birds, as well as known males. Since females are heterogametic for the W chromosome and males are homogametic for the Z (Irwin, 2018), we expect coverage across the W to be roughly half that of autosomal coverage in females, and zero in males. These results indicate that many of the sequences from our short-read dataset that mapped to the Zebra Finch W chromosome cannot be from the Zonotrichia W chromosome. These likely represent sequences found on autosomes in Zonotrichia, but on the W chromosome in the Zebra finch, or repetitive elements found at high frequency in Zonotrichia, but infrequently in the Zebra Finch. After filtering and removal of the W sequences, a total of 45,986 SNPs were retained for analysis.
Analysis
Finally, this dataset contained all scripts used to analyze the SNP dataset. For detailed methods of these analyses, please refer to the associated article.
Usage notes
The metadata file is in .csv format, which can be opened in any text editor.
The files containing scripts used to process samples are in .txt format and can be opened in any text editor. These scripts were written in bash for use in a UNIX environment. They require they require additional scripts from scripts from Baute et al. (2016), and the programs Trimmomatic (version 0.36; Bolger et al., 2014), BWA-MEM (version 0.7.15; Li & Durbin, 2009), Picard (version 2.23.8; Broad Institute https://broadinstitute.github.io/picard/), SAMtools (version 1.3.2; Li et al., 2009), GATK (version 3.8; McKenna et al., 2010), VCFtools (version 1.16; Danecek et al., 2011), and Plink (version 1.9; Chang et al., 2015).
The files containing scripts used for analyses were written in and require R (version 3.6.3; R Core Team), and the R package pcaMethods (version 1.78.0; Stacklies et al., 2007).
Funding
Natural Sciences and Engineering Research Council