Data from: A sex chromosome polymorphism maintains divergent plumage phenotypes between extensively hybridizing yellowhammers (Emberiza citrinella) and pine buntings (E. leucocephalos)
Data files
Sep 05, 2024 version files 7.50 GB
-
Background_Pheno.txt
-
Emberiza_All_Plates.genotypes.variant_only.whole_genome.ADMIX.max2allele_noindel.GATKfiltered_excluded.maxmiss60_LDtrim.2.Q
-
Emberiza_All_Plates.genotypes.variant_only.whole_genome.ADMIX.max2allele_noindel.GATKfiltered_excluded.maxmiss60.vcf.recode.vcf
-
Emberiza_All_Plates.genotypes.variant_only.whole_genome.ADMIX.max2allele_noindel.GATKfiltered_excluded.maxmiss60.vcf.recode2.gds
-
Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.GEMMA.All.Chr.GEMMA.recode.background.recode.gemma.results.ULMM.assoc.txt
-
Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.GEMMA.All.Chr.GEMMA.recode.eyebrow.recode.gemma.results.ULMM.assoc.txt
-
Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.GEMMA.All.Chr.GEMMA.recode.throat.recode.gemma.results.ULMM.assoc.txt
-
Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.Link.1.Group.4.Z.chromo.LD.ld
-
Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.Link.2.Group.6.Z.chromo.LD.ld
-
Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.Link.3.Group.4.5.6.Z.chromo.LD.ld
-
Emberiza_PC_Longitude_Scores.csv
-
Emeriza_Combined_Map_Metadata.csv
-
Eyebrow_Pheno.txt
-
IBD.RData
-
README.md
-
Throat_Pheno.txt
Abstract
Under allopatric speciation, populations of a species become isolated by a geographic barrier and develop reproductive isolation through genetic differentiation. When populations meet in secondary contact, the strength of evolved reproductive barriers determines the extent of hybridization and whether the populations will continue to diverge or merge together. The yellowhammer (Emberiza citrinella) and pine bunting (E. leucocephalos) are avian sister species that diverged in allopatry on either side of Eurasia during the Pleistocene glaciations. Though they differ greatly in plumage and form distinct genetic clusters in allopatry, these taxa show negligible mitochondrial DNA differentiation and hybridize extensively where they overlap in central Siberia, lending uncertainty to the state of reproductive isolation in the system. To assess the strength of reproductive barriers between taxa, we examined genomic differentiation across the system. We found that extensive admixture has occurred in sympatry, indicating that reproductive barriers between taxa are weak. We also identified a putative Z chromosome inversion region that underlies plumage variation in the system, with the “pine bunting” haplotype showing dominance over the “yellowhammer” haplotype. Our results suggest that yellowhammers and pine buntings are currently at a crossroads and that evolutionary forces may push this system towards either continued differentiation or population merging. However, even if these taxa merge, recombination suppression between putative chromosome Z inversion haplotypes may maintain divergent plumage phenotypes within the system. In this way, our findings highlight the important role hybridization plays in increasing the genetic and phenotypic variation as well as the evolvability of a system.
README: A sex chromosome polymorphism maintains divergent plumage phenotypes between extensively hybridizing yellowhammers (Emberiza citrinella) and pine buntings (E. leucocephalos)
https://doi.org/10.5061/dryad.prr4xgxw7
Description of the data and file structure
This repository is associated with Nikelski, E., Rubtsov, A. S., & Irwin, D. (2024). A sex chromosome polymorphism maintains divergent plumage phenotypes between extensively hybridizing yellowhammers (Emberiza citrinella) and pine buntings (E. leucocephalos). Molecular Ecology. In Press.
In this study, DNA was obtained from Emberizidae individuals (Emberiza citrinella, Emberiza leucocephalos, and hybrids) and sequenced using a genotyping-by-sequencing approach. The raw GBS reads were processed to produce variant site datasets using scripts from a previous publication (Nikelski et al. 2023) and Dryad repository (https://doi.org/10.5061/dryad.tmpg4f538). Once the required variant site files were produced, a series of genomic investigations were performed that included Fst, PCA, kinship, admixture, linkage disequilibrium and admixture mapping analyses. Based on these analyses, we presented evidence of a sex chromosome polymorphism that is segregating across the ranges of Emberiza citrinella and Emberiza leucocephalos and that maintains diverse plumage phenotypes within this system.
This repository contains the scripts and raw data necessary to perform all Fst, PCA, admixture, linkage disequilibrium and admixture mapping analyses described in this study and to produce all associated figures. The software included in these scripts are: R, RStudio, Julia, GATK, VCFtools, PLINK, ADMIXTURE, BCFtools, BIMBAM and GEMMA. All software is open source and free to access.
The scripts and datasets necessary to process raw GBS reads into the variant site datasets utilized in this study can be found in a previous Dryad repository (https://doi.org/10.5061/dryad.tmpg4f538).
Files and variables
File: Background_Pheno.txt
Description: This file contains the phenotypic scores of all yellowhammer, pine bunting and hybrid individuals for the "Background" plumage trait which describes the head and body plumage colour in regions without brown or black streaking. "Background" colour ranges from bright yellow to pure white. Scores of "0" are associated with a pure yellowhammer phenotype, scores of "7" are associated with a pure pine bunting phenotype, scores of "1-6" are associated with a hybrid phenotype and "NA" indicates a lack of information for the individual at this trait. This file is one of the metadata files necessary to perform the admixture mapping analysis associated with the "Background" plumage trait and the code necessary to perform this analysis can be found in the Admixture-Mapping-Gemma-Analyses-Code.txt file. The entries in this file are ordered to match other datasets that are included in this analysis. This dataset does not have a header and this file can be opened with any text editing software. In the present study, this file was included in analyses that utilized the GEMMA program.
Variables
- A single variable that includes a "Background" phenotypic score for all individuals
File: Emberiza_All_Plates.genotypes.variant_only.whole_genome.ADMIX.max2allele_noindel.GATKfiltered_excluded.maxmiss60_LDtrim.2.Q
Description: This file contains the results from an admixture analysis performed using the program Admixture while imposing a "k" value of 2. This analysis was performed to determine whether there had been admixture between yellowhammer and pine bunting individuals and to investigate the proportion of yellowhammer and pine bunting genetic ancestry seen within hybrid individuals with different plumage phenotypes. The code necessary to produce this dataset can be found in the Admixture*-*Analyses-Code.txt file. The data in this file was used to produce panels in Figures 2 an 3 of the associated publication. This dataset does not have a header and this file can be opened with any text editing software. In the present study, this file was loaded into R using its accompanying IDE RStudio.
Variables
- Column 1: The proportion of pine bunting genetic ancestry associated with an individual
- Column 2: The proportion of yellowhammer genetic ancestry associated with an individual
File: Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.GEMMA.All.Chr.GEMMA.recode.eyebrow.recode.gemma.results.ULMM.assoc.txt
Description: This file contains the results from an admixture mapping analysis performed using the program GEMMA on the "Brow" phenotypic trait. This analysis was performed to determine what areas of the genome control phenotypic variation at the "Brow" trait among yellowhammers, pine buntings and hybrids. The code necessary to produce this dataset can be found in the Admixture-Mapping-Gemma-Analyses-Code.txt file. The data in this file was used to produce panels in Figure 6 and Supplementary Figure 9 of the associated publication. In the associated publication, we utilized p-values from the likelihood-ratio test. This file can be opened with any text editing software. In the present study, this file was loaded into R using its accompanying IDE RStudio.
Variables
- chr: Chromosome number
- rs: SNP identification
- ps: Base pair position
- n_miss: Number of missing values for a given SNP
- allele1: Minor allele
- allele0: Major allele
- af: Allele frequency
- beta: Estimate of beta
- se: Standard error for beta estimate
- logl_H1: Log likelihood of the alternative hypothesis as a measure of goodness of fit
- l_remle: REML estimate for lambda
- l_mle: MLE estimate for lambda
- p_wald: p-value for the Wald test testing for a significant association between phenotype and genotype
- p_lrt: p-value for the likelihood ratio test testing for a significant association between phenotype and genotype
- p_score: p-value for the score test testing for a significant association between phenotype and genotype
File: Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.GEMMA.All.Chr.GEMMA.recode.background.recode.gemma.results.ULMM.assoc.txt
Description: This file contains the results from an admixture mapping analysis performed using the program GEMMA on the "Background" phenotypic trait. This analysis was performed to determine what areas of the genome control phenotypic variation at the "Background" trait among yellowhammers, pine buntings and hybrids. The code necessary to produce this dataset can be found in the Admixture-Mapping-Gemma-Analyses-Code.txt file. The data in this file was used to produce panels in Figure 6 and Supplementary Figure 9 of the associated publication. In the associated publication, we utilized p-values from the likelihood-ratio test. This file can be opened with any text editing software. In the present study, this file was loaded into R using its accompanying IDE RStudio.
Variables
- chr: Chromosome number
- rs: SNP identification
- ps: Base pair position
- n_miss: Number of missing values for a given SNP
- allele1: Minor allele
- allele0: Major allele
- af: Allele frequency
- beta: Estimate of beta
- se: Standard error for beta estimate
- logl_H1: Log likelihood of the alternative hypothesis as a measure of goodness of fit
- l_remle: REML estimate for lambda
- l_mle: MLE estimate for lambda
- p_wald: p-value for the Wald test testing for a significant association between phenotype and genotype
- p_lrt: p-value for the likelihood ratio test testing for a significant association between phenotype and genotype
- p_score: p-value for the score test testing for a significant association between phenotype and genotype
File: Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.GEMMA.All.Chr.GEMMA.recode.throat.recode.gemma.results.ULMM.assoc.txt
Description: This file contains the results from an admixture mapping analysis performed using the program GEMMA on the "Throat" phenotypic trait. This analysis was performed to determine what areas of the genome control phenotypic variation at the "Throat" trait among yellowhammers, pine buntings and hybrids. The code necessary to produce this dataset can be found in the Admixture-Mapping-Gemma-Analyses-Code.txt file. The data in this file was used to produce panels in Figure 6 and Supplementary Figure 9 of the associated publication. In the associated publication, we utilized p-values from the likelihood-ratio test. This file can be opened with any text editing software. In the present study, this file was loaded into R using its accompanying IDE RStudio.
Variables
- chr: Chromosome number
- rs: SNP identification
- ps: Base pair position
- n_miss: Number of missing values for a given SNP
- allele1: Minor allele
- allele0: Major allele
- af: Allele frequency
- beta: Estimate of beta
- se: Standard error for beta estimate
- logl_H1: Log likelihood of the alternative hypothesis as a measure of goodness of fit
- l_remle: REML estimate for lambda
- l_mle: MLE estimate for lambda
- p_wald: p-value for the Wald test testing for a significant association between phenotype and genotype
- p_lrt: p-value for the likelihood ratio test testing for a significant association between phenotype and genotype
- p_score: p-value for the score test testing for a significant association between phenotype and genotype
File: Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.Link.1.Group.4.Z.chromo.LD.ld
Description: This file contains the results from a chromosome Z linkage disequilibrium analysis using the program Plink when examining only individuals that are homozygous for the "pine bunting" chromosome Z haplotype. This analysis was performed to determine whether a highly differentiated region of chromosome Z when comparing yellowhammers and pine buntings showed high linkage suggestive of low recombination. The code necessary to produce this dataset can be found in the Linkage-Disequilibrium-Analyses-Code.txt file. The data in this file was used to produce panels in Figure 5 of the associated publication. This file can be opened with any text editing software. In the present study, this file was loaded into R using its accompanying IDE RStudio.
Variables
- CHR_A: The chromosome position of the first site included in a pairwise comparison
- BP_A: The base pair position of the first site included in a pairwise comparison
- SNP_A: The SNP ID of the first site included in a pairwise comparison
- CHR_B: The chromosome position of the second site included in a pairwise comparison
- BP_B: The base pair position of the second site included in a pairwise comparison
- SNP_B: The SNP ID of the second site included in a pairwise comparison
- R2: A measure of linkage disequilibrium between the two sites as the squared inter-variant allele correlation
File: Emberiza_All_Plates.genotypes.variant_only.whole_genome.ADMIX.max2allele_noindel.GATKfiltered_excluded.maxmiss60.vcf.recode2.gds
Description: This file contains called genotypes and genomic data for yellowhammers, pine bunting and hybrid individuals in a .gds or genomic data structure format. The file was produced by converting a .vcf file produced during admixture analyses that used scripts in the Admixture-Analyses-Code.txt file. This file was necessary to complete kinship analyses to determine whether individuals within our dataset were related to each other and to produce Supplementary Figure 5 in the associated manuscript. This is a binary file that cannot be opened using a text editing software. In the present study, this file was loaded into R using its accompanying IDE RStudio and commands in the SNPRelate package.
File: Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.Link.2.Group.6.Z.chromo.LD.ld
Description: This file contains the results from a chromosome Z linkage disequilibrium analysis using the program Plink when examining only individuals that are homozygous for the "yellowhammer" chromosome Z haplotype. This analysis was performed to determine whether a highly differentiated region of chromosome Z when comparing yellowhammers and pine buntings showed high linkage suggestive of low recombination. The code necessary to produce this dataset can be found in the Linkage-Disequilibrium-Analyses-Code.txt file. The data in this file was used to produce panels in Figure 5 of the associated publication. This file can be opened with any text editing software. In the present study, this file was loaded into R using its accompanying IDE RStudio.
Variables
- CHR_A: The chromosome position of the first site included in a pairwise comparison
- BP_A: The base pair position of the first site included in a pairwise comparison
- SNP_A: The SNP ID of the first site included in a pairwise comparison
- CHR_B: The chromosome position of the second site included in a pairwise comparison
- BP_B: The base pair position of the second site included in a pairwise comparison
- SNP_B: The SNP ID of the second site included in a pairwise comparison
- R2: A measure of linkage disequilibrium between the two sites as the squared inter-variant allele correlation
File: Emberiza_PC_Longitude_Scores.csv
Description: This file contains metadata for yellowhammer, pine bunting and hybrid individuals included in this study in association with PC1 scores from a PCA conducted on the genomic data from all individuals to examine population structure within the system. The code necessary to perform the PCA and produce the PC1 scores can be found in the Molecular-Ecology-Data-Genomic-Analyses.R file. The data in this file was used to produce panels in Figure 2, Figure 3, Supplementary Figure 4 and Supplementary Figure 7 in the associated publication. This file can be opened with any text editing or spreadsheet software. In the present study, this file was loaded into R using its accompanying IDE RStudio.
Variables
- Sample ID: The unique sample ID assigned to each individual
- Species: The species assigned to each individual. In this column: "E.leucocephalos" is assigned to individuals with phenotypic classes that are either PL or SL, "E.citrinella" is assigned to individuals with the phenotypic classes that are either PL or SL and "Hybrid" is assigned to individuals with the phenotypic classes that are either CH, YH, WH or LH
- Sex: The sex of each individual. "m" indicates male, "f" indicates female and "uk" indicates unknown
- Throat Score: The phenotypic score given to each individual for the "throat" plumage trait (0-7 or NA)
- Eyebrow Score: The phenotypic score given to each individual for the "brow" plumage trait (0-7 or NA)
- Background Score: The phenotypic score given to each individual for the "background" plumage trait (0-7 or NA)
- Phenotypic Class: The phenotypic class assigned to each individual. In this column: "PC" = Pure citrinella, "SC" = Almost citrinella, "CH" = Citrinella hybrid, "YH" = Yellow hybrid, "WH" = White hybrid, "LH" = Leucocephalos hybrid, "SL" = Almost leucocephalos, "PL" = Pure leucocephalos, "FML" = Female and "UK" = "Unknown".
- Geographic Distribution : The geographic distribution assigned to each individual depending on where the sample was collected. In this column: "Allopatric" = Collected in the allopatric range, "Inter" = Collected in the near sympatric range and "sympatric" = Collected in the sympatric range
- Fst group: The Fst group assigned to each individual which is necessary to run certain analyses. The "Fst group" column was created by combining information from the "Species" and "Geographic Distribution" columns
- Region: The region from which each individual was collected
- Exact Location: A detailed description of the location where each individual was collected
- Source: The source of the sample which includes whether it was provided by a museum or collected by researchers associated with this study
- Lat (N): The latitude coordinate where each individual was collected
- Long (E): The longitude coordinate where each individual was collected
- PC1: The PC1 score associated with each individual from a PCA conducted on genomic information obtained from the individuals in this dataset
File: Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.Link.3.Group.4.5.6.Z.chromo.LD.ld
Description: This file contains the results from a chromosome Z linkage disequilibrium analysis using the program Plink when examining individuals that are either homozygous or heterozygous for chromosome Z haplotypes. This analysis was performed to determine whether a highly differentiated region of chromosome Z when comparing yellowhammers and pine buntings showed high linkage suggestive of low recombination. The code necessary to produce this dataset can be found in the Linkage-Disequilibrium-Analyses-Code.txt file. The data in this file was used to produce panels in Figure 5 of the associated publication. This file can be opened with any text editing software. In the present study, this file was loaded into R using its accompanying IDE RStudio.
Variables
- CHR_A: The chromosome position of the first site included in a pairwise comparison
- BP_A: The base pair position of the first site included in a pairwise comparison
- SNP_A: The SNP ID of the first site included in a pairwise comparison
- CHR_B: The chromosome position of the second site included in a pairwise comparison
- BP_B: The base pair position of the second site included in a pairwise comparison
- SNP_B: The SNP ID of the second site included in a pairwise comparison
- R2: A measure of linkage disequilibrium between the two sites as the squared inter-variant allele correlation
File: Emeriza_Combined_Map_Metadata.csv
Description: This file contains metadata for the sampling locations included in this study as well as a breakdown of the numbers and identities of yellowhammer, pine bunting and hybrid individuals collected at each location. The data in this file was used to produce the map included in Figure 1of the associated publication and the code necessary to produce this map can be found in the Molecular-Ecology-Data-Genomic-Analyses.R file. Sampling locations may include multiple sites that appeared too close together to be shown in detail on the map. Full details for the sites included in each sampling location can be found in Supplementary Table 1 of the associated publication. This file can be opened with any text editing or spreadsheet software. This file can be opened with any text editing or spreadsheet software. In the present study, this file was loaded into R using its accompanying IDE RStudio.
Variables
- Exact Location: A detailed description of the location of each sampling site
- Lat: The latitude coordinate of each sampling site
- Long: The longitude coordinate of each sampling site
- Sample_size: The number of avian individuals collected at each sampling site
- E_cit_allo: The number of allopatric Emberiza citrinella individuals collected at each sampling site
- E_cit_inter: The number of near sympatric Emberiza citrinella individuals collected at each sampling site
- E_cit_sym: The number of sympatric Emberiza citrinella individuals collected at each sampling site
- Hyb: The number of hybrid individuals collected at each sampling site
- E_leuc_sym: The number of sympatric Emberiza leucocephalos individuals collected at each sampling site
- E_leuc_inter: The number of near sympatric Emberiza leucocephalos individuals collected at each sampling site
- E_leuc_allo: The number of allopatric Emberiza leucocephalos individuals collected at each sampling site
- E_cit_allo_%: The proportion of the collected samples at each site that were allopatric Emberiza citrinella individuals
- E_cit_inter_%: The proportion of the collected samples at each site that were near sympatric Emberiza citrinella individuals
- E_cit_sym_%: The proportion of the collected samples at each site that were sympatric Emberiza citrinella individuals
- Hyb_%: The proportion of the collected samples at each site that were hybrid individuals
- E_leuc_sym_%: The proportion of the collected samples at each site that were sympatric Emberiza leucocephalos individuals
- E_leuc_inter_%: The proportion of the collected samples at each site that were near sympatric Emberiza leucocephalos individuals
- E_leuc_allo_%: The proportion of the collected samples at each site that were allopatric Emberiza leucocephalos individuals
File: Eyebrow_Pheno.txt
Description: This file contains the phenotypic scores of all yellowhammer, pine bunting and hybrid individuals for the "Brow" plumage trait which describes the amount of chestnut plumage at the brow of an individual versus white or yellow plumage. Scores of "0" are associated with a pure yellowhammer phenotype, scores of "7" are associated with a pure pine bunting phenotype, scores of "1-6" are associated with a hybrid phenotype and "NA" indicates a lack of information for the individual at this trait. This file is one of the metadata files necessary to perform the admixture mapping analysis associated with the "Brow" plumage trait and the code necessary to perform this analysis can be found in the Admixture-Mapping-Gemma-Analyses-Code.txt file. The entries in this file are ordered to match other datasets that are included in this analysis. This dataset does not have a header and this file can be opened with any text editing software. In the present study, this file was included in analyses that utilized the GEMMA program.
Variables
- A single variable that includes a "Brow" phenotypic score for all individuals.
File: Throat_Pheno.txt
Description: This file contains the phenotypic scores of all yellowhammer, pine bunting and hybrid individuals for the "Throat" plumage trait which describes the amount of chestnut plumage at the throat of an individual versus white or yellow plumage. Scores of "0" are associated with a pure yellowhammer phenotype, scores of "7" are associated with a pure pine bunting phenotype, scores of "1-6" are associated with a hybrid phenotype and "NA" indicates a lack of information for the individual at this trait. This file is one of the metadata files necessary to perform the admixture mapping analysis associated with the "Throat" plumage trait and the code necessary to perform this analysis can be found in the Admixture-Mapping-Gemma-Analyses-Code.txt file. The entries in this file are ordered to match other datasets that are included in this analysis. This dataset does not have a header and this file can be opened with any text editing software. In the present study, this file was included in analyses that utilized the GEMMA program.
Variables
- A single variable that includes a "Throat" phenotypic score for all individuals.
File: IBD.RData
Description: This file contains pairwise kinship scores between all yellowhammer, pine bunting and hybrid individuals that were produced as part of a kinship analysis to determine if any of the individuals included in this dataset were related to each other. This information was saved as a .RData file to ensure that this information could be easily uploaded into R over a relatively short period of time. The code necessary to produce this dataset can be found in the Molecular-Ecology-Data-Genomic-Analyses.R file The data in this file was used to produce Supplementary Figure 5 in the associated publication. This file can be opened using R and its accompanying IDE RStudio.
File: Emberiza_All_Plates.genotypes.variant_only.whole_genome.ADMIX.max2allele_noindel.GATKfiltered_excluded.maxmiss60.vcf.recode.vcf
Description: This file contains called genotypes and genomic data for yellowhammer, pine bunting and hybrid individuals in a .vcf file format and was produced during admixture analyses that used scripts in the Admixture-Analyses-Code.txt file. This file was necessary to complete kinship analyses to determine whether individuals within the dataset were related to each other and to produce Supplementary Figure 5 in the associated manuscript. This file can be opened and viewed in the terminal, but is too large to view using a text editing program. In the present study, this file was loaded into R using its accompanying IDE RStudio and commands in the SNPRelate package.
Code/software
File: Admixture-Analyses-Code.txt
Description: This file contains the scripts necessary to conduct the admixture analyses described in the associated paper and can be opened with any text editing software. Author comments within this file are used to explain the purpose of important lines of code. These analyses were written in the C programming language. The input files for these analyses are produced using scripts from a previous Dryad repository (https://doi.org/10.5061/dryad.tmpg4f538) to process raw GBS reads and call genotypes. All code was run on a command-line interface. The output file from these analyses is:
- Emberiza_All_Plates.genotypes.variant_only.whole_genome.ADMIX.max2allele_noindel.GATKfiltered_excluded.maxmiss60_LDtrim.2.Q
File: Admixture-Mapping-GEMMA-Analyses-Code.txt
Description: This file contains the scripts necessary to conduct the admixture mapping analyses described in the associated paper for the "Background", "Brow" and "Throat" phenotypic traits and can be opened with any text editing software. Author comments within this file are used to explain the purpose of important lines of code. These analyses were written in the C programming language. The input files for these analyses are produced using scripts from a previous Dryad repository (https://doi.org/10.5061/dryad.tmpg4f538) to process raw GBS reads and call genotypes. In addition to these files, the following three files are also required to complete these analyses: Background-Pheno.txt, Eyebrow-Pheno.txt and Throat-Pheno.txt. All code was run on a command-line interface. The output files from these analyses are:
- Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.GEMMA.All.Chr.GEMMA.recode.background.recode.gemma.results.ULMM.assoc.txt
- Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.GEMMA.All.Chr.GEMMA.recode.eyebrow.recode.gemma.results.ULMM.assoc.txt
- Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.GEMMA.All.Chr.GEMMA.recode.throat.recode.gemma.results.ULMM.assoc.txt
File: Chromsome-Z-Analyses-Julia-Code.zip
Description: This zipped file contains EmberizaGenomics2024_Julia.qmd, EmberizaGenomics2024_Julia.html and a folder called "EmberizaGenomics2024_Julia_files". The EmberizaGenomics2024_Julia.qmd file contains all the scripts necessary to perform chromosome Z analyses in a Quarto markdown file. The EmberizaGenomics2024_Julia.qmd, EmberizaGenomics2024_Julia.html file contains all the same scripts in a more user-friendly format that is readable in a web browser. In order to view the .html file, the folder called "EmberizaGenomics2024_Julia_files" must be saved to the same directory as the .html file. This folder contains various housekeeping files necessary for the .html file to load properly. Author comments within these files are used to explain the purpose of important lines of code. These analyses were written in the Julia programming language and run within the Julia IDE. Files can be opened using any text editing software and with Julia and its accompanying IDE
File: Linkage-Disequilibrium-Analyses-Code.txt
Description: This file contains the scripts necessary to conduct the linkage disequilibrium analyses described in the associated paper and can be opened with any text editing software. Author comments within this file are used to explain the purpose of important lines of code. These analyses were written in the C programming language. The input files for these analyses are produced using scripts from a previous Dryad repository (https://doi.org/10.5061/dryad.tmpg4f538) to process raw GBS reads and call genotypes. All code was run on a command-line interface. The output files from these analyses are:
- Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.Link.1.Group.4.Z.chromo.LD.ld
- Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.Link.2.Group.6.Z.chromo.LD.ld
- Emberiza_All_Plates.genotypes.variant_only.whole_genome.max2allele_noindel.GATKfiltered_excluded.maxmiss60.Link.3.Group.4.5.6.Z.chromo.LD.ld
File: Molecular-Ecology-Data-Genomic-Analyses.R
Description: This file contains the scripts necessary to conduct the Fst, PCA and kinship analyses described in the associated paper as well as produce all the main and supplementary figures and can be opened with any text editing software or with R and its accompanying IDA RStudio. Author comments within this file are used to explain the purpose of important lines of code. These analyses were written in the R programming language and all files were run in R with its accompanying IDE, Rstudio. The input files for these analyses are produced using scripts from a previous Dryad repository (https://doi.org/10.5061/dryad.tmpg4f538) to process raw GBS reads and call genotypes. In addition to these files, the following files are also required to complete these analyses:
- Emberiza_PC_Longitude_Scores.csv
- Emeriza_Combined_Map_Metadata.csv
- Emberiza_All_Plates.genotypes.variant_only.whole_genome.ADMIX.max2allele_noindel.GATKfiltered_excluded.maxmiss60.vcf.recode.vcf
- Emberiza_All_Plates.genotypes.variant_only.whole_genome.ADMIX.max2allele_noindel.GATKfiltered_excluded.maxmiss60.vcf.recode2.gds
- IBD.RData
Methods
The methods employed in this study can be found in the linked 2024 Molecular Ecology publication with additional information specified in the provided scripts and text files. As well, further detail is available in a previous publication (Nikelski et al. 2023, Heredity; https://doi.org/10.1038/s41437-022-00580-8) and related Dryad repository (https://doi.org/10.5061/dryad.tmpg4f538) that utilized a portion of the dataset included in this research. To summarize, DNA was extracted from avian blood and tissue samples and sequenced using a genotyping-by-sequencing protocol. Resulting DNA reads were processed and then investigated using various genomic and bioinformatic approaches that included Fst, PCA, kinship, admixture, linkage disequilibrium and admixture mapping analyses. The scripts used to conduct these analyses are written in three coding languages (R, C and Julia) and are included in this repository along with any necessary data files provided that the scripts and data files were not previously included in the aforementioned data repository.