Parallel and convergent evolution in genes underlying seasonal migration
Data files
Dec 11, 2024 version files 10.58 GB
-
Admixture_plots.7z
28.48 KB
-
FST_outlier_plots.zip
135.17 MB
-
Genotype_files.zip
8.16 GB
-
Genotype_probability_files.7z
1.83 GB
-
Populations_admixture.7z
756 B
-
Populations_angsd.7z
724 B
-
README.md
8.18 KB
-
Reference_genome.7z
295.10 MB
-
Selection_scan.7z
167.75 MB
-
Treemix.7z
1.58 KB
Abstract
Seasonal migration has fascinated scientists and natural historians for centuries. While the genetic basis of migration has been widely studied across different taxa, there is little consensus regarding which genomic regions play a role in the ability to migrate and whether they are similar across species. Here, we examine the genetic basis of intraspecific variation within and between distinct migratory phenotypes in a songbird. We focus on the Common Yellowthroat (Geothlypis trichas) as a model system because the polyphyletic origin of eastern and western clades across North America provides a strong framework for understanding the extent to which there has been parallel or convergent evolution in the genes associated with migratory behavior. First, we investigate genome-wide population genetic structure in the Common Yellowthroat in 196 individuals collected from 22 locations across breeding range. Then, to identify candidate genes involved in seasonal migration, we identify signals of putative selection in replicate comparisons between resident and migratory phenotypes within and between eastern and western clades. Overall, we find wide-spread support for parallel evolution at the genic level, particularly in genes that mediate biological timekeeping. However, we find little evidence of parallelism at the individual SNP level, supporting the idea that there are multiple genetic pathways involved in the modulation of migration.
README: Parallel and convergent evolution in genes underlying seasonal migration: Repository
Luz E. Zamudio-Beltrán, Christen M. Bossu, Alfredo A. Bueno-Hernández, Peter O. Dunn, Nick D. Sly, Christine Rayne, Eric C. Anderson, Blanca E. Hernández-Baños, Kristen C. Ruegg.
This repository includes the scripts and data used in the analyses performed in Zamudio-Beltrán et. al. 2024. This also contains the scripts and files to visualize results in plots.
This study includes low coverage whole genome sequencing data from 202 individuals of the Common Yellowthroat warbler (Geothlypis trichas). Raw reads were processed following a pipeline adapted from the GATK Best Practices Guide (Van der Auwera et al., 2013).
Description of the data and file structure
Files within this repository are organized in two main directories: data and bin.
Usage notes
Files within this repository may contain genomic variant data, including SNPs and annotations. We recommend the usage of the following software/packages:
- bcftools: For manipulation and querying of VCF files.
- vcftools: For analysis of VCF files, such as filtering or summarizing variant data.
- Command-Line Visualization: to explore raw sequences (e.g.
COYE.fa
).
data
Files within this data repository include a reference genome used to map paired reads and a genome annotation, called genotype files from GATK and genotype probability files created in ANGSD analyses. Here, we also include the population files used to generate site allele frequencies for each population, data and scripts for admixture analyses and plots, and data used in the analyses of FST outlier, selection scan and phylogenetic reconstruction using treemix.
Reference_genome.zip
: the Common Yellowthroat reference genome, and annotation data.COYE.fa
COYE.fa.fai
coye_final.all.clean.renamed.rm_matchp.sort.gff
Genotype_files.zip
:COYEP1-4c.merged_gatk.srs_filt.EAST.recode.vcf.gz
: called genotype file used in admixture analysis for eastern group. This file includes 18,130,407 genotypes for 100 individuals.COYEP1-4c.merged_gatk.srs_filt.WEST.recode.vcf.gz
: called genotype file used in admixture analysis for western group. This file includes 18,130,407 genotypes for 96 individuals.
Genotype_probability_files.zip
:COYE.Plate1_4c.East.rm_rel.minInd50.05maf.baq.beagle.gz
: genotype probabilities of 100 eastern individuals from ANGSD that are used as input for the selection scan analysis.COYE.Plate1_4c.West.rm_rel.minInd50.05maf.baq.beagle.gz
: genotype probabilities of 96 eastern individuals from ANGSD that are used as input for the selection scan analysis.
Populations_angsd.zip
: list of individuals per population to be compared (bin/6.angsd_saf.sbatch
).AZ.list
: 12 individuals in the Arizona resident population.CA_n.list
: 10 individuals in the northern California resident population.FL_AL.list
: 18 individuals in the Eastern resident population.EAST_migrants.list
: 73 individuals from all migratory populations in the East, excluding North Carolina.WEST_migrants.list
: 74 individuals from all migratory populations in the West.
Populations_admixture.zip
: east and west datasets used in admixture analyses.popmap.east.txt
: 100 individuals from 11 localities.popmap.west.txt
: 96 individuals from 11 localities.
Admixture_plots.zip
:admixture_pophelper_plots.R
: script used to plot admixture results.east
: files from the east group used for plotting admixture results.COYE.chr_number.east_cv_summary_k1_k6.txt
: summary of CV error values..Q files
: Files that contain cluster assignments for each individual.west
: files from the west group used for plotting admixture results.COYE.chr_number.west_cv_summary_k1_k6.txt
: summary of CV error values..Q files
: Files that contain cluster assignments for each individual.
FST_outlier_plots.zip
: files and scripts used to calculate Fst within 50kb sliding windows, considering outlier regions that fell within the 99th percentile.AZ.West_migrants.COYE.baq.minInd4.fst.50kbwin.txt
AZ.West_migrants.COYE.baq.minInd4.realsfs.fst.siteA_B.out_unweighted.bed
CAn.West_migrants.COYE.baq.minInd4.fst.50kbwin.txt
CAn.West_migrants.COYE.baq.minInd4.realsfs.fst.siteA_B.out_unweighted.bed
FL_AL.East_migrants.COYE.baq.minInd4.fst.50kbwin.txt
FL_AL.East_migrants.COYE.baq.minInd4.realsfs.fst.siteA_B.out_unweighted.bed
MI.NB.COYE.baq.minInd4.fst.50kbwin.txt
MI.NB.COYE.baq.minInd4.realsfs.fst.siteA_B.out_unweighted.bed
WA.WY.COYE.baq.minInd4.fst.50kbwin.txt
WA.WY.COYE.baq.minInd4.realsfs.fst.siteA_B.out_unweighted.bed
99percentile.angsd.R
Selection_scan.zip
: This folder holds the scripts and results of the selection scan we used to estimate selection on positions in the east and west. The data files associated with this analysis are: east beagle files generated in angsd and the west beagle files generated in angsd.COYE.Plate1_4c.East.mafs.gz
: position file for selection scan of east COYE samples.COYE.Plate1_4c.East.sel_scan.selection.npy
: the result of the selection scan of east COYE samples.COYE.Plate1_4c.East.sel_scan.cov
COYE.Plate1_4c.West.mafs.gz
: position file for selection scan of west COYE samples.COYE.Plate1_4c.West.sel_scan.selection.npy
: the result of the selection scan of west COYE samples.COYE.Plate1_4c.West.sel_scan.cov
a.angsd_east_west.COYE.sh
b.selection_scan.COYE.sh
SelectionScanPCANGSD.Rmd
: This file contains the commands and packages to read Selection Scan results, such as.npy
files (e.g. npyLoad function in RcppCNPy R package).Manhattan_COYE.selection_scan.chr_order.R
Treemix.zip
: scripts and model used to convert the original vcf file into a readable file for treemix analysis using py-popgen, and the script to generate a phylogenetic reconstruction using.a.vcf_to_treemix.sbatch
b.treemix.sbatch
out.model
bin (Code/Software)
Processing scripts: These bash scripts are submitted to our HPC cluster, to process raw fastq files, map to a reference COYE.fa genome, call genotypes with GATK HaplotypeCaller or generate genotype probabilities with ANGSD, and filter called genotypes. These scripts also include analysis scripts for population structure analyses (srsStuff, admixture), FST outlier analyses, selection scan analyses and treemix analyses.
1.trimgalore.COYE.sbatch
: script to remove low quality fragments and trim adapters using trimgalore.2.map.COYEP1.bwa_trim.sbatch
: script to map paired reads to the Common Yellowthroat reference genome (data/reference_genome/COYE.fa
) with bwa 0.7.3a, add read groups using picard 2.23.2, remove PCR duplicates and merge bam files using samtools 1.11.0.3.coverage.sdepth.sbatch
: script to calculate coverage using samtools.4a.snpcalling_angsd.COYE.sbatch
: script to get genotype probabilities using angsd 0.935.4b.snpcalling_gatk.ALL.sbatch
: script to get genotype calls using GATK HaplotypeCaller.4c.mergevcf_nardfilter.COYE2.sbatch
: This script merges all region-specifc vcf files generated in GATK. The script then selects SNPs, and filters based on maf and missingness. Hard filters for Variant Quality Score Recalibration (VQSR) were implemented, resulting in a final clean called genotype dataset.5.allele_depth&srsStuff.Rmd
: script to obtain allele depths using bcftools, identify relatedness and perform a PCA using srsStuff.6.angsd_saf.sbatch
: script to generate site allele frequencies for each population to be compared (data/populations_angsd/
).7.angsd_sfs.sbatch
: script to determine global FST and identify regions potentially significant to migration strategy.8a.admixture_east.sbatch
: script used in admixture analysis for east group.8b.admixture_west.sbatch
: script used in admixture analysis for west group.