Data from: Genomic signatures of adaptation in native lizards exposed to human-introduced fire ants
Data files
Oct 23, 2024 version files 65.28 GB
-
AL_100kb_window_20kb_step.TajimaD
5.39 MB
-
AL_window_200_step100.saltilassi.out
23.37 MB
-
AR_100kb_window_20kb_step.TajimaD
5.23 MB
-
AR_window_200_step100.saltilassi.out
8.31 MB
-
filtered-three-pops-HC.vcf
3.11 GB
-
HC-gtALL.PE.bwa_mem.Sceloporus_undulatus_funannotate.passed.RG.sorted_passedSNPs.vcftoolsFiltered-2alleles-noIndels-hwe-geno-mind.vcf
12.79 GB
-
LSBL.AL.csv
3.25 GB
-
LSBL.AR.csv
3.34 GB
-
LSBL.TN.csv
3.25 GB
-
maf.imputed.all.vcf
35.10 GB
-
README.md
6.07 KB
-
SD_ALL_SCAFFS_passedSNPs.PE.bwa_mem.Sceloporus_undulatus_funannotate.passed.RG.sorted.vcftoolsFiltered.vcf
4.39 GB
-
TN_100kb_window_20kb_step.TajimaD
5.27 MB
-
TN_window_200_step100.saltilassi.out
11.23 MB
Abstract
Understanding the process of genetic adaptation in response to human-mediated ecological change will help elucidate the eco-evolutionary impacts of human activity. In the 1930s red imported fire ants (Solenopsis invicta) were accidently introduced to the Southeastern USA, where today they are both venomous predators and toxic prey to native eastern fence lizards (Sceloporus undulatus). Here, we investigate potential lizard adaptation to invasive fire ants by generating whole-genome sequences from 420 lizards across three populations: one with long exposure to fire ants, and two unexposed populations. Signatures of positive selection exclusive to the exposed population overlap immune system, growth factor pathways, and morphological development genes. Among invaded lizards, longer limbs (used to remove stinging ants) are associated with increased survival. We identify alleles associated with longer limbs that are highly differentiated from the unexposed populations, a pattern counter to the pre-invasion latitudinal cline for limb lengths based on museum specimens. While we cannot rule out other environmental differences between populations driving these patterns, these results do constitute plausible genetic adaptations in lizards invaded by fire ants.
README: Genomic signatures of adaptation in native lizards exposed to human-introduced fire ants
https://doi.org/10.5061/dryad.tht76hf50
This repository contains VCF files used for all analyses with their respective outputs.
HC-gtALL.PE.bwa_mem.Sceloporus_undulatus_funannotate.passed.RG.sorted_passedSNPs.vcftoolsFiltered-2alleles-noIndels-hwe-geno-mind.vcf:
High-coverage genotypes for three populations of fence lizards: 20 from Alabama (SD), 20 from Tennessee (EE), and 19 from Arkansas (SF). The raw SNPs were filtered with a series of thresholds recommended by GATK to keep only SNPs (SelectVariants) and perform filtering (VariantFiltration) using: quality score by depth (QD) < 2.0, Phred-scaled p-value using Fisher’s exact test (FS) > 60.0, and mapping quality score (MQ) < 40.0. SelectVariants was applied again to only keep SNPs that were not filtered out by VariantFiltration. The SNPs that remained were additionally filtered with VCFtools to keep only biallelic sites (min-alleles 2, max-alleles 2) and remove sites with insertions and deletions as well as filtering for Hardy-Weinberg Equilibrium (hwe 0.000001). PLINK (v1.9) was used with -geno 0.05 and -mind 0.1 flags to filter out variants with missing call rates, prior to downstream analyses.
filtered-three-pops-HC.vcf:
For demographic analyses. Same as the VCF file above, but filtered to remove all SNPs where an allele that is fixed in two of the populations is also the minor allele in the third population.
SD_ALL_SCAFFS_passedSNPs.PE.bwa_mem.Sceloporus_undulatus_funannotate.passed.RG.sorted.vcftoolsFiltered.vcf:
Low-coverage genotypes for 381 fence lizards from Alabama. The raw SNPs were filtered with a series of thresholds recommended by GATK to keep only SNPs (SelectVariants) and perform filtering (VariantFiltration) using: quality score by depth (QD) <2.0, Phred-scaled p-value using Fisher’s exact test (FS) >60.0, and mapping quality score (MQ) < 40.0. SelectVariants was applied again to only keep SNPs that were not filtered out by VariantFiltration. The SNPs that remained were additionally filtered with VCFtools to keep only biallelic sites (min-alleles 2, max-alleles 2) and remove sites with insertions and deletions as well as filtering for Hardy-Weinberg Equilibrium (hwe 0.001) and to remove sites with a minor allele frequency less than 0.05 to prevent inflation in downstream statistical estimates and imputation.
maf.imputed.all.vcf:
To improve genotyping rates for the LC dataset, we leveraged the 20 HC sequences from AL as a template for genomic imputation. To that end, we first used Shape-IT version 2.r837 (Delaneau et al. 2008) to phase the each of the 24 scaffolds of the 20 HC sequences and obtain haplotype files. These were used as reference for the imputation of 381 LC sequences. Imputation was performed with Beagle 5.2 (Browning et al. 2018) using a window size of 100 and overlap of 10. Prior to the genome-wide genotype-phenotype association analysis that was conducted with the n=381 AL lizard dataset, we removed genotypes with minor allele frequencies < 0.05, leaving 4,245,544 SNPs.
*.TajimaD:
These were calculated in 100Kb windows with a 20Kb step using VCF-Kit (Cook & Andersen, 2017) for the AL (SD), TN (EE), and AR (SF) populations. Headers indicate:
CHROM: which genomic scaffold the selected genomic window is aligned with; BIN_START: starting nucleotide position for 100Kb window; BIN_END: ending nucleotide position for 100Kb window; N_sites: number of varying nucleotide sites within window; N_SNPs: number of single nucleotide polymorphisms within window; TajimaD: Tajima's D statistic as defined in Tajima (1989).
*saltilassi.out:
These were calculated in 200 SNP windows with a 100 SNP step using lassip (DeGiorgio & Szpiech, 2022) for the AL (SD), TN (EE), and AR (SF) populations. Headers indicate:
chr: which genomic scaffold the selected genomic window is aligned with; start: starting nucleotide position for 200 SNP window; end: ending nucleotide position for 200 SNP window; SNP: number of SNPs in window; pos: (start + end)/2; *_nhaps: total number of haplotypes in window; *_uhaps: number of unique haplotypes in window; *_h12: H12 statistic; *_h2h1: H2/H1 statistic; *_m: number of sweeping haplotypes; *_A: LASSI statistic; *_L: saltiLASSI statistic (DeGiorgio & Szpiech, 2022).
LSBL*:
Locus-specific branch lengths (Shriver et al., 2004) calculated using a custom script (https://github.com/braulioassis/sce-sol/blob/main/lsbl.R) for each focal population. Weir & Cockerham FST was calculated using VCFtools (Danecek et al., 2011). Headers indicate:
CHROM: which genomic scaffold the selected genomic window is aligned with; POS: genomic position that the statistic is calculated for; *LSBL: LSBL statistic for that population.
Sharing/Access information
Raw sequences are available on NCBI SRA BioProject: PRJNA656311.
Scripts are available at https://github.com/braulioassis/sce-sol
References
Cook DE, Andersen EC. 2017. VCF-kit: assorted utilities for the variant call format. Bioinformatics. 33(10):1581–1582. doi:10.1093/bioinformatics/btx011.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. 2011. The variant call format and VCFtools. Bioinformatics. 27(15):2156–2158. doi:10.1093/bioinformatics/btr330.
DeGiorgio M, Szpiech ZA. 2022. A spatially aware likelihood test to detect sweeps from haplotype distributions. PLOS Genetics 18(4): e1010134.
Shriver MD, Kennedy GC, Parra EJ, Lawson HA, Sonpar V, Huang J, Akey JM, Jones KW. 2004. The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs. Hum Genomics. 1(4):274–286.
Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 123:585–595.