Data from: Genetic constraints on wing pattern variation in Lycaeides butterflies: a case study on mapping complex, multifaceted traits in structured populations
Lucas, Lauren K., Utah State University
Nice, Chris C., Texas State University
Gompert, Zach, Utah State University
Gompert, Zachariah, Utah State University
Published Feb 27, 2018 on Dryad.
Cite this dataset
Lucas, Lauren K.; Nice, Chris C.; Gompert, Zach; Gompert, Zachariah (2018). Data from: Genetic constraints on wing pattern variation in Lycaeides butterflies: a case study on mapping complex, multifaceted traits in structured populations [Dataset]. Dryad. https://doi.org/10.5061/dryad.fc827
Patterns of phenotypic variation within and among species can be shaped and constrained by trait genetic architecture. This is particularly true for complex traits, such as butterfly wing patterns, that consist of multiple elements. Understanding the genetics of complex trait variation across species boundaries is difficult, as it necessitates mapping in structured populations and can involve many loci with small or variable phenotypic effects. Here, we investigate the genetic architecture of complex wing pattern variation in Lycaeides butterflies as a case study of mapping multivariate traits in wild populations that include multiple nominal species or groups. We identify conserved modules of integrated wing pattern elements within populations and species. We show that trait covariances within modules have a genetic basis, and thus represent genetic constraints that can channel evolution. Consistent with this, we find evidence that evolutionary changes in wing patterns among populations and species occur mostly in the directions of genetic covariances within these groups. Thus, we show that genetic constraints affect patterns of biological diversity (wing pattern) in Lycaeides, and we provide an analytical template for similar work in other systems.
Genetic data (filtered vcf file)
This text file contains the filtered genetic data (SNP set) in variant call format (vcf). This included genetic data for 78,567 SNPs.
Variant filtering scripts
This compressed directory contains three perl scrips used for filtering and processing the genetic data (i.e., the vcf file). morefilter_filtered2x-70_varsLycGwa.vcf was generated by running the two filter scripts. Most filters described in the paper are impelemented in the main script, vcfFilter.pl, which was run first. The second script, filterSomeMore.pl applies maximum coverage filters and a filter to drop variants that are near each other (within 3 bp in this case). The final scrips, vcf2gl.pl, extracts the genotype likelihoods from the vcf file.
Variant calling script
Shell script used for variant calling with samtools and bcftools.
This perl script generates shell scripts to submit to a SLURM job scheduler to run bwa, which we used for DNA sequence alignments. Note that this depends on having a bwa module installed on a cluster running SLURM.
Lycaeides melissa reference genome
Reference genome for Lycaeides melissa as a fasta file.
Lycaeides melissa linkage map
Text file describing the L. melissa linkage map. There are three columns giving the linkage group (lg), scaffold number (scaf, these match the reference genome scaffold names), and position (pos) in centi Morgans along the relevant linkage group.
Gemma BSLMM infiles
This compressed directory contains the infiles for the genomic prediction/genome wide association mapping analysis. These are in BIMBAM format and include mean genotype (*geno*) and phenotype (*pheno*) files. We include files for size and position (*coord*) traits. Files are included for all biological levels we considered and for all groups: AN = L. anna, GNP = L-ID-GNP, ID = L. idas, JA = Jackson hole, ME = L. melissa east, MW = L. melissa west, RI = L. anna ricei, SIN = L-ME-SIN, SN = Sierra Nevada, WA - Warner Mt., and YBG = L-AN-YBG. Files without prefixes are for the species-comples level analysis. We have also included a perl wrapper script used to fit the BSLMM in gemma = forkRunGemmaPop.pl.
PCA infile and scripts
This compressed directory contains the phenotype infile (resid-sizeANDcoord-6vi17-subgroups-NoNA.csv) used for the PCA, as well as the PCA script PCA_sizeANDposition-final.R and an additional script defining the Bayesian model that was used for the 95% PC mean ellipses (fitEllipse.R).
This files contains the wing pattern data, including area and position measurements. Individual IDs and groups are given in the first and second column, respectively.
Genomic prediction script
Perl wrapper script (forks to run multiple jobs) to run the genomic prediction option in gemma. This is sued after the standard BSLMM has been fit.
Scripts for processing BSLMM output
This compressed directory contains a series of perl scripts used to summarize the output from gemma's BSLMM. The calpost* scripts summarize the hyperparameter estimates (e.g., pve, pge, etc.), get Bvs* extract breeding values from the standard BSLMM (based only on the polygenic term), getPrdtBvs* extract breeding values from genomic prediction that include SNPs with measurable effects, and grabCalsEffects* extract and summarize the SNP effect estimates. All summaries are across MCMC chains.
QLT summaries and scripts
This compressed directory contains three files. sortedCombinedPips.txt contains the SNP posterior inclusions probabilities for all traits and biological levels, averaged across MCMC chains. This is the needed input for the other two files/scripts. pipColocal.R quanties correlations in PIPs across traits, and plotPipLgs.R runs the QTL number/density analyses per LG, along with making some plots.
G- and P-matrix infiles and analyses
This compressed directory includes the genome estimated breeding values from genomic prediction (catbv*csv) for each groups, phenotypic data for P-matrixes (resid*) and a R script that runs analyses on these files, matcomp.R (which has annotations throughout). The R script runs the comparisons of P and G-matrixes and the evolvability/constraint analyses, as well as making related plots.