Data from the paper: The genetics, evolution, and maintenance of a biological rock-paper-scissors game
Data files
Nov 04, 2025 version files 1.11 GB
-
Allele_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo1996.txt
4.51 KB
-
Allele_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo2001.txt
4.66 KB
-
bb_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
220.86 MB
-
bb_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
41.91 MB
-
Best_oo_bb_yy_alignment_w_all_gaps_no_scaffold2135_no_Revio_ref_no_LB367_10-1-24.fasta
2.27 MB
-
Best_oo_bb_yy_alignment_w_all_gaps_no_scaffold2135_no_Revio_ref_no_LB367_10-1-24.fasta_snps_LD_output_no.r2.limit.txt
25.07 MB
-
Blue_and_yellow_morph_with_genotypes_exact_test_final.R
371 B
-
Contig_lengths_Uta_Revio_genome_2-6-24_UTA.bp.p_ctg.txt
5.28 KB
-
Fig_1B_S3_S4_Uta_Revio_scaffold_plots_final.R
30.14 KB
-
Fig_1D_S6_Lastz_dotplot_final.R
12.14 KB
-
Fig_2ABC_S7AB_Fixed_difs_shared_polymorphism_variation_and_genetic_diversity_across_haplotypes_final.R
20.85 KB
-
Fig_2D_Genetic_diversity_plot_final.R
2.40 KB
-
Fig_2E_S9_Uta_male_RNAseq_data.csv
8.57 KB
-
Fig_2E_S9_Uta_SPR_region_RNAseq_final.R
11.88 KB
-
Fig_2F_S9_Uta_SPR_haplotype_specific_expression_final.R
12.82 KB
-
Fig_2G_S9i_S10_Mapping_morphs_on_Pacbio_tree_final.R
6.02 KB
-
Fig_3AB_S11_Correlations_among_morphs_final.R
8.80 KB
-
Fig_3AB_S11_Malegame_w_3_allele_model_stochastic_to_get_number_of_each_morph.R
10.20 KB
-
Fig_3C_Two_allele_model_with_plasticity_final.R
6.49 KB
-
Fig_3D_S12_Two_allele_model_with_plasticity_varying_the_proportion_of_yellow_morphs_final.R
8.40 KB
-
Fig_3E_Fig_S14_Two_allele_model_with_semi-optimized_p_Sinervo1996_payoff_matrix.R
14.52 KB
-
Fig_3F_S16_Plotting_stochastic_trials_and_alleles_lost.R
16.27 KB
-
Fig_3F_S16_Three_allele_model_with_genetic_drift_and_varying_population_size.R
10.50 KB
-
Fig_3F_S16_Two_allele_model_with_fixed_p_with_genetic_drift_and_varying_population_size.R
17.39 KB
-
Fig_3F_S16_Two_allele_model_with_optimized_p_with_genetic_drift_and_varying_population_size_Sinervo_2001_payoff_matrix.R
8.88 KB
-
Fig_S13_Two_allele_model_with_optimized_p_Sinervo_1996_payoff_matrix.R
13.86 KB
-
Fig_S13_Two_allele_model_with_optimized_p_Sinervo_2001_payoff_matrix.R
12.48 KB
-
Fig_S14_Three_allele_model_deterministic_version.R
9.57 KB
-
Fig_S14_Two_allele_model_with_semi-optimized_p_Sinervo2001_payoff_matrix.R
11.93 KB
-
Fig_S15_Uta_ESS_plot_final.R
10.05 KB
-
Fig_S16_Two_allele_model_with_optimized_p_with_genetic_drift_and_varying_population_size_Sinervo_1996_payoff_matrix.R
8.92 KB
-
Fig_S1E_F_Uta_spec_data_analyses_final.R
7.39 KB
-
Fig_S2_Genome_PCA_analyses_99individuals_final.R
5.24 KB
-
Fig_S7_Morphology_by_morph_and_genotype_final.R
6.46 KB
-
Fig_S8_LD_heatmap_Uta_amplicon_data_imported_LD_values_final.R
4.29 KB
-
NGSrelate_output_for_oo_bb_yy_Revio_map_maf_at_least_0.4_and_100_individuals_rab_greater_than_0.01.xlsx
113.78 KB
-
oo_bb_BGIgenome_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
208.47 MB
-
oo_bb_BGIgenome_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
41.86 MB
-
oo_bb_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
222.20 MB
-
oo_bb_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
41.93 MB
-
oo_bb_yy_Revio_mapping_w_standard_filters_but_no_baq_no_LB367_Folded_first_SFS_all_contigs_thetasWindow10000_step10000.pestPG
37.92 MB
-
oo_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
220.99 MB
-
oo_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
41.90 MB
-
PCA_info_for_genome_samples_no_LB367.txt
2.52 KB
-
PCA_w_filters_oo_bb_yy_Revio_map_noLB367_10-4-24_output.covar
96.75 KB
-
Phenotype_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo1996.txt
6.32 KB
-
Phenotype_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo2001.txt
6.55 KB
-
ptg000029l_reverse_complement_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot_paired_down_to_15000bp_upstream_5000bp_downstream_Uta_SPR_CDS_region.txt
11.14 KB
-
ptg000029l_reverse_complement_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot.txt
514.54 KB
-
RAxML_bestTree.SPR_oo_bb_yy_haplotypes_noLB367_rooted.tre
16.54 KB
-
RAxML_bestTree.SPR_oo_bb_yy_haplotypes_transcriptome_individuals.rooted.tre
4.35 KB
-
README.md
52.05 KB
-
scaffold2135_improved_with_Pacbio_data_for_lastz_alignment_to_Sceloporus_undulatus_chromosome5_192000...56526.1.strandplus.rdotplot_paired_down_to_15000bp_upstream_5000bp_downstream_Uta_SPR_CDS_region.txt
8.66 KB
-
scaffold2135_improved_with_Pacbio_data_for_lastz_alignment_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot.txt
171.63 KB
-
scaffold2135_to_ptg000029l_each_10kb_around_SPR_region.strandplus.rdotplot.txt
5.16 KB
-
scaffold2135_to_ptg000029l_Revio_scaffold_reverse_complement.strandplus.rdotplot2.txt
34.24 KB
-
SPR_CDS_diploid_sequences_name_and_ID.txt
1.29 KB
-
SPR_exon1_amino_acid_alignment_diploid_data.fasta
28.79 KB
-
SPR_genotypes_for_yellow_morphs.txt
395 B
-
SPR_oo_bb_yy_haplotypes_noLB367.fasta
1.01 MB
-
SPR_oo_bb_yy_haplotypes_transcriptome_individuals.fasta
273 KB
-
SPR_oo_bb_yy_haplotypes.fasta.names.and.morph.small.tree.csv
5.11 KB
-
SPR_oo_bb_yy_haplotypes.fasta.names.and.morph.w.genotypes_noLB367.csv
16.96 KB
-
Stochastic_trials_w_alleles_lost.zip
19.82 KB
-
Three_allele_model_N250_t10000_interval100_morph_numbers_and_frequency.txt
4.68 KB
-
Uta_color_data_2013_2019_paired_down.csv
17.77 KB
-
Uta_DNAseq_samples_with_SRA_accession_numbers.xlsx
33.06 KB
-
Uta_field_data_on_morphs.csv
3.69 KB
-
Uta_PacBio_amplicon_metadata_and_Genbank_ID.xlsx
28.75 KB
-
Uta_RNAseq_data_haplotype_specific_expression_1-14-25_long_form.csv
7.03 KB
-
Uta_RNAseq_data_haplotype_specific_expression_1-14-25_skin_comparison.csv
2.08 KB
-
Uta_RNAseq_data_haplotype_specific_expression_1-14-25.csv
4.06 KB
-
Uta_RNAseq_samples_with_SRA_accession_numbers.xlsx
24.39 KB
-
Uta_SPR_exon1_CDS_analyses_final.R
6.45 KB
-
Uta_stansburiana_size_morph_and_genotype_data.csv
5.46 KB
Nov 19, 2025 version files 1.11 GB
-
Allele_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo1996.txt
4.51 KB
-
Allele_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo2001.txt
4.66 KB
-
bb_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
220.86 MB
-
bb_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
41.91 MB
-
Best_oo_bb_yy_alignment_w_all_gaps_no_scaffold2135_no_Revio_ref_no_LB367_10-1-24.fasta
2.27 MB
-
Best_oo_bb_yy_alignment_w_all_gaps_no_scaffold2135_no_Revio_ref_no_LB367_10-1-24.fasta_snps_LD_output_no.r2.limit.txt
25.07 MB
-
Blue_and_yellow_morph_with_genotypes_exact_test_final.R
371 B
-
Contig_lengths_Uta_Revio_genome_2-6-24_UTA.bp.p_ctg.txt
5.28 KB
-
Fig_1B_S3_S4_Uta_Revio_scaffold_plots_final.R
30.14 KB
-
Fig_1D_S6_Lastz_dotplot_final.R
12.14 KB
-
Fig_2ABC_S7AB_Fixed_difs_shared_polymorphism_variation_and_genetic_diversity_across_haplotypes_final.R
20.85 KB
-
Fig_2D_Genetic_diversity_plot_final.R
2.40 KB
-
Fig_2E_S9_Uta_male_RNAseq_data.csv
8.57 KB
-
Fig_2E_S9_Uta_SPR_region_RNAseq_final.R
11.88 KB
-
Fig_2F_S9_Uta_SPR_haplotype_specific_expression_final.R
12.82 KB
-
Fig_2G_S9i_S10_Mapping_morphs_on_Pacbio_tree_final.R
6.02 KB
-
Fig_3AB_S11_Correlations_among_morphs_final.R
8.80 KB
-
Fig_3AB_S11_Malegame_w_3_allele_model_stochastic_to_get_number_of_each_morph.R
10.20 KB
-
Fig_3C_Two_allele_model_with_plasticity_final.R
6.49 KB
-
Fig_3D_S12_Two_allele_model_with_plasticity_varying_the_proportion_of_yellow_morphs_final.R
8.40 KB
-
Fig_3E_Fig_S14_Two_allele_model_with_semi-optimized_p_Sinervo1996_payoff_matrix.R
14.52 KB
-
Fig_3F_S16_Plotting_stochastic_trials_and_alleles_lost.R
16.27 KB
-
Fig_3F_S16_Three_allele_model_with_genetic_drift_and_varying_population_size.R
10.50 KB
-
Fig_3F_S16_Two_allele_model_with_fixed_p_with_genetic_drift_and_varying_population_size.R
17.39 KB
-
Fig_3F_S16_Two_allele_model_with_optimized_p_with_genetic_drift_and_varying_population_size_Sinervo_2001_payoff_matrix.R
8.88 KB
-
Fig_S13_Two_allele_model_with_optimized_p_Sinervo_1996_payoff_matrix.R
13.86 KB
-
Fig_S13_Two_allele_model_with_optimized_p_Sinervo_2001_payoff_matrix.R
12.48 KB
-
Fig_S14_Three_allele_model_deterministic_version.R
9.57 KB
-
Fig_S14_Two_allele_model_with_semi-optimized_p_Sinervo2001_payoff_matrix.R
11.93 KB
-
Fig_S15_Uta_ESS_plot_final.R
10.05 KB
-
Fig_S16_Two_allele_model_with_optimized_p_with_genetic_drift_and_varying_population_size_Sinervo_1996_payoff_matrix.R
8.92 KB
-
Fig_S1E_F_Uta_spec_data_analyses_final.R
7.39 KB
-
Fig_S2_Genome_PCA_analyses_99individuals_final.R
5.24 KB
-
Fig_S7_Morphology_by_morph_and_genotype_final.R
6.46 KB
-
Fig_S8_LD_heatmap_Uta_amplicon_data_imported_LD_values_final.R
4.29 KB
-
NGSrelate_output_for_oo_bb_yy_Revio_map_maf_at_least_0.4_and_100_individuals_rab_greater_than_0.01.xlsx
113.78 KB
-
oo_bb_BGIgenome_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
208.47 MB
-
oo_bb_BGIgenome_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
41.86 MB
-
oo_bb_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
222.20 MB
-
oo_bb_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
41.93 MB
-
oo_bb_yy_Revio_mapping_w_standard_filters_but_no_baq_no_LB367_Folded_first_SFS_all_contigs_thetasWindow10000_step10000.pestPG
37.92 MB
-
oo_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
220.99 MB
-
oo_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
41.90 MB
-
PCA_info_for_genome_samples_no_LB367.txt
2.52 KB
-
PCA_w_filters_oo_bb_yy_Revio_map_noLB367_10-4-24_output.covar
96.75 KB
-
Phenotype_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo1996.txt
6.32 KB
-
Phenotype_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo2001.txt
6.55 KB
-
ptg000029l_reverse_complement_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot_paired_down_to_15000bp_upstream_5000bp_downstream_Uta_SPR_CDS_region.txt
11.14 KB
-
ptg000029l_reverse_complement_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot.txt
514.54 KB
-
RAxML_bestTree.SPR_oo_bb_yy_haplotypes_noLB367_rooted.tre
16.54 KB
-
RAxML_bestTree.SPR_oo_bb_yy_haplotypes_transcriptome_individuals.rooted.tre
4.35 KB
-
README.md
52.05 KB
-
scaffold2135_improved_with_Pacbio_data_for_lastz_alignment_to_Sceloporus_undulatus_chromosome5_192000...56526.1.strandplus.rdotplot_paired_down_to_15000bp_upstream_5000bp_downstream_Uta_SPR_CDS_region.txt
8.66 KB
-
scaffold2135_improved_with_Pacbio_data_for_lastz_alignment_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot.txt
171.63 KB
-
scaffold2135_to_ptg000029l_each_10kb_around_SPR_region.strandplus.rdotplot.txt
5.16 KB
-
scaffold2135_to_ptg000029l_Revio_scaffold_reverse_complement.strandplus.rdotplot2.txt
34.24 KB
-
SPR_CDS_diploid_sequences_name_and_ID.txt
1.29 KB
-
SPR_exon1_amino_acid_alignment_diploid_data.fasta
28.79 KB
-
SPR_genotypes_for_yellow_morphs.txt
395 B
-
SPR_oo_bb_yy_haplotypes_noLB367.fasta
1.01 MB
-
SPR_oo_bb_yy_haplotypes_transcriptome_individuals.fasta
273 KB
-
SPR_oo_bb_yy_haplotypes.fasta.names.and.morph.small.tree.csv
5.11 KB
-
SPR_oo_bb_yy_haplotypes.fasta.names.and.morph.w.genotypes_noLB367.csv
16.96 KB
-
Stochastic_trials_w_alleles_lost.zip
19.82 KB
-
Three_allele_model_N250_t10000_interval100_morph_numbers_and_frequency.txt
4.68 KB
-
Uta_color_data_2013_2019_paired_down.csv
17.77 KB
-
Uta_DNAseq_samples_with_SRA_accession_numbers.xlsx
33.22 KB
-
Uta_field_data_on_morphs.csv
3.69 KB
-
Uta_PacBio_amplicon_metadata_and_Genbank_ID.xlsx
28.75 KB
-
Uta_RNAseq_data_haplotype_specific_expression_1-14-25_long_form.csv
7.03 KB
-
Uta_RNAseq_data_haplotype_specific_expression_1-14-25_skin_comparison.csv
2.08 KB
-
Uta_RNAseq_data_haplotype_specific_expression_1-14-25.csv
4.06 KB
-
Uta_RNAseq_samples_with_SRA_accession_numbers.xlsx
24.39 KB
-
Uta_SPR_exon1_CDS_analyses_final.R
6.45 KB
-
Uta_stansburiana_size_morph_and_genotype_data.csv
5.46 KB
Abstract
Side-blotched lizards (Uta stansburiana) play a biological rock-paper-scissors game in which three differently colored male morphs utilize alternative mating strategies. This polymorphsim was previously posited to arise from three alleles at one locus. We identified the genetic basis of this polymorphism, using genome-wide association studies. Orange usurper and blue mate-guarder morphs were associated with two divergent haplotypes in the regulatory region of the sepiapterin reductase gene, but yellow sneaker morphs appear to arise via phenotypic plasticity from the same genetic background as blue-morphs. We aligned genome assemblies for an orange and blue-morph to determine that the associated genetic region was not in a chromosomal inversion. We studied the genetic diversity of the morph-associated region and found that it had elevated nucleotide diversity, consistent with balancing selection. We used RNA-seq data to show that there were difference in gene expression among the morphs. We tested the predictions of the two and three-allele models against data from field surveys of the polymorphism and found that the two-allele model provided a better fit to the field data.
We updated the models of the maintenance of the polymorphism to fit our new genetic understanding of the system. We found that a variety of different models maintained the polymorphism, some of which gave rise to frequency-cycles. We also conducted a game theoretical analysis of the conditions that lead to a stable polymorphism. In addition, we added genetic drift to our population genetic models to study how well they are able to maintain the polymorphism. Our simulations showed that rock-paper-scissors dynamics can better maintain a polymorphism with a genetic system of two-alleles plus plasticity than with a three-allele system. This form of balancing selection that combines genetic determination with phenotypic plasticity expands the possibilities for how stable polymorphisms arise in nature.
Dataset DOI: 10.5061/dryad.4qrfj6qq7
Description of the data and file structure
We investigated the genetic basis of the three throat-color morphs (orange, blue, and yellow) found in side-blotched lizards that correspond to different male mating strategies. To do this, we conducted analyses of the morph phenotypes, genetic association studies, comparisons between different reference genomes, gene expression (RNAseq) analyses, analyses of genetic diversity, analyses of recombination, phylogenetic analyses, and analyses of the frequencies of the morphs in the field. We found that a single region upstream of the gene for sepiapterin reductase (SPR) is associated with the morph phenotypes. However, blue and yellow-morphs appear to arise via phenotypic plasticity from the same genetic background. We also conducted multiple simulations of the maintenance of the polymorphism based on our new understanding that the three color morphs arise from both genetic differences and phenotypic plasticity. We found that the polymorphism is better maintained with our new model of two alleles with plasticity than with previously proposed three-allele model.
Files and variables
The data and code for analyzing them have been split into general categories of analysis. The categories are: 1) analyses of the morph phenotypes, 2) tests of correlations among morphs, 3) genetic data and analyses, 4) genomic association studies, 5) the alignment of genomic contigs/scaffolds, 6) gene expression (RNA-seq) data, 7) data for phylogenetic analyses and visualization of data on the phylogeny, 8) simulations using different models for the maintenance of the morphs, and 9) game theory analysis of the maintenance of the morphs.
Files and Folders
#------------------------------
Data for analyses of the morph phenotypes
SPR_genotypes_for_yellow_morphs.txt
This is data on the yellow-morph lizards that had data collected on them with a spectrometer. The columns are: ID = field code of the lizard, morph.score = a yellow morph (yy), spr.genotype = genotype of the SPR region, spr.b.haplotype = whether or not the lizard had a B-haplotype, tube.number = the lab code of the lizard, collection.group = the set of samples in which the color measurements were taken. The file is analyzed by the script: Fig_S1E_F_Uta_spec_data_analyses_final.R
Uta_color_data_2013_2019_paired_down.csv
This is a data file of the color measurements made on the throats of the lizards. The columns are: filenames = name of the original file,Year = year the data was collected, Lizard_code = the field code of the lizard, Sex = the sex of the lizard, Morph = the throat color morph, Throat_region = the throat region that the spec measurement was taken on (see Fig S1D of the paper), Spec_type = which spectrometer was used for the measurements, Observer = setting of the spectrometer, Illuminant = setting of the spectrometer, Color.Mode = setting of the spectrometer, and the last six variables are the color measurements in Commission Internationale de l'Éclairage (CIE) XYZ and xyz color space. The file is analyzed by the script: Fig_S1E_F_Uta_spec_data_analyses_final.R
Uta_stansburiana_size_morph_and_genotype_data.csv
This is a data file of body size information for the Uta stanburiana morphs. The columns in the file are: Name = field sample name, DNA_tube = lab sample name, Year = year the sample was collected, Date = date of sample collection, Sex = sex of the individual, SVL = snout vent length in millimeters, Weight = weight in grams, Morph = throat color morph, SPR_region_genotype = genotype of the SPR-region haplotypes. It is analyzed by the script:
Fig_S7_Morphology_by_morph_and_genotype_final.R
#--------------------------------
Data for tests of correlations among morphs
Three_allele_model_N250_t10000_interval100_morph_numbers_and_frequency.txt
This file is used for predictions of morph frequencies based on the three allele model. This file was generated with the script: Fig_3AB_S11_Malegame_w_3_allele_model_stochastic_to_get_number_of_each_morph.R. The simulations of the three allele model were done with a population size of 250 individuals. It is analyzed by: Fig_3AB_S11_Correlations_among_morphs_final.R
Uta_field_data_on_morphs.csv
This is field data for the morphs and their frequencies first reported in (Friedman et al. 2017). The columns of the data are: World = the rock outcrop population of lizards that was sampled, year = the year of sampling, the next six columns score the number of individuals with different color phenotypes with oo = an all orange throat, ob = an orange and blue throat, yo = a yellow and orange throat, bb = an all blue throat, by = blue and yellow throat, and yy = an all yellow throat. The next three columns are the numbers of each of the three morphs with N_orange_morph = oo, ob, + oy phenotypes; N_blue_morph = bb phenotype, and N_yellow_morph = yy + by phenotypes, N_territorial = the number of territorial individuals, which is N_blue_morph + N_orange_morph, Freq_O_Morph = frequency of the orange morph, Freq_B_Morph = frequency of the blue morph, Freq_Y_Morph = frequency of the yellow morph, N_males = the total number of males. It is analyzed by: Fig_3AB_S11_Correlations_among_morphs_final.R
#-------------------------------------
Genetic data and data for genetic analyses
Best_oo_bb_yy_alignment_w_all_gaps_no_scaffold2135_no_Revio_ref_no_LB367_10-1-24.fasta
This is an alignment of the SPR haplotypes. It is analyzed by the script: Fig_2ABC_S7AB_Fixed_difs_shared_polymorphism_variation_and_genetic_diversity_across_haplotypes_final.R
SPR_CDS_diploid_sequences_name_and_ID.txt
This provide information on the morph and ID associated with the exon1 amino acid sequences. It has two columns: morph = color morph with bb = blue morph, oo = orange morph, and yy = yellow morph, ID = the lab ID of the sample. This file is analyzed by: Uta_SPR_exon1_CDS_analyses_final.R
SPR_exon1_amino_acid_alignment_diploid_data.fasta
This is an alignment of SPR exon1 amino acid from 27 orange and 32 blue-morphs that each had data for two PacBio haplotypes, which allowed us to score the diploid genotypes of the structural variants for these individuals. This file is analyzed by: Uta_SPR_exon1_CDS_analyses_final.R
Best_oo_bb_yy_alignment_w_all_gaps_no_scaffold2135_no_Revio_ref_no_LB367_10-1-24.fasta_snps_LD_output_no.r2.limit.txt
This is output from quickLD (87) which gave the r2 measure of linkage disequilibrium that was calculated in with ploidy set to haploid as appropriate for the haplotype data of the SPR-haplotypes. Linkage disequilibrium was only measured for biallelic SNPs and the data was filtered to include only SNPs with a minimum frequency of 0.2. The columns of relevance in the file are POS2 = position of one of the polymorphic positions, POS1 = position of the other polymorphic position, freq.2. = frequency of the POS2 polymorphism, freq.1. = frequency of the POS1 polymorphism, LD = linkage disequilibrium value with the r2 metric. This file is analyzed by the script: Fig_S8_LD_heatmap_Uta_amplicon_data_imported_LD_values_final.R
NGSrelate_output_for_oo_bb_yy_Revio_map_maf_at_least_0.4_and_100_individuals_rab_greater_than_0.01.xlsx
This is output from the program NGSrelate (Korneliussen and Moltke 2015), which was used to calculate relatedness among all the genome resequencing samples. Information on the columns can be found at. https://github.com/ANGSD/NgsRelate. The data has been sorted so that the most related pairs are at the top of the matrix according to the rab metric, which is the pairwise relatedness. The file has been paired down to only include comparisons with rab values 0.01 and greater.
oo_bb_yy_Revio_mapping_w_standard_filters_but_no_baq_no_LB367_Folded_first_SFS_all_contigs_thetasWindow10000_step10000.pestPG
This is data on genetic diversity for the whole genome as output by ANGSD (Korneliussen et al. 2014). It is analyzed by: Fig_2D_Genetic_diversity_plot_final.R A description of the output and the columns in it can be found at: https://www.popgen.dk/angsd/index.php/Thetas,Tajima,Neutrality_tests
PCA_info_for_genome_samples_no_LB367.txt
This file provides information on the samples analyzed in a Principal Component Analysis (PCA) performed on the genomic resequencing data of 99 samples (i.e. 44 orange, 45 blue, and 10 yellow-morphs) that was conducted with the ngsCovar function of ngsTools (Fumagalli et al. 2014). It is analyzed by the script Fig_S2_Genome_PCA_analyses_99individuals_final.r. The fields in the file are: 1) library = the lab name of the genetic sample, 2) morph = a two letter code for the type of color morph with bb = blue, oo = orange, yy = yellow, 3) tube = a shortened version of the library name, 4) year_collected = the year the sample was collected, and 5) morph_color = the color morph phenotype assocatiated with the sample.
PCA_w_filters_oo_bb_yy_Revio_map_noLB367_10-4-24_output.covar
This is the output of a Principal Component Analysis (PCA) performed on the genomic resequencing data of 99 samples (i.e. 44 orange, 45 blue, and 10 yellow-morphs) that was conducted with the ngsCovar function of ngsTools (Fumagalli et al. 2014). It is analyzed by the script Fig_S2_Genome_PCA_analyses_99individuals_final.r.
#------------------------------------
Data used in the genomic association studies
Contig_lengths_Uta_Revio_genome_2-6-24_UTA.bp.p_ctg.txt
This is information on the size of the orange morph genome’s contigs. It is analyzed by Uta_Revio_scaffold_plots_final2.r. The columns in the file are: Scaffold_name = name of the contig in the orange morph genome, Length = size of the contig, Original_scaffold_number = number associated with the original contig, Original_order = order of the original contig, and Scaffold_size_ordered = new order of the contigs based on size, with the contigs ordered from large to small.
bb_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
This is the output of the ANGSD association test of blue versus yellow morphs with data mapped to the orange-morph genome. It is analyzed by Uta_Revio_scaffold_plots_final2.r. The columns in the data are Chromosome = the contig/scaffold name, Position = position on that contig/scaffold, Major = major allele, Minor = minor allele, Frequency = frequency of the minor allele, and LRT = likelihood ratio test statistic of the association.
bb_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
This is the output of the ANGSD association test of blue versus blue morphs with data mapped to the orange-morph genome but paired down to the top million SNPs for ease of plotting. It is analyzed by Uta_Revio_scaffold_plots_final2.r. The columns in the data are Chromosome = the contig/scaffold name, Position = position on that contig/scaffold, Major = major allele, Minor = minor allele, Frequency = frequency of the minor allele, and LRT = likelihood ratio test statistic of the association.
oo_bb_BGIgenome_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
This is the output of the ANGSD association test of orange versus blue morphs with data mapped to the blue-morph genome. It is analyzed by Uta_Revio_scaffold_plots_final2.r. The columns in the data are Chromosome = the contig/scaffold name, Position = position on that contig/scaffold, Major = major allele, Minor = minor allele, Frequency = frequency of the minor allele, and LRT = likelihood ratio test statistic of the association.
oo_bb_BGIgenome_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
This is the output of the ANGSD association test of orange versus blue morphs with data mapped to the blue-morph genome but paired down to the top million SNPs for ease of plotting. It is analyzed by Uta_Revio_scaffold_plots_final2.r. The columns in the data are Chromosome = the contig/scaffold name, Position = position on that contig/scaffold, Major = major allele, Minor = minor allele, Frequency = frequency of the minor allele, and LRT = likelihood ratio test statistic of the association.
oo_bb_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
This is the output of the ANGSD association test of orange versus blue morphs with data mapped to the orange-morph genome. It is analyzed by Uta_Revio_scaffold_plots_final2.r. The columns in the data are Chromosome = the contig/scaffold name, Position = position on that contig/scaffold, Major = major allele, Minor = minor allele, Frequency = frequency of the minor allele, and LRT = likelihood ratio test statistic of the association.
oo_bb_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
This is the output of the ANGSD association test of orange versus blue morphs with data mapped to the orange-morph genome but paired down to the top million SNPs for ease of plotting. It is analyzed by Uta_Revio_scaffold_plots_final2.r. The columns in the data are Chromosome = the contig/scaffold name, Position = position on that contig/scaffold, Major = major allele, Minor = minor allele, Frequency = frequency of the minor allele, and LRT = likelihood ratio test statistic of the association.
oo_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
This is the output of the ANGSD association test of orange versus yellow morphs with data mapped to the orange-morph genome. It is analyzed by Uta_Revio_scaffold_plots_final2.r. The columns in the data are Chromosome = the contig/scaffold name, Position = position on that contig/scaffold, Major = major allele, Minor = minor allele, Frequency = frequency of the minor allele, and LRT = likelihood ratio test statistic of the association.
oo_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
This is the output of the ANGSD association test of orange versus yellow morphs with data mapped to the orange-morph genome but paired down to the top million SNPs for ease of plotting. It is analyzed by Uta_Revio_scaffold_plots_final2.r. The columns in the data are Chromosome = the contig/scaffold name, Position = position on that contig/scaffold, Major = major allele, Minor = minor allele, Frequency = frequency of the minor allele, and LRT = likelihood ratio test statistic of the association.
#-------------------------------------
Data used from the alignment of genomic contigs/scaffolds
scaffold2135_to_ptg000029l_Revio_scaffold_reverse_complement.strandplus.rdotplot2.txt
This is the output of the program Lastz (Harris 2007) comparing the full blue-morph genome scaffold 2135 to the full contig ptg000029l in the orange-morph genome. It is analyzed by the script: Fig_1D_S6_Lastz_dotplot_final.R
scaffold2135_to_ptg000029l_each_10kb_around_SPR_region.strandplus.rdotplot.txt
This is the output of the program Lastz (Harris 2007) comparing the blue-morph genome scaffold 2135 to the contig ptg000029l in the orange-morph genome. It is restricted to the SPR region data with 10000 bp from the SPR CDS and the beginning of the divergent region. It is analyzed by the script: Fig_1D_S6_Lastz_dotplot_final.R
ptg000029l_reverse_complement_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot.txt
This is the output of the program Lastz (Harris 2007) comparing the Uta stansburiana orange morph genome contig with SPR to a 1 MB region from Sceloporus undulatus with the SPR gene. The 1 MB region in the Sceloporus undulatus genome contains SPR at positions 627655-632064. The SPR gene is at positions 192,627,655-192,632,06 in the full Chromosome 5 of the Sceloporus genome, so 192 MB has to be added to the coordinates of the 1 MB region to get the position in Chromosome 5. It is analyzed by the script: Fig_1D_S6_Lastz_dotplot_final.R
ptg000029l_reverse_complement_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot_paired_down_to_15000bp_upstream_5000bp_downstream_Uta_SPR_CDS_region.txt
This is the output of the program Lastz (Harris 2007) comparing the Uta stansburiana orange morph genome contig with SPR to a 1 MB region from Sceloporus undulatus with the SPR gene. The alignment has been paired down to 15000 bp upstream (to account for ~10000 upstream associated region) and 5000 bp downstream of the SPR CDS. It is analyzed by the script: Fig_1D_S6_Lastz_dotplot_final.R
scaffold2135_improved_with_Pacbio_data_for_lastz_alignment_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot.txt
This is the output of the program Lastz (Harris 2007) comparing the Uta stansburiana blue-morph genome scaffold 2135 (that has SPR) with two gaps filled in by the PacBio amplicon data that has been aligned to the Sceloporus undulatus genome, with the 1 MB region containing SPR. It is analyzed by the script: Fig_1D_S6_Lastz_dotplot_final.R
scaffold2135_improved_with_Pacbio_data_for_lastz_alignment_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot_paired_down_to_15000bp_upstream_5000bp_downstream_Uta_SPR_CDS_region.txt
This is the output of the program Lastz (Harris 2007) comparing the Uta stansburiana blue-morph genome scaffold 2135 with SPR to a 1 MB region from Sceloporus undulatus with the SPR gene. The alignment has been paired down to 15000 bp upstream (to account for ~10000 upstream associated region) and 5000 bp downstream of the SPR CDS. It is analyzed by the script: Fig_1D_S6_Lastz_dotplot_final.R
#-------------------------------------
Gene expression (RNA-Seq) data
Fig_2E_S9_Uta_male_RNAseq_data.csv
This is the RNAseq data for the side-blotched lizard morphs. The columns in the file are: Sample = a long sample name, Number_of_input_reads = the total number of reads in the RNA data for that individual, Uniquely_mapped_reads_number = the number of reads that mapped uniquely, SPR = the number of reads mapping to the SPR gene, SPR_fragments_per_million = the SPR column divided by Uniquely_mapped_reads_number and then times 1 million (this gives a standardized metric of gene expression), morph = the throat color morph, tissue = brain or skin tissue, skin_type = pigmented or unpigmented skin, ID = the field ID of the lizard, Pacbio_tube = the lab ID of the sample, genotype = the SPR region genotype, RIN = the RIN score of RNA integrity of the sample, alternative_skin_sample_SPR_fragments_per_million = the SPR_fragments_per_million for the other skin sample from that individual (ie for pigmented skin it is the expression of the unpigmented skin and for the unpigmented skin it is the expression of the pigmented skin), SPR_fragments_per_million_divided_by_unpigmented = the ratio of expression of in the pigmented skin divided by the expression in the unpigmented skin. This data is analyzed by the script: Fig_2E_S9_Uta_SPR_region_RNAseq_final.R
Uta_RNAseq_data_haplotype_specific_expression_1-14-25_long_form.csv
This is the haplotype-specific RNAseq data for the side-blotched lizard morphs in a long form for analyses. The columns in the file are: Sample = a long sample name, Number_of_input_reads = the total number of reads in the RNA data for that individual, Uniquely_mapped_reads_number = the number of reads that mapped uniquely, haplo_SPR_exon1_and_upstream = the number of reads mapping to the haplotype, haplotype = whether the haplotype is the blue or orange haplotype, haplo_fragments_per_million = the haplo_SPR_exon1_and_upstream column divided by Uniquely_mapped_reads_number and then times one million (this gives a standardized metric of gene expression), SPR_fragments_per_million = the standardized metric of gene expression for the SPR gene for the non-haplotype specific mapping (included for reference), morph = the throat color morph, sex = the sex of the individual, tissue = brain or skin tissue, skin_type = pigmented or unpigmented skin, ID = the field ID of the lizard, Pacbio_tube = the lab ID of the sample, genotype = the SPR region genotype. This data is analyzed by the script: Fig_2F_S9_Uta_SPR_haplotype_specific_expression final.R
Uta_RNAseq_data_haplotype_specific_expression_1-14-25_skin_comparison.csv
This is the haplotype-specific RNAseq data for the side-blotched lizard morphs, which has been formatted to facilitate comparisons in expression between pigmented/colored skin and unpigmented/non-colored skin. The columns in the file are: Sample = a long sample name for colored skin data, Number_of_input_reads = the total number of reads in the RNA data for that individual for colored skin data, Uniquely_mapped_reads_number = the number of reads that mapped uniquely for colored skin data, blue_haplo_SPR_exon1_and_upstream = the number of reads mapping to the blue haplotype for colored skin data, orange_haplo_SPR_exon1_and_upstream = the number of reads mapping to the orange haplotype for colored skin data, blue_haplo_fragments_per_million = the blue_haplo_SPR_exon1_and_upstream column divided by Uniquely_mapped_reads_number and then times 1 million (this gives a standardized metric of gene expression) for colored skin data, orange_haplo_fragments_per_million = the orange_haplo_SPR_exon1_and_upstream column divided by Uniquely_mapped_reads_number and then times 1 million (this gives a standardized metric of gene expression) for colored skin data, Sample _w_skin = a long sample name for noncolored skin data, Number_of_input_reads _w_skin = the total number of reads in the RNA data for that individual for noncolored skin data, Uniquely_mapped_reads_number _w_skin = the number of reads that mapped uniquely for noncolored skin data, blue_haplo_SPR_exon1_and_upstream _w_skin = the number of reads mapping to the blue haplotype for noncolored skin data, orange_haplo_SPR_exon1_and_upstream _w_skin = the number of reads mapping to the orange haplotype for noncolored skin data, blue_haplo_fragments_per_million_w_skin = the blue_haplo_SPR_exon1_and_upstream _w_skin column divided by Uniquely_mapped_reads_number_w_skin and then times 1 million (this gives a standardized metric of gene expression) for noncolored skin data, orange_haplo_fragments_per_million _w_skin = the orange_haplo_SPR_exon1_and_upstream _w_skin column divided by Uniquely_mapped_reads_number _w_skin and then times 1 million (this gives a standardized metric of gene expression) for noncolored skin data, blue_haplo_fragments_per_million_p_skin_divided_by_w_skin = blue_haplo_fragments_per_million divided by blue_haplo_fragments_per_million_w_skin, orange_haplo_fragments_per_million_p_skin_divided_by_w_skin = orange_haplo_fragments_per_million divided by orange_haplo_fragments_per_million_w_skin, morph = the throat color morph, ID = the field ID of the lizard, Pacbio_tube = the lab ID of the sample, genotype = the SPR region genotype. This data is analyzed by the script: Fig_2F_S9_Uta_SPR_haplotype_specific_expression final.R
Uta_RNAseq_data_haplotype_specific_expression_1-14-25.csv
This is the haplotype-specific RNAseq data for the side-blotched lizard morphs. The columns in the file are: Sample = a long sample name, Number_of_input_reads = the total number of reads in the RNA data for that individual, Uniquely_mapped_reads_number = the number of reads that mapped uniquely, blue_haplo_SPR_exon1_and_upstream = the number of reads mapping to the blue haplotype, orange_haplo_SPR_exon1_and_upstream = the number of reads mapping to the orange haplotype, blue_haplo_fragments_per_million = the blue_haplo_SPR_exon1_and_upstream column divided by Uniquely_mapped_reads_number and then times 1 million (this gives a standardized metric of gene expression), orange_haplo_fragments_per_million = the orange_haplo_SPR_exon1_and_upstream column divided by Uniquely_mapped_reads_number and then times 1 million (this gives a standardized metric of gene expression), blue_minus_orange_counts = blue_haplo_SPR_exon1_and_upstream minus orange_haplo_SPR_exon1_and_upstream, blue_divided_by_orange_counts = blue_haplo_SPR_exon1_and_upstream divided by orange_haplo_SPR_exon1_and_upstream, SPR_fragments_per_million = the standardized metric of gene expression for the SPR gene for the non-haplotype specific mapping (included for reference), morph = the throat color morph, sex = the sex of the individual, tissue = brain or skin tissue, skin_type = pigmented or unpigmented skin, ID = the field ID of the lizard, Pacbio_tube = the lab ID of the sample, genotype = the SPR region genotype. This data is analyzed by the script: Fig_2F_S9_Uta_SPR_haplotype_specific_expression final.R
#--------------------------------
Data for phylogenetic analyses
SPR_oo_bb_yy_haplotypes_noLB367.fasta
This is and alignment of 156 haplotypes of the SPR region that was used to infer a phylogeny by RAxML version 8.2.12 (Stamatakis 2014).
SPR_oo_bb_yy_haplotypes_transcriptome_individuals.fasta
This is an alignment of 14 SPR-region haplotypes each from orange, blue, and yellow-morph individuals (42 haplotypes total) that was used to infer a phylogeny by RAxML version 8.2.12 (Stamatakis 2014).
SPR_oo_bb_yy_haplotypes.fasta.names.and.morph.small.tree.csv
This is the data set that is used with the phylogeny for the 14 SPR-region haplotypes each from orange, blue, and yellow-morph individuals (42 haplotypes total). The columns in the file are: name = name of the haplotype, morph = a two letter code for the type of color morph with bb = blue, oo = orange, yy = yellow, color = the color used for plotting, sample_name = genetic sample ID, colored_skin_haplo_fragments_per_million = this is a standardized metric of the haplotype-specific gene expression of the SPR gene, colored_skin_log2_haplo_fragments_per_million = this is a log2 transformation of colored_skin_haplo_fragments_per_million, Genotype = genotype at the SPR region, name_and_genotype = combination of the sample name and genotype, haplotype = haplotype 1 or 2 for an individual, name_and_genotype_and_haplotype = combination of the sample name, haplotype, and genotype. It is used in the script: Fig_2G_S9i_S10_Mapping_morphs_on_Pacbio_tree_final.R
SPR_oo_bb_yy_haplotypes.fasta.names.and.morph.w.genotypes_noLB367.csv
This is the data set that is used with the phylogeny for the 156 haplotypes of the SPR region. The columns in the file are: name = name of the haplotype, morph = a two letter code for the type of color morph with bb = blue, oo = orange, yy = yellow, color = the color used for plotting, sample_name = genetic sample ID, genotype = genotype at the SPR region, name_and_genotype = combination of the sample name and genotype, haplotype = haplotype 1 or 2 for an individual, name_and_genotype_and_haplotype = combination of the sample name, haplotype, and genotype. It is used in the script: Fig_2G_S9i_S10_Mapping_morphs_on_Pacbio_tree_final.R
RAxML_bestTree.SPR_oo_bb_yy_haplotypes_noLB367_rooted.tre
This is a phylogeny for the 156 haplotypes of the SPR region that was inferred by RAxML version 8.2.12 (Stamatakis 2014). It is used in the script: Fig_2G_S9i_S10_Mapping_morphs_on_Pacbio_tree_final.R
RAxML_bestTree.SPR_oo_bb_yy_haplotypes_transcriptome_individuals.rooted.tre
This is a phylogeny for 14 SPR-region haplotypes each from orange, blue, and yellow-morph individuals (42 haplotypes total) that was inferred by RAxML version 8.2 (Stamatakis 2014). It is used in the script: Fig_2G_S9i_S10_Mapping_morphs_on_Pacbio_tree_final.R
#----------------------------------
Data associated with the simulations using different models for the maintenance of the morphs.
Stochastic_trials_w_alleles_lost.zip
This folder holds the output from different simulations of morph maintenance with genetic drift and different populations sizes. The output comes from the different scripts that simulate a model under genetic drift with varying population size (eg Fig_3F_S16_Two_allele_model_with_fixed_p_with_genetic_drift_and_varying_population_size.R). The folder has been zipped and needs to be unzipped before being used in an R-script. The files in this folder are used by the script: Fig_3F_S16_Plotting_stochastic_trials_and_alleles_lost.R
Allele_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo1996.txt
This is the output of the script: Fig_S13_Two_allele_model_with_optimized_p_Sinervo_1996_payoff_matrix.R. The output was saved to record the stochastic simulation that underlay Fig. 13H in the paper. The columns are frequency = the allele frequency, phenotype.frequency = the frequency of the morph phenotype, allele = which allele is being considered, and years = the generation number of the simulation. This data is used to plot the allele frequencies by the script: Fig_S13_Two_allele_model_with_optimized_p_Sinervo_1996_payoff_matrix.R.
Allele_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo2001.txt
This is the output of the script: Fig_S13_Two_allele_model_with_optimized_p_Sinervo_2001_payoff_matrix.R. The output was saved to record the stochastic simulation that underlay Fig. 13G in the paper. The columns are frequency = the allele frequency, phenotype.frequency = the frequency of the morph phenotype, allele = which allele is being considered, and years = the generation number of the simulation. This data is used to plot the allele frequencies by the script: Fig_S13_Two_allele_model_with_optimized_p_Sinervo_2001_payoff_matrix.R.
Phenotype_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo1996.txt
This is the output of the script: Fig_S13_Two_allele_model_with_optimized_p_Sinervo_1996_payoff_matrix.R. The output was saved to record the stochastic simulation that underlay Fig. 13H in the paper. The columns are frequency = the allele frequency, phenotype.frequency = the frequency of the morph phenotype, allele = which allele is being considered, and years = the generation number of the simulation. This data is used to plot the phenotype frequencies by the script: Fig_S13_Two_allele_model_with_optimized_p_Sinervo_1996_payoff_matrix.R.
Phenotype_freq_stochastic_model_optimized_p_N300_t50_orange_0.5_start_Sinervo2001.txt
This is the output of the script: Fig_S13_Two_allele_model_with_optimized_p_Sinervo_2001_payoff_matrix.R. The output was saved to record the stochastic simulation that underlay Fig. 13G in the paper. The columns are frequency = the allele frequency, phenotype.frequency = the frequency of the morph phenotype, allele = which allele is being considered, and years = the generation number of the simulation. This data is used to plot the phenotype frequencies by the script: Fig_S13_Two_allele_model_with_optimized_p_Sinervo_2001_payoff_matrix.R.
Code/software
All the code is written in the R language and environment for statistical computing (https://www.r-project.org/). The scripts are known to run with R version 4.4.1 (2024-06-14). In general, all the scripts depend on the R-packages ggplot2, ggthemes, and gridExtra. Additional dependencies are noted after each script. Within each script are annotations of what is being done within the script and details of the analyses or visualization of the data. Scripts that generate figures have a name that begins with the figure numbers associated with the script.
#--------------------------
Analyses of the morph phenotypes
Fig_S1E_F_Uta_spec_data_analyses_final.R
This script analyzes the coloration data taken by a spectrometer on four regions on the throat of the lizard. It depends on the PAVO (52) and car (Fox and Weisberg 2011) R-packages. It utilizes two data files: Uta_color_data_2013_2019_paired_down.csv and SPR_genotypes_for_yellow_morphs.txt.
Fig_S7_Morphology_by_morph_and_genotype_final.R
This script analyzes the body size data and conducts comparisons among different morphs and SPR genotypes. It depends on the car (Fox and Weisberg 2011), multcomp (Hothorn et al. 2008), and dplyr (Wickham et al. 2025)R-packages. It analyzes data in the file: Uta_stansburiana_size_morph_and_genotype_data.csv
#--------------------------------
Tests of correlations among morphs
Fig_3AB_S11_Malegame_w_3_allele_model_stochastic_to_get_number_of_each_morph.R
This simulated the three allele model to determine the expected correlation in the numbers of morphs. The simulations of the three allele model were done with a population size of 250 individuals. The output was the file: Three_allele_model_N250_t10000_interval100_morph_numbers_and_frequency.txt
Fig_3AB_S11_Correlations_among_morphs_final.R
This script tests for correlations among the observed frequencies of the morphs, plots the observed and predicted correlations among morphs, and calculates summary statistics on the proportions of morph phenotypes. It depends on the car (Fox and Weisberg 2011) R-package. It requires two files: Uta_field_data_on_morphs.csv and Three_allele_model_N250_t10000_interval100_morph_numbers_and_frequency.txt
#-------------------------------------
Analyzes of the genetic data
Blue_and_yellow_morph_with_genotypes_exact_test_final.R
This tests whether BB and OB genotypes differ in producing blue and yellow morphs. It uses a Fisher’s Exact Test.
Uta_SPR_exon1_CDS_analyses_final.R
This script conducts Fisher's Exact Tests on the three structural variants in exon1 of the SPR gene and on one position in the upstream region this indicative of the upstream haplotypes. It depends on the R package ape (Paradis and Schliep 2019). It utilizes two data files: SPR_CDS_diploid_sequences_name_and_ID.txt and SPR_exon1_amino_acid_alignment_diploid_data.fasta
Fig_1D_S6_Lastz_dotplot_final.R
This script analyzes the output of Lastz (Harris 2007) alignments of different genomic regions. . It utilizes the following six data files:
scaffold2135_to_ptg000029l_Revio_scaffold_reverse_complement.strandplus.rdotplot2.txt
scaffold2135_to_ptg000029l_each_10kb_around_SPR_region.strandplus.rdotplot.txt
ptg000029l_reverse_complement_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot.txt
ptg000029l_reverse_complement_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot_paired_down_to_15000bp_upstream_5000bp_downstream_Uta_SPR_CDS_region.txt
scaffold2135_improved_with_Pacbio_data_for_lastz_alignment_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot.txt
scaffold2135_improved_with_Pacbio_data_for_lastz_alignment_to_Sceloporus_undulatus_chromosome5_192000001-193000000_NC_056526.1.strandplus.rdotplot_paired_down_to_15000bp_upstream_5000bp_downstream_Uta_SPR_CDS_region.txt
Fig_2ABC_S7AB_Fixed_difs_shared_polymorphism_variation_and_genetic_diversity_across_haplotypes_final.R
This script analyzes the alignment of the SPR haplotypes. It determines the number of fixed differences and shared polymorphisms between the haplotypes. It also measures genetic diversity in windows across the haplotypes. It also visualizes each position of the haplotype alignment to show whether the position is: 1) an invariant position, 2) a gap position, 3) a position unique to either the O or B-haplotypes, 4) a fixed difference, 5) a shared polymorphism, or 6) a complex variable position. It utilizes the following data file:
Best_oo_bb_yy_alignment_w_all_gaps_no_scaffold2135_no_Revio_ref_no_LB367_10-1-24.fasta
Fig_2D_Genetic_diversity_plot_final.R
This script analyzes data on genetic diversity for the whole genome as output by ANGSD (Korneliussen et al. 2014). It analyzes data in the file: oo_bb_yy_Revio_mapping_w_standard_filters_but_no_baq_no_LB367_Folded_first_SFS_all_contigs_thetasWindow10000_step10000.pestPG.
A description of the output can be found at: https://www.popgen.dk/angsd/index.php/Thetas,Tajima,Neutrality_tests
Fig_S2_Genome_PCA_analyses_99individuals_final.R
This script analyzes a Principal Component Analysis (PCA) performed on the genomic resequencing data of 99 samples (i.e. 44 orange, 45 blue, and 10 yellow-morphs) that was conducted with the ngsCovar function of ngsTools (Fumagalli et al. 2014). It utilizes two data files: PCA_w_filters_oo_bb_yy_Revio_map_noLB367_10-4-24_output.covar and PCA_info_for_genome_samples_no_LB367.txt
Fig_S8_LD_heatmap_Uta_amplicon_data_imported_LD_values_final.R
This script plots a matrix of linkage disequilibrium among SNPs with LDheatmap (Shin et al. 2006). It depends on the LDheatmap R-package. It analyzes the data in: Best_oo_bb_yy_alignment_w_all_gaps_no_scaffold2135_no_Revio_ref_no_LB367_10-1-24.fasta_snps_LD_output_no.r2.limit.txt
#--------------------------
Genome wide association study analyses
Fig_1B_S3_S4_Uta_Revio_scaffold_plots_final.R
This script makes Manhattan plots of the association test results from ANGSD. It also makes qqplots of the distribution of the P-values. It depends on the R-package dplyr (Wickham et al. 2025). It utilizes eight data files: bb_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
bb_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
Contig_lengths_Uta_Revio_genome_2-6-24_UTA.bp.p_ctg.txt
oo_bb_BGIgenome_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
oo_bb_BGIgenome_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
oo_bb_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
oo_bb_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
oo_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.gz
oo_yy_Revio_association_test_extra_filters_and_PCs_noLB367_output.lrt0.sorted.top.1000000.snps.txt
#-------------------------------------
Analyzes of the gene expression (RNA-Seq) data
Fig_2E_S9_Uta_SPR_region_RNAseq_final.R
This script analyses the RNAseq data for the side-blotched lizard morphs. It depends on the car (Fox and Weisberg 2011), multcomp (Hothorn et al. 2008), and scales (Wickham et al. 2024) R-packages. It analyzes data in the file: Fig_2E_S9_Uta_male_RNAseq_data.csv
Fig_2F_S9_Uta_SPR_haplotype_specific_expression_final.R
This script conducts analyses of the haplotype-specific expression of the SPR gene in side-blotched lizards. It depends on the scales R-package. It analyzes data in three files: Uta_RNAseq_data_haplotype_specific_expression_1-14-25_long_form.csv
Uta_RNAseq_data_haplotype_specific_expression_1-14-25_skin_comparison.csv
Uta_RNAseq_data_haplotype_specific_expression_1-14-25.csv
#-------------------------------------
Visualizing data on a phylogeny
Fig_2G_S9i_S10_Mapping_morphs_on_Pacbio_tree_final.R
This script plots the phylogenies of the SPR-region haplotypes. The color morph of the individual lizard associated with each haplotype is plotted on the tips of the phylogeny. It also plots haplotype-specific expression of SPR data on to the phylogeny. It depends on phytools (Revell 2012), ape (Paradis and Schliep 2019), and plotrix (Lemon 2006) R-packages. It reads in four files: RAxML_bestTree.SPR_oo_bb_yy_haplotypes_noLB367_rooted.tre, SPR_oo_bb_yy_haplotypes.fasta.names.and.morph.w.genotypes_noLB367.csv, RAxML_bestTree.SPR_oo_bb_yy_haplotypes_transcriptome_individuals.rooted.tre, SPR_oo_bb_yy_haplotypes.fasta.names.and.morph.small.tree.csv
#---------------------------------
Simulations using different models for the maintenance of the morphs.
Fig_3C_Two_allele_model_with_plasticity_final.R
This simulates the one locus, two allele model with plasticity. Number of yellow-morphs is determined by the a fixed proportion of the individuals with the blue haplotype.
Fig_3D_S12_Two_allele_model_with_plasticity_varying_the_proportion_of_yellow_morphs_final.R
This simulated the one locus, two allele model with plasticity. Number of yellow-morphs is determined by the a fixed proportion of the individuals with the blue haplotype. This version of the model looks at the effects of varying the proportion of yellow-morphs across different simulations.
Fig_3E_Fig_S14_Two_allele_model_with_semi-optimized_p_Sinervo1996_payoff_matrix.R
This simulates the model of one locus with two alleles with plasticity determined by the a proportion of individuals with the blue haplotype. This version of the model looks at the effects of having a semi-optimized proportion of yellow and blue-morphs. A semi-optimized proportion means that the proportion is optimized to where the fitness of blue and yellow morphs are equal, but this value is adjusted by a constant to maintain a certain proportion of one morph. It uses the Sinervo 1996 payoff matrix (ie territory overlap fitness matrix).
Fig_3F_S16_Plotting_stochastic_trials_and_alleles_lost.R
This script plot the outcomes of the stochastic models looking at the stability of the two-allele model with plasticity with different levels of the p parameter. It also plots the comparative stability of the three-allele model and of the two-allele model in which an optimal level of the p parameter is calculated in each generation. It depends on all the files in the folder: Stochastic_trials_w_alleles_lost
Fig_3F_S16_Three_allele_model_with_genetic_drift_and_varying_population_size.R
This simulates the model based on one locus with three alleles for the determination of the three morphs. This is a stochastic model that includes genetic drift.
Fig_3F_S16_Two_allele_model_with_fixed_p_with_genetic_drift_and_varying_population_size.R
This simulated the model of one locus with two alleles with the numbers of blue and yellow morphs plasticity determined by the a proportion of individuals with the blue haplotype.
This version of the model looks at the effects of having fixed proportion of yellow and blue-morphs across generations. It looks at the stability of this model with genetic drift and different population sizes. It uses either the Sinervo 2001 payoff matrix (ie paternity data fitness matrix) with the time parameter is set to 1000 or the Sinervo and Lively 1996 payoff matric (ie territory overlap fitness matrix) with the time parameter set to 100. This is a stochastic model that includes genetic drift.
Fig_3F_S16_Two_allele_model_with_optimized_p_with_genetic_drift_and_varying_population_size_Sinervo_2001_payoff_matrix.R
This simulates the model of one locus with two alleles with the numbers of blue and yellow morphs plasticity determined by the a proportion of individuals with the blue haplotype.
This version of the model looks at the effects of having an optimized proportion of yellow and blue-morphs. It looks at the stability of this model with genetic drift and different population sizes. It uses the Sinervo 2001 payoff matrix (ie paternity data fitness matrix). This is a stochastic model that includes genetic drift.
Fig_S13_Two_allele_model_with_optimized_p_Sinervo_1996_payoff_matrix.R
This simulates the model of one locus with two alleles with plasticity determined by the a proportion of individuals with the blue haplotype. This version of the model looks at the effects of having an optimized proportion of yellow and blue-morphs. An optimized proportion means that the proportion is optimized to where the fitness of blue and yellow morphs are equal. It uses the Sinervo 1996 payoff matrix (ie territory overlap fitness matrix). This can be a deterministic or stochastic model with genetic drift.
Fig_S13_Two_allele_model_with_optimized_p_Sinervo_2001_payoff_matrix.R
This simulates the model of one locus with two alleles with plasticity determined by the a proportion of individuals with the blue haplotype. This version of the model looks at the effects of having an optimized proportion of yellow and blue-morphs. An optimized proportion means that the proportion is optimized to where the fitness of blue and yellow morphs are equal. It uses the Sinervo 2001 payoff matrix (ie paternity data fitness matrix). This can be a deterministic or stochastic model with genetic drift.
Fig_S14_Three_allele_model_deterministic_version.R
This simulates the model based on one locus with three alleles for the determination of the three morphs. This is a deterministic model.
Fig_S14_Two_allele_model_with_semi-optimized_p_Sinervo2001_payoff_matrix.R
This simulates the model of one locus with two alleles with plasticity determined by the a proportion of individuals with the blue haplotype. This version of the model looks at the effects of having a semi-optimized proportion of yellow and blue-morphs. A semi-optimized proportion means that the proportion is optimized to where the fitness of blue and yellow morphs are equal, but this value is adjusted by a constant to maintain a certain proportion of one morph. It uses the Sinervo 2001 payoff matrix (ie paternity data fitness matrix).
Fig_S16_Two_allele_model_with_optimized_p_with_genetic_drift_and_varying_population_size_Sinervo_1996_payoff_matrix.R
This simulates the model of one locus with two alleles with the numbers of blue and yellow morphs plasticity determined by the a proportion of individuals with the blue haplotype.
This version of the model looks at the effects of having an optimized proportion of yellow and blue-morphs. It looks at the stability of this model with genetic drift and different population sizes. It uses the Sinervo and Lively, 1996 payoff matrix (ie territory overlap fitness matrix).
This is a stochastic model that includes genetic drift.
#------------------------
Game theory analysis of the maintenance of the morphs
Fig_S15_Uta_ESS_plot_final.R
This script generates a plot of the allele and phenotype frequencies at a certain proportion of yellow-morphs based on the equations of the game theory ESS analysis in the paper (i.e. it generates Figure S15).
Citations
Fox, J., and S. Weisberg. 2011. An {R} companion to applied regression. Third. Sage, Thousand Oaks, CA.
Friedman, D., J. Magnani, D. Paranjpe, and B. Sinervo. 2017. Evolutionary games, climate and the generation of diversity. PLoS ONE 12:e0184052. Public Library of Science.
Fumagalli, M., F. G. Vieira, T. Linderoth, and R. Nielsen. 2014. ngsTools: Methods for population genetics analyses from next-generation sequencing data. Bioinformatics 30:1486–1487.
Harris, R. C. 2007. Improved pairwise alignment of genomic DNA. Ph.D. Thesis, The Pennsylvania State University.
Hothorn, T., F. Bretz, and P. Westfall. 2008. Simultaneous inference in general parametric models. Biometrical Journal 50:346–363.
Korneliussen, T. S., A. Albrechtsen, and R. Nielsen. 2014. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics 15:356. BioMed Central.
Korneliussen, T. S., and I. Moltke. 2015. NgsRelate: a software tool for estimating pairwise relatedness from next-generation sequencing data. Bioinformatics 31:4009–4011.
Lemon, J. 2006. Plotrix: a package in the red light district of R. R-News 6:8–12.
Paradis, E., and K. Schliep. 2019. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35:526–528.
Revell, L. J. 2012. Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3:217–223.
Shin, J.-H., S. Blay, and B. McNeney. 2006. LDheatmap: an R function for graphical display of pairwise linkage disequilibria between single nucleotide polymorphisms. J. Stat. Soft. 16:1–10.
Stamatakis, A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313.
Wickham, H., R. Francois, L. Henry, K. Muller, and D. Vaughan. 2025. dplyr: A grammar of data manipulation. R package version 1.1.4, https://dplyr.tidyverse.org.
Wickham, H., T. Pedersen, and D. Seidel. 2024. scales: Scale functions for visualization. R package version 1.3.0, https://scales.r-lib.org.
Access information
Other publicly accessible locations of the data:
-
Specimens, tissue samples, and DNA samples associated with this study have been deposited in the Museum of Vertebrate Zoology (MVZ) at the University of California, Berkeley. https://mvz.berkeley.edu/
Genetic data has been deposited in the Sequence Read Archive (SRA) in Bioproject PRJNA1328718.
https://www.ncbi.nlm.nih.gov/sra/PRJNA1328718
Genetic sequences for the SPR region have been deposited in Genbank (https://www.ncbi.nlm.nih.gov/genbank/) with the accession numbers: PX436996 - PX437172.
The SRA or Genbank accession numbers associated with particular samples can be found in the following three files, which provide metadata on the samples, the MVZ catalog number of the samples, and the SRA or Genbank accession numbers.
DNA sequence data deposited in the SRA: Uta_DNAseq_samples_with_SRA_accession_numbers.xlsx
RNA sequence data deposited in the SRA: Uta_RNAseq_samples_with_SRA_accession_numbers.xlsx
DNA sequence data deposited in Genbank: Uta_PacBio_amplicon_metadata_and_Genbank_ID.xlsx
The metadata associated with all the samples in the files are: Color_morph = throat color morph, collection_date = date of sample collection, latitude_longitude = GPS coordinates for where the sample was collect, Elevation_in_feet = the elevation where the sample was collected, Lab_Sample_ID = lab sample name, Life_Stage = whether the individual was an adult or juvenile, Sex = sex of the individual, Snout_vent_length_mm = snout vent length in millimeters, Weight_g = weight in grams, and SPR_region_genotype = genotype of the SPR-region. haplotypes. The samples in the file Uta_RNAseq_samples_with_SRA_accession_numbers.xlsx each have their own SRA Biosample accession numbers listed in the column “Sequence_Read_Archive_Biosample_Accession_Number,” but we also list for reference the Biosample accession numbers for the DNA sequence data associated with these lizards in the column “SRA_Biosample_accession_number_for_DNA_data.” The sequence data in the file “Uta_PacBio_amplicon_metadata_and_Genbank_ID.xlsx” is for haplotypes of the SPR region. We were able to resolve two haplotypes for most of the individuals that were sequenced (i.e. hap1 and hap2), but we were only able to resolve a single haplotype for some individuals (i.e. only hap1). The "Haplotype" column shows whether the individual haplotype is part of either the orange or blue grouping of haplotypes.
Data was derived from the following sources:
- The field data on the morph frequencies was first reported in: Friedman, D., J. Magnani, D. Paranjpe, and B. Sinervo. 2017. Evolutionary games, climate and the generation of diversity. PLoS ONE 12:e0184052. Public Library of Science. That data is available at: https://doi.org/10.1371/journal.pone.0184052.s002
Changes after Nov 4, 2025: The file "Uta_DNAseq_samples_with_SRA_accession_numbers.xlsx" has been updated to include two additional Genbank accession numbers associated with the samples.
