Yosemite Toad (Anaxyrus canorus) transcriptome reveals interplay between speciation genes and adaptive introgression
Data files
Mar 05, 2024 version files 3.36 GB
-
README.md
-
YOTOAD_ADAPTIVE_INTROGRESSION.zip
Abstract
Genomes are heterogeneous during the early stages of speciation, with small “islands” of DNA appearing to reflect strong adaptive differences, surrounded by vast seas of relative homogeneity. As species diverge, secondary contact zones between them can act as an interface, and selectively filter through advantageous alleles of hybrid origin. Such introgression is another important adaptive process, one that allows beneficial mosaics of recombinant DNA (“rivers”) to flow from one species into another. Although genomic islands of divergence appear to be associated with reproductive isolation, and genomic rivers form by adaptive introgression, it is unknown whether islands and rivers tend to be the same or different loci. We examined three replicate secondary contact zones for the Yosemite toad (Anaxyrus canorus) using two genomic datasets and a morphometric dataset to answer the questions: (1) How predictably different are islands and rivers, both in terms of genomic location and gene function; (2) Are the adaptive genetic trait loci underlying tadpole growth and development reliably islands, rivers, or neither? We found that island and river loci have significant overlap within a contact zone, suggesting that some loci are first islands, and later are predictably converted into rivers. However, gene ontology enrichment analysis showed strong overlap in gene function unique to all island loci, suggesting predictability in overall gene pathways for islands. GWAS outliers for tadpole development included LPIN3, a lipid metabolism gene potentially involved in climate change adaptation, that is island-like for all three contact zones but also appears to be introgressing (as a river) across one zone. Taken together, our results suggest that adaptive divergence and introgression may be more complementary forces than currently appreciated.
README: Yosemite Toad (Anaxyrus canorus) transcriptome reveals interplay between speciation genes and adaptive introgression
GENERAL INFORMATION
Title of Dataset: YOTOAD_ADAPTIVE_INTROGRESSION
Author Information:
Paul Maier
paulm@genebygene.com
San Diego State University
5500 Campanile Drive
San Diego, CA 92182
Principal Investigator: Paul Maier
Co-investigators: Amy Vandergast, Andrew Bohonak
Date of data collection: 2011–2013
Geographic location of data collection: Yosemite National Park
SHARING/ACCESS INFORMATION
No licenses/restrictions have been placed on the data.
** NOTE: ALL GEOGRAPHIC COORDINATES HAVE BEEN REMOVED TO PROTECT THE LOCATION OF THIS FEDERALLY PROTECTED SPECIES **
** PLEASE CONTACT THE CORRESPONDING AUTHOR TO REQUEST PERMISSION FOR THE ORIGINAL COORDINATES **
Recommended citation for the data:
Maier P.A., Vandergast A.G., Bohonak A.J. (2024). Yosemite Toad (Anaxyrus canorus) transcriptome reveals interplay between speciation genes and adaptive introgression. Molecular Ecology, in press. https://doi.org/10.1111/mec.17317
Citation for and links to publications that cite or use the data:
Main article DOI:
https://doi.org/10.1111/mec.17317
NCBI BioProject with links to Sequence Read Archive (SRA) fastq files for ddRADseq data (previously published):
https://www.ncbi.nlm.nih.gov/sra/PRJNA558546
NCBI BioProject with links to Sequence Read Archive (SRA) fastq files for RNAseq data (published in this paper):
https://www.ncbi.nlm.nih.gov/sra/PRJNA574353
This Dryad DOI:
https://doi.org/10.5061/dryad.pg4f4qrvj
DATA & FILE OVERVIEW
├── README.txt This README file
│\
├── scripts Scripts directory
│ │\
│ ├── analysis Scripts for main analyses of the paper
│ │ ├── 01_BlastRADtoRNAseq.R Script for BLASTing ddRAD sequences to transcriptome
│ │ ├── 02_Colony.sh Script for Colony analysis to remove close-kinship samples
│ │ ├── 03_Structure.sh Script for STRUCTURE analysis of three contact zones
│ │ ├── 04_NewHybrids.R Script for NewHybrids, HiEst, and PCA analysis to identify admixture zone
│ │ ├── 05_IslandRiverMakeInput.R Script for identifying genomic islands of divergence, producing BGC input files
│ │ ├── 06_RunBGC.sh Script for running BGC to identify rivers of introgression
│ │ ├── 07_TracePlotBGC.R Script for summarizing BGC output and identifying rivers
│ │ ├── 08_Genind2migrate.R Script for generating Migrate-n input from ddRAD sequences
│ │ ├── 09_Migrate.sh Script for running Migrate-n
│ │ ├── 10_PlotMigrateResults.R Script for summarizing and plotting Migrate-n results
│ │ ├── 11_Upset.R Script for comparative analyses (Fisher's and CMH tests, clustering, upset plots, power analysis, SNP effects)
│ │ ├── 12_GWAS.R Script for GWAS analyses
│ │ ├── IslandRiverFunctions.R Helper functions sourced by other R scripts
│ │ ├── SlatkinMaddisonS.py Helper function called by 05_IslandRiverMakeInput.R to calculate Slatkin's 's'
│ │ └── StructureRunner.sh Helper function called by 03_Structure.sh to run STRUCTURE
│ │\
│ └── transcriptome Scripts for bioinformatic assembly, variant calling, and annotation of transcriptome
│ ├── assembly Directory of scripts to assemble and annotate the transcriptome
│ │ ├── 01_inchworm.sh Script for running inchworm
│ │ ├── 02_chrysalis.sh Script for running chrysalis
│ │ ├── 03_RSEM.sh Script for running RSEM
│ │ ├── 04_transdecoder.sh Script for running Transdecoder
│ │ ├── 05_hmmscan.sh Script for running hmmscan
│ │ ├── 06_RNAmmer.sh Script for running RNAmmer
│ │ ├── 07_blast_homologies.sh Script for running blast homologies
│ │ ├── 08_signalp.sh Script for running signalp
│ │ ├── 09_tmhmm.sh Script for running tmhmm
│ │ └── 10_supertranscriptsALL.sh Script for running Trinity_gene_splice_modeler.py
│ ├── qc Directory of scripts for QC
│ │ ├── 01_readrepresentation.sh Script for running bowtie2
│ │ └── 02_busco.sh Script for running run_BUSCO.py
│ └── variants Directory of scripts to call variants, identify genotypes, and annotate SNPs
│ ├── 00_genotyping_protocol.txt Master file with pseudocode for how and why to run each script
│ ├── 01_variantcallingALL_part1.sh Script to prepare pooled (3) samples for HaplotypeCaller
│ ├── 02_variantcallingALL_part2.sh Script to make sure the output includes only biallelic SNPs
│ ├── 03_variantcalling229_part1.sh Script to prepare RNAseq Sample 229 for HaplotypeCaller
│ ├── 04_variantcalling229_part2.sh Script to run HaplotypeCaller on Sample 229
│ ├── 05_variantcalling230_part1.sh Script to prepare RNAseq Sample 230 for HaplotypeCaller
│ ├── 06_variantcalling230_part2.sh Script to run HaplotypeCaller on Sample 230
│ ├── 07_variantcalling231_part1.sh Script to prepare RNAseq Sample 231 for HaplotypeCaller
│ ├── 08_variantcalling231_part2.sh Script to run HaplotypeCaller on Sample 231
│ ├── 09_changeVCFnames.sh Script to change names of samples in Samples 229, 230, 231 VCF files
│ ├── 10_genotypecalling_fast1.sh Script to run joint genotyping pipeline
│ ├── 11_genotypecalling_fast2.sh Script to make sure the output includes only biallelic SNPs
│ ├── 12_final.editing.R Script to make sure REF/ALT for SNPs agree between pooled and individual genotypes
│ ├── 13_Transdecoder_supertranscripts.sh Script to run TransDecoder to get GFF3 of CDS
│ ├── 14_vep_tableoutput.sh Script to run VEP and use table output
│ ├── 15_JoinGenotypesAndSnpAnnotations.R Script to combine genotypes, SNP annotations, and gene annotations
│ ├── VCF2multiInterval.R Helper function called by 10_genotypecalling_fast1.sh to return intervals to actual chrom/pos
│ ├── VCF2oneInterval.R Helper function called by 10_genotypecalling_fast1.sh to assign SNPs to same fake interval
│ └── master.sh Helper script to split up genotype calling into intervals
│\
├── input Input directory for main analyses of paper
│ ├── EN_loci.txt Static list of loci chosen for EN contact zone
│ ├── ES_loci.txt Static list of loci chosen for ES contact zone
│ ├── EW_loci.txt Static list of loci chosen for EW contact zone
│ ├── YOSEbndry.dbf Yosemite NP boundary shapefile for plotting
│ ├── YOSEbndry.prj Yosemite NP boundary shapefile for plotting
│ ├── YOSEbndry.qpj Yosemite NP boundary shapefile for plotting
│ ├── YOSEbndry.shp Yosemite NP boundary shapefile for plotting
│ ├── YOSEbndry.shx Yosemite NP boundary shapefile for plotting
│ ├── YOSEgwasSNPs.ped GWAS SNPs in PED format
│ ├── YOSEgwasSNPs_loci.txt GWAS list of loci
│ ├── YOSEhaplo.stru STRUCTURE genotype file using ddRAD haplotypes (unique integers per haplotype)
│ ├── YOSEseqs.stru STRUCTURE genotype file with full ddRAD sequences
│ ├── YOSEseqs.txt Text genotype file with full ddRAD sequences
│ ├── YOSEsnp.stru STRUCTURE genotype file using SNPs
│ ├── coordinates.txt Coordinates of sampled tadpoles (** NOTE: ALL GEOGRAPHIC COORDINATES HAVE BEEN REMOVED *)
│ ├── meadows.txt List of meadows and which contact zone they're assigned to (1=EN, 2=ES, 3=EW)
│ ├── sPCAloci.txt sPCA scores for first three eigenvectors, corresponding to EN, ES, EW divergences
│ └── tadData.txt Tadpole morphometric (stage/length) data using for GWAS analysis
│\
├── data Directory for data produced by scripts and subsequently used as input for main analyses
│ ├── admixed.haplo.EN.100.txt EN contact zone: BGC input data
│ ├── admixed.haplo.ES.100.txt ES contact zone: BGC input data
│ ├── admixed.haplo.EW.100.txt EW contact zone: BGC input data
│ ├── haplo.EN.100BREAKS.txt EN contact zone: Start and end "coordinates" for concatenated ddRAD loci
│ ├── haplo.EN.100FASTA.txt EN contact zone: FASTA of concatenated ddRAD sequences
│ ├── haplo.EN.100FST.txt EN contact zone: PhiST, Dxy, Slatkin's 's', and island outlier status (prior to running BGC)
│ ├── haplo.EN.100LABELS.txt EN contact zone: Individual (haplotype) and parental population labels for BGC
│ ├── haplo.EN.100LOCI.txt EN contact zone: List of loci used for BGC
│ ├── haplo.EN.100_raxml EN contact zone: Directory of phylip files and NJ trees for Slatkin's 's' analysis
│ │ ├── *.phy EN contact zone: Phylip files (1126) one per locus
│ │ ├── ID.txt EN contact zone: Individual, meadow, and parental pop IDs for each individual
│ │ ├── NJ_.tre EN contact zone: Neighbor joining trees (1126) one per locus
│ │ └── SlatkinMaddisonS.txt EN contact zone: Slatkin's 's' results
│ ├── haplo.ES.100BREAKS.txt ES contact zone: Start and end "coordinates" for concatenated ddRAD loci
│ ├── haplo.ES.100FASTA.txt ES contact zone: FASTA of concatenated ddRAD sequences
│ ├── haplo.ES.100FST.txt ES contact zone: PhiST, Dxy, Slatkin's 's', and island outlier status (prior to running BGC)
│ ├── haplo.ES.100LABELS.txt ES contact zone: Individual (haplotype) and parental population labels for BGC
│ ├── haplo.ES.100LOCI.txt ES contact zone: List of loci used for BGC
│ ├── haplo.ES.100_raxml ES contact zone: Directory of phylip files and NJ trees for Slatkin's 's' analysis
│ │ ├── .phy ES contact zone: Phylip files (1138) one per locus
│ │ ├── ID.txt ES contact zone: Individual, meadow, and parental pop IDs for each individual
│ │ ├── NJ_.tre ES contact zone: Neighbor joining trees (1138) one per locus
│ │ └── SlatkinMaddisonS.txt ES contact zone: Slatkin's 's' results
│ ├── haplo.EW.100BREAKS.txt EW contact zone: Start and end "coordinates" for concatenated ddRAD loci
│ ├── haplo.EW.100FASTA.txt EW contact zone: FASTA of concatenated ddRAD sequences
│ ├── haplo.EW.100FST.txt EW contact zone: PhiST, Dxy, Slatkin's 's', and island outlier status (prior to running BGC)
│ ├── haplo.EW.100LABELS.txt EW contact zone: Individual (haplotype) and parental population labels for BGC
│ ├── haplo.EW.100LOCI.txt EW contact zone: List of loci used for BGC
│ ├── haplo.EW.100_raxml EW contact zone: Directory of phylip files and NJ trees for Slatkin's 's' analysis
│ │ ├── .phy EW contact zone: Phylip files (925) one per locus
│ │ ├── ID.txt EW contact zone: Individual, meadow, and parental pop IDs for each individual
│ │ ├── NJ_.tre EW contact zone: Neighbor joining trees (925) one per locus
│ │ └── SlatkinMaddisonS.txt EW contact zone: Slatkin's 's' results
│ ├── p1.haplo.EN.100.txt EN contact zone: Parent 1 genotype file
│ ├── p1.haplo.ES.100.txt ES contact zone: Parent 1 genotype file
│ ├── p1.haplo.EW.100.txt EW contact zone: Parent 1 genotype file
│ ├── p2.haplo.EN.100.txt EN contact zone: Parent 2 genotype file
│ ├── p2.haplo.ES.100.txt ES contact zone: Parent 2 genotype file
│ └── p2.haplo.EW.100.txt EW contact zone: Parent 2 genotype file
│\
├── transcriptome Transcriptome directory
│ ├── genotyping Genotyping directory
│ │ ├── snpeff snpEff directory
│ │ │ ├── config Configuration directory
│ │ │ │ └── snpEff.config snpEff configuration file
│ │ │ ├── data Data directory
│ │ │ │ ├── yotoad.v1 Yosemite toad "genome" v1 (all gene transcripts) used for snpEff
│ │ │ │ │ ├── genes.gtf Genes in General Transfer Format
│ │ │ │ │ ├── sequences.fa Genes in FASTA Format
│ │ │ │ │ └── snpEffectPredictor.bin Genes in snpEff binned format
│ │ │ │ └── yotoad.v2 Yosemite toad "genome" v2 (Transdecoder transcripts) used for snpEff
│ │ │ │ ├── genes.gtf Genes General Transfer Format
│ │ │ │ └── sequences.fa Genes in FASTA Format
│ │ │ └── output snpEff output directory
│ │ │ ├── genotyped.multiInterval.bisnps.edited.vcf Joint genotyping biallelic SNPs VCF file (removed SNPs where REF/ALT don't agree across datasets)
│ │ │ ├── snpEff_genes.txt Gene variants binned into effect categories
│ │ │ └── snpEff_summary.html HTML summary of snpEff results
│ │ ├── snplist Lists of SNPs for various functions
│ │ │ ├── SNPlist.txt List of all transcriptome SNPs used to split genotype calling into 1000 SNP intervals for faster processing
│ │ │ ├── SNPlistREFALT.txt List of REF and ALT for finding any SNPs where REF or ALT do not agree across samples
│ │ │ ├── SNPlist_concat.txt List of SNP real and fake positions used to return intervals to actual coordinates
│ │ │ └── SNPlist_test.txt Test version of SNPlist.txt
│ │ ├── snpref Reference for called SNPs
│ │ │ ├── concat.dict Dictionary of concatenated REF snps from pooled output
│ │ │ ├── concat.fa FASTA file of concatenated REF snps from pooled output
│ │ │ ├── concat.fa.fai FASTA file of concatenated REF snps from pooled output (index)
│ │ │ └── fasta.bed Region file that allows gvcftools break_blocks to break a gVCF file into all positions for a chosen contig
│ │ ├── transdecoder Transdecoder directory
│ │ │ ├── my.stdout.Transdecoder Standard output from analysis
│ │ │ ├── supertranscripts.wOrfs.bed Predicted ORFs superimposed onto the supertranscript (bed file)
│ │ │ ├── supertranscripts.wOrfs.gff3 Predicted ORFs superimposed onto the supertranscript (gff3 file)
│ │ │ ├── supertranscripts.wOrfs.gtf Predicted ORFs superimposed onto the supertranscript (gtf file)
│ │ │ ├── trinity_genes.gff3 Transcriptome genes in gff3 format
│ │ │ ├── trinity_transcripts.fasta Extracted transcript sequences from gtf_genome_to_cdna_fasta.pl
│ │ │ ├── trinity_transcripts.fasta.transdecoder.bed Transdecoder candidate ORFs (bed file)
│ │ │ ├── trinity_transcripts.fasta.transdecoder.cds Transdecoder candidate ORFs (cds file)
│ │ │ ├── trinity_transcripts.fasta.transdecoder.gff3 Transdecoder candidate ORFs (gff3 file)
│ │ │ └── trinity_transcripts.fasta.transdecoder.pep Transdecoder candidate ORFs (pep file)
│ │ └── vep VEP directory
│ │ ├── sequences.fa Input FASTA file of yotoad.v2 transcript sequences
│ │ ├── sequences.fa.fai Input FASTA file of yotoad.v2 transcript sequences (index)
│ │ ├── transcriptome.SNPs.final.annotated.vcf Transcriptome SNPs VCF file annotated with VEP output
│ │ ├── transcriptome.SNPs.final.vcf Transcriptome SNPs VCF file
│ │ ├── variant_effect_output.txt_summary.html HTML summary of VEP results
│ │ ├── variant_effect_output.txt_warnings.txt VEP program warnings
│ │ ├── yotoad.gff Yosemite toad "genome" v2 (Transdecoder transcripts) used for VEP
│ │ ├── yotoad.gff.gz Yosemite toad "genome" v2 (Transdecoder transcripts) used for VEP (zipped)
│ │ └── yotoad.gff.gz.tbi Yosemite toad "genome" v2 (Transdecoder transcripts) used for VEP (index)
│ ├── output Transcriptome output directory
│ │ ├── annotation Annotation directory
│ │ │ └── trinotate_annotation_report.txt Annotation for entire transcriptome, incorporating all nucleotide, protein, GO, and motif matches
│ │ ├── gVCF gVCF output directory for genotyping
│ │ │ ├── output.229.g.vcf.gz gVCF for sample 229
│ │ │ ├── output.229.g.vcf.gz.tbi gVCF for sample 229 (index)
│ │ │ ├── output.230.g.vcf.gz gVCF for sample 230
│ │ │ ├── output.230.g.vcf.gz.tbi gVCF for sample 230 (index)
│ │ │ ├── output.231.g.vcf.gz gVCF for sample 231
│ │ │ └── output.231.g.vcf.gz.tbi gVCF for sample 231 (index)
│ │ ├── genes Genes directory
│ │ │ ├── trinity_genes.dict Genes dictionary
│ │ │ ├── trinity_genes.fasta.fai Genes FASTA (index)
│ │ │ ├── trinity_genes.fasta.gz Genes FASTA
│ │ │ ├── trinity_genes.gtf Genes General Transfer Format
│ │ │ └── trinity_genes.malign.gz Genes multiple alignment with different candidate splicing isoforms
│ │ ├── supertranscriptome Primary output directory for supertranscriptome (gene complexes)
│ │ │ └── Trinity_v2.fasta Primary supertranscriptome result
│ │ ├── transcriptome Primary output directory for transcriptome
│ │ │ ├── combined All three samples combined
│ │ │ │ └── Trinity.fasta Primary transcriptome result
│ │ │ └── individual Each of three samples separately
│ │ │ ├── TrinityNewRun229.Trinity.fasta Transcriptome for sample 229
│ │ │ ├── TrinityNewRun230.Trinity.fasta Transcriptome for sample 230
│ │ │ └── TrinityNewRun231.Trinity.fasta Transcriptome for sample 231
│ │ └── variants Variant calling directory
│ │ ├── genotyped.multiInterval.bisnps.edited.vcf Joint genotyping biallelic SNPs VCF file (removed SNPs where REF/ALT don't agree across datasets)
│ │ ├── genotyped.multiInterval.bisnps.vcf Joint genotyping biallelic SNPs VCF file
│ │ ├── genotyped.multiInterval.bisnps.vcf.idx Joint genotyping biallelic SNPs VCF file (index)
│ │ ├── genotyped.multiInterval.vcf Joint genotyping SNPs VCF file
│ │ ├── genotyped.vcf Joint genotyping SNPs VCF file with fake SNP positions for better processing speed
│ │ ├── genotyped.vcf.gz.tbi Joint genotyping SNPs VCF file with fake SNP positions for better processing speed (index)
│ │ ├── output.229.all.oneInterval.fixed.vcf.gz Sample 229 SNPs VCF file with fake SNP positions with updated sequence dictionary
│ │ ├── output.229.all.oneInterval.fixed.vcf.gz.tbi Sample 229 SNPs VCF file with fake SNP positions with updated sequence dictionary (index)
│ │ ├── output.229.all.oneInterval.vcf.gz Sample 229 SNPs VCF file with fake SNP positions
│ │ ├── output.229.all.oneInterval.vcf.gz.tbi Sample 229 SNPs VCF file with fake SNP positions (index)
│ │ ├── output.230.all.oneInterval.fixed.vcf.gz Sample 230 SNPs VCF file with fake SNP positions with updated sequence dictionary
│ │ ├── output.230.all.oneInterval.fixed.vcf.gz.tbi Sample 230 SNPs VCF file with fake SNP positions with updated sequence dictionary (index)
│ │ ├── output.230.all.oneInterval.vcf.gz Sample 230 SNPs VCF file with fake SNP positions
│ │ ├── output.230.all.oneInterval.vcf.gz.tbi Sample 230 SNPs VCF file with fake SNP positions (index)
│ │ ├── output.231.all.oneInterval.fixed.vcf.gz Sample 231 SNPs VCF file with fake SNP positions with updated sequence dictionary
│ │ ├── output.231.all.oneInterval.fixed.vcf.gz.tbi Sample 231 SNPs VCF file with fake SNP positions with updated sequence dictionary (index)
│ │ ├── output.231.all.oneInterval.vcf.gz Sample 231 SNPs VCF file with fake SNP positions
│ │ ├── output.231.all.oneInterval.vcf.gz.tbi Sample 231 SNPs VCF file with fake SNP positions (index)
│ │ ├── output.all.filtered.bisnps.ann.edited.vcf Genes variant called using Haplotype Caller with biallelic SNPs and FS > 30 & QD < 2 filters (annotated and filtered)
│ │ ├── output.all.filtered.bisnps.ann.vcf Genes variant called using Haplotype Caller with biallelic SNPs and FS > 30 & QD < 2 filters (annotated)
│ │ ├── output.all.filtered.bisnps.vcf Genes variant called using Haplotype Caller with biallelic SNPs and FS > 30 & QD < 2 filters
│ │ ├── output.all.filtered.bisnps.vcf.idx Genes variant called using Haplotype Caller with biallelic SNPs and FS > 30 & QD < 2 filters (index)
│ │ ├── output.all.filtered.vcf Genes variant called using Haplotype Caller with FS > 30 & QD < 2 filters
│ │ ├── output.all.filtered.vcf.gz.tbi Genes variant called using Haplotype Caller with FS > 30 & QD < 2 filters (index)
│ │ ├── output.all.vcf.gz Genes variant called using Haplotype Caller
│ │ ├── output.all.vcf.gz.tbi Genes variant called using Haplotype Caller (index)
│ │ ├── transcriptome.SNPs.annotation.table.metadata.txt Variant Effect Prediector (VEP) fields
│ │ ├── transcriptome.SNPs.annotation.table.txt VEP output table
│ │ ├── transcriptome.SNPs.final.annotated.vcf Transcriptome SNPs VCF file annotated with VEP output
│ │ ├── transcriptome.SNPs.final.vcf Transcriptome SNPs VCF file
│ │ ├── transcriptome.SNPs.genes.final.annotated.txt Transcriptome genic SNPs VCF file annotated with VEP output
│ │ ├── transcriptome.variant.definition.table.txt Table defining types of variant effects
│ │ ├── transcriptome.variant.hydrophobicity.table.txt Table defining hydrophobicity of each possible amino acid change
│ │ └── transcriptome.variant.rank.table.txt Table ranking effect magnitude of each possible variant
│ └── qc QC directory
│ ├── BUSCO_full_table.tsv BUSCO orthologs tabular output
│ ├── BUSCO_missing_list.tsv BUSCO missing orthologs
│ ├── BUSCO_summary.txt BUSCO summarized output
│ ├── STARlog229.txt STAR mapping QC for sample 229
│ ├── STARlog230.txt STAR mapping QC for sample 229
│ ├── STARlog231.txt STAR mapping QC for sample 229
│ ├── STARlogALL.txt STAR mapping QC for combined samples
│ └── readrepresentation.txt Bowtie2 read representation and alignment stats
│\
├── blast BLAST directory for matching transcriptome against ddRAD sequences (IDs only)
│ ├── RAD.RNA.table.txt BLAST match results of ddRAD sequences and RNAseq contigs
│ ├── RAD.fa ddRAD FASTA file
│ ├── RAD.txt ddRAD SNP positions and values
│ ├── RAD_supertranscriptome.txt BLAST directory for matching supertranscriptome against ddRAD sequences
│ ├── RAD_transcriptome.txt BLAST directory for matching transcriptome against ddRAD sequences
│ ├── supertranscriptome.fa Supertranscriptome FASTA file (abbreviated headers)
│ ├── supertranscriptome.nhr Supertranscriptome BLAST database
│ ├── supertranscriptome.nin Supertranscriptome BLAST database
│ ├── supertranscriptome.nsq Supertranscriptome BLAST database
│ ├── transcriptome.fa Transcriptome FASTA file (abbreviated headers)
│ ├── transcriptome.nhr Transcriptome BLAST database
│ ├── transcriptome.nin Transcriptome BLAST database
│ └── transcriptome.nsq Transcriptome BLAST database
│\
├── colony Colony directory for finding close kinship relationships for sample removal
│ ├── RemoveFullSibs.xlsx Excel spreadsheet selecting samples to remove based on Colony results
│ ├── colony_remove.txt List of samples to remove based on Colony results
│ ├── input Input directory
│ │ └── yosemite2.dat Colony input data
│ └── output Output directory
│ ├── yosemite2.AlleleFreq Refined allele frequencies taking ML family structure into account
│ ├── yosemite2.BestCluster Clusters of related individuals
│ ├── yosemite2.BestConfig Best configuration with ML
│ ├── yosemite2.BestConfig_Ordered Best configuration with ML (ordered)
│ ├── yosemite2.BestFSFamily Best full-sib family with ML
│ ├── yosemite2.ConfigArchive Archived configurations with ML
│ ├── yosemite2.DadGenotype True father ID and genotypes
│ ├── yosemite2.Distribution Distribution of the number of pat/mat/full-sib families, children per father/mother, mates per male/female
│ ├── yosemite2.ErrorRate Mistyping error rates
│ ├── yosemite2.FullSibDyad Pairs of inferred full siblings
│ ├── yosemite2.GtypeData Binarized genotype data
│ ├── yosemite2.HalfSibDyad Pairs of inferred half siblings
│ ├── yosemite2.MidResult Intermediate results 1
│ ├── yosemite2.MidResult2 Intermediate results 2
│ ├── yosemite2.MidResult3 Intermediate results 3
│ ├── yosemite2.MidResult5 Intermediate results 5
│ ├── yosemite2.MidResult6 Intermediate results 6
│ ├── yosemite2.MumGenotype Genotypes of inferred mothers
│ ├── yosemite2.Ne Ne estimates from sibship assignment and heterozygote excess methods
│ ├── yosemite2.OffGenotype Individuals with the same clone index
│ ├── yosemite2.PairCluster Cluster pairs
│ ├── yosemite2.PairwiseFullSibDyad Pairs of inferred full siblings inferred by PPL
│ ├── yosemite2.PairwiseHalfSibDyad Pairs of inferred half siblings inferred by PPL
│ └── yosemite2.data Input Colony genotype matrix
│\
├── structure STRUCTURE analysis directory
│ ├── EN.stru EN contact zone input STRUCTURE data
│ ├── EN_1_f EN replicate 1 results
│ ├── EN_2_f EN replicate 2 results
│ ├── EN_3_f EN replicate 3 results
│ ├── EN_4_f EN replicate 4 results
│ ├── EN_5_f EN replicate 5 results
│ ├── EN_ind.txt EN individual ancestry proportion results (single replicate)
│ ├── EN_indfile EN individual ancestry proportion results (all reps for clumpp)
│ ├── EN_paramfile EN STRUCTURE parameters
│ ├── EN_popfile EN population ancestry proportion results (all reps for clumpp)
│ ├── ES.stru ES contact zone input STRUCTURE data
│ ├── ES_1_f ES replicate 1 results
│ ├── ES_2_f ES replicate 2 results
│ ├── ES_3_f ES replicate 3 results
│ ├── ES_4_f ES replicate 4 results
│ ├── ES_5_f ES replicate 5 results
│ ├── ES_ind.txt ES individual ancestry proportion results (single replicate)
│ ├── ES_indfile ES individual ancestry proportion results (all reps for clumpp)
│ ├── ES_paramfile ES STRUCTURE parameters
│ ├── ES_popfile ES population ancestry proportion results (all reps for clumpp)
│ ├── EW.stru EW contact zone input STRUCTURE data
│ ├── EW_1_f EW replicate 1 results
│ ├── EW_2_f EW replicate 2 results
│ ├── EW_3_f EW replicate 3 results
│ ├── EW_4_f EW replicate 4 results
│ ├── EW_5_f EW replicate 5 results
│ ├── EW_ind.txt EW individual ancestry proportion results (single replicate)
│ ├── EW_indfile EW individual ancestry proportion results (all reps for clumpp)
│ ├── EW_paramfile EW STRUCTURE parameters
│ ├── EW_popfile EW population ancestry proportion results (all reps for clumpp)
│ ├── extraparams STRUCTURE extra parameters
│ ├── mainparams STRUCTURE main parameters
│ ├── miscfile STRUCTURE misc parameters
│ ├── parameters Parameters file looped over by StructureRunner.sh
│ ├── permfile STRUCTURE and clumpp permutations file
│ └── yose.stru Yosemite NP (all contact zones) input STRUCTURE data
│\
├── newhybrids NewHybrids analysis directory
│ ├── EN EN zone directory
│ │ ├── EN.names.txt EN: individual and meadow names for genotype rows
│ │ ├── EN.txt EN: NewHybrids external input genotype data
│ │ ├── EN_HIest.txt EN: HIest results
│ │ ├── EchoedGtypData.txt EN: NewHybrids internally formatted genotype data
│ │ ├── aa-EchoedGtypFreqCats.txt EN: input genotype frequency proportions
│ │ ├── aa-LociAndAlleles.txt EN: input alleles at each locus
│ │ ├── aa-Pi.aves EN: histogram of mixing proportions
│ │ ├── aa-Pi.hist EN: average of mixing proportions
│ │ ├── aa-PofZ.txt EN: posterior probabilities of hybrid classes
│ │ ├── aa-ScaledLikelihood.txt EN: likelihood for each individual of each hybrid class
│ │ ├── aa-Theta.hist EN: histogram of estimated allele proportions
│ │ └── aa-ThetaAverages.txt EN: average estimated allele frequencies at each locus
│ ├── ES ES zone directory
│ │ ├── ES.names.txt ES: individual and meadow names for genotype rows
│ │ ├── ES.txt ES: NewHybrids external input genotype data
│ │ ├── ES_HIest.txt ES: HIest results
│ │ ├── EchoedGtypData.txt ES: NewHybrids internally formatted genotype data
│ │ ├── aa-EchoedGtypFreqCats.txt ES: input genotype frequency proportions
│ │ ├── aa-LociAndAlleles.txt ES: input alleles at each locus
│ │ ├── aa-Pi.aves ES: histogram of mixing proportions
│ │ ├── aa-Pi.hist ES: average of mixing proportions
│ │ ├── aa-PofZ.txt ES: posterior probabilities of hybrid classes
│ │ ├── aa-ScaledLikelihood.txt ES: likelihood for each individual of each hybrid class
│ │ ├── aa-Theta.hist ES: histogram of estimated allele proportions
│ │ └── aa-ThetaAverages.txt ES: average estimated allele frequencies at each locus
│ └── EW EW zone directory
│ ├── EW.names.txt EW: individual and meadow names for genotype rows
│ ├── EW.txt EW: NewHybrids external input genotype data
│ ├── EW_HIest.txt EW: HIest results
│ ├── EchoedGtypData.txt EW: NewHybrids internally formatted genotype data
│ ├── aa-EchoedGtypFreqCats.txt EW: input genotype frequency proportions
│ ├── aa-LociAndAlleles.txt EW: input alleles at each locus
│ ├── aa-Pi.aves EW: histogram of mixing proportions
│ ├── aa-Pi.hist EW: average of mixing proportions
│ ├── aa-PofZ.txt EW: posterior probabilities of hybrid classes
│ ├── aa-ScaledLikelihood.txt EW: likelihood for each individual of each hybrid class
│ ├── aa-Theta.hist EW: histogram of estimated allele proportions
│ └── aa-ThetaAverages.txt EW: average estimated allele frequencies at each locus
│\
├── bgc BGC analysis directory
│ ├── EN BGC results for EN contact zone
│ │ ├── BGChaplo.EN.100.1.LnL.txt EN replicate 1: Log likelihood of each MCMC step
│ │ ├── BGChaplo.EN.100.1.alpha.txt EN replicate 1: α of each MCMC step
│ │ ├── BGChaplo.EN.100.1.beta.txt EN replicate 1: β of each MCMC step
│ │ ├── BGChaplo.EN.100.1.gamma-quantile.txt EN replicate 1: γ quantiles of each MCMC step
│ │ ├── BGChaplo.EN.100.1.hdf5 EN replicate 1: HDF5 formatted output made to be summarized by estpost
│ │ ├── BGChaplo.EN.100.1.hi.txt EN replicate 1: Hybrid index of each MCMC step
│ │ ├── BGChaplo.EN.100.1.zeta-quantile.txt EN replicate 1: ζ quantiles of each MCMC step
│ │ ├── BGChaplo.EN.100.2.LnL.txt EN replicate 2: Log likelihood of each MCMC step
│ │ ├── BGChaplo.EN.100.2.alpha.txt EN replicate 2: α of each MCMC step
│ │ ├── BGChaplo.EN.100.2.beta.txt EN replicate 2: β of each MCMC step
│ │ ├── BGChaplo.EN.100.2.gamma-quantile.txt EN replicate 2: γ quantiles of each MCMC step
│ │ ├── BGChaplo.EN.100.2.hdf5 EN replicate 2: HDF5 formatted output made to be summarized by estpost
│ │ ├── BGChaplo.EN.100.2.hi.txt EN replicate 2: Hybrid index of each MCMC step
│ │ ├── BGChaplo.EN.100.2.zeta-quantile.txt EN replicate 2: ζ quantiles of each MCMC step
│ │ ├── BGChaplo.EN.100.3.LnL.txt EN replicate 3: Log likelihood of each MCMC step
│ │ ├── BGChaplo.EN.100.3.alpha.txt EN replicate 3: α of each MCMC step
│ │ ├── BGChaplo.EN.100.3.beta.txt EN replicate 3: β of each MCMC step
│ │ ├── BGChaplo.EN.100.3.gamma-quantile.txt EN replicate 3: γ quantiles of each MCMC step
│ │ ├── BGChaplo.EN.100.3.hdf5 EN replicate 3: HDF5 formatted output made to be summarized by estpost
│ │ ├── BGChaplo.EN.100.3.hi.txt EN replicate 3: Hybrid index of each MCMC step
│ │ ├── BGChaplo.EN.100.3.zeta-quantile.txt EN replicate 3: ζ quantiles of each MCMC step
│ │ ├── BGChaplo.EN.100.LnL.txt EN across reps: Log likelihood of each MCMC step
│ │ ├── BGChaplo.EN.100.alpha.txt EN across reps: α of each MCMC step
│ │ ├── BGChaplo.EN.100.beta.txt EN across reps: β of each MCMC step
│ │ ├── BGChaplo.EN.100.gamma-quantile.txt EN across reps: γ quantiles of each MCMC step
│ │ ├── BGChaplo.EN.100.hi.txt EN across reps: Hybrid index of each MCMC step
│ │ ├── BGChaplo.EN.100.hiest.txt EN across reps: HIest results
│ │ └── BGChaplo.EN.100.zeta-quantile.txt EN across reps: ζ quantiles of each MCMC step
│ ├── ES BGC results for ES contact zone
│ │ ├── BGChaplo.ES.100.1.LnL.txt ES replicate 1: Log likelihood of each MCMC step
│ │ ├── BGChaplo.ES.100.1.alpha.txt ES replicate 1: α of each MCMC step
│ │ ├── BGChaplo.ES.100.1.beta.txt ES replicate 1: β of each MCMC step
│ │ ├── BGChaplo.ES.100.1.gamma-quantile.txt ES replicate 1: γ quantiles of each MCMC step
│ │ ├── BGChaplo.ES.100.1.hdf5 ES replicate 1: HDF5 formatted output made to be summarized by estpost
│ │ ├── BGChaplo.ES.100.1.hi.txt ES replicate 1: Hybrid index of each MCMC step
│ │ ├── BGChaplo.ES.100.1.zeta-quantile.txt ES replicate 1: ζ quantiles of each MCMC step
│ │ ├── BGChaplo.ES.100.2.LnL.txt ES replicate 2: Log likelihood of each MCMC step
│ │ ├── BGChaplo.ES.100.2.alpha.txt ES replicate 2: α of each MCMC step
│ │ ├── BGChaplo.ES.100.2.beta.txt ES replicate 2: β of each MCMC step
│ │ ├── BGChaplo.ES.100.2.gamma-quantile.txt ES replicate 2: γ quantiles of each MCMC step
│ │ ├── BGChaplo.ES.100.2.hdf5 ES replicate 2: HDF5 formatted output made to be summarized by estpost
│ │ ├── BGChaplo.ES.100.2.hi.txt ES replicate 2: Hybrid index of each MCMC step
│ │ ├── BGChaplo.ES.100.2.zeta-quantile.txt ES replicate 2: ζ quantiles of each MCMC step
│ │ ├── BGChaplo.ES.100.3.LnL.txt ES replicate 3: Log likelihood of each MCMC step
│ │ ├── BGChaplo.ES.100.3.alpha.txt ES replicate 3: α of each MCMC step
│ │ ├── BGChaplo.ES.100.3.beta.txt ES replicate 3: β of each MCMC step
│ │ ├── BGChaplo.ES.100.3.gamma-quantile.txt ES replicate 3: γ quantiles of each MCMC step
│ │ ├── BGChaplo.ES.100.3.hdf5 ES replicate 3: HDF5 formatted output made to be summarized by estpost
│ │ ├── BGChaplo.ES.100.3.hi.txt ES replicate 3: Hybrid index of each MCMC step
│ │ ├── BGChaplo.ES.100.3.zeta-quantile.txt ES replicate 3: ζ quantiles of each MCMC step
│ │ ├── BGChaplo.ES.100.LnL.txt ES across reps: Log likelihood of each MCMC step
│ │ ├── BGChaplo.ES.100.alpha.txt ES across reps: α of each MCMC step
│ │ ├── BGChaplo.ES.100.beta.txt ES across reps: β of each MCMC step
│ │ ├── BGChaplo.ES.100.gamma-quantile.txt ES across reps: γ quantiles of each MCMC step
│ │ ├── BGChaplo.ES.100.hi.txt ES across reps: Hybrid index of each MCMC step
│ │ └── BGChaplo.ES.100.zeta-quantile.txt ES across reps: ζ quantiles of each MCMC step
│ ├── EW BGC results for EW contact zone
│ │ ├── BGChaplo.EW.100.1.LnL.txt EW replicate 1: Log likelihood of each MCMC step
│ │ ├── BGChaplo.EW.100.1.alpha.txt EW replicate 1: α of each MCMC step
│ │ ├── BGChaplo.EW.100.1.beta.txt EW replicate 1: β of each MCMC step
│ │ ├── BGChaplo.EW.100.1.gamma-quantile.txt EW replicate 1: γ quantiles of each MCMC step
│ │ ├── BGChaplo.EW.100.1.hdf5 EW replicate 1: HDF5 formatted output made to be summarized by estpost
│ │ ├── BGChaplo.EW.100.1.hi.txt EW replicate 1: Hybrid index of each MCMC step
│ │ ├── BGChaplo.EW.100.1.zeta-quantile.txt EW replicate 1: ζ quantiles of each MCMC step
│ │ ├── BGChaplo.EW.100.2.LnL.txt EW replicate 2: Log likelihood of each MCMC step
│ │ ├── BGChaplo.EW.100.2.alpha.txt EW replicate 2: α of each MCMC step
│ │ ├── BGChaplo.EW.100.2.beta.txt EW replicate 2: β of each MCMC step
│ │ ├── BGChaplo.EW.100.2.gamma-quantile.txt EW replicate 2: γ quantiles of each MCMC step
│ │ ├── BGChaplo.EW.100.2.hdf5 EW replicate 2: HDF5 formatted output made to be summarized by estpost
│ │ ├── BGChaplo.EW.100.2.hi.txt EW replicate 2: Hybrid index of each MCMC step
│ │ ├── BGChaplo.EW.100.2.zeta-quantile.txt EW replicate 2: ζ quantiles of each MCMC step
│ │ ├── BGChaplo.EW.100.3.LnL.txt EW replicate 3: Log likelihood of each MCMC step
│ │ ├── BGChaplo.EW.100.3.alpha.txt EW replicate 3: α of each MCMC step
│ │ ├── BGChaplo.EW.100.3.beta.txt EW replicate 3: β of each MCMC step
│ │ ├── BGChaplo.EW.100.3.gamma-quantile.txt EW replicate 3: γ quantiles of each MCMC step
│ │ ├── BGChaplo.EW.100.3.hdf5 EW replicate 3: HDF5 formatted output made to be summarized by estpost
│ │ ├── BGChaplo.EW.100.3.hi.txt EW replicate 3: Hybrid index of each MCMC step
│ │ ├── BGChaplo.EW.100.3.zeta-quantile.txt EW replicate 3: ζ quantiles of each MCMC step
│ │ ├── BGChaplo.EW.100.LnL.txt EW across reps: Log likelihood of each MCMC step
│ │ ├── BGChaplo.EW.100.alpha.txt EW across reps: α of each MCMC step
│ │ ├── BGChaplo.EW.100.beta.txt EW across reps: β of each MCMC step
│ │ ├── BGChaplo.EW.100.gamma-quantile.txt EW across reps: γ quantiles of each MCMC step
│ │ ├── BGChaplo.EW.100.hi.txt EW across reps: Hybrid index of each MCMC step
│ │ └── BGChaplo.EW.100.zeta-quantile.txt EW across reps: ζ quantiles of each MCMC step
│ ├── haplo.EN.100FST_AB.txt Final island and rivers file for EN contact zone
│ ├── haplo.ES.100FST_AB.txt Final island and rivers file for ES contact zone
│ └── haplo.EW.100FST_AB.txt Final island and rivers file for EW contact zone
│\
├── migrate Migrate-n analysis directory
│ ├── EN.all Directory for EN contact zone, all loci
│ │ ├── EN.all.mig EN migrate-n input file (all loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── EN.islands Directory for EN contact zone, island loci
│ │ ├── EN.islands.mig EN migrate-n input file (island loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── EN.rivers.neg Directory for EN contact zone, negative river loci
│ │ ├── EN.rivers.neg.mig EN migrate-n input file (negative river loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── EN.rivers.pos Directory for EN contact zone, positive river loci
│ │ ├── EN.rivers.pos.mig EN migrate-n input file (positive river loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── ES.all Directory for ES contact zone, all loci
│ │ ├── ES.all.mig ES migrate-n input file (all loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── ES.islands Directory for ES contact zone, island loci
│ │ ├── ES.islands.mig ES migrate-n input file (island loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── ES.rivers.neg Directory for ES contact zone, negative river loci
│ │ ├── ES.rivers.neg.mig ES migrate-n input file (negative river loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── ES.rivers.pos Directory for ES contact zone, positive river loci
│ │ ├── ES.rivers.pos.mig ES migrate-n input file (positive river loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── EW.all Directory for EW contact zone, all loci
│ │ ├── EW.all.mig EW migrate-n input file (all loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── EW.islands Directory for EW contact zone, island loci
│ │ ├── EW.islands.mig EW migrate-n input file (island loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── EW.rivers.neg Directory for EW contact zone, negative river loci
│ │ ├── EW.rivers.neg.mig EW migrate-n input file (negative river loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ ├── EW.rivers.pos Directory for EW contact zone, positive river loci
│ │ ├── EW.rivers.pos.mig EW migrate-n input file (positive river loci)
│ │ ├── outfile.pdf Migrate-n output PDF
│ │ └── outfile.txt Migrate-n output text file
│ └── MIGRATE_PARAMS Migrate-n parameter used
│\
├── upset Comparative analysis directory (Fisher's and CMH tests, clustering, upset plots, power analysis, SNP effects)
│ ├── GO_outliers.txt GO 2 categories and which group (zone or marker class) in which they are enriched
│ ├── annotation_heatmap1.txt Hierarchical clustering of gene function by group
│ ├── annotation_heatmap2.txt Hierarchical clustering of gene function by group (sorted)
│ ├── fisher_odds_ratio.txt Fisher's test results of gene overlap (odds ratio)
│ ├── fisher_pval_corr.txt Fisher's test results of gene overlap (p-value corrected)
│ └── fisher_pval_uncorr.txt Fisher's test results of gene overlap (p-value uncorrected)
│\
└── gwas GWAS analysis directory
├── GWAS.txt GWAS results from GWAA method (not primary method of paper)
├── cross.data.file.csv GWAS genotype matrix for lmem.gwaser method
├── gwas_quantiles.txt GWAS outliers and zone quantiles from lmem.gwaser method
├── gwas_reports lmem.gwaser analysis directory
│ ├── gwas_results_p.file.txt GWAS intermediate output from lmem.gwaser method
│ └── gwas_results_selected.txt GWAS intermediate output from lmem.gwaser method
└── lmem_table.txt GWAS outliers, ddRAD positions, and p-values
METHODOLOGICAL INFORMATION
For all information about how the analyses were performed, which parameters were chosen, and for all sample meta-data, please refer to the main article and the supporting information.