Sequential introgression of a carotenoid processing gene underlies sexual ornament diversity in a genus of manakins
Data files
Nov 21, 2024 version files 20.80 MB
-
adn8339.zip
20.79 MB
-
README.md
9.48 KB
Abstract
In a hybrid zone between two tropical lekking birds, the yellow male plumage of one species has introgressed asymmetrically replacing the white plumage of another via sexual selection. Here, we present a detailed analysis of the plumage trait to uncover its physical and genetic bases and trace its evolutionary history. We determine that the carotenoid lutein underlies the yellow phenotype, and describe microstructural feather features likely to enhance color appearance. These same features reduce the predicted water-shedding capacity of feathers, a potential liability in the tropics. Through genome-scale DNA sequencing of hybrids and each species in the genus, we identify BCO2 as the major gene responsible for the color polymorphism. The BCO2 gene tree and genome-wide allele frequency patterns suggest that carotenoid-pigmented collars initially arose in a third species and reached the hybrid zone through historical gene flow. The complex interplay between sexual selection and hybridization has thus shaped the phenotypes of these species, where conspicuous sexual traits are key to male reproductive success.
README: Sequential introgression of a carotenoid processing gene underlies sexual ornament diversity in a genus of manakins
A directory containing data files necessary to produce the feather-related results, including microstructure, spectrophotometry, and HPLC chromatograms.
- Barb.data.csv - Individual replicate feather measurements for each species (5 measurements per feather per individual). Contains measurement of barb width, spacing, length, D and r from Rijke (1970) equations, repellency and penetration resistance estimates, and presence/absence of barbules.
- USNM: Specimen catalog number at National Museum of Natural History
- Barb.Spacing: Distance between adjacent barbs (µm)
- Barb.D: Half the distance between adjacent barbs
- Barb.Length: Barb length (µm)
- Barb.Width: Barb width (µm)
- Barb.r: Barb radius (µm)
- Repellency (unitless), Penetration (g/cm^2): Calculations based on equations (1) and (2) in the main text methods.
- barb_data_avgs.csv - Output from feather.morph.script.R. Contains the morphology measurements averaged per individual bird for each species. Is combined with data from feather_color_for_R.csv in the script. Contains average values for barb length, width, spacing, feather repellency (Rijke 1970), and feather penetration resistance (Rijke 1970). See the above descriptions.
- Feather_color_for_R.csv - Contains metadata for feather samples, and for each feather measured the lutein concentrations from HPLC analysis and all Pavo components from spectrophotometric analysis (B1-B3, S1-S9, H1-H5). See the Pavo documentation for descriptions of colorimetric variables: https://rafaelmaia.net/pavo/reference/summary.rspec.html.
- HPLC_chromatograms.csv - HPLC output
- Time: Minutes
- Other columns: intensity of standards and samples. Conversion of feather IDs (e.g., 6c, 13b, etc.) to species/USNM numbers can be found in Feather_color_for_R.csv.
Feather scripts
Code necessary to produce feather results.
feather.morph.script_04-04-24.R - Takes as input the barb morphology measurements recorded for each species and models differences in barb length, width, and number. The script also models feather repellency and penetration resistance among species and generates comparative graphs. Outputs averages of feather morphology for import to feather.color.script.R.
feather.color.script_06-03-2024.R - Processes feather color data from HPLC, spec, and microscopy and combines these with feather morphology averages. The script contains models of lutein concentrations among species, lutein’s effect on color components from Pavo, HPLC chromatogram plotting, color component comparisons among species, and barb morphology’s effect on color components from Pavo. Generates several comparative graphs and figures.
Genetics data
The necessary data for producing many of the genetic results. Some analyses that are described sufficiently in the Methods would need to be regenerated from the raw genetic data. Access to that data on NCBI is described in the text of the paper.
- popgen3_10k.csv - Output from popgenWindows script available here: https://github.com/simonhmartin/genomics_general. Focal populations are sites 3 and 4.
- sites: Number of sites in the genomic window used for population genetics statistics calculation
- pi_3: Nucleotide diversity, site 3
- pi_4: Nucleotide diversity, site 4
- dxy_3_4: dxy between sites 3 and 4
- Fst_3_4: Fst between sites 3 and 4
- popgen_4-12_min21_10k.csv - Output from popgenWindows. Focal populations are sites 4 and 12
- See above for variable descriptions
- P1pop3.10kb.pip.csv - Output from the ABBABABAWindows script from the GitHub page above
- Output variables described in Martin et al. 2015. Evaluating the use of ABBA–BABA statistics to locate introgressed** **loci. Molecular Biology and Evolution 32(1):244-257
- chromosome_orderings.csv - Output from RaGOO, scaffolding M. vitellinus reference scaffolds to chromosomes
- orient: Orientation of scaffold on reference chromosome
- conf_pos: Confidence of scaffold position
- conf_orient: Confidence of scaffold orientation
- fix_diff.txt - Invariant site locations, used in get.popgen.data.R to correct pi and dxy estimates
- GCF_001715985.3_ASM171598v3_assembly_report.txt - M. vitellinus assembly data downloaded from NCBI used to get scaffold lengths
- freqs_min21_filtered.csv - dSNP locations (described in the text), also available in data S1
- pwfreq0/1/u: Frequency of the ref/alt/uncalled allele among individuals in the "parental white" site, AKA, site 2
- wfreq0/1/u: Same as above in the cross-river white-collared site, AKA, site 3
- yfreq0/1/u: Same as above in the cross-river yellow-collared site, AKA, site 4
- pyfreq0/1/u: Same as above in the "parental yellow" site, AKA, site 12
- shifted_clines.csv - Locations of SNPs showing cline center significantly shifted from the center of the hybrid zone (also available as data S2).
- center: Cline center, measured in km from site 10
- width: Cline width, measured in km
- Pmin/max: Observed minimum/maximum allele frequencies
- center/width_low/med/high: 5%, 50%, and 95% credibility intervals for cline center/width
- bco2_48kb.treefile, dcps.treefile, and tbcel.treefile - End results from BCO2 gene tree analysis. Generating these is described in tree_workflow.txt and gatk_baseQ_recalibration_workflow.txt.
- manacus_filter1-snp-mm10_ABBABABAv2.csv - Output of ABBABABAWindows script from Github page above. This one is between M. aurantiacus and M. vitellinus. Generating this file is described in tree_workflow.txt, gatk_baseQ_recalibration_workflow.txt, and abba-baba_workflow.txt. Variables described in Martin et al. 2015. Evaluating the use of ABBA–BABA statistics to locate introgressed** **loci. Molecular Biology and Evolution 32(1):244-257
- Cerebellum.exonCounts.txt - Data used for cerebellum expression analysis (muscle expression data is all available from Pease et al. 2022. Layered evolution of gene expression in “superfast” muscles for courtship. PNAS 119 (14) e2119671119)
- manacus_filter1-snp-mm10-biallel-chr24.vcf - Result of GATK pipeline combining M. vitellinus/M. candei sequencing run and M. aurantiacus/M. manacus sequencing run, filtered to just chromosome 24. Used in topos-analysis-workflow.txt. Information on VCF (Variant Call Format) files can be found here: https://samtools.github.io/hts-specs/VCFv4.2.pdf
- chr24-win100.bed - A .bed file used in topos-analysis workflow.txt to make trees in 100-bp windows across chr. 24. Columns: scaffold, start, end
- aur-vit-topos100.list - Output of topos-analysis-workflow.txt. Lists the position of the 100-bp windows where M. aurantiacus and M. vitellinus are sister to one another, which is the topology expected under gene flow. Input for aur-vit-topos.plot.R.
Genetics scripts
Scripts necessary to run main genomic analyses on site 3 / site 4 divergence and introgression and on M. vitellinus / M. aurantiacus introgression. Also includes data for BCO2 expression analysis in the cerebellum.
- allele.frequency.geno.R - Takes in VCF (generated using methods described in the Methods section) and outputs allele frequencies for each SNP (or only those SNPs passing a custom set of filters for missing data or allele frequency). Outputs dSNP file.
- popgen.source.R - Load before running get.popgen.data.R or plot.popgen.R. Contains custom functions necessary for running those scripts.
- get.popgen.data.R - Processes and combines much of the vitellinus-candei-hybrid data above, including popgen3_10k.csv, P1pop3.10kb.pip.csv, popgen_4-12_min21_10k.csv, chromosome_orderings.csv, fix_diff.txt, GCF_001715985.3_ASM171598v3_assembly_report.txt, and freqs_min21_filtered.csv.
- plot.popgen.R - Plotting Fig. 3, A-E.
- tree_workflow.txt, gatk_baseQ_recalibration_workflow.txt, and abba-baba_workflow.txt - Description and example code for going from raw data to manacus_filter1-snp-mm10_ABBABABAv2.csv and the three tree file documents in the Genetics data directory.
- topos-analysis-workflow.txt - Takes in the manacus_filter1-snp-mm10-biallel-chr24.vcf file and chr24-win100.bed. Creates a consensus sequence for each species in 100-bp windows across chromosome 24, aligns them, and makes trees. Produces a list of window positions that show the topology expected given gene flow between M. aurantiacus and M. vitellinus.
- aur-vit-topos.plot.R - Takes in aur-vit-topos100.list and produces the plots used in Fig. 4H and 4J.
HZAR scripts
Scripts necessary to generate SNP and morphological clines from raw data (available from Long et al. 2024. Ongoing introgression of a secondary sexual plumage trait in a stable avian hybrid zone. Evolution 78(9):1539–1553 and explained in detail at https://github.com/kiralong/HZAR_pipeline/tree/main).
- run_process_radtags.sh - Raw to processed reads
- run_bwa.sh - Align processed reads to reference
- run_gstacks.sh - Assembly RAD loci from aligned reads
- run_populations.sh - Call variants
- filter_sumstats_to_whitelist.py - Filter variants
- HZAR_morphological_clines.R - Run HZAR, generate morphological clines
- HZAR_genetic_clines.R - Run HZAR, generate SNP clines