Male song structure predicts offspring recruitment to the breeding population in a migratory bird
Data files
Feb 29, 2024 version files 1.03 GB
-
ADULTS_2019-2022.csv
-
Agena_Genotyping_PCR_Conditions.csv
-
Bioinformatics_Methods_and_Code.pdf
-
DiSciullo_et_al_HOWR_MSA_fitness_and_song_factor.csv
-
DiSciullo_et_al_HOWR_MSA_fitness.csv
-
DiSciullo_et_al_HOWR_MSA_focal_male_nests.csv
-
DiSciullo_et_al_HOWR_MSA_genotypes.csv
-
DiSciullo_et_al_HOWR_MSA_MCMCped.csv
-
DiSciullo_et_al_HOWR_MSA_phenotypes.csv
-
DiSciullo_et_al_HOWR_MSA_putative_parents.csv
-
DiSciullo_et_al_HOWR_MSA_Raven_Pro_selections.csv
-
DiSciullo_et_al_HOWR_MSA_Rcode.R
-
DiSciullo_et_al_HOWR_MSA_rel_recr_eigenvalues.csv
-
DiSciullo_et_al_HOWR_MSA_rel_time2pair_eigenvalues.csv
-
DiSciullo_et_al_HOWR_MSA_song_and_bout_manual_edit.csv
-
DiSciullo_et_al_HOWR_MSA_song_and_bout.csv
-
DiSciullo_et_al_HOWR_MSA_song_parameters.csv
-
DiSciullo_et_al_HOWR_MSA_strategy.csv
-
DiSciullo_et_al_HOWR_MSA_time2pair.csv
-
filtered_output2.vcf.gz
-
final.thinned.tsv
-
HOWR-1.html
-
HOWR-2.html
-
NS_x.csv
-
NS_y.csv
-
README.md
-
Trinity.fasta
-
Trinity.SuperTrans.fasta
Abstract
Bird song is a classic example of a sexually selected trait, but much of the work relating individual song components to fitness has not accounted for song typically being composed of multiple, often-correlated components, necessitating a multivariate approach. We explored the role of sexual selection in shaping complex male song of house wrens (Troglodytes aedon) by simultaneously relating its multiple components to fitness using multivariate selection analysis, which is widely used in insect and anuran studies but not in birds. The analysis revealed significant variation in the form and strength of selection acting on song across different selection episodes, from nest-site defense to recruitment of offspring to the breeding population. Males that sang more song typically employed in close communication sired more offspring that were subsequently recruited to the breeding population than those that sang far-communication song. However, this relationship was not consistent across earlier selection episodes, as evidenced by non-linear selection acting on these song components in other contexts. Collectively, our results present a complex picture of multivariate selection on male song structure that would not be evident using univariate approaches and suggest possible trade-offs within and among song components at different points of the breeding season.
README: Male song structure predicts offspring recruitment to the breeding population in a migratory bird
This data set includes the files relevant to house wren (Troglodytes aedon) transcriptome assembly and code used in bioinformatic analyses, as well as the files necessary to run the paternity analysis, summarize song, and conduct canonical analyses integral to our study concerning the sexual selection of male song in northern house wrens.
Rachael A DiSciullo(1)*, Anna M Forsman(2), Robert R Fitak(2), John Hunt(3), Pirmin Nietlisbach(1), Charles F Thompson(1), Scott K Sakaluk(1)
(1)Behavior, Ecology, Evolution, and Conservation Section, School of Biological Sciences, Illinois State University, Normal, Illinois, USA
(2)Department of Biology, University of Central Florida, Orlando, Florida, USA
(3)School of Science, Western Sydney University, Penrith, New South Wales, Australia
*corresponding author: radisci@ilstu.edu
Description of the data and file structure
Files for transcriptome assembly
NOTE: The md5sum is also given for each file except "Bioinformatics and Code.pdf". You can verify that you do not have a corrupted file by matching the md5sum for each file given below with the output of UNIX terminal commands like "md5 file" or "md5sum file". The various file formats are described in Appendix 1 immediately following the list of files.
FILE: Trinity.fasta
MD5SUM: 8b07883b45f799ccffef2880844b6342
DESCRIPTION: This file contains the transcriptome
assembly produced using trinity.
FORMAT: FASTA
FILE: Trinity.SuperTrans.fasta
MD5SUM: 96411710a67fbca96f3fcec148f57526
DESCRIPTION: This file contains the transcriptome
assembly after being collapsed into SuperTranscripts.
FORMAT: FASTA
FILE: final.thinned.tsv
MD5SUM: 5059f31f2abd2d67bf7453b7fc7ae7ca
DESCRIPTION: This file contains information about the
final set of SNPs and associate flanking region that was
submitted for the final iPLEX assay development. The
columns are:
SuperTranscript, Position, SNP_ID, SNP_Set, bTaeGut1.4_Chrom, bTaeGut1.4_start, bTaeGut1.4_end, Sequence
FORMAT: TSV
FILE: HOWR-1.html
MD5SUM: d559aa104e96b5a28de27f0e92c21c6a
DESCRIPTION: The results of all the raw read
cleaning and adapter removal using the fastp
software.
FORMAT: HTML
FILE: HOWR-2.html
MD5SUM: d0c8764427adbb32db5cc950467a8e98
DESCRIPTION: The results of all the raw read
cleaning and adapter removal using the fastp
software.
FORMAT: HTML
FILE: filtered_output2.vcf.gz
MD5SUM: 5e9358c00868215318a0ab5ed1ff2329
DESCRIPTION: This file contains all the SNPs
identified among the Super Transcripts. It is
recommended to only use SNPs with the "PASS"
label for subsequent analyses.
FORMAT: VCF
FILE: Bioinformatics Methods and Code
DESCRIPTION: This file contains supplementary bioinformatics methods and code.
FORMAT: PDF
Appendix 1: File Formats
Format: FASTA
This format represents DNA sequences without quality
scores. See
https://en.wikipedia.org/wiki/FASTA_format
for more specific information
Format: TSV
This format stores tabulated data in plain text format
Each line of this format is a record, with fields, or
columns separated by a tab ("\t"). Can be opened
in any text editor or spreadsheet software, e.g.
MS Excel
Format: HTML
HTML (Hyper Text Markup Language) is the format for
web pages created for display in browsers. These files
Can be opened in any web browser.
Format: VCF
This is the standard format for representing genomic
variation data. It is a tab-delimited text file that
also contains a header. All header lines begin with
a "#". The complete VCF specification can be found at
the following link:
http://samtools.github.io/hts-specs/VCFv4.2.pdf
The format represents a VCF file that has been
compressed (.gz). Most computers can uncompress this
file into a file by double-clicking automatically.
In most unix machines, it can easily be unpacked from
the command line using:
gunzip file.vcf.gz
Files for paternity analysis, summarising song, and conducting canonical analyses
FILE: DiSciullo_et_al_HOWR_MSA_Rcode.R
DESCRIPTION: This file contains the R code that allows one to conduct the paternity analysis, summarise song, and conduct the canonical analyses.
FILE: ADULTS_2019-2022.csv
DESCRIPTION: This file contains all adult house wrens captured or visually sighted on the Mackinaw Study Area from the 2019-2022 breeding seasons (April-August). The columns are BANDNO (unique number of aluminum band), NEST (nest of capture or sighting), SEX (M=male, F=female), DATE (day of year of capture or sighting), COLR (color of bands on right leg, top to bottom), COLL (color of bands on left leg, top to bottom), NOTES, YEAR
FILE: Agena_Genotyping_PCR_Conditions.csv
DESCRIPTION: This file contains the PCR conditions, and SNP and oligo data from the University of Minnesota Genomics Center Agena Bioscience iPLEX Genotyping.
FILE: DiSciullo_et_al_HOWR_MSA_fitness.csv
DESCRIPTION: This file contains fitness proxy information for the 63 focal males from the 2018 breeding season (except recruitment of young, which is determined in subsequent breeding seasons). The columns are maleID (unique number of aluminum band), nest (nest of capture or sighting), num_nest_sites (number of nests at which a male was captured or sighted defended), time2paired_ctrl4Msettle (the regression of average male settlement on average number of days between male and female settlement), WPYsired (number of within-pair young sired), WPYandEPYsired (number of within- and extra-pair young sired), num_sired_yg_recruited (number of young recruited to the breeding population in 2019 and 2020)
FILE: DiSciullo_et_al_HOWR_MSA_fitness_and_song_factor.csv
DESCRIPTION: This file contains fitness proxy information for the 63 focal males and the extracted values of the four song factors from the factor analysis. The columns are maleID (unique number of aluminum band), nest (nest of capture or sighting), num_nest_sites (number of nests at which a male was captured or sighted defended), time2paired_ctrl4Msettle (the regression of average male settlement on average number of days between male and female settlement), WPYsired (number of within-pair young sired), WPYandEPYsired (number of within- and extra-pair young sired), num_sired_yg_recruited (number of young recruited to the breeding population in 2019 and 2020), Close.MLCF1 (factor ML1), Complex.MLCF2 (factor ML3), Perform.MLCF3 (factor ML2), Far.MLCF4 (factor ML4)
FILE: DiSciullo_et_al_HOWR_MSA_focal_male_nests.csv
DESCRIPTION: This file contains a list of all nests at which a male was captured or sighted defended in the 2018 breeding season. The columns are nest (nest of capture or sighting), maleID (unique number of aluminum band), COLR (color of bands on right leg, top to bottom), COLL (color of bands on left leg, top to bottom)
FILE: DiSciullo_et_al_HOWR_MSA_genotypes.csv
DESCRIPTION: This file contains the genotypes for 807 house wrens (104 males, 84 females [four of whom have no genotype--all loci set to NA], and 619 offspring [two of whom have no genotype--all loci set to NA]) for the paternity analysis. The first column "id" is the unique number of aluminum band. All other columns are two alleles of one locus (Name_1 and Name_2). Loci at which genotyping was unsuccessful have NA.
FILE: DiSciullo_et_al_HOWR_MSA_MCMCped.csv
DESCRIPTION: This file is the output from the MCMC model. The column names are id (unique number of aluminum band of the 619 offspring), dam (social.mother of the offspring as supplied in the model), sire (the most likely sire from the model), prob (posterior probability of assignment of father to offspring)
FILE: DiSciullo_et_al_HOWR_MSA_phenotypes.csvDESCRIPTION: This file contains the phenotypes for 807 house wrens (104 males, 84 females, and 619 offspring for the paternity analysis. The columns are id (unique number of aluminum band), sex (Male, Female, or NA if offspring), social.father (unique number of aluminum band of father who tended to offspring in the nest), social.mother (unique number of aluminum band of mother who tended to offspring in the nest), offspring (0=adult, 1=offspring), nest (nest of social family), notes, long (average x coordinate of all nests at which the male was captured or sighted), lat (average y coordinate of all nests at which the male was captured or sighted), link.father (unique identified to connect social.father value in the same row to id value of the male in another row), link.mother (unique identified to connect social.mother value in the same row to id value of the male in another row)
FILE: DiSciullo_et_al_HOWR_MSA_putative_parents.csv
DESCRIPTION: This file is the output from the MCMC model with the social father information added. The column names are id (unique number of aluminum band of the 619 offspring), nest (nest of social family), social.mother (unique number of aluminum band of mother who tended to offspring in the nest), dam (social.mother of the offspring as supplied in the model), social.father (unique number of aluminum band of father who tended to offspring in the nest), sire (the most likely sire from the model), prob (posterior probability of assignment of father to offspring), match.fat.90 (TRUE if social.father=fat.90, else FALSE), fat.90 (unique number of aluminum band of sire when prob >0.9, else "unk" for unknown)
FILE: DiSciullo_et_al_HOWR_MSA_Raven_Pro_selections.csv
DESCRIPTION: This file contains the concatenated output from Raven Pro 1.6 selection files, to summarise song parameters. The column names are maleID (unique number of aluminum band), recording (name of .wav recording file), day (day of year on which recording was gathered), duration (length of the recording in min), name (initials of individual who recorded the file), nest (active nest at which the recording was gathered), status (breeding stage [pre=pre-pairing]), fem (Y=female was sighted during the recording, N=no female was sighted during the recording, unk=unknown if the female was present during the recording), scored (date the recording file was quantified in Raven Pro 1.6), Begin Time (s)
(time in sec that the selection begins), End Time (s)
(time in sec that the selection ends), BW 90% (Hz)
(the difference between the frequencies [Hz] that divide the selection into two frequency intervals that contain 5% and 95% of the energy), notes to self
, section (intro=introduction section, terminal=terminal section), element (name of the note; all uppercase letters are notes in the introduction and all lowercase letters are notes in the terminal section), introNum (overall number of the introduction sections within the file, labelled sequentially), termNum, (overall number of the terminal sections within the file, labelled sequentially), songNum (overall number of the song within a file including intro-only, terminal-only, or intro+terminal [full-song], labelled sequentially), introOnly (Y=song is only composed of introduction section notes, blank otherwise), termOnly (Y=song is only composed of terminal section notes, blank otherwise)
FILE: DiSciullo_et_al_HOWR_MSA_rel_recr_eigenvalues.CSV
DESCRIPTION: This file contains the extracted values of the four song factors (Close.MLCF1, Complex.MLCF2, Perform.MLCF3, Far.MLCF4), the relative recruitment (rel.recr), and the indivdual scores for each eigenvector of the canonical analysis comparing song factors to recruitment (m1, m2, m3, m4). The column names are Close.MLCF1, Complex.MLCF2, Perform.MLCF3, Far.MLCF4, rel.recr, m1, m2, m3, m4.
FILE: DiSciullo_et_al_HOWR_MSA_rel_time2pair_eigenvalues.CSV
DESCRIPTION: This file contains the extracted values of the four song factors (Close.MLCF1, Complex.MLCF2, Perform.MLCF3, Far.MLCF4), the relative time-to-pairing (rel.time2pair), and the indivdual scores for each eigenvector of the canonical analysis comparing song factors to time-to-pairing (m1, m2, m3, m4). The column names are Close.MLCF1, Complex.MLCF2, Perform.MLCF3, Far.MLCF4, rel.time2pair, m1, m2, m3, m4.
FILE: DiSciullo_et_al_HOWR_MSA_song_and_bout.csv
DESCRIPTION: This file contains the summarised sequences of notes for each song within a file and when (in sec) the song starts and ends within the file, for song-type renaming and bout number labelling. The columns are recording (name of .wav recording file), songNum (overall number of the song including intro-only, terminal-only, or intro+terminal [full-song] within the file, labelled sequentially), song (notes of the song, separated by spaces), min.start (time in sec in the recording when the song started), max.end (time in sec in the recording when the song ended), intersong.post (time between consecutive songs (n and n+1) in sec), bout.label (X=bout ends here [intersong.post>30sec])
FILE: DiSciullo_et_al_HOWR_MSA_song_and_bout_manual_edit.csv
DESCRIPTION: This file contains the summarised sequences of notes for each song within a file and when (in sec) the song starts and ends within the file, as well as the manually entered song-type condensing steps and the bout number. The column names are recording (name of .wav recording file), songNum (overall number of the song including intro-only, terminal-only, or intro+terminal [full-song] within the file, labelled sequentially), song (notes of the song, separated by spaces), song.squish (song with repeated notes removed), song.squish.syll (song.squish with syllables identified but true order of note and syllable presentation maintained), song.squish.syll.conden (condensed version of song.squish.syll with extra notes removed and song-type defined as note and syllable presentation order, regardless of number of times a note or syllable was repeated [including half presentations of 2-note syllables]), min.start (time in sec in the recording when the song started), max.end (time in sec in the recording when the song ended), intersong.post (time between consecutive songs (n and n+1) in sec), bout.label (X=bout ends here [intersong.post>30sec]), bout.num (sequential number of bout within a recording)
FILE: DiSciullo_et_al_HOWR_MSA_song_parameters.csv
DESCRIPTION: This file contains the summarised song parameters for use in the factor analysis. The column names are recording (name of .wav recording file), nest (active nest at which the recording was gathered), status (breeding stage [pre=pre-pairing]), maleID (unique number of aluminum band), total_songNum (total number of songs/recording), avg_notesD_per_song (average number of distinct notes/song/recording) avg_total_notes_per_song (average total number of notes/song/recording), avg_song_duration (average song duration/recording), avg_element_rate (average number of notes/song/recording), avg_bandwidth (average 90% frequency bandwidth of notes/song/recording), avg_intro_duration (average duration of the introduction section/recording), avg_term_duration (average duration of the terminal section/recording), avg_term_rate (average number of notes in the terminal section only/song/recording), total_bout (total number of bouts/recording), avg_rate_within_bout (average number songs/sec within a bout per recording), n.distinct.songtype (total number of distinct songs/recording), prop.unique.songtype (proportion of all songs within the recording that are distinct), consistency (1 minus the number of changes between successive song-types divided by the total number of songs sung)
FILE: DiSciullo_et_al_HOWR_MSA_strategy.csv
DESCRIPTION: This file contains the mating strategy used by each of the 63 focal males. The columns are strategy (seq=sequentially monogamous, mon=singly monogamous, pol=socially polygynous)
FILE: DiSciullo_et_al_HOWR_MSA_time2pair.csv
DESCRIPTION: This file contains the infomration necessary to calculate the time-to-pairing proxy that accounts for day of year. The column names are nest (active nest at which the recording was gathered), earliest.M.settle (earliest possible day of year the male added >=50% sticks), latest.M.settle (latest possible day of year the male added >=50% sticks), avg.M.settle (average of earliest.M.settle and latest.M.settle), earliest.F.settle (earliest possible day of year the female added soft lining material), latest.F.settle (latest possible day of year the female added soft lining material), avg.F.settle (average of earliest.F.settle and latest.F.settle), time2pair (difference between avg.F.settle and avg.M.settle), maleID (unique number of aluminum band), COLR (color of bands on right leg, top to bottom), COLL (color of bands on left leg, top to bottom)
FILE: NS_x.csv
DESCRIPTION:This file contains the extracted values of the four song factors for each of the 63 focal males for canonical analysis.
FILE: NS_y.csv
DESCRIPTION: This file contains the five fitness proxies for each of the 63 focal males for canonical analysis.
Sharing/Access information
Links to other publicly accessible locations of the data:
All raw sequencing data can be obtained through the NCBI BioProject database, under accession PRJNA631786
- https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA631786
The computer code and commands used to generate the transcriptome results are available at Bob Fitak's GitHub page. Thanks for promoting data-sharing!!! Bob Fitak 04/11/2023
- https://github.com/rfitak/HouseWren_Transcriptome