GENERAL INFORMATION 1. Title: Song complexity is maintained during inter-population cultural transmission of humpback whale songs 2. Co-authors: Jenny A. Allen*, Ellen C. Garland, Claire Garrigue, Rebecca A. Dunlop, & Michael J. Noad *contact: jenny.allen@griffith.edu.au 3. Data collection: Queensland, Australia (2009-2014) and South Lagoon, New Caledonia (2010-2015) 4. Funding sources: Australian Government Research Training Program, American Australian Association, Winifred Violet Scott Trust, the Royal Society 5. Data citation: Allen et al. (2022). Data from: Song complexity is maintained during inter-population cultural transmission of humpback whale songs, Dryad, Dataset DATA OVERVIEW The data consists of a summary of calculated song complexity metrics, a summary of calculated theme complexity metrics, and a complete set of song transcriptions for each location and year. Transcriptions are numerical sequences which represent the sound units comprising the song patterns. Each number (1-125) represents a unique sound that occurs in the humpback whale song repertoire. These numbers are derived from a self-organising map dictionary comprised of measured sound units taken from across the data sample size (for details, see Allen et al. 2017. Using self-organizing maps to classify humpback whale song units and quantify their similarity. JASA). Color-coding reflects the unique song type being sung in that year and population. There are six colored song types: Purple, Light Purple, Brown, Light Brown, Teal, and Orange. SONG COMPLEXITY METRICS These data are a summary of the metrics used to calculate the complexity score for each year and population (as seen in Figure 2). Each row is a unique year and population record. This reflects the fact that each song type was recorded in each population in a separate year. Number of variables: 12 Number of rows: 12 Missing data: none Variable list: Year: Year from which data were derived Population: Population from which data were derived Song Type: song type based on color-coded system Song cycles: the number of complete song cycles from which data were derived Individuals: the total number of individuals singers from which data were derived Mean Unique Units per Cycle: the average number of unique sound units calculated per song cycle in that population and year Mean Total Units per Cycle: the average number of total sound units (both unique and repeated) per each song cycle in that population and year Mean Song Cycle Length (m): the average length of song cycles in minutes for that year and population Mean Themes: the average number of unique themes per song cycle for that year and population Mean Phrase Length (s): the average length of phrase repetitions in seconds across all themes for that year and population Mean Score of Themes: the average complexity score calculated for all themes in that year and population (this number is relative to the other values in this column) Complexity Scores: the complexity score calculated for each year and population (this number is relative to the other values in this column) THEME COMPLEXITY MEASURES These data are a summary of the metrics used to calculate the complexity score for theme as recorded in each year and population (as seen in Figure 1). Each row is a theme as recorded in a specific year and population. There are a total of 40 unique themes, with 29 of these being recorded in both populations while 11 were only recorded in a single population. Number of variables: 11 Number of rows: 69 Missing data: none Variable list: Year: Year from which data were derived Population: Population from which data were derived Song Type: song type based on color-coded system Theme: unique theme identifier (numbered 1-40) Shared or Unique: identified if a theme was shared between the two populations or was unique to a single population Count: the number of times the theme was recorded in that year and population Relative Occurrence (%): the percentage of times a theme was recorded as a proportion of the total number of phrase repetitions recorded in that year and population Units: the average number of total sound units (both unique and repeated) for that theme as recorded in that population and year Unique Units: the average number of unique sound units for that theme as recorded in that population and year Duration (s): the average length of phrase repetitions in seconds for that theme as recorded in that population and year Complexity Score: the complexity score calculated for that theme as recorded in that year and population (this number is relative to the other values in this column) TRANSCRIPTIONS FOR EACH POPULATION AND YEAR These data represent the individual song transcriptions for all song cycles recorded and used in this study. There are 12 total sheets which comprise these data: EA2009, NC2010, EA2010, NC2011, EA2011, NC2012, EA2012, NC2013, EA2013, NC2014, EA2014, NC2015. Each row in a single phrase repetition of a theme, comprised of a sequence of units. Phrase repetitions do not have uniform length, which results in a variable number of strings. Number of variables: 11 Number of rows: 1147 (EA2009), 223 (NC2010), 1344 (EA2010), 357 (NC2011), 1117 (EA2011), 521 (NC2012), 1004 (EA2012), 327 (NC2013), 1060 (EA2013), 594 (NC2014), 790 (EA2014), 720 (NC2015) Missing data: none Variable list: Year: year in which recording was made Population: population in which recording was made Date: date of recording Record ID: unique identifier for each row, comprised of the recording data and the row number within that date, used to indicate the correct order of the records for each date Singer #: distinct identifier for the singer of the recording Total Cycle Count: distinct identifier for the song cycle of the recording Theme: unique theme identifier (as seen in Theme Complexity Metrics) Units: total number of sound units in phrase repetition Unique Units: total number of unique sound unit types in phrase repetition Phrase Duration (s): total length in seconds of phrase repetition Sequence of units: numeric string representing the sequence of sound units comprising the phrase repetition. As the number of units making up each phrase repetition is variable, the sequence of units for each phrase repetition is not of uniform length. This may make it appear as though there is missing data for “sequence of units” in the shorter sequences, but all sequences are complete.