Data and code from: Dialect formation in ghost bats: Genetic, geographic, and morphological drivers of social and echolocation call divergence
Data files
Apr 22, 2026 version files 15.68 MB
-
all_ct.csv
70.56 KB
-
all_e.csv
42.63 KB
-
all_s.csv
58.09 KB
-
all_u.csv
59.86 KB
-
covariates_corrected_lat_lon_generalised_TandC_swapped.csv
2.98 KB
-
README.md
8.81 KB
-
Report_DBat17-3211_SNP_singlerow_2_NT.csv
15.43 MB
Abstract
Geographical patterns of vocal dialects are poorly understood in bats, despite growing evidence that they possess complex vocal communication systems.
We investigated patterns and drivers of variation in vocalisations recorded at five ghost bat colonies in the Northern Territory, Australia. We calculated the genetic and morphological distances among individuals and investigated correlations with geographic distance. We then determined variation within three ghost bat social vocalisations (“Chirp-trill”, “Squabble”, “Ultrasonic Social”) and their “Echolocation” call using seven spectrographic measurements. Finally, we tested whether acoustic distance could be explained by genetic, geographic, or morphological distance.
Geographic and genetic distance were highly correlated, suggesting the occurrence of isolation by distance. All measures of morphological distance were consistent with Bergmann’s Rule, except noseleaf shape, which is likely constrained by its role in echolocation. Geographic variation was evident within each of the three social vocalisations and the echolocation call, with the patterns of geographic variation differing among the four vocalisation types. The degree of difference was surprising, given the ghost bat’s long-range seasonal dispersal. Acoustic distance in Chirp-trill and Squabble calls was marginally significantly correlated with genetic (and geographic) distance. In contrast, Ultrasonic Social and Echolocation calls varied among colonies but showed no significant associations with other metrics, apart from a weak correlation between Ultrasonic Social distance and forearm length. This supports the view that these ultrasonic calls are under stabilising selection due to their role in foraging and orientation.
This study provides the first evidence of dialect formation in megadermatid bats. It highlights the importance of considering multiple vocalisation types and investigating multiple processes in signal evolution. Overall, we found genetic, geographic, and morphological distances accounted for some of the variation in acoustic differences among colonies, but further work is needed to investigate other processes that may also contribute to dialect formation in ghost bats.
Dataset DOI: 10.5061/dryad.kprr4xhjf
Description of the data and file structure
To investigate dialect formation in the vocal repertoire of the ghost bat (Macroderma gigas), we collected genetic, geographic, morphological, and acoustic data from five colonies in the Northern Territory, Australia. Distances among colonies were calculated for each data type, and relationships between acoustic distance and genetic, geographic, and morphological distance were assessed using R.
This repository contains acoustic measurement datasets, genetic datasets, and sample covariates associated with those analyses.
Files and variables
File: all_ct.csv, all_u.csv, all_s.csv, all_e.csv
Description: These files contain acoustic measurements extracted from individual ghost bat vocalisations. Each row represents one selected vocalisation measured from a sound recording using the software Raven Pro. The same variable definitions apply across all four files.
Variables:
- selec: selection number assigned to the vocalisation within the annotation software.
Units: none - start: start time of the selected vocalisation within the source audio file.
Units: seconds - end: end time of the selected vocalisation within the source audio file.
Units: seconds - Begin.Path: original file path of the source audio recording on the authors’ computer system.
Units: none
Note: retained as original metadata identifying the source recording used for measurement. - Begin.File: filename of the source audio recording from which the vocalisation was measured.
Units: none - bw90: 90% bandwidth of the vocalisation.
Units: Hz
Interpretation: frequency bandwidth containing 90% of the signal energy. - iqrbw: interquartile range bandwidth of the vocalisation.
Units: Hz
Interpretation: the difference between the 75th and 25th percentile of the frequency distribution. - maxentropy: maximum entropy value measured for the vocalisation.
Units: unitless - minentropy: minimum entropy value measured for the vocalisation.
Units: unitless - aggentropy: aggregate entropy value measured for the vocalisation.
Units: unitless - peakfreq: peak frequency of the vocalisation.
Units: Hz
Interpretation: the frequency at which the call was the loudest (kHz) - length: duration of the selected vocalisation.
Units: milliseconds - pfcminfreq: minimum frequency of the peak frequency contour.
Units: Hz - pfcmaxfreq: maximum frequency of the peak frequency contour.
Units: Hz - peaktime: time from the start of the vocalisation to the point of maximum amplitude or peak energy.
Units: seconds
Interpretation: measured relative to the start of the selection. - rownames: row identifier carried over from the original analysis workflow.
Units: none - sound.files: source audio filename associated with the vocalisation.
Units: none - sel.name: filename assigned to the exported or labelled vocalisation selection.
Units: none - site: sampling locality code for the vocalisation.
Units: none
Interpretation key:- PIN = Pine Creek
- CLA = Claravale
- PUN = Pungalina
- KAK = Kakadu
- TOL = Tolmer
- name: vocalisation type.
Units: none
Interpretation key:- CT = chirp–trill
- U = ultrasonic social call
- S = squabble
- E = echolocation call
File: covariates_corrected_lat_lon_generalised_TandC_swapped.csv
Description: Sample metadata for the individuals included in the DArTseq analysis. This file includes sample identifiers, colony/population assignments, and generalised geographic coordinates for the sampling locations.
Variables:
- id: unique sample identifier for each individual.
Units: none - colony: colony or sampling locality from which the individual was assigned.
Units: none - pop: population grouping used in the analyses.
Units: none
Note: population labels correspond to colony names in this file. - lat: latitude of the sampling location.
Units: decimal degrees
Note: coordinates have been generalised to reduce disclosure of sensitive ghost bat roost locations. - lon: longitude of the sampling location.
Units: decimal degrees
Note: coordinates have been generalised to reduce disclosure of sensitive ghost bat roost locations.
File: Report_DBat17-3211_SNP_singlerow_2_NT.csv
Description: Raw DArTseq SNP report for ghost bat samples used in the population genetic analyses. This file is in standard DArT export format and includes several preliminary header rows containing batch, plate, and sample metadata, followed by a main header row and the SNP genotype data table.
File structure:
- The first several rows contain DArT export metadata and sample layout information.
- The main header row begins with locus-level metadata fields and is followed by one column per individual sample.
- Each row after the header represents one SNP locus.
Locus-level variables:
- AlleleID: unique identifier for the SNP locus and allele
- AlleleSequence: full sequence associated with the SNP locus
- TrimmedSequence: trimmed sequence used in the DArT report
- Chrom_Pteropus_vampyrus_v01: reference chromosome/scaffold in the alignment reference genome
- ChromPos_Pteropus_vampyrus_v01: genomic position in the alignment reference genome
- AlnCnt_Pteropus_vampyrus_v01: number of alignments to the reference genome
- AlnEvalue_Pteropus_vampyrus_v01: alignment expectation value (E-value) for the reference genome match
- SNP: SNP identity and base change at the focal position
- SnpPosition: position of the SNP within the sequence
- CallRate: proportion of individuals successfully genotyped at that locus
- OneRatioRef: proportion of one allele state for the reference allele in the dataset, as reported by DArT
- OneRatioSnp: proportion of one allele state for the SNP allele in the dataset, as reported by DArT
- FreqHomRef: frequency of homozygotes for the reference allele
- FreqHomSnp: frequency of homozygotes for the SNP allele
- FreqHets: frequency of heterozygotes
- PICRef: polymorphism information content for the reference allele
- PICSnp: polymorphism information content for the SNP allele
- AvgPIC: average polymorphism information content
- AvgCountRef: average count for the reference allele
- AvgCountSnp: average count for the SNP allele
- RepAvg: repeatability/reproducibility metric reported by DArT
Sample genotype columns:
Columns named by sample ID (e.g., Mg_PC1, Mg_K6, Mg_C5) give the genotype call for each individual at each SNP locus.
Genotype coding used in the raw DArT report:
0= reference allele homozygote1= SNP allele homozygote2= heterozygote-= missing call / null allele state, as reported in the raw DArT export
Notes:
- Sample IDs correspond to individual bats and are linked to sample metadata in the covariates file.
- Some individuals were later excluded during processing and filtering steps; the raw report retains the original exported sample set.
- Preliminary rows above the main header are part of the native DArT export structure and contain plate/batch/sample metadata rather than SNP observations.
Code/software
The analysis code associated with this dataset is provided via the linked Zenodo software record rather than being hosted directly in Dryad. This allows the code to be released under a software licence appropriate for code.
Two scripts are provided:
- Acoustic_analysis_lost_in_translation_updated28Nov25.R: R code for analyses of acoustic distance among colonies for four vocalisation types, and for tests of correlation with genetic, geographic, and morphological distance.
- dartseq_NT_readdata.R: R code for population genetic analyses using DArTseq single-nucleotide polymorphism (SNP) data.
Notes on file paths
The scripts use relative file paths and assume that the user is running them from the relevant project directory.
dartseq_NT_readdata.Rassumes the working directory contains the subfolderrawdata/.Acoustic_analysis_lost_in_translation_updated28Nov25.Rassumes the working directory contains the input CSV filesall_ct.csv,all_s.csv,all_u.csv, andall_e.csv.
Users should place the input files in the expected folder structure or modify the path settings to match their local environment.
