Discordance among nuclear, mitochondrial, plumage, and vocal differentiation in the Dysithamnus mentalis (Plain Antvireo) complex
Data files
Abstract
The study investigates genetic structure and phenotypic variation in Dysithamnus mentalis, a bird species complex with 18 plumage-based taxa distributed across Central and South America. The taxonomy of this complex has long been controversial due to the sparse sampling in prior studies, and recent genetic work added complexity to the problem by revealing discordance between genotypic and phenotypic variation. We integrate genomic and phenotypic data to investigate the population genetic structure and geographic variation of the D. mentalis complex, sampling extensively across its vast range. We find that the D. mentalis complex comprises 12–14 phenotypically distinct populations grouped into six distinct nuclear genomic clusters. Mitochondrial variation exhibits a different geographic structure that does not align fully with either nuclear genomic or phenotypic variation. While some genetic and phenotypic clusters align, our results revealed widespread discordance between patterns of variation inferred from different markers. Increased geographic sampling further revealed multiple previously unrecognized hybrid zones, indicating weak premating reproductive isolation between most parapatric populations. However, some lineages exhibit substantial genome-wide differentiation and limited gene flow in contact zones, suggesting some degree of reproductive isolation. Our results offer a redefined understanding of the genetic structure and geographic phenotypic variation of the D. mentalis complex, providing a significant step toward resolving its taxonomy while highlighting areas that require further research, particularly in newly identified contact zones.
This repository contains all data and code necessary to reproduce analyses in the article "Discordance among nuclear, mitochondrial, plumage, and vocal differentiation in the Dysithamnus mentalis (Plain Antvireo) complex," published in Ornithology by Lima et al. in 2025.
Description of the compressed files
The file data.zip contains all files used in the R scripts described below. These files are described in further detail below.
The file code.zip contains R scripts necessary to reproduce all genetic and vocal analyses and most figures in the paper (except maps, which were made in QGIS).
Note: Plumage and playback experiment data were analyzed only qualitatively and are provided in the supplementary material of the article at https://doi.org/10.1093/ornithology/ukaf044
Description of the data files in data.zip
This comma-separated spreadsheet contains song trait measurement data. Each row is a unique sample (i.e., one measured vocalization). Empty cells indicate unavailable data; for instance, a vocalization with only ten notes will have empty cells for measurements pertaining to an eleventh note, which is absent in this sample.
Variables:
- recording: Unique accession number for the sound recording used for measurement.
- group: One of several groups made for comparisons.
- sex: Sex of the recorded bird. Either male (m) or unknown (u).
- country: Country of recording.
- locality: Locality of recording.
- latitude: Latitude in decimal degrees.
- longitude: Longitude in decimal degrees.
- has.intro.note: Indicates whether the measured song has an introductory note, as defined in the methods of the study.
- nxd: Duration of the nth note, in seconds (e.g., n1d is the duration of the 1st note).
- nxi: Duration of the nth interval between notes, in seconds (e.g., n1i is the duration of the 1st interval).
- nxfa: Frequency amplitude of the nth note, in hertz (e.g., n1fa is the frequency amplitude of the 1st note).
- nxpf: Peak frequency of the nth note, in hertz (e.g., n1pf is the peak frequency of the 1st note).
- number.of.notes: Number of notes.
- total.dur: Total duration of the vocalization, in seconds.
- overall.pace: Pace of the vocalization, in notes per second.
- pace.1section: Pace of the 1st third of the vocalization (in notes per second).
- pace.2section: Pace of the 2nd third of the vocalization (in notes per second).
- pace.3section: Pace of the 3rd third of the vocalization (in notes per second).
- change.in.pace: Pace of the first section divided by the pace of the third section.
- peak.freq.1section: Peak frequency of the note with the highest peak frequency in the 1st third of the vocalization. Measured in hertz.
- peak.freq.2section: Peak frequency of the note with the highest peak frequency in the 2nd third of the vocalization. Measured in hertz.
- peak.freq.3section: Peak frequency of the note with the highest peak frequency in the 3rd third of the vocalization. Measured in hertz.
- change.in.pf: Peak frequency pattern along the vocalization.
- peak.freq: Peak frequency of the note with the highest peak frequency in the entire vocalization. Measured in hertz.
- freq.ampl.1section: Mean frequency amplitude of notes in the 1st third of the vocalization. Measured in hertz.
- freq.ampl.2section: Mean frequency amplitude of notes in the 2nd third of the vocalization. Measured in hertz.
- freq.ampl.3section: Mean frequency amplitude of notes in the 3rd third of the vocalization. Measured in hertz.
- change.in.fa: Frequency amplitude pattern along the vocalization.
Files used for the EEMS analysis. dysit.diffs is a matrix of average pairwise genetic dissimilarities, dysit.coord is a list of sample coordinates in the same order as the rows of the dissimilarity matrix, and dysit.outer is a list of coordinates delimiting a polygon where the analysis was performed.
Mitochondrial DNA sequence data.
Unfiltered set of 222,171 SNPs in variant call format (for details, see Methods in the paper).
Set of 3,187 quality-filtered SNPs in variant call format (for details, see Methods in the paper).
Set of 1,231 SNPs with a single randomly selected SNP per locus, subset from the previous one (for details, see Methods in the paper).
Sample information used in various R scripts for genetic analyses.
Variables:
- collection: Acronym for collection name.
- skin.number: accession number for study skin (if any).
- tissue.number: accession number for tissue sample (if any).
- DNA.source: either "tissue" or "toepad".
- sample.id: unique custom identifier for samples.
- sequence.id: id used by sequencing facility.
- probe: either 2.5K or 5K, denoting number of UCEs targeted during sequencing.
- included.in.nuclear.analyses: "yes" or "no".
- included.in.mitochondrial.analyses: "yes" or "no".
- pop: one of multiple possible populations for analysis.
- locality: sampling location name.
- latitude: latitude in decimals.
- longitude: longitude in decimals.
- sampling.site: one of multiple aggregated localities (used in some analyses where samples within a certain radius of each other were grouped).
- sampling.site.lat: latitude in decimals of the aggregated locality.
- sampling.site.long: longitude in decimals of the aggregated locality.
- plot.order: number indicating plot order for some graphs (e.g., sNMF bar plot).
Description of the code files in code.zip
R script to perform all vocal analyses and most associated figures (except maps, which were made in QGIS).
Scripts to call SNPs from UCE sequence data.
R script to perform all SNP filtering described in the paper.
R script to perform PCA on filtered SNP data.
R script to perform DAPC on filtered SNP data.
R script to perform sNMF analysis on filtered SNP data.
R script to estimate pairwise Fst on filtered SNP data.
R script to perform EEMS analysis on filtered SNP data.
