Data and code from: Structural instability and concerted evolution in the mitochondrial control region of the grey-headed lapwing (Vanellus cinereus) during range expansion
Data files
Mar 18, 2026 version files 1.35 MB
-
16699.fas
768.67 KB
-
2454VcVV3.fas
251.50 KB
-
6.6kb.fas
297.08 KB
-
6kb.frags
20.88 KB
-
GeneConv_Visualizer.py
5.36 KB
-
README.md
2.57 KB
Abstract
Mitochondrial genome duplications, particularly within the control region, can influence evolutionary trajectories and population structure, yet their prevalence and dynamics in birds remain insufficiently understood. The grey-headed lapwing (Vanellus cinereus) has recently undergone a rapid range expansion in Japan, providing a unique opportunity to study genome evolution under demographic change. We combined Nanopore long-read sequencing and Sanger validation to characterize complete mitogenomes of 44 V. cinereus individuals from historical and newly established populations across Japan, along with partial sequences from a congener, the northern lapwing (V. vanellus). We analyzed structural variations, gene conversion events, and phylogenetic relationships to elucidate the evolutionary history of these populations. All V. cinereus individuals harbored a conserved ~2.5 kb tandem duplication spanning cytochrome b to the control region. The duplicated copies exhibited high similarity within individuals (mean 99.05%) with evidence of ongoing concerted evolution. We identified a "chimeric" individual displaying discordant phylogenetic positions between copies within single long-reads, capturing a snapshot of the incomplete homogenization process via gene conversion, rather than heteroplasmy. Phylogenomic analysis revealed a specific lineage ("Akita–Okayama" clade) that became predominant in the recently established Okayama population (62.5%), likely due to a founder effect. This study demonstrates the utility of long-read sequencing for resolving complex mitochondrial structures. The results reveal that the mitochondrial genome of V. cinereus is shaped by the interplay of structural instability (duplication), concerted evolution, and demographic history (founder effects) during range expansion.
Dataset DOI: 10.5061/dryad.jsxksn0qw
Description of the data and file structure
This dataset contains the genomic and analytical data supporting the study of mitochondrial duplicated region instability in Vanellus cinereus. The files include whole mitochondrial genome alignments, specific duplicated region alignments, and custom Python scripts for data visualization.
Files and variables
File: 16699.fas
Description: Full mitochondrial genome alignment of Vanellus cinereus (approx. 16,699 bp). Format: FASTA
File: 6.6kb.fas
Description: Sequence alignment of the duplicated 6.6kb control region. Format: FASTA
File: 2454VcVV3.fas
Description: Multiple sequence alignment used for comparative analysis between V. cinereus and V. vanellus. Format: FASTA
File: 6kb.frags
Description: Fragment data file generated for/from gene conversion analysis (GENECONV). Format: .frags (Text-based)
File: GeneConv_Visualizer.py
Description: Custom Python script developed to visualize the gene conversion events detected in the mitochondrial control region. Dependencies: Python 3.
Code/software
The following free and open-source software is recommended to view and process the data:
1. Sequence Alignments (.fas):
- MEGA10 (Molecular Evolutionary Genetics Analysis) can be used to view and edit the FASTA alignment files.
2. Python Script (GeneConv_Visualizer.py):
- Python (v3.9 or later) is required to run the script.
- Required libraries/packages: matplotlib, pandas, and Biopython (optional, depending on the script's specific implementation).
- The script visualizes gene conversion events by processing output fragments from GENECONV software.
3. Fragment Files (.frags):
- These are text-based files that can be viewed with any standard text editor (e.g., Notepad++, TextEdit, or VS Code).
Access information
Other publicly accessible locations of the data:
- Mitochondrial genome sequences newly determined in this study were deposited in the DDBJ/EMBL/GenBank databases under accession numbers LC902723–LC902771. These will be released upon the formal publication of the article.
Data was derived from the following sources:
- Reference mitochondrial genome sequences were obtained from NCBI GenBank (Accession: NC025514, MW303998, NC025637, NC056257).
