Data from: Limited population structure but signals of recent selection in introduced African Fig Fly (Zaprionus indianus) in North America
Data files
Aug 14, 2025 version files 5.55 GB
-
PO1791_Zaprionus_indianus.annotation.gff
14.03 MB
-
PO1791_Zaprionus_indianus.protein.fasta.gz
4.28 MB
-
PO1791_Zaprionus_indianus.RepeatMasked.gff
37.25 MB
-
PO1791_Zaprionus_indianus.transcript.fasta
23.44 MB
-
README.md
4.44 KB
-
zap_all_called_sv.smoove.square.vcf.gz
107.09 MB
-
zap_full_info_updated_v2.csv
29.56 KB
-
zaprionus.individual.2023.vcf.gz
5.37 GB
Feb 13, 2026 version files 8.38 GB
-
PO1791_Zaprionus_indianus.annotation.gff
14.03 MB
-
PO1791_Zaprionus_indianus.protein.fasta.gz
4.28 MB
-
PO1791_Zaprionus_indianus.RepeatMasked.gff
37.25 MB
-
PO1791_Zaprionus_indianus.transcript.fasta
23.44 MB
-
README.md
4.65 KB
-
zap_all_called_sv.smoove.square.vcf.gz
107.09 MB
-
zap_full_info_updated_v2.csv
29.56 KB
-
zaprionus.individual.2023.vcf.gz
5.37 GB
-
zaprionus.individual.nosingleton.2023.annotated.vcf.gz
2.83 GB
Abstract
Invasive species have devastating consequences for human health, food security, and the environment. Many invasive species adapt to new ecological niches following invasion, but little is known about the early steps of adaptation. Here, we examine the population genomics of a recently introduced drosophilid in North America, the African Fig Fly, Zaprionus indianus. This species is likely intolerant of subfreezing temperatures and recolonizes temperate environments yearly. We generated a new chromosome-level genome assembly for Z. indianus. Using resequencing of over 200 North American individuals collected over four years in temperate Virginia, plus a single collection from subtropical Florida, we tested for signatures of population structure and adaptation within invasive populations. We show that founding populations are sometimes small and contain close genetic relatives, yet temporal population structure and differentiation of populations are mostly absent across North America. However, we identify two haplotypes that are differentiated between African and invasive populations and show signatures of selective sweeps. Both haplotypes contain genes in the cytochrome P450 pathway, indicating these sweeps may be related to pesticide resistance. X chromosome evolution in invasive populations is strikingly different from the autosomes, and a haplotype on the X chromosome that is differentiated between Virginia and Florida populations is a candidate for temperate adaptation. These results show that despite limited population structure, populations may rapidly evolve genetic differences early in an invasion. Further uncovering how these genomic regions influence invasive potential and success in new environments will advance our understanding of how organisms evolve in changing environments.
https://doi.org/10.5061/dryad.q2bvq83v3
Description of the data and file structure
This dataset contains data related to Zaprionus indianus (African fig fly) individuals that were collected from the wild and sequenced with Illumina short read sequencing.
Files and variables
File: zap_full_info_updated_v2.csv
Description: This file contains metadata for the sequencing samples generated in this study and incorporated from other studies. Missing or non-applicable data are indicated with NA.
Variables
- sample.id: A unique name for each sample. IDs beginning with "SRR" were previously sequenced from Comeault et al 2020 or 2021. IDs beginning with "ZAP" were sequenced in this study
- library: The source of the sample. Comeault2020 and Comeault2021 refer to previously publsihed studies. *ZAP-1 ZAP-2 and ZAP-3 *are three 96-well plates of DNA samples prepared for this study.
- well: The location of the sample in its respective plate.
- Location: The location where the fly sample was collected
- NC, HI, TN, NJ, FL, NY, Colombia - invasive range locations previously sampled, see Comeault et al 2020, 2021 for more information
- Kenya, Zambia, SenegalForest, SenegalDesert, SaoTome - native African locations previously sampled, see Comeault et al 2020, 2021 for more information.
- VA-CM=Carter's Mountain Orchard, Charlottesville, VA
- VA-HPO=Hanover Peach Orchard, Mechanicsville, VA
- MIA=Miami, Florida
- Year: The year the sample was collected
- Season: The season the sample was collected.
- Early = August or earlier in the year
- mid = September
- late = October-November
- Sex: The sex of the fly assigned visually at the time of DNA extraction
- long_name: a unique ID including the sample location and number
- date: The date the sample was collected as MM/DD/YY
- continent: The continent where the sample was collected
- group: a unique identifier that groups samples by location and collection year and season
- loc.spec: Groups samples by broader collection locations such as "Africa" and "Northeast"
- sex.to.auto: Ratio of sequencing coverage of the X chromosome to the autosome
- assigned_sex: Sex assigned based on the sex to autosome ratio-may differ from the sex that was determined at the time of DNA extraction.
File: zap_all_called_sv.smoove.square.vcf.gz
Description: A standard VCF file of all structural variants identified by smoove using paired-end sequencing data. All relevant metadata can be found within the header lines. Sample names correspond to those in zap_full_info_updated_v2.csv above.
File: zaprionus.individual.2023.vcf.gz
Description: A standard VCF file of all SNPs identified with the GATK pipeline and the called genotypes for all samples. All relevant metadata can be found within the header lines. Sample names correspond to those in zap_full_info_updated_v2.csv above.
File: zaprionus.individual.nosingleton.2023.annotated.vcf.gz
Description: A standard VCF file that has been filtered to remove singleton SNPs and annotated with functional annotations for the SNPs.
File: PO1791_Zaprionus_indianus.RepeatMasked.gff
Description: A standard gff file containing genome locations of all repeats.
File: PO1791_Zaprionus_indianus.annotation.gff
Description: A gff file containing genome locations of all annotated exons and mRNA sequences.
File: PO1791_Zaprionus_indianus.transcript.fasta
Description: A standard fasta file containing nucleotide sequences of all annotated transcripts.
File: PO1791_Zaprionus_indianus.protein.fasta.gz
Description: A standard fasta file containing amino acid sequences of all annotated proteins.
Code/software
The data analysis workflow is described in the manuscript. The code is provided on Zenodo (individual_sequencing_revision2_for_zenodo.zip) and at https://github.com/ericksonp/Zindianus_individual_sequencing
Access information
Other publicly accessible locations of the data:
- NA
Data was derived from the following sources:
- New individual sequencing data has been deposited in the SRA under project number # PRJNA991922. RNA sequencing from larval and pupal samples, and larval Hi-C data used for scaffolding are deposited under the same project number. The genome sequence has been deposited at DDBJ/ENA/GenBank under the accession JAUIZU000000000.
This dataset contains processed data from whole-genome sequencing of over 200 Zaprionus indianus fruit flies from North America. Reads were mapped to a new genome assembly, and SNPs were called with GATK.
Changes after Aug 14, 2025:
The file zaprionus.individual.nosingleton.2023.annotated.vcf.gz was added to include SNP annotations.
