Skip to main content

Testing the potential contribution of Wolbachia to speciation when cytoplasmic incompatibility becomes associated with host‐related reproductive isolation

Cite this dataset

Bruzzese, Daniel et al. (2021). Testing the potential contribution of Wolbachia to speciation when cytoplasmic incompatibility becomes associated with host‐related reproductive isolation [Dataset]. Dryad.


Endosymbiont induced cytoplasmic incompatibility (CI) may play an important role in arthropod speciation. However, whether CI consistently becomes associated or coupled with other host-related forms of reproductive isolation (RI) to impede the transfer of endosymbionts between hybridizing populations and further the divergence process remains an open question. Here, we show varying degrees of pre- and post-mating RI exist among allopatric populations of two interbreeding cherry-infesting tephritid fruit flies (Rhagoletis cingulata and R. indifferens) across North America. These flies display allochronic and sexual isolation among populations, as well as unidirectional reductions in egg hatch in hybrid crosses involving southwestern USA males. All populations are infected by a Wolbachia strain, wCin2, whereas a second strain, wCin3, only coinfects flies from the Southwest USA and Mexico. Strain wCin3 is associated with a unique mtDNA haplotype and unidirectional postmating RI, implicating the strain as the cause of CI. When coupled with non-endosymbiont RI barriers, we estimate the strength of CI associated with wCin3 would not prevent the strain from introgressing from infected Southwestern to uninfected populations elsewhere in the USA if populations were to come into secondary contact and hybridize. In contrast, cytoplasmic-nuclear coupling may be sufficient to impede the transfer of wCin3 if Mexican and USA populations were to come into contact. We discuss our results in the context of the general paucity of examples demonstrating stable Wolbachia hybrid zones and whether the spread of Wolbachia among taxa can be constrained in natural hybrid zones long enough for the endosymbiont to participate in speciation.


These data were collected for the published manuscript "Testing the potential contribution of Wolbachia to speciation when cytoplasmic incompatibility becomes associated with host-related reproductive isolation".

Usage notes

README file for:
Testing the potential contribution of Wolbachia to speciation when cytoplasmic incompatibility becomes associated with host-related reproductive isolation,

You can ask Daniel Bruzzese ( or Hannes Schuler ( for any additional help and information.

Authors and acknowledgment:
We would like to thank Thomas Wolfe, Mary Glover, Joseph Mastroni, Meredith Doellman, Cheyenne Tait, Wee Yee, Juan Rull, Martin Aluja, Glen Hood, Robert Goughnour, Christian Stauffer, Patrik Nosil, and Jeffery L. Feder for their contributions to the project.

These files and data part of an open source project.

TEEseq raw paired-end Miseq data and associated barcodes:
The fastq.gz files are gzipped compressed and are otherwise in standard fastq format.
The barcode file is a .txt file where the first column contains the barcode sequence and the second column contains the associated individual ID number.


These are in standard fasta format with the fasta header the ASV number and the sequence the unique ASV. For each gene used for MLST barcoding there are two ASV files, one for paired-end data and the other for single-end data.



ASV counts:
These files count the total number of ASVs reads found for each gene for each individual. There are two summary files for each MLST gene, one for paired-end data and the other for single-end data. 
The first column is labeled OTU which describes what ASV is being counted. The ASVs listed in the OTU column are the same as those found in the ASV fasta files above. The rest of the columns are individual samples that were sequenced, these are the same individuals found in the barcode file above. For each gene used for MLST barcoding there are two ASV summary files, one for paired-end data and the other for single-end data.

Example use: For the paired-end_db_coxA.csv file we see that for individual cinAZ5a, OTU (ASV) 1 has 145 reads associated with it. We can then identify the ASV sequence by looking in the ASV_paired_coxA.fasta file above. 



TEEseq ASV consensus:
This csv file contains the strain consensus for each individual and the strain consensus for each gene for both paired-end and single-end data. Consensus strain typing for each gene was calculated by comparing the number of reads for each ASV. Overall strain consensus was calculated by counting the consensus strains from each MLST gene. There are 16 columns in this file, described below.

pop = where the samples were collected from
ID  = individual ID for each sample
12 columns of strain consensus for each MLST gene (single and paired)
consensus = lists the consensus strain


2018 R. cingulata and R. indifferens eclosion data:
This csv file contains eclosion data from 2018 from Rhagoletis indifferens and rhagoletis cingulata used to calculate allochronic isolation. This file contains 7 columns described below. 

pop = population where flies were collected
pull = date flies were removed from overwintering treatment
region = region flies pops were from (PNW,SW,ENA)
dateEclosed = date adult fly eclosed
numberEclosed = number of adult flies that eclosed on that date
daysToEclosion = how many days the flies took to eclose
yearEclosed = the year these flies eclosed.


Premating trial data:
This csv file contains premating trial data used to calculate premating isolation. This file contains 7 columns described below.

cross = name of cross performed
female.pop = female pop used in cross assay
male.pop = male pop used in cross assay
cross.type = (parental / hybrid)
time = hours each trial was observed for
success = number of matings (lasting > 5 min)
sucess_rate = rate of successful matings


Postmating isolation crossing data:
This csv file contains crossing assay data used to determine postmating isolation. This file contains 12 columns described below.

CrossID = unique identifier for the cross
name = describes what pops were crossed
pop.Female = female population 
pop.Male = male population
Type = how many females were crossed with males in cage 
n_eggs = number of eggs laid during assay
n_hatch = number of eggs hatched during assay
hours_trial = number of hours the assay was run for 
n_mating = total number of matings for that cross name
hatch_rate = rate of egg hatch for the assay
eggsPerDay = number of eggs per day for the assay
cross.type = (parental / hybrid)


COI alignment fasta:

This MUSCLE aligned fasta file contains COI sequences from 10 individuals from each population crossed as well as the GenBank sequences used to root the COI tree. 


ab1 strain verification files:
These individual AB1 files for the wsp and hcpA wolbachia barcoding genes were used to verify the Wolbachia strains in the crossing assays were the same as those identfied with TEEseq. These files are zipped to reduce file size.

Blast results:

These csv files contain saved BLAST results from 7/15/20 for wsp and hcpA genes for the wolbachia strains found in this paper. The wCin2 strain, wCin2 SW strain (only for hcpA), and the wCin3 strains were BLASTED for both the wsp and hcpA barcoding genes resulting in 5 BLAST tables shown below. Each BLAST table contains 7 columns, that represent standard BLAST output (shown below). 

Description = description of BLAST hit 
Max Score = highest alignment score
Total Score = sum of alignment scores
Query Cover = % of query sequence that is aligned
E value = significance of alignment
per.  ident = percent identity of aligned sequence
Accession = GenBank accession number for the BLAST hit

hcpa wCin2 SW BLAST results.csv
hcpa wCin3 BLAST results.csv
hcpa wCin2 BLAST results.csv
wsp wcin3 BLAST results.csv
wsp wcin2 BLAST results.csv


National Institute of Food and Agriculture, Award: 2015‐67013‐23289

FWF Austrian Science Fund, Award: J3527,P31441