Background: Sex determination mechanisms are known to be evolutionarily labile but the factors driving transitions in sex determination mechanisms are poorly understood. All insects of the Hymenoptera are haplodiploid, with males normally developing from unfertilized haploid eggs. Under complementary sex determination (CSD), diploid males can be produced from fertilized eggs that are homozygous at the sex locus. Diploid males have near-zero fitness and thus represent a genetic load, which is especially severe under inbreeding. Here, we study mating structure and sex determination in the parasitoid Cotesia vestalis to investigate what may have driven the evolution of two complementary sex determination loci in this species. Results: We genotyped Cotesia vestalis females collected from eight fields in four townships in Western Taiwan. 98 SNP markers were developed by aligning Illumina sequence reads of pooled DNA of eight different females against a de novo assembled genome of C. vestalis. This proved to be an efficient method for this non-model species and provides a resource for future use in related species. We found significant genetic differentiation within the sampled population but variation could not be attributed to sampling locations by AMOVA. Non-random mating was detected, with 8.1% of matings between siblings. Diploid males, detected by flow cytometry, were produced at a rate of 1.4% among diploids. Conclusions: We think that the low rate of diploid male production is best explained by a CSD system with two independent sex loci, supporting laboratory findings on the same species. Fitness costs of diploid males in C. vestalis are high because diploid males can mate with females and produce infertile triploid offspring. This severe fitness cost of diploid males combined with non-random mating may have resulted in evolution from single locus CSD to CSD with two independent loci.
Reference genome sequence of Cotesia vestalis: scaffolds and contigs
No genome information was available for C. vestalis. In order to build a draft reference genome and to develop SNP assays, we sequenced the entire genome of C. vestalis on a single lane of paired-end sequences (2x100 bp) on an Illumina HiSeq 2000 (Illumina Inc., U.S.A.) instrument. The SNP discovery panel consisted of eight C. vestalis females, one from each of eight fields.Before assembly, Illumina reads were trimmed using an in-house Perl script that trims the sequence as soon as two consecutive bases have a quality score lower than 20. Reads that after trimming had a length smaller than 50 bp were removed from the analysis. To obtain C. vestalis sequence contigs to be used as a pseudo-reference genome, we performed a de novo assembly on the 133 million 100 bp reads using SOAPDENOVO version 1.05 [37]. The assembly was done using a k-mer size of 45 and k-mers that were seen only once were removed (option –d). After contig construction, scaffolding was performed using intra-scaffold closure (option –F) and a minimum length for scaffolding of 50 bp. The total size of the assembly was 152 Mb with a contig N50 size of 761 bp and a scaffold N50 size of 2400 bp.
de Boer_Cotesia vestalis scaffold sequences.scafSeq
Cotesia vestalis SNP information and sequences
From our list of putative SNPs across the C. vestalis genome, we selected 100 SNPs for genotyping assay development. We first selected the 200 largest scaffolds; they varied in length from 17-58Kb and contained a total of 7,878 SNPs. We then removed SNPs with a minor allele frequency (MAF) <0.2, SNPs that had another SNP within 50 bp up- or downstream, and SNPs with more than 2 alleles. The remaining SNPs were binned in MAF bins of 0.2-0.3 (1,908 SNPs), 0.3-0.4 (1,605 SNPs) and 0.4-0.5 (1,156 SNPs). Per MAF bin, SNPs were ranked by SNP quality score. We then selected the SNPs with the highest quality scores, picking 20 SNPs with a MAF between 0.2-0.3 and 40 each with a MAF between 0.3-0.4 and 0.4-0.5, all on different scaffolds. All selected SNPs had a quality score of more than 200 (based on SAMTOOLS), and an average read depth of 61. High-throughput genotyping assays based on allele-specific forward primers were developed for these 100 SNP sequences at KBioscience (now LGC Genomics, Hoddesdon, U.K.).
de Boer_SNP info and sequences.xlsx
Genototypes of C vestalis females at 98 SNPs
SNP genotypes of 139 Cotesia vestalis females collected in Western Taiwan at 98 polymorphic SNPs
de Boer_genotype matrix C vestalis.xlsx
Putative SNPs discovered in genome of Cotesia vestalis
Individual paired-end reads were aligned against the artificial Cotesia vestalis reference genome obtained from the de novo genome assembly using BWA. The resulting BAM file was then used for the identification of putative SNPs using SAMTOOLS and varFilter from the samtools.pl utility. We only considered nucleotide substitutions and ignored small indels. SNPs were filtered that had a mapping quality higher than 20, a minimum read depth of 3 and a maximum read depth of 90 (3x the average read depth, a strategy to avoid orthologous SNPs, e.g. in multi copy genes.
de Boer_putative SNPs.xlsx