Data from: Population structure and gene flow in the global pest, Helicoverpa armigera

Anderson, Craig J.1; Tay, Wee T.2; McGaughran, Angela3; Gordon, Karl2; Walsh, Tom K.2

Published Sep 28, 2016 on Dryad. https://doi.org/10.5061/dryad.875n5

Data files

Sep 28, 2016 version files 225.09 MB

all_bacs.vcf.bz2

193.38 MB
b3_b1b2_snps.vcf.bz2

28.12 MB
denovogbs.tar

2.04 MB
mtdna.fasta

833.70 KB
Supplementary document FINAL.docx

712.96 KB

Abstract

Helicoverpa armigera is a major agricultural pest that is distributed across Europe, Asia, Africa and Australasia. This species is hypothesized to have spread to the Americas 1.5 million years ago, founding a population that is at present, a distinct species, Helicoverpa zea. In 2013, H. armigera was confirmed to have re-entered South America via Brazil and subsequently spread. The source of the recent incursion is unknown and population structure in H. armigera is poorly resolved, but a basic understanding would highlight potential biosecurity failures and determine the recent evolutionary history of region-specific lineages. Here, we integrate several end points derived from high-throughput sequencing to assess gene flow in H. armigera and H. zea from populations across six continents. We first assemble mitochondrial genomes to demonstrate the phylogenetic relationship of H. armigera with other Heliothine species and the lack of distinction between populations. We subsequently use de novo genotyping-by-sequencing and whole-genome sequences aligned to bacterial artificial chromosomes, to assess levels of admixture. Primarily, we find that Brazilian H. armigera are derived from diverse source populations, with strong signals of gene flow from European populations, as well as prevalent signals of Asian and African ancestry. We also demonstrate a potential field-caught hybrid between H. armigera and H. zea, and are able to provide genomic support for the presence of the H. armigera conferta subspecies in Australasia. While structure among the bulk of populations remains unresolved, we present distinctions that are pertinent to future investigations as well as to the biosecurity threat posed by H. armigera.

FASTA mtDNA alignment, 12,248 bp assembly of resequencing data from heliothine species

Heliothine moths were collected between 2004 and 2014 from 16 different countries around the world across various climatic zones and altitudes (Tables S1 and S2), many of which are described in Behere et al. (2007); and Tay et al. (2013). Samples were collected as larvae from wild and crop host plants, as adult moths via light/pheromone traps, or as larvae after bioassay, and preserved in ethanol (>95%) or RNAlater, or stored at -20°C prior to DNA extraction. DNA was extracted from samples using DNeasy blood and tissue kits (Qiagen). Nextera libraries were produced following the manufacturer’s instructions and sequence was generated as 100 bp PE reads (Illumina HiSeq 2000, Biological Resources Facility, Australian National University, Canberra, Australia, as well as at Beijing Genomics Institute, Hong Kong). Sample and sequencing data are included in the supplementary material (Table S2). Raw sequence reads obtained from whole genome sequencing were aligned to the H. armigera mitochondrial genome using BBMap v. 33.43 (http://sourceforge.net/projects/bbmap/), permitting a minimum identity of 0.6 and allowing for a minimum quality threshold equivalent to Q10 over two consecutive bases before reads were trimmed. Reads were assembled using mira v. 4 (Chevreux et al. 2004) before mitobim v. 1.7 (Hahn et al. 2013) was used to iteratively map and assemble whole mitochondrial sequences. Heterozygous bases were removed, sequences were aligned using MAFFT v. 7.017 (Katoh 2002) and sequences were trimmed using the Gblocks v. 0.91b online server (http://molevol.cmima.csic.es/castresana/Gblocks_server.html) (Talavera & Castresana 2007).

mtdna.fasta

SNP data in plink bed format from GBS analysis of Helicoverpa armigera, H. zea and H. punctigera

Heliothine moths were collected between 2004 and 2014 from 16 different countries around the world across various climatic zones and altitudes (Tables S1 and S2), many of which are described in Behere et al. (2007); and Tay et al. (2013). Samples were collected as larvae from wild and crop host plants, as adult moths via light/pheromone traps, or as larvae after bioassay, and preserved in ethanol (>95%) or RNAlater, or stored at -20°C prior to DNA extraction. DNA was extracted from samples using DNeasy blood and tissue kits (Qiagen), before being quantified with a Qubit 2.0.GBS library preparation and sequencing was performed by the Genomic Diversity Facility, Cornell University, NY, USA. Information regarding the samples used and sequencing output is recorded in the supplementary material (Table S1). Briefly, 50 ng of gDNA was digested using PstI, before library construction as in Elshire et al. (2011) and sequencing using an Illumina Hiseq. A negative control was included with each plate. Raw data were assessed for quality and processed using Stacks v. 1.30 (Catchen et al. 2013b). Briefly, process_radtags was used to demultiplex samples, trim to 90 bp and assess the quality of reads before being forwarded to denovo_map, which was run using default settings. The Populations module was then run, limiting the output to loci existing in at least 5% of samples from each sampling location, with at least 5x coverage. The Populations module was used to output SNP data in Plink format.

denovogbs.tar

A VCF file of all whole-genome sequenced heliothine indviduals, aligned to BACs (not those containing CYP337B1, 2 or 3) available on NCBI

BAC descriptions are available in the supplementary document. Heliothine moths were collected between 2004 and 2014 from 16 different countries around the world across various climatic zones and altitudes (Tables S1 and S2), many of which are described in Behere et al. (2007); and Tay et al. (2013). Samples were collected as larvae from wild and crop host plants, as adult moths via light/pheromone traps, or as larvae after bioassay, and preserved in ethanol (>95%) or RNAlater, or stored at -20°C prior to DNA extraction. DNA was extracted from samples using DNeasy blood and tissue kits (Qiagen), before being quantified with a Qubit 2.0. Nextera libraries were produced following the manufacturer’s instructions and sequence was generated as 100 bp PE reads (Illumina HiSeq 2000, Biological Resources Facility, Australian National University, Canberra, Australia, as well as at Beijing Genomics Institute, Hong Kong). Sample and sequencing data are included in the supplementary material (Table S2). Raw reads were aligned to BAC sequences, originally derived from H. armigera and available on NCBI (accessions in supplementary document), using BBMap. Reads were trimmed when quality in at least 2 bases fell below Q10. Only uniquely aligning reads were included in the analysis, to prevent spuriously inferring evolutionary processes occurring independently on each BAC. Outputted BAM files were sorted before duplicate reads were removed and files were annotated with read groups using Picard v. 1.138 (http://picard.sourceforge.net). BAC reference sequences were indexed using Samtools v. 1.1.0 (Li et al. 2009). UnifiedGenotyper in GATK v. 3.3-0 (McKenna et al. 2010) was used to estimate genotypes across all individuals simultaneously, implementing a heterozygosity value of 0.01. Variant call format files containing SNP calls were reformatted into Plink format using VCFtools v. 0.1.12b (Danecek et al. 2011).

all_bacs.vcf.bz2

Supplementary document containing accession codes and eigenstrat analyses of all BACs used as references for whole genome sequencing of heliothine species

Supplementary document containing accession codes and eigenstrat analyses of all BACs used as references for whole genome sequencing of heliothine species.

Supplementary document FINAL.docx

vcf file of all heliothine individuals that have undergone whole genome sequencing aligned to B3 and B1/B2 BAC

Chromosome "1" contains the B3 BAC, Chromsome "B1_B2" contains the B1/B2 BAC. Heliothine moths were collected between 2004 and 2014 from 16 different countries around the world across various climatic zones and altitudes (Tables S1 and S2), many of which are described in Behere et al. (2007); and Tay et al. (2013). Samples were collected as larvae from wild and crop host plants, as adult moths via light/pheromone traps, or as larvae after bioassay, and preserved in ethanol (>95%) or RNAlater, or stored at -20°C prior to DNA extraction. DNA was extracted from samples using DNeasy blood and tissue kits (Qiagen), before being quantified with a Qubit 2.0. Nextera libraries were produced following the manufacturer’s instructions and sequence was generated as 100 bp PE reads (Illumina HiSeq 2000, Biological Resources Facility, Australian National University, Canberra, Australia, as well as at Beijing Genomics Institute, Hong Kong). Sample and sequencing data are included in the supplementary material (Table S2). Raw reads were aligned to BAC sequences, originally derived from H. armigera and available on NCBI (accessions in supplementary document), using BBMap. Reads were trimmed when quality in at least 2 bases fell below Q10. Only uniquely aligning reads were included in the analysis, to prevent spuriously inferring evolutionary processes occurring independently on each BAC. Outputted BAM files were sorted before duplicate reads were removed and files were annotated with read groups using Picard v. 1.138 (http://picard.sourceforge.net). BAC reference sequences were indexed using Samtools v. 1.1.0 (Li et al. 2009). UnifiedGenotyper in GATK v. 3.3-0 (McKenna et al. 2010) was used to estimate genotypes across all individuals simultaneously, implementing a heterozygosity value of 0.01. Variant call format files containing SNP calls were reformatted into Plink format using VCFtools v. 0.1.12b (Danecek et al. 2011). When linkage disequilibrium (LD)-based pruning was necessary, Plink v. 1.07 (Purcell et al. 2007) was used to filter one of a pair of SNPs using a pairwise LD threshold (r2=0.5) within windows of 50 SNPs, moving forwards 5 SNPs per iteration.

b3_b1b2_snps.vcf.bz2