Collection and ddRadSeq sequencing data for Sitophilus zeamais from Oaxaca and Chiapas, Mexico
Data files
Mar 08, 2023 version files 24.75 GB
-
Expt81_Lib3_1_S88_L008_R1_001.fastq.gz.trimmed.fq.gz
-
Expt81_Lib3_2_S89_L008_R1_001.fastq.gz.trimmed.fq.gz
-
Expt81_Lib3_4_S90_L008_R1_001.fastq.gz.trimmed.fq.gz
-
Expt81_Lib3_6_S91_L008_R1_001.fastq.gz.trimmed.fq.gz
-
lib6_1_S7_L002_R1_001.fastq.gz.trimmed.fq.gz
-
lib6_2_S8_L002_R1_001.fastq.gz.trimmed.fq.gz
-
lib6_4_S9_L002_R1_001.fastq.gz.trimmed.fq.gz
-
lib6_6_S10_L002_R1_001.fastq.gz.trimmed.fq.gz
-
MX_CollectionData_formatted.xlsx
-
README.md
Abstract
The maize weevil, Sitophilus zeamais, is a ubiquitous pest of maize and other cereal crops worldwide and remains a threat to food security in subsistence communities. Few population genetic studies have been conducted on the maize weevil, but those that exist have shown that there is very little genetic differentiation between geographically dispersed populations and that it is likely the species has experienced a recent range expansion within the last few hundred years. While the previous studies found little genetic structure, they relied primarily on mitochondrial and nuclear microsatellite markers for their analyses. It is possible that more fine-scaled population genetic structure exists due to local adaptation, the biological limits of natural species dispersal, and the isolated nature of subsistence farming communities. In contrast to previous studies, here, we utilized genome-wide single nucleotide polymorphism data to evaluate the genetic population structure of the maize weevil from the southern and coastal Mexican states of Oaxaca and Chiapas. We employed strict SNP filtering to manage large next generation sequencing lane effects and this study is the first to find fine-scale genetic population structure in the maize weevil. Here, we show that although there continues to be gene flow between populations of maize weevil, that fine-scale genetic structure exists. It is possible that this structure is shaped by local adaptation of the insects, the movement and trade of maize by humans in the region, geographic barriers to gene flow, or a combination of these factors.
Methods
Field collections
From 2 August to 18 August 2016, maize weevils were collected from a total of 61 small to medium sized farms within rural communities in southern Mexico. Weevils collected from each community were treated as one population in subsequent analyses. The sample locations were chosen based on previously established relationships with community members. Members of each residence verbally consented to specimen collection, but the need for review was waived by the ethics committee. We sampled 7 communities in Oaxaca, Mexico and 5 communities in Chiapas, Mexico (Fig 1). In each community, we collected approximately 200 maize weevils from 3 to 7 individual residences. These samples were stored in 100% ethanol at ambient temperature during the duration of field collecting. Upon returning to the United States, samples were stored at -20˚C in the laboratory until processed. At each sampling location, we recorded latitude, longitude, and elevation using a Garmin GPS unit (Garmin International, Inc., Olathe, KS, USA) (Table 1). The sites in Oaxaca ranged in elevation from 23.5 m to 2,005 m and those in Chiapas ranged from 8.0 m to 1,142.5 m.
Fig 1. Map of sampling locations. Communities marked with red dots were sequenced in Lane 1, samples marked with blue dots were sequenced in Lane 2. Full community names are listed in Table 1. A) A map of Mexico showing relative location of sampled communities located in the southern and coastal region of Mexico. B) A zoomed in view centered on Oaxaca and Chiapas and the location of the communities where maize weevils were collected.
Table 1. Summary table of communities visited in Oaxaca and Chiapas.
State |
Community |
Label |
Homes |
Elevation |
Latitude |
Longitude |
Oaxaca |
Santa María Yavesía |
YAV |
6 |
1988.25 |
17.234986 |
-96.436065 |
|
Santa Ana Zegache |
ZEG |
4 |
1482.94 |
16.837518 |
-96.724833 |
|
Santos Reyes Nopala |
NOP |
7 |
466.57 |
16.078580 |
-97.154197 |
|
San Isídro Campechero |
ISD |
7 |
105.53 |
15.980961 |
-97.302886 |
|
Santiago Yaitepec |
YAI |
5 |
1801.40 |
16.226998 |
-97.268016 |
|
Santa Rosa de Lima |
SRL |
3 |
24.27 |
16.039082 |
-97.491641 |
|
San Pedro Pochutla |
POC |
6 |
145.58 |
15.760327 |
-96.499309 |
Chiapas |
Huixtla |
HUX |
5 |
13.72 |
15.071078 |
-92.543628 |
|
Golondrinas |
GOL |
4 |
745.00 |
15.419210 |
-92.651981 |
|
Montecristo |
MTC |
4 |
329.31 |
16.937728 |
-93.307673 |
|
Villa Corzo |
VLC |
5 |
585.65 |
16.181779 |
-93.269561 |
|
Nuevo León |
NVL |
5 |
1106.10 |
16.486764 |
-92.572740 |
Table 1. includes the label for each community, which corresponds to figure 1. labels, number of households visited, mean latitude and longitude for each community, and mean elevation (m) of households visited in each community.
Genomic DNA isolations
Genomic DNA was isolated from the head and thorax of individual weevils following the DNeasy Qiagen Kit (Qiagen Inc., Germantown, MD, USA) manufacturer’s suggestions in our laboratory at North Carolina State University in Raleigh, NC. DNA from 24 weevils from each of four communities (SRL, YAV, HUX, and NVL) were isolated for the proof-of-concept sequencing run. DNA from 12 weevils from each of the 8 remaining communities were isolated for the second sequencing run. Homogenized weevil tissue samples were incubated overnight in the lysis buffer solution at 55˚C and on day two of the isolation a RNaseA (4mg/mL) treatment was performed before the recommended wash steps. Finally, samples were eluted into 300 µl of 70˚C dH2O and stored at -20˚C for short term storage or -80˚C for long term storage.
ddRadSeq library preparation and sequencing
Genomic DNA isolated from individual weevils as described above was prepared for sequencing following the ddRadSeq protocol first described by Peterson et al. [25] and adapted to our laboratory as described by Fritz et al. [26]. Two separate libraries were prepared and sequenced on two different lanes of an Illumina HiSeq 2500 125 bp SE. The first library included 96 individual weevils, 24 each from the 4 most distant communities sampled: Santa Maria Yavesia, Santa Rosa de Lima, Huixtla, and Nuevo Leon. This sampling regime included the communities with the highest and lowest elevation from each state. This particular library was used to test the hypothesis that genetic differentiation could be found by genotyping SNPs. A second library, sequenced in the same manner as the first, was prepared after confirmation that genetic differentiation could be identified. This library included 12 individual weevils from each of the remaining 8 communities sampled, Santa Ana Zegache, Santos Reyes Nopala, San Isidro Campechero, Santiago Yaitepec, Pochutla, Golondrinas, Montecristo, and Villa Corzo. The communities in the second library represent intermediate distances and elevations as compared to the first library.
Bioinformatic and statistical analyses
Data produced by each lane was first checked with FastQC, then reads were trimmed with trimmomatic to remove adapter and Illumina indices [27-28]. The trimming was verified by FastQC and a multiQC report was generated to compare results from the different indices [27-29]. Data from both lanes were analyzed together in Stacks v. 1.48 following the de novo procedure described by Rochette and Catchen [30] [31]. The procedure is outlined here to specify certain parameter choices. Samples were first cleaned, demultiplexed, and truncated to 115 bp using the program process_radtags. The per sample coverage was checked. Then, ustacks was run on all samples (-M = 5, -m = 3). The 4 highest coverage individuals from each of the 12 populations were chosen to create the catalog, with the exception of sample zag.c1, which had much higher coverage than other samples. Using the catalog popmap created in the previous step, cstacks was implemented (-n = 5). Then, sstacks was run on all samples. To improve calls, rxstacks was used with the flags --prune_haplo and --conf_lim = 0.10. Then, cstacks and sstacks were run again using the improved catalog. Finally, genotype information for each individual was exported from Stacks in genepop format from the populations program. Using the adegenet package in R, these data were stringently filtered [32-33]. Individuals with greater than 80% missing genotype calls were removed. Then, loci with greater than 35% of missing data were removed. Principal components were then computed and plotted with the same program. F statistics were calculated in hierfstat package in R [34].
Usage notes
Files are in: Command Line (Bash), Stacks (https://catchenlab.life.illinois.edu/stacks/) and R.