Genotype data of Anoplophora Glabripennis from invasive populations in North America and native population in Asia
Data files
Feb 07, 2025 version files 1.54 GB
-
490_2768.recode.vcf
5.61 MB
-
FastGBS_platypus_basic.log
1.17 KB
-
FastGBS_platypus_basic.recode.vcf
1.53 GB
-
metadata490.txt
37.32 KB
-
README.md
2.13 KB
Abstract
This dataset provides genomic resources for the invasive Asian longhorned beetle (Anoplophora glabripennis Motschulsky, ALB), a significant pest threatening global forest ecosystems. It includes 2,768 genome-wide single nucleotide polymorphisms (SNPs) derived from invasive ALB populations in North America, enabling the study of genetic variation, invasion history, and population dynamics. The dataset is structured to support analyses of genetic bottlenecks, population expansions, and secondary spread patterns, offering insights into multiple independent introductions from the native range.
The dataset is organized into genotype matrices and metadata, including sample locations, collection dates, and population identifiers. It is reusable for studies on invasion biology, biosurveillance, and biosecurity, providing a foundation for tracing the origins of intercepted individuals and informing pest management strategies. Legal and ethical considerations include compliance with data-sharing policies and restrictions on the use of genetic data for invasive species management.
This resource is designed to enhance genome-based biosurveillance tools, supporting regulatory agencies in strengthening biosecurity measures against ALB and other invasive pests.
https://doi.org/10.5061/dryad.280gb5n05
Description of the data and file structure
490_2768.recode.vcf: This file contains the filtered genotype data in VCF format.
metadata490.txt: This file provides corresponding metadata for the genotype data.
FastGBS_platypus_basic.recode.vcf: This file contains initial genotype data in VCF format generated using the Fast-GBS pipeline. The dataset includes SNPs that underwent basic filtering using VCFtools to reduce file size, by removing variants with excessive missing data or other low-quality characteristics.
FastGBS_platypus_basic.log: This file is a log file generated during the filtering of the initial variant calls using VCFtools.
Files and variables
File: metadata490.txt
Description:
Variables
- INDV: Individual sample identifier (e.g., "P1_c01c02_SKOR52.sort").
- pop_id: Population identifier or code (e.g., "KOR" for Korea).
- Latitude: Geographic latitude of the collection site.
- Longitude: Geographic longitude of the collection site.
- City: City where the sample was collected.
- State.Region: State or region where the sample was collected.
- Country: Country of origin for the sample (e.g., "Korea").
- Range: Indicates whether the population is native or invasive.
- Fst_group: Grouping based on genetic differentiation (Fst), typically indicating a population or region of origin (e.g., "Korea").
- Collection_year: The year in which the sample was collected (e.g., 2001).
File: 490_2768.recode.vcf
Description: This file contains the filtered genotype data in Variant Call Format (VCF). It includes single nucleotide polymorphisms (SNPs) for the Anoplophora glabripennis specimens. The data has undergone quality control filtering, retaining only high-quality, biallelic SNPs that meet specific criteria for missing data, read depth, minor allele frequency, and relatedness among individuals.
DNA was extracted from individual specimens using a single leg or larval thoracic muscle, surface-sterilized with ethanol, flash frozen in liquid nitrogen, and homogenized. The samples were processed using the DNeasy 96 Blood & Tissue Kit (Qiagen). Additionally, previously published A. glabripennis data from China and South Korea were included to generate a native reference collection.
Genotyping was performed using genotyping-by-sequencing (GBS) on an Ion Proton platform. The Fast-GBS v1.0 pipeline was used for variant calling. SNP variants were filtered using VCFtools and PLINK to retain high-quality biallelic SNPs. Quality filters included thresholds for missing data, minor allele frequency, read depth, and relatedness among samples.
