Tammar & parma wallaby GBS data from NZ
Data files
Aug 30, 2024 version files 55.63 MB
-
NI_wall_Auto_mac3_mindp5_maxmeanDP60_maxmiss0.7.recode.vcf
-
NZ_Wallaby_metadata_PC.csv
-
README.md
Abstract
While conducting a landscape genomics study of invasive tammar wallabies (Notamacropus eugenii) in Aotearoa New Zealand we discovered that parma wallabies (N. parma) are also present in the North Island. This population has gone undetected for at least 30 years (and potentially for over a century), hidden amongst the morphologically similar tammar wallabies. The fact that an invasive wallaby species could remain undetected for so long, highlights the need for greater monitoring efforts for invasive species including genomic species identification.
README: Tammar & Parma Wallaby GBS data from NZ
https://doi.org/10.5061/dryad.80gb5mkzz
Description of the data and file structure
Metadata for each sample including ID, Species, Date, Sex, Latitude, and Longitude, as well as the first 4 Principal Coordinates can be found in the .csv file.
The VCF file contains the filtered SNP data for all samples.
Code/software
No specific software is required, the csv can be read in any text editor or database software. The vcf can be read in any text editor and can be analysed with many packages such as vcftools, bcftools, plink etc.
Methods
Laboratory and bioinformatic methods
We used a DNeasy blood and tissue extraction kit on a QiaCube to extract the DNA from c. 0.5 cm2 of tissue, with an overnight digest using proteinase K according to the manufacturer’s protocols (QIAGEN, Hilden, Germany). DNA was eluted into 200 μL of Buffer AE and then stored at − 20 °C. The quality and quantity of the DNA were evaluated using a denovix DS-11 nucleic acid spectrophotometer, examining the 260/230 and 260/280 ratios to determine if there was any contamination. Any sample that did not meet the criteria for purity (260/280 = 1.7–2.1, 260/230 = 1.9–2.2) were removed.
All DNA extractions were then diluted to a uniform 50 ng/μL (with a concentration step using a SpeedVac for samples that had low concentration), with 1 μg of DNA sent for each sample for GBS sequencing. This GBS was performed at GenomNZ Animal Genomics Group (AgResearch, New Zealand). Procedures followed Dodds et al. (2015) after Elshire et al. (2011), with the following modifications. Briefly, genomic DNA was digested with PstI and MspI restriction enzymes (NEB R140L and R0106L, New England Biolabs, Ipswich, United States). We chose enzymes based on bioanalyser traces (2100 Bioanalyser, Agilent Technologies, Santa Clara, United States) showing an even digestion pattern with no evidence of repeat sequences through the region of interest. Following ligation to barcoded adapters, the uniquely barcoded individuals were pooled into two multiplexed libraries of 94 samples. Libraries post-pooling were run through PCR in multiples of four and pooled again before column clean-up, then each library was further purified and size selected (193–500 bp) using a Pippin (SAGE Science, Beverly, United States; 2% agarose, dye-free with internal standards CDF2050, Marker L CDF2010). We then sequenced each library on an Illumina HiSeq2500 using single-end reads, with 101 cycles in high-output mode (v4 chemistry).
Quality checks and adapter removal followed Dodds et al. (2015). Raw fastq files were quality checked using FastQC v. 0.10.1 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Barcodes and adapters were removed using cutadapt (Martin 2011), then a random 15,000 reads were checked for contamination using BLAST + against the NT database (https://blast.ncbi.nlm.nih.gov ), with the following settings: blastn -query—-task blastn—num_threads 2 -db nt -evalue 1.0e-10 -dust ‘20 64 1’ -max_target_seqs 1 -outfmt ‘7 qseqid sseqid pident evalue staxids sscinames scomnames sskingdoms stitle’. We then produced a catalogue of SNP loci (single nucleotide polymorphisms, or single base variants) following Dodds et al. (2015) and the general guidelines of (Benestan et al. 2016).
After trimming adapters with cutadapt (Martin 2011), reads were mapped and SNPs detected via a modified reference-based pipeline in bcftools (Danecek et al. 2011) using the recently completed tammar wallaby genome, which was assembled from a male obtained by lead author Dr Veale from the Waimangu thermal area in the Bay of Plenty, New Zealand (mMacEug1.pri: GCA_028372415.1). All following data filtering steps were then conducted using vcftools (Danecek et al. 2011). Filtering parameters for loci were: Minor allele count = 3, Maximum average depth = 45, Minimum depth = 5, Maximum missingness = 0.3. To retain an individual, a minimum average depth of 2 was required (measured before other filtering).
GBS analyses
We performed a Principal Coordinate Analyses (PCoA) within KGD (Dodds et al. 2015) on the genomic relatedness between individuals. Based on the KGD analyses there were clearly two highly divergent groups (which corresponded to the two species, see results), though two individuals had slight intermediate values between these groups. To further investigate this, we converted the VCF file to a Geste file to enable us to calculate the number of fixed differences between the two species (excluding the two potential hybrids). Once the alleles that were fixed differences were identified between the two species were identified we then counted the proportion of each of these categories in the two potentially hybridised individuals (as per Etherington et al., 202219). To map the distribution of samples we used the R package ggmap (Kahle & Wickham 2013).