Data from: Lineage diversity within a widespread endemic Australian skink to better inform conservation in response to regional-scale disturbance

Dissanayake, Duminda S. B.1 ; Georges, Arthur 1

Published Feb 09, 2024 on Dryad. https://doi.org/10.5061/dryad.tx95x69zc

Data files

Feb 09, 2024 version files 492.19 MB

bassiana_full.Rdata
8.47 MB
bassiana_ingroup.Rdata
8.39 MB
bassiana_raw.Rdata
74.44 MB
metadata_raw.txt

39.02 KB
R_Code_Final_Dryad.R
14.99 KB
README.md
11.32 KB
Report_DSk20-4927_1_moreOrders_SNP_2.csv

215.71 MB
Report_DSk20-4927_1_moreOrders_SNP_3.csv

185.12 MB

Abstract

This dataset was used to examine the phylogeographic genetic structure of Eastern three lined skink Bassiana duperreyi. It comprises SNP data used for population genetics and phylogenetic reconstruction. The data were used to provide foundational work for the detailed taxonomic re-evaluation of this species complex and to reinforce the need for biodiversity assessment to include an examination of cryptic species and/or cryptic diversity below the level of species. Such information on lineage diversity within species and its distribution in the context of disturbance at a regional scale can be factored into conservation planning regardless of whether a decision is made to formally diagnose new species taxonomically and nomenclaturally.

https://doi.org/10.5061/dryad.tx95x69zc

This Dryad entry contains the datafiles and associated R script to prepare the data in a form conducive to analysis, including the analyses presented in the companion article. They include SNP datasets for the Eastern three-lined skink Bassiana duperreyi.

Description of the data and file structure

The SNP data comprises a matrix of entities (individuals) versus attributes (loci) taking on the states 0 for homozygous reference allele, 2 for homozygous alternate allele and 1 for the heterozygous state. As such, the data have no units of measurement but do when summed, conveniently, represent the frequency of the alternate allele.

The data are stored in compressed form as an adegenet genlight object with associated locus metadata (e.g. callrate, reproducibility) and individual metadata (e.g. latitude, longitude, population). The SNP scores can be examined in R with as.matrix(gl); the names of the locus metadata can be obtained with names(gl@other$loc.metrics); the names of the individual metadata can be obtained with names(gl@other$ind.metrics).

Locus metadata are:

AlleleID	Unique identifier for the sequence in which the SNP marker occurs
AlleleSequence	In 1 row format: the sequence of the Reference allele. In 2 rows format: the sequence of the Reference allele is in the Ref row, the sequence of the SNP allele in the SNP row
AvgCountRef	The sum of the tag read counts for all samples, divided by the number of samples with non-zero tag read counts, for the Reference allele row
AvgCountSnp	The sum of the tag read counts for all samples, divided by the number of samples with non-zero tag read counts, for the SNP allele row
AvgPIC	The average of the polymorphism information content (PIC) of the Reference and SNP allele rows
CallRate	The proportion of samples for which the genotype call is either “1” or “0”, rather than “-“
CloneID	Unique identifier for the sequence in which the SNP marker occurs
FreqHets	The proportion of samples which score as heterozygous
FreqHomRef	The proportion of samples which score as homozygous for the Reference allele
FreqHomSnp	The proportion of samples which score as homozygous for the SNP allele
OneRatioRef	The proportion of samples for which the genotype score is “1”, in the Reference allele row
OneRatioSnp	The proportion of samples for which the genotype score is “1”, in the SNP allele row
PICRef	The polymorphism information content (PIC) for the Reference allele row
PICSnp	The polymorphism information content (PIC) for the SNP allele row
RepAvg	The proportion of technical replicate assay pairs for which the marker score is consistent
SNP	In 2 rows format: this column is blank in the Reference row, and contains the base position and base variant details in the SNP row. In 1 row format: contains the base position and base variant details
SnpPosition	The position (zero indexed) in the sequence tag at which the defined SNP variant base occurs
TrimmedSequence	Same as the full sequence, but with removed adapters in short marker tags

Individual metadata are:

no	Specimen identifier
id	Sample identifier
pop	Population code
popname	Population name
legend	Location label useful for figure legends
bioregion	Bioregion from which the animal was captured
species	Species of lizard
elevation	Elevation above sea level
age	Age category, e.g. adult, neonate
lat	Latitude of location of capture
lon	Longitude of location of capture

Missing data are scored as NA. Missing data arise both because the target sequence tag is present in the genome but missed by chance because of finite read depth or because the sequence tag is not amplified because of a mutation at one or both of the restriction enzyme sites (null allele).

1. Report_DSk20-4927_1_moreOrders_SNP_2.csv

Raw data as provided by Diversity Arrays Technology in 2-row format. Refer to https://www.diversityarrays.com/ for details of this format, and to Georges et al. (2018) for an overview of how the data were generated.

2. Report_DSk20-4927_1_moreOrders_SNP_3.csv

**3. metadata_raw.csv **

Metadata associated with each individual, including population assignments, sex, stage of maturity and location of capture.

4. bassiana_raw.Rdata

Contains the raw data, as per Report_DSk20-4927_1_moreOrders_SNP_2.csv **and Report_DSk20-4927_1_moreOrders_SNP_3.csv **in binary format. Can be read in to dartR using gl <- readRDS(file=”bassiana_raw.Rdata”)

5. bassiana_full.Rdata

Contains the data post filtering (see **R_code_final_Dryad.r) **for all individuals in the study. Can be read in to dartR using gl <- readRDS(file=”bassiana_full.Rdata”)

6. bassiana_ingroup.Rdata

Contains the data post filtering (see *R_code_final_Dryad.r) **for the ingroup taxon *Bassiana duperreyi only. Can be read in to dartR using gl <- readRDS(file=”bassiana_ingroup.Rdata”)

7. R_code_final_Dryad.r

R code used to generate the results. Point of entry is indicated, and begins by reading in the data from one of bassiana_raw.Rdata, bassiana_full.Rdata or bassiana_ingroup.Rdata using the function readRDS().To access the datasets (those with .Rdata extension), use my.genlight.object <- readRDS(“path/filename”) or dartR.base::gl.load((“path/filename”). To interrogate the datasets after loading, use the accessors provided in the R package {adegenet}. Refer to the accompanying ms_initial_read.R.

Code/Software

The script necessary to undertake the SNP analyses depends on the R software package dartR.base available on the Comprehensive R Archive Network (CRAN). dartR.base works with adegenet genlight objects such as those listed above.

Software for undertaking phylogenetic analysis on the SNPs include SVDQuartets (Singular Value Decomposition Quartets, Chifman & Kubatko, 2014) available through Paup (Swofford, 2003).

Data availability

These data are freely available unconditionally for download and use. Although some of the data were generated using a commercial service, the intellectual property associated with the data reside entirely with the authors, and their are no restrictions on use arising.

References

Chifman, J., & Kubatko, L. (2014). Quartet inference from SNP data under the coalescent model. Bioinformatics, 30, 3317–3324. https://doi.org/10.1093/bioinformatics/btu530

Georges, A., Gruber, B., Pauly, G., Adams, M., White, D., Young, M., Kilian, A., Zhang, X., Shaffer, H. B., & Unmack, P. J. (2018). Genome-wide SNP markers breathe new life into phylogeography and species delimitation for the problematic short-necked turtles (Chelidae: Emydura) of eastern Australia. Molecular Ecology, 27, 5195–5213. https://doi.org/10.1111/mec.14925

Gruber, B., Unmack, P. J., Berry, O. F., & Georges, A. (2018). dartr: An r package to facilitate analysis of SNP data generated from reduced representation genome sequencing. Molecular Ecology Resources, 18(3), 691–699. https://doi.org/10.1111/1755-0998.12745

Kilian, A., Wenzl, P., Huttner, E., Carling, J., Xia, L., Blois, H., Caig, V., Heller-Uszynska, K., Jaccoud, D., Hopper, C., Aschenbrenner-Kilian, M., Evers, M., Peng, K., Cayla, C., Hok, P., & Uszynski, G. (2012). Diversity Arrays Technology: a generic genome profiling technology on open platforms. In F. Pompanon & A. Bonin (Eds.), Data Production and Analysis in Population Genomics: Methods and Protocols (pp. 67–89). Humana Press. https://doi.org/10.1007/978-1-61779-870-2_5

Mijangos, J., Gruber, B., Berry, O., Pacioni, C., & Georges, A. (2022). dartR v2: an accessible genetic analysis platform for conservation, ecology, and agriculture. Methods in Ecology and Evolution, 3, 2150–2158. https://doi.org/https://doi.org/10.1111/2041-210X.13918

Swofford, D. L. (2003). Phylogenetic Analysis Using Parsimony * (and other methods). Version 4. In PAUP. Phylogenetic Analysis Using Parsimony (and Other Methods). Version 4. Sinauer Associates.