Data from: Canis STR-seq: A universal approach for non-invasive genetic monitoring of wolves and coyotes
Data files
Apr 02, 2026 version files 3.59 GB
-
README.md
3.81 KB
-
SatAnalyzerInputFiles.zip
3.59 GB
Abstract
Population genetic studies have traditionally relied on data from short tandem repeat (STR) markers, known as microsatellites, to produce individual genotypes used in population genetics research. However, size fragment analysis from traditional capillary electrophoresis presents scoring challenges and limits data comparisons among labs. Here, we present a new cost-effective, universal microsatellite genotype-by-sequencing assay for Canis species that allows for unambiguous allele calls, flags homoplasy for more accurate assignment tests and estimates of diversity, and improves genotyping output from low-template DNA. We note size homoplasy in 18 of 26 loci, with the number of alleles being 32% higher in the dataset that included sequence mutations (Namut=334) compared to the dataset based on size alone (Nalen=253). Bayesian cluster analysis was similar for both datasets, although 63 of 83 samples had higher assignment values to their primary cluster when mutations were considered. We document and code a list of sequence mutations associated with each locus and propose a framework for building an accessible, universal STR dataset for wolves, coyotes, and dogs that improves cluster assignments and admixture estimates in a system with complex demography and hybridization patterns. Overall, the assay provides an improved method of genetic monitoring for improved conservation of wolf populations.
Dataset DOI: 10.5061/dryad.02v6wwqgb
Description of the data and file structure
We amplified 33 autosomal microsatellite markers and the Amelogenin sex marker in a single PCR followed by indexing of individual Canis tissue/scat samples (in separate runs) with Nextera XT unique dual indexes followed by sequencing on an Illumina MiSeq platform.
Prior to amplification, tissue samples were standardized to include 5ng in the PCR, and scat samples were screened for sufficient nuclear DNA by amplification at a microsatellite primer and visualization against known standards on an agarose gel.
Five of the 33 markers were removed from the analysis, leaving 26 markers for routine genotyping. The raw data includes all 33 markers.
The data file includes input files to run Seq2Sat / SatAnalyzer with the raw data provided.
Files and variables
File: SatAnalyzerInputFiles.zip
Description: This .zip file includes raw sequence data and input files for genotyping the data in Seq2Sat / SatAnalyzer.
- CanisSTRseq_Run04_Tissue_Fastq.zip
- Tissue samples: raw demultiplexed paired-end (2 x 150 bp) output data from Illumina MiSeq Standard v2 kit;
- sequence identifiers appear before the first underscore and are Tissue Sample IDs (see Supplemental Tables published with manuscript).
- e.g,. CAN003735_S12_L001_R1_001.fastq.gz and CAN003735_S12_L001_R2_001.fastq.gz are zipped paired-end raw sequence output for tissue sample ID "CAN0037.35."
- CanisSTRseq_Run05_Scat_Fastq.zip
- Scat samples: raw demultiplexed paired-end (2 x 150 bp) output data from Illumina MiSeq Micro v2 kit;
- sequence identifiers appear before the first underscore and are Scat Sample IDs (see Supplemental Tables published with manuscript).
- e.g.CP-2023-016_S1_L001_R1_001.fastq.gz and CP-2023-016_S1_L001_R2_001.fastq.gz are zipped paired-end raw sequence output for scat sample ID "CP-2023-016."
- CanisRun04TissueSamples.txt
- Input sample file (tissue samples) for the Seq2Sat / Sat Analyzer analysis
- CanisRun05ScatSamples.txt
- Input sample file (scat samples) for the Seq2Sat / Sat Analyzer analysis
- CanisSexLoc.txt
- Input sex locus file for the Seq2Sat / Sat Analyzer analysis
- Canis31Loci.txt
- Input loci file (31 loci) for the Seq2Sat / Sat Analyzer analysis
- Canis26Loci.txt
- Input loci file (26 loci) for Seq2Sat / Sat Analyzer analysis
Code/software
We created a custom Python script that uses short tandem repeat (STR/microsatellite) sequence data output from SatAnalyzer (Seq2Sat) and incorporates mutations found in the flanking and repeat regions into a consolidated allele call that considers length and mutations. It takes as input all the *_ genotypes_mra_final.txt output files from SatAnalyzer, and outputs a list of alleles at each locus that have mutation codes appended that indicate mutations in the MRA, snpsFF, and/or snpsRF regions.
The allele_muts.py code is available on the GitHub repository and can be cited as follows:
Rutledge, L. Y., & Rutledge, G. A. (2025). lrutledge107/allele_muts: v1.0.0 (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.15794713
Access information
The source code for coding mutations into the fragment length microsatellite data is archived on Zenodo and is available on the GitHub repository.
- Rutledge, L. Y., & Rutledge, G. A. (2025). lrutledge107/allele_muts: v1.0.0 (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.15794713
- Rutledge LY and Rutledge GA (2025). https://github.com/lrutledge107/allele_muts
