First insights into population structure and genetic diversity versus host specificity in trypanorhynch tapeworms using multiplexed shotgun genotyping
Data files
Oct 16, 2023 version files 21.80 GB
- 
              
                README.md
                1.03 KB
- 
              
                trimmed_fastqs_Cgracilis.tar.gz
                16.83 GB
- 
              
                trimmed_fastqs_Rmegacantha.tar.gz
                4.97 GB
Abstract
Theory predicts relaxed host specificity and high host vagility should contribute to reduced genetic structure in parasites while strict host specificity and low host vagility should increase genetic structure. Though these predictions are intuitive, they have never been explicitly tested in a population genomic framework. Trypanorhynch tapeworms, which parasitize sharks and rays (elasmobranchs) as definitive hosts, are the only order of elasmobranch tapeworms that exhibit considerable variability in their definitive host specificity. This allows for unique combinations of host use and geographic range, making trypanorhynchs ideal candidates for studying how these traits influence population-level structure and genetic diversity. Multiplexed shotgun genotyping (MSG) datasets were generated to characterize component population structure and infrapopulation diversity for a representative of each trypanorhynch suborder: the ray-hosted Rhinoptericola megacantha (Trypanobatoida) and the shark-hosted Callitetrarhynchus gracilis (Trypanoselachoida). Adults of R. megacantha are more host-specific and less broadly distributed than adults of C. gracilis, allowing correlation between these factors and genetic structure. Replicate tapeworm specimens were sequenced from the same host individual, from multiple conspecific hosts within and across geographic regions, and from multiple definitive host species. For R. megacantha, population structure coincided with geography rather than host species. For C. gracilis, limited population structure was found, suggesting a potential link between degree of host specificity and structure. Conspecific trypanorhynchs from the same host individual were found to be as, or more, genetically divergent from one another as from conspecifics from different host individuals. For both species, high levels of homozygosity and positive FIS values were documented.
Two datasets are provided, each as a gzipped tarball.
The first dataset (trimmed_fastqs_Cgracilis.tar.gz) is for the focal species Callitetrarhynchus gracilis (md5 -r key: 082769af225375407dd36c48e1b72094).
The second dataset (trimmed_fastqs_Rmegacantha.tar.gz) is for the focal species Rhinoptericola megacantha (md5 -r key: d4978a59c296a5dea190c22499b34cfb).
Description of the data and file structure
Each dataset contains a single concatenated fastq file of demultiplexed, adaptor-trimmed, and raw read quality-controlled (via STACKS and Trimmomatic) Illumina reads generated for each individual tapeworm sequenced.
Files that include "combo" in the file name contain read data from both the initial sequencing run and a second round of re-sequencing.
Files are named by specimen code. Additional details for each specimen can be found in Supplementary Table S2.
Genomic DNA was extracted from specimens of two species of elasmobranch tapeworms preserved in 95% ethanol: Rhinoptericola megacantha (n=39 worms) and Callitetrarhynchus gracilis (n=47 worms) collected from various species of shark or ray hosts from various geographic localities. Microscissors were used to remove a piece of the strobila and/or scolex of each worm for DNA extraction. (To ensure accurate genotyping, DNA was intentionally not extracted from gravid proglottids [i.e., proglottids containing eggs]). Methods for DNA extraction follow Herzog and Jensen (2022). Extracted genomic DNA for all specimens was used to generate two multiplexed shotgun genotyping (MSG) libraries using the restriction enzyme MseI and the protocol of Andolfatto et al. (2011) with the following modifications: (1) Unique in-line barcodes were ligated to digested DNA prior to pooling to allow for bioinformatic identification of each specimen after sequencing; and (2) following bead purification, pooled libraries were run on a Blue Pippin 2% agarose gel cassette (Sage Science, Beverly, MA, USA) to elute DNA fragments within a 300–400 bp range. Each of the two multiplexed libraries was sequenced on a single flow cell of an Illumina NextSeq 550 High Output Next Generation Sequencer (Illumina, San Diego, CA, USA) for 75 bp single end reads. All specimens were included in the first library. The second library consisted of a subset of specimens for which insufficient read counts were generated following the first round of library preparation and sequencing. Library preparation and sequencing was completed at the University of Kansas Genome Sequencing Core. For both rounds of sequencing, specimens were demultiplexed and low-quality reads and adaptor contamination were removed using the process_radtags module of Stacks v. 2.53 (Catchen et al. 2013; Rochette et al. 2019) with the ‑r, ‑c, and ‑q flags specified. Further filtering was performed using Trimmomatic v. 0.39 (Bolger et al. 2014) to remove remaining low-quality reads and adaptor contamination, and to enforce a consistent read length of 70 bp.
Datasets are formatted as zipped fastq files (.fastq.gz), so any of the many programs that can utilize fastq files as input will be capable processing these data.
