Skip to main content

Population genetic structure of the gastropod species Bulinus truncatus

Cite this dataset

Vangestel, Carl (2022). Population genetic structure of the gastropod species Bulinus truncatus [Dataset]. Dryad.


Background: Gastropod snails remain strongly understudied, despite their important role in transmitting parasitic diseases. Knowledge on their distribution and population dynamics increases our understanding of processes driving disease transmission. This is the first study using High Throughput Sequencing (HTS) to elucidate the population genetic structure of the hermaphroditic snail Bulinus truncatus (Gastropoda, Heterobranchia) on a regional (17 to 150 km) and an inter-regional (1,000 – 5,400 km) scale. This snail species acts as an intermediate host of Schistosoma haematobium and Schistosoma bovis, which cause human and animal schistosomiasis respectively.

Methods: Bulinus truncatus snails were collected in Senegal, Cameroon, Egypt and France and identified through DNA barcoding. A single-end Genotyping by Sequencing (GBS) library, comprising of 87 snail specimens from the respective countries, was built and sequenced on an Illumina HiSeq 2000 platform. Reads were mapped against S. bovis and S. haematobium reference genomes to identify schistosome infections and Single Nucleotide Polymorphisms (SNPs) were scored using the Stacks pipeline. These SNPs were used to estimate genetic diversity, assess population structure and to construct phylogenetic trees of Bulinus truncatus.

Results: A total of 10,750 SNPs were scored and used in downstream analyses. The phylogenetic analysis identified five clades, each consisting of snails from a single country but with two distinct clades within Senegal. Genetic diversity was low in all populations, reflecting high selfing rates, but varied between locations due to habitat variability. Significant genetic differentiation and isolation by distance patterns were observed at both spatial scales, indicating that gene flow is not strong enough to counteract the effects of population bottlenecks, high selfing rates and genetic drift. Remarkably, the population genetic differentiation on a regional scale (i.e. within Senegal) was as large as between populations on an inter-regional scale. The blind GBS technique was able to pick up parasite DNA in snail tissue, demonstrating the potential of HTS techniques to further elucidate the role of snail species in parasite transmission.

Conclusions: HTS techniques offer a valuable toolbox to further investigate the population genetic patterns of schistosome intermediate host snails and the role of snail species in parasite transmission.


Samples of B. truncatus (numbers of specimens ranging from four to 16 per locality) were collected from 2011 to 2014 in Cameroon (one site, Cam_BAK), Senegal (four sites, Sen_DIAM, Sen_GUEC, Sen_MBO, Sen_NDO), Egypt (two sites, Egy_BEK, Egy_MAN) and France (one site, Cor_SUT).

A single-end GBS reduced representation library (RRL) was built and sequenced on an Illumina HiSeq 2000 platform (Cornell University Biotechnology Resource Center). The library was prepared by digesting DNA with EcoT22I, ligating P1 and P2 Illumina sequencing primers (the former including molecular identifiers for sequence demultiplexing) and Illumina adapters, before proceeding with titration and library enrichment. The choice of the restriction enzyme followed preliminary comparisons between EcoT22I and PstI on representatives of B. truncatus.

HTS data processing

The HiSeq run produced 257x106 single-end raw reads of 101 bp. FastQC ( was used for sequence quality check before trimming the last 7 bp of raw sequences as implemented in FASTX-Toolkit ( We explored if, and to what extent, parasite infection could be detected using GBS data by adding two snail samples from an additional site of Senegal (Sen_PAK), which were known to be infected with S. bovis (i.e. cercariae collected after shedding). To remove possible contaminant sequences originating from Schistosoma spp. flukes, reads were mapped against the S. bovis and S. haematobium reference genomes (GenBank assembly GCA_003958945.1 and GCA_000699445.1, respectively) using DeconSeq 0.4.3 applying default settings. Decontaminated data were processed using the Stacks 1.21 pipeline and implementing (1) process_radtags for demultiplexing and quality filtering and (2) to call SNPs. After some pilot runs with various parameter settings, considered a minimum threshold of 10 raw reads per stack (m), maximum two mismatches between loci from a single individual (M), maximum two mismatches when aligning secondary reads to primary stacks (N), and maximum two mismatches between loci when building the catalog (n). SNPs were scored when occurring in at least four samples. VCFtools v0.1.14 was used to filter out specimens with more than 60 % missing data, and SNPs with minimum allele frequencies < 0.05. To eliminate putative paralogs we conducted a heterozygosity excess test for each SNP using VCFtools and applied a threshold level of significance of 0.01. The data were further filtered by selecting one single SNP per RAD tag. This generated a main dataset of 10,750 polymorphic sites scored in 87 specimens.

Usage notes

Linux platform


Belgian Federal Science Policy Office

Research Foundation - Flanders