Genomic data reveal a North-South split and introgression history of blood fluke populations across Africa
Data files
Mar 20, 2025 version files 27.32 GB
-
README.md
61.40 KB
-
scan_snvs.vcf.gz
27.07 GB
-
scan_snvs.vcf.gz.tbi
359.58 KB
-
sch_hae_scan_SUPPLEMENTAL_DATA_mito_assem_2025-02-19.fasta
2.74 MB
-
sch_hae_scan_SUPPLEMENTAL_DATA_xpehh_source_data_2025-02-19.csv.gz
252.42 MB
Abstract
The human parasitic fluke, Schistosoma haematobium hybridizes with the livestock parasite S. bovis in the laboratory, but the frequency of hybridization in nature is unclear. Here, we analyze 34.6 million single nucleotide variants in 162 samples from 18 African countries, revealing a sharp genetic discontinuity between northern and southern S. haematobium. We find no evidence for recent hybridization. Instead the data reveal admixture events that occurred 257-879 generations ago in northern S. haematobium populations. Fifteen introgressed S. bovis genes are approaching fixation in northern S. haematobium with four genes potentially driving adaptation. Further, we identify 19 regions that are resistant to introgression; these are enriched on the sex chromosomes. These results (i) suggest strong barriers to gene flow between these species, (ii) indicate that hybridization may be less common than currently envisaged, but (iii) reveal profound genomic consequences of rare interspecific hybridization between schistosomes of medical and veterinary importance.
https://doi.org/10.5061/dryad.xgxd254sk
Description of the data and file structure
Roy N. Platt II, Egie E. Enabulele, Ehizogie Adeyemi, Marian O Agbugui, Oluwaremilekun G Ajakaye, Ebube C Amaechi, Chika E Ejikeugwu, Christopher Igbeneghu, Victor S Njom, Precious Dlamini, Grace A. Arya, Robbie Diaz, Muriel Rabone, Fiona Allan, Bonnie Webster, Aidan Emery, David Rollinson, Timothy J.C. Anderson
Texas Biomedical Research Institute, San Antonio TX, United States: RNPII, EEE, GA, RD, TJCA
Department of Pathology, University of Benin Teaching Hospital, Edo State, Nigeria: EA
Department of Biological Sciences, Edo State University, Uzairue, Nigeria: MOA
Department of Animal and Environmental Biology, Adekunle Ajasin University, Nigeria: OGA
Department of Zoology, University of Ilorin, Kwara State, Nigeria: ECA
Enugu State University of Science and Technology, Nigeria: CEE
Department of Medical Laboratory Science, Ladoke Akintola University of Technology, Nigeria: CI
Department of Applied Biology and Biotechnology, Enugu State University of Science and Technology, Nigeria: VSN
Central Public Health Offices, Manzini, Swaziland: PD
Department of Life Sciences, Natural History Museum, London, United Kingdom: MR, FA, BW, AE, DR
Data supports the published manuscript in 4 files:
Files and variables
sch_hae_scan_SUPPLEMENTAL_DATA_xpehh_source_data_2025-02-19.csv.gz
- data necessary to recreate Figure 2C in the main text.
sch_hae_scan_SUPPLEMENTAL_DATA_mito_assem_2025-02-19.fasta
- assembled mitochondrial genomes
sch_hae_scan_SUPPLEMENTAL_DATA_TABLE-S1_2025-02-19.xlsx
- supplemental data tables as referenced in the manuscript. This file contains data from four supplemental tables:
STable1-Spec_Examined
- descriptive information on each parasite sample examinedSTable2-Fst_eq_1
- all variants that are fixed between parasite populationsSTable3-xpEHH
- regions that are under directional selection between populationsSTable4-FixedAA
- genes with fixed amino acid differences (mutations) between populations
table | variables | type | description |
---|---|---|---|
STable1-Spec_Examined | Library ID | string | Unique identifiers given to samples for this study only. Information contained within the sample id were tentativley assigned based on country and collecting location. Country, locality and all other data provided in the table has been verified by musuem staff and should be given priority over conflicting information contained in this colum. |
STable1-Spec_Examined | Museum Accession Number (NHM) or Donor ID | string | Used to track sequence data back to the original sample if it was provided by the Natural History Museum (London) |
STable1-Spec_Examined | Predicted Species | string | The expected species assignment based on collecting location, host, or previous DNA seqeunce data. Sequence data can span the range from single gene seqeuncing of mitochondrial COX1 or ribosomal ITS markers, to whole exome or genome sequencing. |
STable1-Spec_Examined | Mitchondrial Haplotype | string | Mitochondrial haplotype differentiates between samples that have the S. haematobium or S. bovis mitotype regardless of introgression status. Samples that failed mitochondrial genome assembly are indicated with an “Na” |
STable1-Spec_Examined | Population Assignment | string | Parasite population as deterimined in this study |
STable1-Spec_Examined | NCBI SRA Accession | string | NCBI accession number to access the raw read data |
STable1-Spec_Examined | Project Citation | string | Manuscript where data is first reported |
STable1-Spec_Examined | Country | string | Country of origin |
STable1-Spec_Examined | Locality | string | Collection location usually provided as a state or province |
STable1-Spec_Examined | Latitude | string | Lattitude of collection site |
STable1-Spec_Examined | Longitude | string | longitude of collection site |
STable1-Spec_Examined | Collection Date | string | date of sample collection |
STable1-Spec_Examined | Collection Host | string | parasite host sampled |
STable1-Spec_Examined | Original life-cycle stage collected | string | parasite life cycle stage upon collection |
STable1-Spec_Examined | Life-cycle stage sequenced | string | parasite life cycle stage upon seqeuncing |
STable1-Spec_Examined | Num Read Pairs (1e6) | float | number of sequence reads generated for the sample |
STable1-Spec_Examined | Coverage | float | genome coverage for the sample when mapped to a S. haematoboium reference |
STable1-Spec_Examined | Final SNV Dataset | conditional | indicates samples that were included in analyses. Samples labeled as “False” were seqeunced, but failed one or more quality control steps and were excluded. |
STable1-Spec_Examined | Discordant COX1/ITS | string | many of the samples we examined had been previouly labeled as possible hybrids based on genetic discrodance between the mitochondrial COX1 and ribosomal ITS markers. |
STable1-Spec_Examined | % S. haematobium ancestry (q) | float | percent S. haematobium ancestry as estimated (q) from the supervised Admixture analysis shown in Figure 1 in the main text. |
STable1-Spec_Examined | Origin | string | Repository that provided the sample |
STable1-Spec_Examined | Project (Collector) | string | specific collection effor responsible for the sample |
STable1-Spec_Examined | Comments | string | descriptive comments relevant to the sample |
STable2-Fst_eq_1 | Chromosome | string | chromosomal position of variant |
STable2-Fst_eq_1 | Position | integer | chromosomal position of variant |
STable2-Fst_eq_1 | Ref. allele | nucleotide | genome reference allele |
STable2-Fst_eq_1 | Alt. allele | nucleotide | alternate allele |
STable2-Fst_eq_1 | S. haem minor allele frequency | float | allele frequency of the S. haematobioum minor variant/allele |
STable2-Fst_eq_1 | S. bovis minor allele frequency | float | allele frequency of the S. bovis minor variant/allele |
STable2-Fst_eq_1 | FST | float | Weir & Cockerham Fixation index |
STable3-xpEHH | Chrom | string | chromosomal position of variant |
STable3-xpEHH | Location | integer | chromosomal position of variant |
STable3-xpEHH | Median xpEHH | float | the median value of the cross population extended haplotype homozygosity (xpEHH) statistic in the window |
STable3-xpEHH | Sig SNVs | integer | the number of single nucleotide variants with significant xpEHH values in the window |
STable3-xpEHH | Window Size | integer | the number of single nucleotide variants within the window |
STable3-xpEHH | Distance to Chrom End | integer | number of bases between the window and the closest chromosome end |
STable3-xpEHH | Per Distance to Chrom End | float | the “Distance to Chrom End” normalized by the chromosomal length |
STable4-FixedAA | Chrom | string | chromosomal position of variant |
STable4-FixedAA | Pos | integer | chromosomal position of variant |
STable4-FixedAA | Location | string | positional coordinates in the <chrom>:<pos> format |
STable4-FixedAA | Ref | nucleotide | genome reference allele |
STable4-FixedAA | Alt | nucleotide | alternate allele |
STable4-FixedAA | NW Af | float | allele frequency of the alternate allele in the Northern S. haematobioum population |
STable4-FixedAA | SE Af | float | allele frequency of the alternate allele in the Northern S. haematobioum population |
STable4-FixedAA | SB Af | float | allele frequency of the alternate allele in the S. bovis population |
STable4-FixedAA | NW vs. SE FST | float | Weir & Cockerham Fixation index of the alternate variant between northern and southern S. haematobium populations |
STable4-FixedAA | NW vs. Sb FST | float | Weir & Cockerham Fixation index of the alternate variant between northern S. haematobium and S. bovis populations |
STable4-FixedAA | DNA Mutation | string | DNA muatation caused by the alternate allele |
STable4-FixedAA | AA Mutation | string | Amino acid muctation caused by the alternate allele |
STable4-FixedAA | NCBI Reference | string | NCBI reference ID fo the transcript impacted by the alternate allele |
STable4-FixedAA | Gene ID | string | NCBI Gene ID |
STable4-FixedAA | Gene Name | string | common name of the gene impacted |
scan_snvs.vcf*
- VCF (and index) for all filtered SNVs used in the analyses.
sch_hae_scan_SOURCE_DATA__2025-02-19.xlsx
- contains all data necessary to recreate figures from the main text and supplement. Each
table | variables | type | description |
---|---|---|---|
Figure1A | Sample ID | string | identification number for parasite sample specific to this study |
Figure1A | Country | string | country of origin for the parasite sample |
Figure1A | Lattitude | float | lattitude of parasite location when collected |
Figure1A | Longitude | float | longitude of parasite location when collected |
Figure1A | Precise Coordinates | string | in some cases parasites coordinates are not available and provided. When precise coordinates are not available the collection location defaults to the country capital |
Figure1A | Species | string | species identification of the sample |
Figure1C | Sample ID | string | identification number for parasite sample specific to this study |
Figure1C | PC1 | float | principle component 1 value for the sample from unlinked high frequency (<5%) variants |
Figure1C | PC2 | float | principle component 2 value for the sample from unlinked high frequency (<5%) variants |
Figure1C | Country | string | country of origin for the parasite sample |
Figure1C | Species | string | species identification of the sample |
Figure1C | Population | string | population assigned to each parasite determined from this study |
Figure1D | Sample ID | string | identification number for parasite sample specific to this study |
Figure1D | Country | string | country of origin for the parasite sample |
Figure1D | Population Category | string | population assigned to each parasite determined from this study |
Figure1D | q1 | float | ancestry component 1 as determined by the program Admixture |
Figure1D | q2 | float | ancestry component 2 as determined by the program Admixture |
Figure2A | Chrom | string | chromosomal position of variant |
Figure2A | Chrom Start | integer | start position of the variant window |
Figure2A | Chrom End | integer | end position of the variant window |
Figure2A | N SNVs in window | integer | number of single nucleotide variants genotyped in the window |
Figure2A | %Sb alleles in Sb | float | percent of alleles associated with S. bovis in the S. bovis population |
Figure2A | %Sb alleles in Sh North | float | percent of alleles associated with S. bovis in the northern S. haematobium population |
Figure2A | %Sb alleles in Sh South | float | percent of alleles associated with S. bovis in the southern S. haematobium population |
Figure2B | Chrom | string | chromosomal position of variant |
Figure2B | Chrom Start | integer | start position of the variant window |
Figure2B | Chrom End | integer | end position of the variant window |
Figure2B | topo1 | float | weight of topology 1 in the window as determined with TWISST |
Figure2B | topo2 | float | weight of topology 2 in the window as determined with TWISST |
Figure2B | topo3 | float | weight of topology 3 in the window as determined with TWISST |
Figure2D | Chrom | string | chromosomal position of variant |
Figure2D | Chrom Start | integer | start position of the variant window |
Figure2D | Chrom End | integer | end position of the variant window |
Figure2D | N SNVs in region | integer | number of single nucleotide variants wihtin the region |
Figure2D | Fst North V South | float | Weir & Cockerham Fixation index of the alternate variant between northern and southern S. haematobium populations |
Figure2D | Fst North vs Sb | float | Weir & Cockerham Fixation index of the alternate variant between northern S. haematobium and S. bovis populations |
Figure2D | Fst South v Sb | float | Weir & Cockerham Fixation index of the alternate variant between southern S. haematobium and S. bovis populations |
Figure2D | Fst Sh vs Sb | float | Weir & Cockerham Fixation index of the alternate variant between S. haematobium and S. bovis populations |
Figure2E | Chrom | string | chromosomal position of variant |
Figure2E | Chrom Start | integer | start position of the variant window |
Figure2E | Chrom End | integer | end position of the variant window |
Figure2E | N SNVs in region | integer | number of single nucleotide variants wihtin the region |
Figure2E | D (ABBA/BABA) | float | D statistic (also known as ABBA/BABA) within the region |
Figure2E | Z Score | float | Z-Score of the D statistic |
Figure2F | Chrom | string | chromosomal position of variant |
Figure2F | Chrom Start | integer | start position of the variant window |
Figure2F | Chrom End | integer | end position of the variant window |
Figure2F | Region Size | integer | number of bases within the region |
Figure2F | Robust Z-score | float | robust Z-score of the S. bovis ancestry for the region calculated with from the median abosultute deviation |
Figure2F | Outlier | conditional | Designates wether regions is significant outlier after multiple test correction (Bonferonni) |
Figure3 | Sample ID | string | identification number for parasite sample specific to this study |
Figure3 | Country | string | country of origin for the parasite sample |
Figure3 | Species | string | species identification of the sample |
Figure3 | Population | string | population assigned to each parasite determined from this study |
Figure4 | Sample ID | string | identification number for parasite sample specific to this study |
Figure4 | Country | string | country of origin for the parasite sample |
Figure4 | Species | string | species identification of the sample |
Figure4 | Population | string | population assigned to each parasite determined from this study |
Figure5A | Sample ID | string | identification number for parasite sample specific to this study |
Figure5A | q1 | float | ancestry component 1 as determined by the program Admixture |
Figure5A | Country | string | country of origin for the parasite sample |
Figure5A | Lattitude | float | lattitude of parasite location when collected |
Figure5A | Longitude | float | longitude of parasite location when collected |
Figure5A | Population ID | string | population assigned to each parasite determined from this study |
Figure5B | Chrom | string | chromosomal position of variant |
Figure5B | Start Pos | integer | start position of the variant window |
Figure5B | Stop Pos | integer | end position of the variant window |
Figure5B | N SNVs | integer | number of single nucleotide variants wihtin the region |
Figure5B | Pi S. bovis | float | Nucleotide diversity in the window for the S. bovis population |
Figure5B | Pi S.haem | float | Nucleotide diversity in the window for the S. haematobium population |
Figure5B | Pi S. haem (North) | float | Nucleotide diversity in the window for the northern S. haematobium population |
Figure5B | Pi S. haem (South) | float | Nucleotide diversity in the window for the southern S. haematobium population |
Figure5B | Pi S. haem (North-Masked) | float | Nucleotide diversity in the window for the northern S. haematobium population aftern masking introgressed S. bovis variants |
Figure5C | Sample ID | string | identification number for parasite sample specific to this study |
Figure5C | PC1 | float | principle component 1 value for the sample from unlinked high frequency (<5%) variants |
Figure5C | PC2 | float | principle component 2 value for the sample from unlinked high frequency (<5%) variants |
Figure5C | Country | string | country of origin for the parasite sample |
Figure5C | Species | string | species identification of the sample |
Figure5C | Population | string | population assigned to each parasite determined from this study |
sFigure1 | Chromome | string | chromosomal position of variant |
sFigure1 | Mid-position | integer | middle position of the variant window |
sFigure1 | Dxy S.haem vs. S. bovis | float | Genetic differentiation between S. haematobium and S. bovis |
sFigure1 | Dxy S.haem (North) vs S. haem (South) | float | Genetic differentiation between northern and southern S. haematobium populations |
sFigure1 | Dxy S. haem (North) vs S. bovis | float | Genetic differentiation between northern S. haematobium and S. bovis |
sFigure1 | Dxy S. haem (South) vs. S. bovis | float | Genetic differentiation between southern S. haematobium and S. bovis |
sFigure2A | Sample ID | string | identification number for parasite sample specific to this study |
sFigure2A | Country | string | country of origin for the parasite sample |
sFigure2A | Population Category | string | population assigned to each parasite determined from this study |
sFigure2A | q1 | float | ancestry component 1 as determined by the program Admixture |
sFigure2A | q2 | float | ancestry component 2 as determined by the program Admixture |
sFigure2B | Sample ID | string | identification number for parasite sample specific to this study |
sFigure2B | Country | string | country of origin for the parasite sample |
sFigure2B | Population Category | string | population assigned to each parasite determined from this study |
sFigure2B | q1 | float | ancestry component 1 as determined by the program Admixture |
sFigure2B | q2 | float | ancestry component 2 as determined by the program Admixture |
sFigure2B | q3 | float | ancestry component 3 as determined by the program Admixture |
sFigure2C | Sample ID | string | identification number for parasite sample specific to this study |
sFigure2C | Country | string | country of origin for the parasite sample |
sFigure2C | Population Category | string | population assigned to each parasite determined from this study |
sFigure2C | q1 | float | ancestry component 1 as determined by the program Admixture |
sFigure2C | q2 | float | ancestry component 2 as determined by the program Admixture |
sFigure2C | q3 | float | ancestry component 3 as determined by the program Admixture |
sFigure2C | q4 | float | ancestry component 4 as determined by the program Admixture |
sFigure2D | Sample ID | string | identification number for parasite sample specific to this study |
sFigure2D | Country | string | country of origin for the parasite sample |
sFigure2D | Population Category | string | population assigned to each parasite determined from this study |
sFigure2D | q1 | float | ancestry component 1 as determined by the program Admixture |
sFigure2D | q2 | float | ancestry component 2 as determined by the program Admixture |
sFigure2D | q3 | float | ancestry component 3 as determined by the program Admixture |
sFigure2D | q4 | float | ancestry component 4 as determined by the program Admixture |
sFigure2D | q5 | float | ancestry component 5 as determined by the program Admixture |
sFigure3 | Library ID | string | identification number for parasite sample specific to this study |
sFigure3 | Country | string | country of origin for the parasite sample |
sFigure3 | % non S. haematobium ancestry (q) | float | sum of all ancestry components as determined by the program Admixture that are not associated with S. haematobium |
sFigure4 | Chromosome | string | chromosomal position of variant |
sFigure4 | Start | integer | start position of the variant window |
sFigure4 | End | integer | end position of the variant window |
sFigure4 | N (SNVs) | integer | number of single nucleotide variants wihtin the region |
sFigure4 | Nigeria | float | percent of alleles introgressed from S. bovis in the S. haematobium population from Nigeria |
sFigure4 | Cote d’ Ivoire | float | percent of alleles introgressed from S. bovis in the S. haematobium population from Cote d’ Ivoire |
sFigure4 | Niger | float | percent of alleles introgressed from S. bovis in the S. haematobium population from Niger |
sFigure4 | Sudan | float | percent of alleles introgressed from S. bovis in the S. haematobium population from Sudan |
sFigure4 | Other Countries | float | percent of alleles introgressed from S. bovis in the S. haematobium population from all other north African countries |
sFigure5 | Chromosome | string | chromosomal position of variant |
sFigure5 | Start | integer | start position of the variant window |
sFigure5 | End | integer | end position of the variant window |
sFigure5 | N (SNVs) | integer | number of single nucleotide variants wihtin the region |
sFigure5 | Nigeria | float | percent of alleles introgressed from S. bovis in the S. haematobium population from Nigeria |
sFigure5 | Cote d’ Ivoire | float | percent of alleles introgressed from S. bovis in the S. haematobium population from Cote d’ Ivoire |
sFigure5 | Niger | float | percent of alleles introgressed from S. bovis in the S. haematobium population from Niger |
sFigure5 | Sudan | float | percent of alleles introgressed from S. bovis in the S. haematobium population from Sudan |
sFigure5 | Other Countries | float | percent of alleles introgressed from S. bovis in the S. haematobium population from all other north African countries |
sFigure6 | Haplotype_ID | string | identifier used to differentiate introgression estimates from each sample/haplotype |
sFigure6 | Generations | float | estimated number of generations since S. bovis introgression |
sFigure6 | Country | string | country of origin for the parasite sample |
sFigure6 - Pvalues | Group 1 | string | country of origin for samples in group 1 |
sFigure6 - Pvalues | Group 2 | string | country of origin for samples in group 2 |
sFigure6 - Pvalues | p-adj | float | Bonferonni adjusted p-value comparing differneces in introgression estimates between group 1 and group 2 |
Code/software
Environmental recipe files, Jupyter notebooks, and other code can is archived on GitHub (github.com/nealplatt/sch_hae_scan v0.1z) and archived on Zenodo at: https://doi.org/10.5281/zenodo.13124719
Access information
The DNA sequence data generated in this study have been deposited in the NCBI Short Read Archive database under the BioProject accession code PRJNA636746 [https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA636746]. The autosomal and mitochondrial phylogenetic trees, and mitochondrial genome generated in this study as well as all source data needed to recreate figures are provided in a Supplemental Data files and archived on Dryad.
Sample collection: description, ethics, and identification: We used samples or data from three sources. i) The first dataset was generated from samples provided by the Schistosomiasis Collection At the Natural History Museum1 which is housed at the Natural History Museum (London). SCAN samples consisted of individual miracidia and cercariae preserved on Whatman FTA cards 2. We analyzed 114 S. haematobium and S. bovis samples from 123 individual hosts (snails or humans) and 12 Africa countries. ii) In addition to the SCAN samples, we collected nine adult Schistosome worms, presumed to be S. bovis, from the intestines of routinely slaughtered cattle from meat vendors at three abattoirs located in Auchi, Benin City, and Enugu in Nigeria. In the laboratory, the mesenteric vessels of each purchased intestines were visually inspected for schistosome parasites. Adult schistosomes were recovered using forceps and washed in saline solution. Adult pairs were separated into males and females before being stored in 96% ethanol for subsequent DNA isolation analyses. iii) Finally, for the third source of data we used whole genome sequence data from NCBI3-8.
Samples provided by the SCAN repository were originally collected in accordance with protocols approved by local, state, and national authorities, including the Ministry of Health. The Imperial College Research Ethics Committee (ICREC) at Imperial College London, in conjunction with ongoing Schistosomiasis Control Initiative (SCI) activities, provided additional ethical guidance for samples collected through the CONTRAST program. Ethical clearance and study protocols for Nigerian samples were approved by the National Health Research Ethics Committee of Nigeria (NHREC) (protocol number: NHREC/01/01/2007– 30/10/2020 and approval number: NHREC/01/01/2007– 29/03/2021) and the Institutional Review Board (IRB) of University of Texas Health, San Antonio Texas, United States of America (protocol number: HSC20180612H). Informed consent was obtained from all participants, with processes tailored to ensure understanding and voluntary participation. All data were anonymized to protect participant privacy, and schistosomiasis-positive individuals were treated with a single dose of praziquantel (40 mg/kg). For livestock parasite collection, approval was secured from local veterinarians. No animals were euthanized for research purposes; Schistosoma samples were collected during routine activities at abattoirs. Further details on collection methods, ethical approvals, and data availability for public samples can be found in their respective publications documented in Supplemental Data File 1.
Provisional species identifications were assigned to cercariae and miracidia based on sampled host. For example, miracidia hatched from eggs collected from human urine samples were assumed to be S. haematobium while miracidia hatched from eggs in cattle feces were assumed to be S. bovis. Cercariae collected from snails were identified by Sanger sequencing the mitochondrial cox1 region and the ribosomal internal transcribed spacer (ITS) rDNA. The mitochondria was genotyped at the cox1 gene using a multiplex PCR that contains a standard forward primer (Asmit1: forward [TTT TTT GGT CAT CCT GAG GTG TAT]) and species specific reverse primers (SbR: reverse [CAC AGG ATC AGA CAA ACG AGT ACC], ShR: reverse [TGA TAA TCA ATG ACC CTG CAA TAA]). Amplicons were visualized on a 2% gel. Larger amplicons (543 bp) indicated S. haematobium and smaller amplicons (306 bp) were diagnostic for S. bovis9. The ribosomal ITS sequence was amplified using the ETTS1 (TGC TTA AGT TCA GCG GGT) and ETTS2 (TAA CAA GGT TTC CGT AGG TGA A) primers 10. Amplicons were Sanger sequenced and the resulting fragments were assigned to species based on comparison to reference samples11. Downstream genetic analysis with whole genome SNVs was used to confirm and reassign species identifications where necessary.
Library prep and sequencing: DNA from single parasites stored on FTA cards was subjected to whole-genome amplification (WGA). Single miracidia2 isolated by punching the FTA card into a sterile tube. We used the GenomiPhi V2 DNA Amplification kit (Sigma-Aldrich: Cat. No. GE25-6600-31) and the recommended protocols to amplify the schistosome DNA. DNA was extracted from single male adult S. bovis worms using the DNeasy® Blood and Tissue kit (Qiagen: Cat. No.69504) before subsequent WGA. We quantified amount of schistosome DNA in each WGA sample by real time quantitative PCR (qPCR) reactions using the single copy gene α-tubulin 1 gene markers primers (S. haematobium: forward [GGT GGT ACT GGT TCT GGT TT], reverse [AAA GCA CAA TCC GAA TGT TCT AA]; S. bovis: forward [ATG GCC TCG TTA TCA ACC AT], reverse [TGG CCT CGT TAT CAA CCA TA]) 2. The PCR reaction was denatured at 95 °C for 10 minutes, followed by 40 cycles of 95 °C for 15 seconds and 60 °C for 1 minute. A standard curve was generated using six dilutions of α-tubulin PCR product spanning 1.29 × 10¹ to 1.29 × 10⁷ copies/µL. DNA sequencing libraries were generated from 500 ng of DNA per sample using the KAPA Hyperplus kit (Roche: Cat. No. 07962401001) protocol with the following modifications: i) enzymatic fragmentation at 37°C for 10 minutes, ii) adapter ligation at 20°C for an hour, and iii) 4 cycles of library PCR amplification. After qPCR quantification of each library with KAPA Library Quantification kits (Roche: Cat. No. 07960140001), samples with similar concentrations were combined into pools for sequencing at 4nM, while samples with disparate concentrations were equalized in 10 mM Tris-HCl pH 8.5 before pooling. Libraries were sequenced with 150 bp paired-end reads on two Illumina NovaSeq flowcell. All resulting reads were deposited in the NCBI Short Read Archive under BioProject PRJNA636746 and are documented in Supplemental Data File 1.
Read filtering and Mapping: Raw reads were quality trimmed with trimmomatic v0.39 12 using the following parameters: LEADING:10, TRAILING:10, SLIDINGWINDOW:4:15, MINLEN:36, ILLUMINACLIP:2:30:10:1:true. This command removed low quality bases at the beginning and ends of the reads, removed portions of the read where quality dropped below a minimum threshold, trimmed adapter sequences and discarded reads <36 nts. We then mapped the trimmed reads to the Egyptian-strain S. haematobium reference genome, GCF_000699445.35, with BBMap v38.1813. On average the S. haematobium and S. bovis (GCA_944470425.1) genome assemblies are ~97% similar across their genomes8 which should minimally affect reference biases when mapping short reads. However, to avoid reference biases we used the ‘vslow’ and ‘minid=0.8’ options with BBMap and discarded ambiguously mapping reads (‘ambig=toss’).
Genotyping, phasing, and filtering: Mapped reads were sorted with SAMtools v1.1314 and checked for duplicates with GATK v4.2.0.0’s15 mark_duplicates. Then single nucleotide variants (SNVs) were genotyped with HaplotypeCaller and GenotypeGVCFs. To make the dataset more manageable, we genotyped each chromosome separately using the -L option. Next, we removed all indels and hard filtered SNVs based on QualByDepth (QD < 2.0), RMSMappingQuality (MQ < 30.0), FisherStrand (FS > 60.0), StrandOddsRation (SOR > 3.0), MappingQualityRankSumTest (MQRankSum < -12.5), and ReadPosRankSumTest (ReadPosRankSum < -8.0) with GATK’s VariantFiltration. We removed multi-allelic sites, and sites with genotype quality (GQ) <20 or read depth (DP) <8 with VCFtools v0.1.1616. After these filters were applied we removed genomic sites that were genotyped in ≤50% of individuals and then any individuals that were genotyped at ≤50% of sites.
References
1 Emery, A. M., Allan, F. E., Rabone, M. E. & Rollinson, D. Schistosomiasis collection at NHM (SCAN). Parasites & vectors 5, 185, doi:10.1186/1756-3305-5-185 (2012).
2 Le Clec'h, W. et al. Whole genome amplification and exome sequencing of archived schistosome miracidia. Parasitology 145, 1739-1747, doi:10.1017/s0031182018000811 (2018).
3 Rey, O. et al. Diverging patterns of introgression from Schistosoma bovis across S. haematobium African lineages. PLoS pathogens 17, e1009313, doi:10.1371/journal.ppat.1009313 (2021).
4 Platt, R. N. et al. Ancient Hybridization and Adaptive Introgression of an Invadolysin Gene in Schistosome Parasites. Molecular biology and evolution 36, 2127-2142, doi:10.1093/molbev/msz154 (2019).
5 Stroehlein, A. J. et al. Chromosome-level genome of Schistosoma haematobium underpins genome-wide explorations of molecular variation. PLoS pathogens 18, e1010288, doi:10.1371/journal.ppat.1010288 (2022).
6 Comparative genomics of the major parasitic worms. Nature genetics 51, 163-174, doi:10.1038/s41588-018-0262-1 (2019).
7 Young, N. D. et al. Whole-genome sequence of Schistosoma haematobium. Nature genetics 44, 221-225, doi:10.1038/ng.1065 (2012).
8 Oey, H. et al. Whole-genome sequence of the bovine blood fluke Schistosoma bovis supports interspecific hybridization with S. haematobium. PLoS pathogens 15, e1007513, doi:10.1371/journal.ppat.1007513 (2019).
9 Webster, B. L., Rollinson, D., Stothard, J. R. & Huyse, T. Rapid diagnostic multiplex PCR (RD-PCR) to discriminate Schistosoma haematobium and S. bovis. Journal of helminthology 84, 107-114, doi:10.1017/s0022149x09990447 (2010).
10 Kane, R. A. & Rollinson, D. Repetitive sequences in the ribosomal DNA internal transcribed spacer of Schistosoma haematobium, Schistosoma intercalatum and Schistosoma mattheei. Molecular and biochemical parasitology 63, 153-156, doi:10.1016/0166-6851(94)90018-3 (1994).
11 Pennance, T. et al. Interactions between Schistosoma haematobium group species and their Bulinus spp. intermediate hosts along the Niger River Valley. Parasites & vectors 13, 268, doi:10.1186/s13071-020-04136-9 (2020).
12 Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120, doi:10.1093/bioinformatics/btu170 (2014).
13 Bushnell, B. BBMap: a fast, accurate, splice-aware aligner. (Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), 2014).
14 Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079, doi:10.1093/bioinformatics/btp352 (2009).
15 McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297-1303, doi:10.1101/gr.107524.110 (2010).
16 Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156-2158, doi:10.1093/bioinformatics/btr330 (2011).