Data from: A landscape genetics approach reveals species-specific connectivity patterns for stream insects in fragmented habitats
Data files
Mar 12, 2025 version files 21.48 MB
-
ColoburiscusHumeralis_SNP_mapping.csv.zip
5.78 MB
-
HydropsycheFimbriata_SNP_mapping.csv.zip
5.15 MB
-
README.md
5.45 KB
-
ZelandobiusConfusus_SNP_mapping.csv.zip
10.54 MB
Abstract
Dispersal is a critical process in ecology and evolution, shaping global biodiversity patterns. In stream habitats, which often exist within diverse and fragmented landscapes, dispersal ensures population connectivity and survival. For aquatic insects in particular, landscape features may significantly influence the degree of genetic connectivity among populations. Thus, understanding connectivity drivers in such populations is essential for the conservation and management of streams.
We conducted a landscape genetic study using mitochondrial DNA (mtDNA) and genome-wide single nucleotide polymorphism (SNP) markers to assess functional connectivity of stream insects in a fragmented pasture-dominated landscape. We focused on three species with terrestrial winged adults: the mayfly Coloburiscus humeralis, the stonefly Zelandobius confusus, and the caddisfly Hydropsyche fimbriata.
We observed significant spatial genetic structure at larger geographical distances (populations separated by ~30 and 170 km). However, the effect of landscape factors, which were assessed at fine spatial scales varied among species: for C. humeralis SNP data, genetic differentiation was weakly correlated with land cover, suggesting greater population connectivity within stream channels protected by forested riparian zones compared to fragmented streams; for Z. confusus, widespread gene flow indicated high dispersal potential across forested and pasture land; while overland dispersal was reduced for for H. fimbriata (potentially due to local habitat features), this did not seem to hinder broader population connectivity.
Our results emphasise the importance of assessing landscape features when evaluating population connectivity in stream riparian zones, which can greatly benefit stream management efforts through enhanced understanding of connectivity dynamics.
https://doi.org/10.5061/dryad.rxwdbrvm0
Description of the data and file structure
Dataset Overview
This dataset contains the genomic data used for population genetics analyses in the manuscript, to test the rates of gene flow among aquatic insect populations within streams and between streams across three different regions in the North Island of New Zealand: Pirongia, Karioi and Taranaki. The genomic dataset contains three csv files, each with single nucleotide polymorphisms (SNPs) reports for the three studied species: Coloburiscus humeralis, Zelandobius confusus and *Hydropsyche fimbriata. *
The key features of the csv file include SNP position, frequency of genotypes and SNP markers scored as binary presence/absence data:
- "1" → SNP is present in the sample.
- "0" → SNP is absent in the sample.
- "-" → Failed to score (low-quality or missing data).
Files and variables
File: ColoburiscusHumeralis_SNP_mapping.csv.zip
Description: Contain a unique csv file with SNPs scores for Coloburiscus humeralis individual samples
AlleleID: Unique identifier for the allele.
CloneID: Identifier for the clone associated with the SNP.
AlleleSequence: Full nucleotide sequence of the allele.
TrimmedSequence: Trimmed version of the allele sequence.
SNP: SNP mutation.
SnpPosition: Position of the SNP within the sequence.
CallRate: Proportion of successful genotype calls for this SNP.
OneRatioRef: Ratio of homozygous reference alleles.
OneRatioSnp: Ratio of homozygous SNP alleles.
FreqHomRef: Frequency of homozygous reference genotype.
FreqHomSnp: Frequency of homozygous SNP genotype.
FreqHets: Frequency of heterozygous genotype.
PICRef: Polymorphic Information Content (PIC) for reference allele.
PICSnp: Polymorphic Information Content (PIC) for SNP allele.
AvgPIC: Average Polymorphic Information Content across samples.
AvgCountRef: Average read count for reference allele.
AvgCountSnp: Average read count for SNP allele.
RepAvg: Average repeatability score.
CHDF0M01: Unique sample identifier and SNP scored as binary presence/absence data: "1" if SNP is present in the sample; "0" if SNP is absent in the sample; "-" if failed to score (low-quality or missing data). All remaining columns starting with CH follow this description.
File: HydropsycheFimbriata_SNP_mapping.csv.zip
Description: Contain a unique csv file with SNPs scores for Hydropsyche fimbriata individual samples
AlleleID: Unique identifier for the allele.
CloneID: Identifier for the clone associated with the SNP.
AlleleSequence: Full nucleotide sequence of the allele.
TrimmedSequence: Trimmed version of the allele sequence.
SNP: SNP mutation.
SnpPosition: Position of the SNP within the sequence.
CallRate: Proportion of successful genotype calls for this SNP.
OneRatioRef: Ratio of homozygous reference alleles.
OneRatioSnp: Ratio of homozygous SNP alleles.
FreqHomRef: Frequency of homozygous reference genotype.
FreqHomSnp: Frequency of homozygous SNP genotype.
FreqHets: Frequency of heterozygous genotype.
PICRef: Polymorphic Information Content (PIC) for reference allele.
PICSnp: Polymorphic Information Content (PIC) for SNP allele.
AvgPIC: Average Polymorphic Information Content across samples.
AvgCountRef: Average read count for reference allele.
AvgCountSnp: Average read count for SNP allele.
RepAvg: Average repeatability score.
HDF0M3: Unique sample identifier and SNP scored as binary presence/absence data: "1" if SNP is present in the sample; "0" if SNP is absent in the sample; "-" if failed to score (low-quality or missing data). All remaining columns starting with HF follow this description.
File: ZelandobiusConfusus_SNP_mapping.csv.zip
Description: Contain a unique csv file with SNPs scores for Zelandobius confusus individual samples
AlleleID: Unique identifier for the allele.
CloneID: Identifier for the clone associated with the SNP.
AlleleSequence: Full nucleotide sequence of the allele.
TrimmedSequence: Trimmed version of the allele sequence.
SNP: SNP mutation.
SnpPosition: Position of the SNP within the sequence.
CallRate: Proportion of successful genotype calls for this SNP.
OneRatioRef: Ratio of homozygous reference alleles.
OneRatioSnp: Ratio of homozygous SNP alleles.
FreqHomRef: Frequency of homozygous reference genotype.
FreqHomSnp: Frequency of homozygous SNP genotype.
FreqHets: Frequency of heterozygous genotype.
PICRef: Polymorphic Information Content (PIC) for reference allele.
PICSnp: Polymorphic Information Content (PIC) for SNP allele.
AvgPIC: Average Polymorphic Information Content across samples.
AvgCountRef: Average read count for reference allele.
AvgCountSnp: Average read count for SNP allele.
RepAvg: Average repeatability score.
ZCDF0M11: Unique sample identifier and SNP scored as binary presence/absence data: "1" if SNP is present in the sample; "0" if SNP is absent in the sample; "-" if failed to score (low-quality or missing data). All remaining columns starting with ZC follow this description.
Insect Collection
The study sampled aquatic insects at 11 sites across three streams in two neighboring catchments on Mount Pirongia, New Zealand, selected based on land cover, accessibility, and species presence. Sampling occurred at 3–4 sites per stream, spaced at least 490 m apart. Three additional stream sites were included at Mount Karioi (Wainui Stream), approximately 30 km from the main study area, and Mount Taranaki (Katikara & Patea Streams), about 170 km away. All streams had perennial flow, boulder/cobble bottoms, and minimal sediment deposition. Coloburiscus humeralis and Zelandobius confusus nymphs, and Hydropsyche fimbriata larvae were collected via kick-netting or hand-picking and preserved in 95% ethanol.
DNA extraction and SNP sequencing
DNA extraction, sequencing, and SNP genotyping were conducted by Diversity Array Technology (DarTseq™) in Australia. DNA was digested with the PstI-SphI enzyme pair, selected through a pilot study for optimal genome complexity reduction. Custom adapters were used for Illumina sequencing, and fragments were amplified via PCR. Amplified products were pooled and sequenced using 77 cycles of single-read sequencing on the HiSeq2500 (Illumina) platform. Raw reads were processed with a proprietary DarT pipeline for filtering, variant calling, and genotype generation. Each DNA sample was genotyped in duplicate to assess marker reproducibility. SNPs polymorphic across samples were scored as binary data: ‘1’ for presence, ‘0’ for absence, and ‘-’ for missing data.
Additional sequence data information
All individuals were also sequenced for the cytochrome c oxidase subunit I (COI) gene, conducted by the Canadian Centre for DNA Barcoding. Each individual sample ID in the SNP csv files can be matched to both the raw COI sequence data and the sample metadata in the dataset DS-EPTNZNI, published on the Barcode of Life Data Systems (BOLD) at dx.doi.org/10.5883/DS-EPTNZNI.
