Populations genomics of deep-sea hydrothermal vent copepod Stygiopontius lauensis: from raw fasta files to filtered vcf file
Data files
Mar 08, 2024 version files 1.37 GB
-
1_snp_per_locus_het_0.6_maxmi.vcf
-
1_tuimalila.fasta
-
10_tuimalila.fasta
-
100_tahimoana_filtered.fasta
-
101_tahimoana_filtered.fasta
-
102_tahimoana_filtered.fasta
-
103_tahimoana_filtered.fasta
-
104_tahimoana_filtered.fasta
-
106_tahimoana_filtered.fasta
-
107_tahimoana_filtered.fasta
-
108_mangatolo.fasta
-
109_mangatolo.fasta
-
11_tuimalila.fasta
-
110_mangatolo.fasta
-
111_mangatolo.fasta
-
112_mangatolo.fasta
-
113_mangatolo.fasta
-
114_mangatolo.fasta
-
115_mangatolo.fasta
-
116_mangatolo.fasta
-
117_mangatolo.fasta
-
118_mangatolo.fasta
-
119_mangatolo.fasta
-
12_tuimalila.fasta
-
120_mangatolo.fasta
-
121_mangatolo.fasta
-
122_mangatolo.fasta
-
123_mangatolo.fasta
-
124_mangatolo.fasta
-
125_mangatolo.fasta
-
126_mangatolo.fasta
-
127_mangatolo.fasta
-
128_mangatolo.fasta
-
129_mangatolo.fasta
-
13_tuimalila.fasta
-
130_mangatolo.fasta
-
131_mangatolo.fasta
-
132_mangatolo.fasta
-
133_mangatolo.fasta
-
134_mangatolo.fasta
-
135_mangatolo.fasta
-
136_mangatolo.fasta
-
137_mangatolo.fasta
-
138_mangatolo.fasta
-
139_mangatolo.fasta
-
14_tuimalila.fasta
-
140_mangatolo.fasta
-
141_mangatolo.fasta
-
142_mangatolo.fasta
-
143_mangatolo.fasta
-
144_mangatolo.fasta
-
145_mangatolo.fasta
-
146_mangatolo.fasta
-
147_mangatolo.fasta
-
148_mangatolo.fasta
-
149_mangatolo.fasta
-
15_tuimalila.fasta
-
150_mangatolo.fasta
-
151_mangatolo.fasta
-
152_mangatolo.fasta
-
153_mangatolo.fasta
-
154_mangatolo.fasta
-
155_mangatolo.fasta
-
156_mangatolo.fasta
-
157_mangatolo.fasta
-
158_mangatolo.fasta
-
159_mangatolo.fasta
-
16_tuimalila.fasta
-
160_mangatolo.fasta
-
161_mangatolo.fasta
-
162_mangatolo.fasta
-
163_mangatolo.fasta
-
164_mangatolo.fasta
-
165_mangatolo.fasta
-
166_mangatolo.fasta
-
167_mangatolo.fasta
-
168_mangatolo.fasta
-
169_mangatolo.fasta
-
17_tuimalila.fasta
-
170_mangatolo.fasta
-
171_mangatolo.fasta
-
18_tuimalila.fasta
-
19_tuimalila.fasta
-
2_tuimalila.fasta
-
20_tuimalila.fasta
-
21_tuimalila.fasta
-
22_tuimalila.fasta
-
23_tuimalila.fasta
-
24_tuimalila.fasta
-
3_tuimalila.fasta
-
4_tuimalila.fasta
-
45_ABE_filtered.fasta
-
46_ABE_filtered.fasta
-
47_ABE_filtered.fasta
-
48_ABE_filtered.fasta
-
49_ABE_filtered.fasta
-
5_tuimalila.fasta
-
50_ABE_filtered.fasta
-
51_ABE_filtered.fasta
-
52_ABE_filtered.fasta
-
53_ABE_filtered.fasta
-
54_ABE_filtered.fasta
-
55_ABE_filtered.fasta
-
56_ABE_filtered.fasta
-
57_ABE_filtered.fasta
-
58_ABE_filtered.fasta
-
59_ABE_filtered.fasta
-
6_tuimalila.fasta
-
60_ABE_filtered.fasta
-
61_ABE_filtered.fasta
-
62_ABE_filtered.fasta
-
63_ABE_filtered.fasta
-
64_ABE_filtered.fasta
-
65_ABE_filtered.fasta
-
66_ABE_filtered.fasta
-
67_ABE_filtered.fasta
-
68_ABE_filtered.fasta
-
69_ABE_filtered.fasta
-
7_tuimalila.fasta
-
70_ABE_filtered.fasta
-
71_ABE_filtered.fasta
-
72_ABE_filtered.fasta
-
73_ABE_filtered.fasta
-
74_ABE_filtered.fasta
-
75_ABE_filtered.fasta
-
76_ABE_filtered.fasta
-
77_ABE_filtered.fasta
-
78_ABE_filtered.fasta
-
79_ABE_filtered.fasta
-
8_tuimalila.fasta
-
80_ABE_filtered.fasta
-
81_ABE_filtered.fasta
-
82_ABE_filtered.fasta
-
83_ABE_filtered.fasta
-
84_ABE_filtered.fasta
-
86_tahimoana_filtered.fasta
-
87_tahimoana_filtered.fasta
-
88_tahimoana_filtered.fasta
-
89_tahimoana_filtered.fasta
-
9_tuimalila.fasta
-
90_tahimoana_filtered.fasta
-
91_tahimoana_filtered.fasta
-
92_tahimoana_filtered.fasta
-
93_tahimoana_filtered.fasta
-
94_tahimoana_filtered.fasta
-
95_tahimoana_filtered.fasta
-
96_tahimoana_filtered.fasta
-
97_tahimoana_filtered.fasta
-
98_tahimoana_filtered.fasta
-
99_tahimoana_filtered.fasta
-
popmap_filtered.csv
-
popmap_filtered.txt
-
README.md
Abstract
Copepoda is the most abundant taxon in deep-sea hydrothermal vents, where hard substrate is available. Despite the increasing interest in seafloor massive sulphides exploitation, there have been no population genomic studies conducted on vent meiofauna, which are known to contribute over 50% to metazoan biodiversity at vents. To bridge this knowledge gap, restriction site-associated DNA sequencing, specifically 2b-RADseq, was used to retrieve thousands of genome-wide single nucleotide polymorphisms (SNPs) from abundant populations of the vent-obligate copepod Stygiopontius lauensis from the Lau Basin. SNPs were used to investigate population structure, demographic histories, genotype-environment associations at a basin scale. Genetic analyses also helped to evaluate the suitability of tailored larval dispersal models and the parameterization of life history traits that better fit the population patterns observed in the genomic dataset for the target organism. Highly structured populations were observed on both spatial and temporal scales, with divergence of populations between the north, mid, and south of the basin estimated to have occurred after the creation of the major transform fault dividing the Australian and the Niuafo’ou tectonic plate (350 kya), with relatively recent secondary contact events (< 20 kya). Larval dispersal models were able to predict the high levels of structure and the highly asymmetric northward low-level gene flow observed in the genomic data. These results differ from most studies conducted on megafauna in the region, elucidating the need to incorporate smaller size when considering site-prospecting for deep-sea exploitation of seafloor massive sulphides, and the creation of area-based management tools to protect areas at risk of local extinction, should mining occur.
README: Populations genomics of deep-sea hydrothermal vent Copepod Stygiopontius lauensis: from raw fasta files to filtered vcf file
https://doi.org/10.5061/dryad.zkh1893g2
Description of the data and file structure
This dataset contains Fasta files which are demultiplexed by specimen and labeled as follows: specimen_ventsite_filtered/notfiltered. Where the samples are filtered, they are filtered by GC content (RADs with GC higher than 50% were removed). These files are the raw files used for all analyses in the study. Additionally, a popmap.txt file can be found containing the mapping information needed to group the Fasta files and subsequent genotypes by metadata of the vent site and year collected. For population genomics, either a global alignment method such as STACKS2 or a k-mer-based approach for calling Single Nucleotide Polymorphisms (SNPs) such as DiscoSnpRAD++ can be applied to the raw fasta files following the instructions for each software that can easily be found on the following webpages (the output will be a Variant Calling File (VCF) or any other file type of your choose depending on the type of downstream analysis:
STACKS2:
https://catchenlab.life.illinois.edu/stacks/
DiscoSnpRAD++:
https://github.com/GATB/DiscoSnp
To apply simple population genomics analysis pertaining to population structure, the vcf file can be read into SNPrelate in R. To apply demographic analyses, the following scripts can be applied directly to the vcf file:
https://github.com/Atranluy/Scripts-Ifremeria
Sharing/Access information
n/a
Code/Software
n/a
Methods
This dataset was collected by implementing a modified (for the target species) version of the original 2b-RAD protocols (Wang et al., 2012) on 149 individuals of Stygiopontius lauensis copepods. These were collected from hydrothermal vents in the Lau Basin (Southwest Pacific Ocean). The pooled library was composed of samples collected in 2016 and in 2019. Samples from 2016 were sequenced on an Illumina Nextseq 500, while those from 2019 were sequenced on an Illumina Nextseq 2000 at University Medical Centre, Utrecht (UMC). Samples were demultiplexed by UMC. Raw fastq files were further demultiplexed by specimen using a modified script that implements Cutadapt, finding specimen-specific barcodes. These files were then filtered for low-quality reads and duplicates in the same script, resulting in 149 fasta files being used for downstream analysis.
Fasta files were run through DiscoSnpRad++ using a kmer length of 15, and an auto read depth. The resulting vcf file was then run through the DiscoSnpRad postprocessing scripts to generate a vcf file containing 1 SNP per locus. This resulting vcf file was filtered for missing data (cutoff = 0.3) and heterozygosity (cutoff = 0.6). Individuals with data > 0.5 were also removed.