Antarctic krill (Euphausia superba Dana) is a keystone species in the Southern Ocean ecosystem, with ecological and commercial significance. However, its vulnerability to climate change requires an urgent investigation of its adaptive potential to future environmental conditions. Historical museum collections of krill from the early 20^th century represent an ideal opportunity to investigate how krill have changed over time due to predation, fishing, and climate change. However, there is currently no cost-effective method for implementing population scale collection genomics for krill given its genome size (48Gbp). Here, we assessed the utility of two inexpensive methods for population genetics using historical krill samples, specifically low-coverage shotgun sequencing (i.e., “genome-skimming”) and exome capture. Two full-length transcriptomes were generated and used to identify 166 putative gene targets for exome capture bait design. A total of 20 historical krill samples were sequenced using shotgun and exome capture. Mitochondrial and nuclear ribosomal sequences were assembled from both low-coverage shotgun and off-target of exome capture data demonstrating that endogenous DNA sequences could be assembled from historical collections. Although, mitochondrial and ribosomal sequences are variable across individuals from different populations, phylogenetic analysis does not identify any population structure. We find exome capture provides approximately 4,500-fold enrichment of sequencing targeted genes, suggesting this approach can generate the sequencing depth required to call identify a significant number of variants. Unlocking historical collections for genomic analyses using exome capture, will provide valuable insights into past and present biodiversity, resilience, and adaptability of krill populations to climate change.

Description of the data and file structure

This data includes the configuration files and reference data used for the analyses of genome skimming data from krill museum collections. In addition, mitochondrial and ribosomal sequences assembled from previously published raw sequence data are shared here. Finally, sankey plots to visualise where sequence data was retained or lost throughout each step of the pipeline are presented.

Files and variables

File: gene2phylo_config.yaml

Description: Configuration file for the gene2phylo pipeline used in this study. This YAML file contains all parameter settings and input specifications required to reproduce the phylogenetic analyses presented in the paper. Key configuration options include input directory paths, sequence alignment parameters (realignment settings, missing data thresholds, trimming methods), outgroup specification, and output plot dimensions.

File: skim2mito_config.yaml

Description: Configuration file for the skim2mito pipeline used in this study. This YAML file specifies all parameters and settings required to extract, assemble, and analyse mitochondrial genomes from genome skimming data. The configuration includes sample sheet specifications, sequencing adapter sequences, read processing options (deduplication settings), mitochondrial genome assembly parameters (GetOrganelle reference database selection), gene annotation settings (MITOS reference database and genetic code), sequence alignment and trimming parameters, outgroup designation for phylogenetic analysis, and output plot dimensions.

File: skim2mito_samples.csv

Description: Sample sheet for the skim2mito pipeline containing metadata and file paths for all sequencing samples analyzed in this study. The CSV file includes six columns: sample ID, forward read file path, reverse read file path, NCBI taxonomy ID (taxid), seed sequence reference, and target gene specification. Each row represents one paired-end sequencing sample with corresponding FASTQ file locations in the data directory. The taxonomy ID (6819) indicates the target taxonomic group for mitochondrial genome assembly. This sample sheet enables batch processing of multiple genome skimming datasets through the skim2mito pipeline.

Variables

ID: Sample ID
forward: Path to forward reads
reverse: Path to reverse reads
taxid: NCBI taxon ID
seed: Path to GetOrganelle seed reference
gene: Path to GetOrganelle gene reference

File: skim2mito_gene.fasta

Description: Reference gene dataset used for mitochondrial genome assembly and annotation with GetOrganelle. This FASTA file was generated using go_fetch.py (https://github.com/o-william-white/go_fetch) which downloads sequences from related taxa from NCBI.

File: skim2mito_seed.fasta

Description: Reference seed dataset used for mitochondrial genome assembly and annotation with GetOrganelle. This FASTA file was generated using go_fetch.py (https://github.com/o-william-white/go_fetch) which downloads sequences from related taxa from NCBI.

File: skim2rrna_config.yaml

Description: Configuration file for the skim2rrna pipeline used in this study. This YAML file specifies all parameters and settings required to extract, assemble, and analyse ribosomal sequences from genome skimming data. The configuration includes sample sheet specifications, sequencing adapter sequences, read processing options (deduplication settings), gene annotation settings (barrnap kingdom), sequence alignment and trimming parameters, outgroup designation for phylogenetic analysis, and output plot dimensions.

File: skim2rrna_samples.csv

Description: Sample sheet for the skim2rrna pipeline containing metadata and file paths for all sequencing samples analyzed in this study. The CSV file includes six columns: sample ID, forward read file path, reverse read file path, NCBI taxonomy ID (taxid), seed sequence reference, and target gene specification. Each row represents one paired-end sequencing sample with corresponding FASTQ file locations in the data directory. The taxonomy ID (6819) indicates the target taxonomic group for ribosomal gene assembly. This sample sheet enables batch processing of multiple genome skimming datasets through the skim2rrna pipeline.

Variables

ID: Sample ID
forward: Path to forward reads
reverse: Path to reverse reads
taxid: NCBI taxon ID
seed: Path to GetOrganelle seed reference
gene: Path to GetOrganelle gene reference

File: skim2rrna_gene.fasta

Description: Reference gene dataset used for ribosomal gene assembly and annotation with GetOrganelle. This FASTA file was generated using go_fetch.py (https://github.com/o-william-white/go_fetch) which downloads sequences from related taxa from NCBI.

File: skim2rrna_seed.fasta

Description: Reference seed dataset used for ribosomal gene assembly and annotation with GetOrganelle. This FASTA file was generated using go_fetch.py (https://github.com/o-william-white/go_fetch) which downloads sequences from related taxa from NCBI.

File: skim2rrna_shao_et_al.zip

Description: Zipped file containing ribosomal assemblies of raw sequence data from Shao et al. (2023) https://doi.org/10.1016/j.cell.2023.02.005. The assembled sequences are in FASTA format and samples are named following the sample names used by Shao et al. (2023). In total there are 78 FASTA files.

File: skim2mito_shao_et_al.zip

Description: Zipped file containing mitochondrial assemblies of raw sequence data from Shao et al. (2023) https://doi.org/10.1016/j.cell.2023.02.005. The assembled sequences are in FASTA format and samples are named following the sample names used by Shao et al. (2023). In total there are 78 FASTA files.

File: sankey_plots.zip

Description: Zipped file containing sankey plots to visualise where sequence data was retained or lost throughout each step of the pipeline. The sample names are the same as those presented in our study. The sankey plots can be visualised in the HTML files and the associated metadata used to build the plots is found in the folder sharing the same sample name. In total there are 40 HTML files.

Exome capture of Antarctic krill (Euphausia superba) for cost effective genotyping and population genetics with historical collections

Data files

Abstract

Description of the data and file structure

Files and variables

File: gene2phylo_config.yaml

File: skim2mito_config.yaml

File: skim2mito_samples.csv

Variables

File: skim2mito_gene.fasta

File: skim2mito_seed.fasta

File: skim2rrna_config.yaml

File: skim2rrna_samples.csv

Variables

File: skim2rrna_gene.fasta

File: skim2rrna_seed.fasta

File: skim2rrna_shao_et_al.zip

File: skim2mito_shao_et_al.zip

File: sankey_plots.zip

Exome capture of Antarctic krill (Euphausia superba) for cost effective genotyping and population genetics with historical collections

Data files

Abstract

README: Exome capture of Antarctic krill (Euphausia superba) for cost effective genotyping and population genetics with historical collections.

Description of the data and file structure

Files and variables

File: gene2phylo_config.yaml

File: skim2mito_config.yaml

File: skim2mito_samples.csv

Variables

File: skim2mito_gene.fasta

File: skim2mito_seed.fasta

File: skim2rrna_config.yaml

File: skim2rrna_samples.csv

Variables

File: skim2rrna_gene.fasta

File: skim2rrna_seed.fasta

File: skim2rrna_shao_et_al.zip

File: skim2mito_shao_et_al.zip

File: sankey_plots.zip