Genomes for halophilic bacteria with potential contamination, sampled from Northern California in 2021
Data files
Nov 25, 2024 version files 18.12 MB
-
231232_2ArWaCKT.fasta
9.10 MB
-
231240_10ClHs.fasta
9.02 MB
-
README.md
838 B
Abstract
UC Davis offers an undergraduate research course (BIS 23A/B) designed to teach new students the basics about wet-lab experimentation. The course aims to serve as a gentle introduction to academia. In the 2021/2022 iteration of the course, students sampled, prepared, and sequenced genomes for halophilic (salt tolerant) bacteria, collected from environmental samples in Northern California. One draft genome was successfully sequenced without contamination and was submitted to the NIH's genetics database. Two other genome scaffolds, though successfully sequenced, are high in suspected contamination and were not accepted by the NCBI. These possibly-contaminated scaffolds are provided as FASTA files in this Dryad database. Both samples were collected in September/October 2021. Researchers can freely access and use these scaffolds for any purpose, such as seeking to apply more advanced contamination detection and removal algorithms that are beyond the scope of the course.
README: Additional genomes from BIS23 (2021)
https://doi.org/10.5061/dryad.dbrv15fbz
Description of the data and file structure
These genomes were collected from samples in Davis, CA for an undergraduate research course (BIS 23A/BIS 23B) taught by the College of Biological Sciences at UC Davis in 2021.
Files and variables
File: 231232_2ArWaCKT.fasta
Description: Sample from a Saltwater Fish Tank that shares close identity with existing Halomonas sp.
File: 231240_10ClHs.fasta
Description: Sample from a Arboretum Soil that shares close identity with Bacillus sp.
Code/software
The data can be analyzed with any genome analysis software that supports FASTA files.
Access information
Data was derived from environmental samples in Northern California.
Methods
These environmental samples were collected in Northern Calfiornia, primarily in Davis, California. After culturing and incubating, they were sequenced on Illumina MiSeq machines. The reads were processed by Trimmomatic and SPAdes, with validation performed by CHECKM, the NCBI Foreign Contamination Screen, and ContEst16.