Genome assemblies for halophilic bacteria with potential contamination, sampled from Northern California in 2022
Data files
Nov 29, 2024 version files 22.07 MB
-
B23F22_14_S8.fa
7 MB
-
B23F22_23_kncow_S11.fa
4.64 MB
-
B23F22_6_s5.fa
10.43 MB
-
README.md
1.65 KB
Abstract
BIS 23 is an undergraduate research course offered at UC Davis, designed to teach new undergraduates the fundamentals of good science and academic research. The course is structured around the collection and sequencing of halophilic organisms (e.g. salt-tolerant bacteria) from samples taken by students themselves. In the 2022/2023 iteration of the course, students took dozens of environmental samples from Northern California, primarily in Davis, CA. These samples were cultured, incubated, and the most successful ones were sent for sequencing. Nine genome scaffolds were successfully created after sequencining on the Illumina MiSeq platform and processing with Trimmomatic 0.36 and SPAdes 3.7. Only 6 of the 9 scaffolds were accepted by the NCBI Genome database, due to suspected contamination in the remaining 3. This dataset provides the FASTA files for the 3 assemblies with suspected contamination. Researchers can freely access and use this dataset for any purpose, such as applying more advanced contamination detection and removal algorithms that were beyond the scope of the undergraduate course.
https://doi.org/10.5061/dryad.0000000d9
Description of the data and file structure
These genome assemblies were derived from environmental samples collected in Northern California. These samples were processed, cultured, and incubated in high salinity media to isolate halophilic organism strains. After sequencing on Illumina MiSeq machines, 9 samples returned successful reads that were subsequently assembled into scaffolds with Trimmomatic and SPAdes. 6 of the scaffolds were successfully accepted to the NCBI Genome database with the remaining 3 rejected for high suspected contamination. The FASTA files for those 3 possibly-contaminated scaffolds are provided here.
Files and variables
File: B23F22_23_kncow_S11.fa
Description: BioSample (SAMN43907867), collected in Davis, California from dry soil. Likely taxonomy is Bacillus sp.
File: B23F22_6_s5.fa
Description: BioSample (SAMN43907873), collected in Davis, California from dry soil. Likely taxonomy is Piscibacillus halophilus.
File: B23F22_14_S8.fa
Description: BioSample (SAMN43907869), collected in Doran Beach, California from wet, coast sand. Likely taxonomy is Salimicrobium jeotgali.
Code/software
These FASTA files provide the genomic assemblies for three salt-tolerant organisms. They can be analyzed with any genomic software that can read FASTA files.
Access information
Data was derived from environmental samples collected in Northern California.
These samples were collected by undergraduates at UC Davis in September/October 2022. The samples were isolated, cultured, and incubated in high salinity media. The cultures that showed the most growth and successfully proceeded through DNA extraction were sent for sequencing on the Illumina MiSeq platform. 6 of the 9 assemblies without suspected contamination were submitted to the NCBI Genome database. The remaining 3 with suspected contamination are presented here as FASTA files.