Skip to main content
Dryad

Custom script for feature extraction from Genbank files

Cite this dataset

Liao, Li et al. (2024). Custom script for feature extraction from Genbank files [Dataset]. Dryad. https://doi.org/10.5061/dryad.gb5mkkwwj

Abstract

Microbes are thought to be distributed and circulated around the world, but the connection between marine and terrestrial microbiomes is largely unknown. We use Plantibacter, a representative plant-associated genus, as our research model to show the global distribution and adaptation of plant-related bacteria in plant-free environments, especially in the remote Southern Ocean and the deep Atlantic Ocean. The marine isolates and their plant-associated relatives shared over 98% whole-genome average nucleotide identity (ANI), indicating recent divergence and ongoing speciation from plant-related niches to marine environments. Comparative genomics revealed that the marine strains acquired new genes via horizontal gene transfer from non-Plantibacter species and refined existing genes through positive selection to improve adaptation to new habitats. Meanwhile, marine strains retained the ability to interact with plants, such as modifying root system architecture and promoting germination. Plantibacter species were further found to be widely distributed in marine environments, revealing an unrecognized phenomenon that plant-associated microbiomes have colonized the ocean, which could serve as a reservoir for plant growth-promoting microbes. This study demonstrates the presence of an active reservoir of terrestrial plant growth-promoting bacteria in remote marine systems and advances our understanding of the microbial connections between plant-associated and plant-free environments at the genome level.

README: Custom script for feature extraction from Genbank files

https://doi.org/10.5061/dryad.gb5mkkwwj

The script is used to extract isolation source from GenBank files and can be adjusted to extract other features as well. This script can help extract the relevant information batch rather than manually copy the information one by one. The script contains all the information needed to perform the information extraction. GenBank file (x.gb) to be processed should be provided by users. An example GenBank file for test is included in this dataset (16S_plantibacter_isolates.gb).

Description of the data and file structure

The script was written in Python and is Linux compatible. Biopython should be installed in the system to run this script. Steps to run this script on Linux system are as below.
1. First download the GenBank file (x.gb) into the same location where the script is.
2. Then change the gb file name to the one written in the script or change the gb file name in the script to the one you use.
3. Run the script by typing ./feature_extr.py.
4. Output will be produced at the same location.

Sharing/Access information

NA

Code/Software

The script was written in Python and is Linux compatible. Biopython should be installed in the system to run this script. Steps to run this script on Linux system are as above.

Methods

The script was written in Python to extract isolation source in a genbank file.

Funding

Ministry of Science and Technology of the People's Republic of China, Award: 2022YFC2807501

National Natural Science Foundation of China, Award: 41976224