Custom script for feature extraction from Genbank files
Cite this dataset
Liao, Li et al. (2024). Custom script for feature extraction from Genbank files [Dataset]. Dryad. https://doi.org/10.5061/dryad.gb5mkkwwj
Abstract
Microbes are thought to be distributed and circulated around the world, but the connection between marine and terrestrial microbiomes is largely unknown. We use Plantibacter, a representative plant-associated genus, as our research model to show the global distribution and adaptation of plant-related bacteria in plant-free environments, especially in the remote Southern Ocean and the deep Atlantic Ocean. The marine isolates and their plant-associated relatives shared over 98% whole-genome average nucleotide identity (ANI), indicating recent divergence and ongoing speciation from plant-related niches to marine environments. Comparative genomics revealed that the marine strains acquired new genes via horizontal gene transfer from non-Plantibacter species and refined existing genes through positive selection to improve adaptation to new habitats. Meanwhile, marine strains retained the ability to interact with plants, such as modifying root system architecture and promoting germination. Plantibacter species were further found to be widely distributed in marine environments, revealing an unrecognized phenomenon that plant-associated microbiomes have colonized the ocean, which could serve as a reservoir for plant growth-promoting microbes. This study demonstrates the presence of an active reservoir of terrestrial plant growth-promoting bacteria in remote marine systems and advances our understanding of the microbial connections between plant-associated and plant-free environments at the genome level.
README: Custom script for feature extraction from Genbank files
https://doi.org/10.5061/dryad.gb5mkkwwj
The script is used to extract isolation source from GenBank files and can be adjusted to extract other features as well. This script can help extract the relevant information batch rather than manually copy the information one by one. The script contains all the information needed to perform the information extraction. GenBank file (x.gb) to be processed should be provided by users. An example GenBank file for test is included in this dataset (16S_plantibacter_isolates.gb).
Description of the data and file structure
The script was written in Python and is Linux compatible. Biopython should be installed in the system to run this script. Steps to run this script on Linux system are as below.
1. First download the GenBank file (x.gb) into the same location where the script is.
2. Then change the gb file name to the one written in the script or change the gb file name in the script to the one you use.
3. Run the script by typing ./feature_extr.py.
4. Output will be produced at the same location.
Sharing/Access information
NA
Code/Software
The script was written in Python and is Linux compatible. Biopython should be installed in the system to run this script. Steps to run this script on Linux system are as above.
Methods
The script was written in Python to extract isolation source in a genbank file.
Funding
Ministry of Science and Technology of the People's Republic of China, Award: 2022YFC2807501
National Natural Science Foundation of China, Award: 41976224