Dipsacales four locus supermatrix assembled with PyNCBIminer
Data files
Nov 29, 2024 version files 23.62 MB
-
Dipsacales.zip
23.62 MB
-
README.md
2.09 KB
Abstract
PyNCBIminer is a user-friendly software that automates the assembly of large DNA datasets from GenBank for phylogenetic reconstruction using the supermatrix method.To evaluate the sensitivity of PyNCBIminer to initial queries, we downloaded four DNA markers (ITS, matK, rbcL and trnLintron-trnF) of Dipsacales using two distinct initial query sets. The experiments were performed on a personal computer featuring a 13th Gen Intel(R) Core (TM) i5-1340P 1.90 GHz processor and 32.0 GB of RAM. Run 1 used six well-aligned sequences from six angiosperm orders as default initial queries in PyNCBIminer, while Run 2 used six sequences from six genera within the order Dipsacales.
README: Dipsacales four locus supermatrix assembled with PyNCBIminer
https://doi.org/10.5061/dryad.xpnvx0kq3
Description of the data and file structure
To evaluate the sensitivity of PyNCBIminer to initial queries, we downloaded four DNA markers (ITS, matK, rbcL and trnLintron-trnF) of Dipsacales using two distinct initial query sets. The experiments were performed on a personal computer featuring a 13th Gen Intel(R) Core (TM) i5-1340P 1.90 GHz processor and 32.0 GB of RAM. Run 1 used six well-aligned sequences from six angiosperm orders as default initial queries in PyNCBIminer, while Run 2 used six sequences from six genera within the order Dipsacales.
The zip file contained two folders for Run 1 and Run 2, which each contained the following sub directories or files:
#original blast results for each maker named by the marker name, including
ITS, matK, rbcL, and trnL-trnF are directly produced by the "Sequence Retrieving" module in PyNCBIminer.
#the following directories or files are the results generated in "Supermatrix Module" of PyNCBIminer.
01_filtered_seqs: the species-level sequences data produced by "Sequences filtering" function.
02_msa: the alignments for each marker's species-level sequences, produced by "Sequences alignment" function.
03_msa_trimmed: the alignments matrix after trimming, generated by "Alignments trimming" function.
04_supermatrix: the supermatrix produced by concatenating each trimmed alignment matrix, produced by the "Alignments concatenation" function.
combined_records.txt: the accession numbers of all the sequences combined in the supermatrix.
Sharing/Access information
Data was derived from the following sources:
Code/Software
Source code, software executables and manuals deposited in Github (https://github.com/Xiaoting-Xu/PyNCBIminer) and Gitee (https://gitee.com/xiaotingxu/PyNCBIminer).