Datasets used for Western Mediterranean Dugesia phylotranscriptomic analyses
Data files
Oct 28, 2022 version files 488.64 MB
-
Phylotranscriptome_Dugesia_Data.zip
-
README.md
Abstract
The Mediterranean is one of the most biodiverse areas of the Paleartic region. Here, basing on large data sets of single copy orthologs obtained from transcriptomic data, we investigated the evolutionary history of the genus Dugesia in the Western Mediterranean area. The results corroborated that the complex paleogeological history of the region was an important driver of diversification for the genus, speciating as microplates and islands were forming. These processes led to the differentiation of three main biogeographic clades: Iberia-Apennines-Alps, Corsica-Sardinia, and Iberia-Africa. The internal relationships of these major clades were analysed with several representative samples per species. The use of large data sets regarding the number of loci and samples, as well as state-of-the-art phylogenomic inference methods allowed us to answer different unresolved questions about the evolution of particular groups, such as the diversification path of D. subtentaculata in the Iberian Peninsula and its colonization of Africa. Additionally, our results support the differentiation of D. benazzii in two lineages which could represent two species. Finally, we analysed here for the first time a comprehensive number of samples from several asexual Iberian populations whose assignment at the species level has been an enigma through the years. The phylogenies obtained with different inference methods showed a branching topology of asexual individuals at the base of sexual clades. We hypothesize that this unexpected topology is related to long-term asexuality. This work represents the first phylotranscriptomic analysis of Tricladida, laying the first stone of the genomic era in phylogenetic studies on this taxonomic group.
Methods
This data has been obtained from intermediate steps described in the Phylotranscriptomic workflow available at https://github.com/lisy87/dugesia-transcriptome that includes all necessary scripts and commands to perform every step.
After filtering, 82 samples of Dugesia species (Platyhelminthes: Tricaldida: Dugesiidae) from the Western Mediterranean region were analyzed.
Usage notes
Data Description:
This data has been obtained from intermediate steps described in the Phylotranscriptomic workflow available at https://github.com/lisy87/dugesia-transcriptome. After filtering, 82 samples of Dugesia species (Platyhelminthes: Tricaldida: Dugesiidae) from Western Mediterranean region were analyzed. All files are in fasta format.
Groups of files:
1) *_longiso_pep.fasta
Protein sequence of longest isorforms
These files contain the longest isorfoms obtained from Transdecoder output (*.pep), which were the input files in the orthologs searches with Orthofinder. One file by sample is available.
2) OG*_SC_**_prot.fasta
OG*_SC_**_nuc.fasta.
Single Copy orthogroups (SC):
These files contain the nucleotide (*_nuc.fasta) and protein (*_prot.fasta) sequences of every SC (OG*). Every file contains one representative sequence by sample.
**: “all”, “subte”, and “etru-ligu” are the three orthologs searches performed. For them were obtained: 717 SC (all), 4175 SC (subte), and 1984 SC (etru-ligu).
3) Align_Dataset_*.fasta
Final alignments of nucleotide sequences. One alignment by dataset:
Dataset 1: All samples retained after filtering (82 samples, “all” orthogroups search, 717 SC)
Dataset 2: Reduced Dataset 1 (29 samples, “all” orthogroups search, 717 SC)
Dataset 3: Reduced Dataset 1 (13 samples, “all” orthogroups search, 717 SC)
Dataset 4: Subtentaculata group (23 samples, “subte” orthogroups search, 4175 SC)
Dataset 5: Etrusca-liguriensis group (36 samples, “etru-ligu” orthogroups search, 1984 SC)
Dataset 6: Dataset 5 without samples from Berga (31 samples, “etru-ligu” orthogroups search, 1984 SC)