Phylogenomics and pervasive genome-wide phylogenetic discordance among fin whales (Balaenoptera physalus)
Data files
Aug 21, 2024 version files 17.86 GB
-
100k_autosomal_astral_consensus.nwk
-
100k_X_chromosome_astral_consensus.nwk
-
100k_Y_chromosome_astral_consensus.nwk
-
100kbp_windows_autosomal_results_final.tsv
-
1mb_autosomal_astral_consensus.nwk
-
1mb_X_chromosome_astral_consensus.nwk
-
1mbp_windows_autosomal_results_final.tsv
-
50kb_autosomal_astral_consensus.nwk
-
50kb_windows_autosomal_results_final.tsv
-
50kb_X_chromosome_astral_consensus.nwk
-
50kb_Y_chromosome_astral_consensus.nwk
-
all_mitochondrial_200.fa
-
all_mitochondrial_200.nwk
-
chrom_y.min4.phy.gz
-
chromX.concat.support.nwk
-
chromX.min4.phy.gz
-
chromY.concat.support.nwk
-
final_phylogeny_auto_filt_snps.min4.phy.gz
-
only_new_mitochondria_alignment.fa
-
only_new_mitochondria_alignment.nwk
-
README.md
-
whole_autosomal.concat.support.nwk
Abstract
Phylogenomics has the power to uncover complex phylogenetic scenarios across the genome. In most cases, no single topology is reflected across the entire genome as the phylogenetic signal differs among genomic regions due to processes, such as introgression and incomplete lineage sorting. Baleen whales are among the largest vertebrates on Earth with a high dispersal potential in a relatively unrestricted habitat, the oceans. The fin whale (Balaenoptera physalus) is one of the most enigmatic baleen whale species, currently divided into four subspecies. It has been a matter of debate whether phylogeographic patterns explain taxonomic variation in fin whales. Here we present a chromosome-level whole genome analysis of the phylogenetic relationships among fin whales from multiple ocean basins. First, we estimated concatenated and consensus phylogenies for both the mitochondrial and nuclear genomes. The consensus phylogenies based upon the autosomal genome uncovered monophyletic clades associated with each ocean basin, aligning with the current understanding of subspecies division. Nevertheless, discordances were detected in the phylogenies based on the Y chromosome, mitochondrial, autosomal genome and X chromosome. Furthermore, we detected signs of introgression and pervasive phylogenetic discordance across the autosomal genome. This complex phylogenetic scenario could be explained by a puzzle of introgressive events, not yet documented in fin whales. Similarly, incomplete lineage sorting and a low phylogenetic signal could equally be the mechanisms leading to such phylogenetic discordances. Our study reinforces the pitfalls of relying on concatenated or single locus phylogenies to determine taxonomic relationships below the species level by illustrating the underlying nuances which some phylogenetic approaches may fail to capture. We emphasize the significance of accurate taxonomic delineation in fin whales by exploring crucial information revealed through genome-wide assessments.
README: Phylogenomics and pervasive genome-wide phylogenetic discordance among fin whales (Balaenoptera physalus)
https://doi.org/10.5061/dryad.v6wwpzh24
Description of the data and file structure
DATA OVERVIEW
- Alignments
File: final_phylogeny_auto_filt_snps.min4.phy.gz
Description: Whole autosomal genome alignment containing a total of monomorphic and SNP sites.
Format: Phylip
File: chromX_alignment.phy.gz
Description: Chromosome X genome alignment containing a total of monomorphic and SNP sites.
Format: Phylip
File: chromY_alignment.phy.gz
Description: Chromosome Y alignment containing a total of monomorphic and SNP sites.
Format: Phylip
File: all_mitochondrial_200.fa
Description: Whole mitochondrial genome alignment including published sequences
Format: fasta
File: only_new_mitochondria_alignment.fa
Description: Whole mitochondrial genome alignment including only newly generated 38 sequences with whole genomes.\
Format: fasta
2. Output trees
Concatenated Data Trees
File: all_mitochondrial_200.nwk
Description: Tree topology with support values for the best-fitting phylogenetic estimation based on the whole mitochondrial data (including new + published sequences).
Format: Newick
File: only_new_mitochondria.nwk
Description: Tree topology with support values for the best-fitting phylogenetic estimation based on the whole mitochondrial data including only new sequences (from whole genomes).
Format: Newick
File: whole_autosomal.concat.support.nwk
Description: Tree topology with support values for the best-likelihood phylogeny estimation for the whole autosomal genome data.
Format: Newick
File: chromX.concat.support.nwk
Description: Tree topology with support values for the best-fitting phylogenetic estimation based on the X chromosome concatenated data.
Format: Newick
File: chromY.concat.support.nwk
Description: Tree topology with support values for the best-fitting phylogenetic estimation based on the Y chromosome concatenated data.
Format: Newick
3. Consensus Data Trees (ASTRAL)
Consensus Trees for every windows-based analysis
Files:
100k_Y_chromosome_astral_consensus.nwk
100k_X_chromosome_astral_consensus.nwk
100k_autosomal_astral_consensus.nwk
50kb_autosomal_astral_consensus.nwk
50k_X_chromosome_astral_consensus.nwk
50k_autosomal_astral_consensus.nwk
1mb_X_chromosome_astral_consensus.nwk
1mb_autosomal_astral_consensus.nwk
Format: nwk
4. Data Tables:
File: 1mbp_windows_autosomal results_final.txt
Description: 1mbp genomic window results including the following information: ML trees with support values, window start and end position in the chromosome, D-statistics, Z-values, p-values, ABBA, BABA, BBAA for Dsuite analysis, monophyly test results (TRUE or FALSE) based on ocean basin origin, Adjusted_P_Value.
Format: Tab-delimited
File: 100kbp_windows_results_info.txt
Description: 100kbp genomic window results including the following information: ML trees with support values, window start and end position in the chromosome, D-statistics, Z-values, p-values, ABBA, BABA, BBAA for Dsuite analysis, monophyly test results (TRUE or FALSE) based on ocean basin origin, Adjusted_P_Value.
Format: Tab-delimited
File: 50kbp_windows_results_info.txt
Description: ML trees with support values, window start and end position in the chromosome, D-statistics, Z-values, p-values, ABBA, BABA, BBAA for Dsuite analysis, Adjusted_P_Value.
Format: Tab-delimited
5. Abbreviations:
SOU: Southern Ocean
NAT: North Atlantic
NOP: North Pacific
ANT: Antarctica
NAT: North Atlantic
ENP: Eastern North Pacific
MNG: Megaptera novaeangliae
ACCESS INFORMATION
Published genomes used in this study are available on:
[https://www.ncbi.nlm.nih.gov/bioproject/PRJNA74029](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA74029
[https://datadryad.org/stash/dataset/doi:10.5061/dryad.qt528n0](https://datadryad.org/stash/dataset/doi:10.5061/dryad.qt528n0
[https://www.ncbi.nlm.nih.gov/sra/SRX323050](https://www.ncbi.nlm.nih.gov/sra/SRX323050
SCRIPTS AND CODE
Scripts are available on :
https://github.com/fabriciofurni/phylo_fin/tree/main}