Data from: Frequent allopolyploidy with distant progenitors in the moss genera Physcomitrium and Entosthodon (Funariaceae) identified via subgenome phasing of targeted nuclear genes
Data files
Sep 19, 2023 version files 38.17 MB
-
best.models.iqtree.txt
3.98 KB
-
collenchymatum_4780.fasta
2.46 KB
-
collenchymatum_5274_plastid_rps4_psba.fasta
1.24 KB
-
Collenchymatum_7379.fasta
1.92 KB
-
Folder_1_ALIGNMENTS.zip
1.17 MB
-
Folder_2_trimal_50genes.zip
992.03 KB
-
Folder_3_GENE_TREES.zip
99.41 KB
-
Folder_4_physco.50genes.zip
9.30 MB
-
Folder_5_phased_subgenomes.zip
7.06 MB
-
Folder_6_rerooted.zip
97.26 KB
-
Folder_7_treeshrink.zip
228.12 KB
-
Folder_8_Phyparts.zip
47.74 KB
-
Folder_9_triploid_homologizer.zip
19.13 MB
-
Funariaceae4loci_reduced_5274.nex.con.tre
25.21 KB
-
homologizing.sh
2.04 KB
-
README.md
5.15 KB
-
README.txt
1.72 KB
Abstract
Polyploids represent a new frontier in species discovery among embryophytes. Within mosses, polyploid discovery is challenged by low morphological complexity. The rapid expansion of sub-genome sequencing approaches in addition to computational approaches to identifying whole genome duplication using allelic variation among nuclear markers has allowed for increased polyploid discovery among mosses. We confirm the intergeneric hybrid nature of E. hungaricus, and the allopolyploid origin of P. eurystomum and of one population of P. collenchymatum. We also reveal that hybridization gave rise to P. immersum, as well as to yet unrecognized lineages sharing the phenotype of P. pyriforme, P. sphaericum and P. collenchymatum. Our findings and methods demonstrate the utility of a novel approach to allele phasing and subgenome assignment, called homologizer, when working with polyploid genomes, and its value in identifying progenitor species using target capture data.
https://doi.org/10.5061/dryad.0p2ngf23f
Give a brief summary of dataset contents, contextualized in experimental procedures and results.
Description of the data and file structure
Raw target capture data is published in Sequence Read Archive (SRA) at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA674709
Folder 1: Alignments for each of 50 genes sampled for all taxa
including individual sequences for each phased allele labeled as h1
or h2
Folder 2: Trimmed alignments for each of 50 genes sampled for all taxa
including individual sequences for each phased allele labeled as h1
or h2
Folder 3: Gene trees for each of 50 genes sampled.
Each gene tree is generated using RAxML and rerooted using the Physcomitrellopsis clade as an outgroup.
Folder 4: Homologizer output
genelist.txt
is a list of the 50 genes used in the Homologizer analysisInitialPhase.rev
andPhasemoves.rev
are files used to direct Homologizer within the RevBayes scriptrevbayes_template.txt
physco.50genes_geneNN_phase.log
for each gene the Homologizer phase information output, one line per generation sampledphysco.50genes.log
RevBayes output for substitution model parameters, one line per generationphysco.50genes.trees
RevBayes phylogeny output for each generationphysco.50genes.tree
Maximum clade credibility tree summarized from RevBayes output with 10% discarded as burninphysco_genecopymap.csv
file used to generate Phase moves used by Revbayes
logfiles indicating each iteration of subgenome phasing for each of 50 genes indicated in genelist.txt.
Includes homologizing.sh batch file.
Folder 5: Phased subgenomes for each of 50 genes indicated as “h1” or “h2” for each allodiploid genome, respectively.
As determined by the Homologizer output from RevBayes if the posterior probability of phase was greater than 80%
Folder 6: rerooted homologizer trees
Gene trees with haploid samples and allopolyploids with phased subgenome sequences
Folder 7: Log files and output from TreeShrink.
Output trees collapse unrealistically long branches.
subgenomePhased_genetrees_rerooted_collapsed_RS_0.05.txt
contains a summary of removed sequences
Also contains node-by-node output from Phyparts on the ASTRAL species tree generated from TreeShrink gene trees, used to generate the PieCharts in pies.svg
Folder 8: Phyparts input
subgenomePhased_genetrees_rerooted_collapsed.tre
- gene trees collapsed to remove nodes with low bootstrap supportsubgenomePhased_astral_collapsed.tre
- astral tree from collapsed gene treessubgenomePhased_astral_rerooted.tre
- astral tree rerooted from the outgroup
PhyParts output can be found in Folder7
Folder 9: Triploid Homologizer
“Iteration Two” from the manuscript
RevBayes log files for 10 independent runs where two putatively triploid samples were coded as having three subgenomes while the subgenome phase from “Iteration one” (only allodiploids) were fixed.
The astral subfolder contains:
- gene trees where sequences are phased to subgenomes according to the RevBayes output
- astral tree inferred from these gene trees
- phyparts output summarizing bipartition conflict among gene trees
- results of the “minority report” for alternative bipartitions at node 15 and node 34 (see tree_nodes.pdf for numbered nodes)
Collenchymatum_7379.fasta - FASTA format Sanger sequence for three Physcomitrium collenchymatum accessions. Contains forward and reverse Sanger sequencing reads for approximately 360bp of gene 7379.
Collenchymatum_4780.fasta - FASTA format Sanger sequence for three Physcomitrium collenchymatum accessions. Contains forward and reverse Sanger sequencing reads for approximately 360bp of gene 4780.
Collenchymatum_5274_plastid_rps4_psba - FASTA format Sanger sequence for one Physcomitrium collenchymatum accession. Contains consensus sequences for plastid markers psba and rsp4.
Funariaceae4loci_reduced.nex.con.tre - Contains a bayesian phylogeny based on plastid markers incorporating Physcomitrium collenchymatum 5274 to discern the maternal genome position among the Funariaceae.
Code/Software
homologizing.sh - example command lines run to collect sequences and add polyploid samples to the existing alignments\, conduct subgenome phasing in homologizer\, recover phased subgenome sequences\, and infer ASTRAL species trees
best.models.iqtree.txt - list of best models inferred by BIC for each gene tree