Knowing the closest relatives of land plants is key to understanding the complex adaptations to terrestrial life. Unfortunately, multi-gene analyses yield highly incongruent results, suggesting for instance Charales, Zygnematales, or Coleochaete as the sister-group of land plants. Such controversy may result from the real history of life, in particular closely spaced speciation events, incomplete lineage sorting, gene duplication or horizontal gene transfer. In such cases, the solution resides in improved taxon sampling and sophisticated models of evolution. However, we will show that the quality of data used to infer the phylogeny may also play a major role, creating unnecessary controversy. In particular, the inclusion of contaminant sequences from other species, and of genes with incomplete taxon sampling explains a large part of the discrepancies observed between various studies. The use of a carefully checked and almost complete dataset suggests that land plants are closely related to a group composed of Zygnematales and Coleochaetales.
Figure-S3-54
Figures S3-54: Single gene phylogenies used to validate the contamination of the charophyte sequences detected by blast search.
FigS3-54-Final.pdf
Figure-S55-109
Figures S55-109: Single gene phylogenies used for the congruence test.
FigS55-109-Final.pdf
Table-S2
Table S2: Summary of the contaminations detected in the Finet et al. (2010) dataset.
Table_S2-Final.pdf
Table-S3
Table S3: Impact of taxon sampling on the GTR+G and CATGTR+G inferences.
Table_S3-Final.pdf
Complete-22360
NEXUS file: 40 taxa, 119 proteins, 22,360 unambiguously aligned amino acid positions, 11.8% of missing data.
Finet-99
NEXUS file: 77 ribosomal protein alignments of Finet et al., with 99 sequences deleted. 74 contaminant sequences were detected by the congruence test and a total of 99 sequences were removed because in 25 cases it was impossible to identify the correct sequence.
Not-Complete-43300
NEXUS file: 40 taxa, 164 proteins, 43,300 unambiguously aligned amino acid positions. Ribosomal dataset (11,571 positions and 4.7 % of missing data) and non-ribosomal dataset (31,729 positions and 24.1 % of missing data).
Table-S1
Table S1: Contaminations detected in alignments of Finet et al. (2010) by Blast search and by congruence test.
Table_S1-Final.pdf
164-single-gene-before-SCaFoS
NEXUS files: 164 orthologous genes before the use of SCaFoS (selection of sequences, species and genes).
FINET-sineContam-Trees
The trees (RAxML and PhyloBayes) are inferred from the cleaned dataset (Finet-99 NEXUS file).
PhyloBayes-Trees
20 PhyloBayes trees (CATGTR+G) obtained from five different partitions (missing data) with four different taxon samples.
Phylobayes-Trees.zip
RAxML-Trees
20 RAxML Best trees (GTR+G) + 20 Bipartitions (GTR+G) obtained from five different partitions (missing data) with four different taxon samples.