Skip to main content
Dryad logo

Plant pan-genomes are highly vulnerable to methodological considerations


Glick, Lior (2022), Plant pan-genomes are highly vulnerable to methodological considerations, Dryad, Dataset,


Pan-genomics is a promising approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data to a single reference genome, the construction of a pan-genome involves the direct comparison of multiple genomes to one another, thereby enabling the detection of nonreference genomic sequences and genes, as well as the analysis of gene content diversity. While multiple studies describing pan-genomes of various plant species have been published in recent years, the effect of the construction methodology is still poorly understood. Here we examine the effect of several key methodological factors on the obtained nonreference gene pool and on gene presence-absence detections by constructing and comparing multiple pan-genomes of Arabidopsis thaliana and cultivated soybean, as well as conducting a meta-analysis on published pan-genomes. These factors include the construction method, the sequencing depth, and the extent of input data used for gene annotation. We observe extreme differences between pan-genomes constructed using the two common procedures (De novo assembly and Map-to-pan), and that results are dependent on the extent of the input data. Specifically, we report low overlap between nonreference gene pools and low agreement in gene content inferences obtained using different procedures and input data. Our results raise questions regarding the reliability of recently published pan-genomes and indicate that current inferences should be treated with great caution.

This DRYAD data set contains all pan-genomes (A. thaliana and G. max) constructed as part of this study. See further details regarding the data in the DRYAD_README.txt file.

Usage Notes

Please see the DRYAD_README.txt file for details.