Supplementary Information for: A cautionary note on the use of genotype callers in phylogenomics
Duchen, Pablo; Salamin, Nicolas (2020), Supplementary Information for: A cautionary note on the use of genotype callers in phylogenomics, Dryad, Dataset, https://doi.org/10.5061/dryad.fn2z34ts3
Next-generation-sequencing genotype callers are commonly used in studies to call variants from newly-sequenced species. However, due to the current availability of genomic resources, it is still common practice to use only one reference genome for a given genus, or even one reference for an entire clade of a higher taxon. The problem with traditional genotype callers, such as the one from GATK, is that they are optimized for variant calling at the population level. However, when these callers are used at the phylogenetic level, the consequences for downstream analyses can be substantial. Here, we performed simulations to compare the performance between the genotype callers of GATK and ATLAS, and present their differences at various phylogenetic scales. We show that the genotype caller of GATK substantially underestimates the number of variants at the phylogenetic level, but not at the population level. We also found that the accuracy of heterozygote calls declines with increasing distance to the reference genome. We quantified this decline, and found that it is very sharp in GATK, while ATLAS maintains a high accuracy even at moderately-divergent species from the reference. We further suggest that efforts should be taken towards acquiring more reference genomes per species, before pursuing high-scale phylogenomic studies.