Skip to main content

Data from: Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants

Cite this dataset

Lopez, Lua; Barreiro, Rodolfo; Fischer, Markus; Koch, Marcus A. (2016). Data from: Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants [Dataset]. Dryad.


Background: Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest. Methods: Here, we explored the use of publicly available ESTs (GenBank at the National Center for Biotechnology Information-NCBI) for SSRs development in non-model plants, focusing on genera listed by the International Union for the Conservation of Nature (IUCN). We also search two model genera with fully annotated genomes for EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome distribution analyses. Overall, we downloaded 16 031 555 sequences for 258 plant genera which were mined for SSRsand their primers with the help of QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by blasting the sequences with SSR against the Oryza sativa and Arabidopsis thaliana reference genomes implemented in the Basal Local Alignment Tool (BLAST) of the NCBI website. Finally, we performed an empirical test to determine the performance of our EST-SSRs in a few individuals from four species of two eudicot genera, Trifolium and Centaurea. Results: We explored a total of 14 498 726 EST sequences from the dbEST database (NCBI) in 257 plant genera from the IUCN Red List. We identify a very large number (17 102) of ready-to-test EST-SSRs in most plant genera (193) at no cost. Overall, dinucleotide and trinucleotide repeats were the prevalent types but the abundance of the various types of repeat differed between taxonomic groups. Control genomes revealed that trinucleotide repeats were mostly located in coding regions while dinucleotide repeats were largely associated with untranslated regions. Our results from the empirical test revealed considerable amplification success and transferability between congenerics. Conclusions: The present work represents the first large-scale study developing SSRs by utilizing publicly accessible EST databases in threatened plants. Here we provide a very large number of ready-to-test EST-SSR (17 102) for 193 genera. The cross-species transferability suggests that the number of possible target species would be large. Since trinucleotide repeats are abundant and mainly linked to exons they might be useful in evolutionary and conservation studies. Altogether, our study highly supports the use of EST databases as an extremely affordable and fast alternative for SSR developing in threatened plants.

Usage notes