MicrosatNavigator: Exploring nonrandom distribution and lineage-specificity of microsatellite repeat motifs on vertebrate sex chromosomes across 186 whole genomes

Rasoarahona, Ryan 1 ; Srikulnath, Kornsorn 1

Published Aug 26, 2025 on Dryad. https://doi.org/10.5061/dryad.qz612jmm7

Data files

Aug 26, 2025 version files 6.17 MB

README.md

2.86 KB
Supplementary_File_1_GC_Category_Python_script.zip

636 B
Supplementary_File_2_Simulated_fasta_files.zip

384.15 KB
Supplementary_table_1–14.zip

5.78 MB

Abstract

Microsatellites are short tandem DNA repeats, ubiquitous in genomes. They are believed to be under selection pressure, considering their high distribution and abundance beyond chance or random accumulation. However, limited analysis of microsatellites in single taxonomic groups makes it challenging to understand their evolutionary significance across taxonomic boundaries. Despite abundant genomic information, microsatellites have been studied in limited contexts and within a few species, warranting an unbiased examination of their genome-wide distribution in distinct versus closely related-clades. Large-scale comparisons have revealed relevant trends, especially in vertebrates. Here, “MicrosatNavigator,” a new tool that allows quick and reliable investigation of perfect microsatellites in DNA sequences was developed. This tool can identify microsatellites across the entire genome sequences, considering their evolutionary variability and the simplicity of their evolutionary mechanisms. Using this tool, microsatellite repeat motifs in sex chromosomes of 186 vertebrates were identified. A significant positive correlation was noted between the abundance, density, length, and GC bias of microsatellites and specific lineages. The (AC)_n motif is the most prevalent in vertebrate genomes, showing distinct patterns in closely related species. Longer microsatellites were observed on sex chromosomes in birds and mammals but not on autosomes. Microsatellites on sex chromosomes of non-fish vertebrates have the lowest GC content, whereas high-GC microsatellites (≥50M% GC) are preferred in bony and cartilaginous fishes. Thus, similar selective forces and mutational processes may constrain GC-rich microsatellites to different clades. These findings should facilitate investigations into the roles of microsatellites in sex chromosome differentiation and provide candidate microsatellites for functional analysis across the vertebrate evolutionary spectrum.

Dataset DOI: 10.5061/dryad.qz612jmm7

Description of the data and file structure

The dataset contains all density and abundance data about the distribution of microsatellites in 186 whole genome sequences. Fasta sequence was collected from public databases (NCBI, DDBJ, ENA) and then processed by MicrosatNavigator. The output microsatellite data has been organized and structured in various TSV and Excel files.

Description of the data and file structure

The dataset contains:

File S1: Supplementary_File_1_GC_Category_Python_script.zip: Python script allowing to categorize microsatellite category
File S2: Supplementary_File_2_Simulated_fasta_files.zip: Simulated FASTA dataset used for benchmark testing of MicrosatNavigator

Supplementary_table_1–14.zip:

Table S1: List of genome assemblies that can be downloaded from public data repositories, such as NCBI, DDBJ, and ENA. These assemblies have chromosome-level information at the least
Table S2: Relative abundance and relative density of microsatellite motif repeats
Table S3: Relative density and fold difference of microsatellite motif repeats in autosomes and sex chromosomes
Table S4: Summary of the top-five most significant microsatellites in the X, Y, Z, W, and autosomes
Table S5: Assessment of accuracy and number of reported microsatellites in five simulated datasets
Table S6: Comparison of pairs of the software's error rate using Tukey's Honest Significance Test
Table S7: Comparison of the efficiency of MicrosatNavigator, KMER-SSR, PERF, and MISA for representative vertebrate genomes
Table S8: The distribution of microsatellite repeat motif classes (mono-, di-, tri-, tetra-, penta-, and hexanucleotide) across 186 vertebrate genomes
Table S9: Summary of the most common microsatellite repeat motifs in each class (mono-, di-, tri-, tetra-, penta-, and hexanucleotide)
Table S10: Comparison of length differences among microsatellite repeat motif classes (mono-, di-, tri-, tetra-, penta-, and hexanucleotide) using Tukey’s Honest significance test
Table S11: The length of individual microsatellites across 186 vertebrate genomes
Table S12: Differences in the length of individual microsatellites between autosomes and sex chromosomes analyzed using Tukey’s Honest Significance Test
Table S13: The relative abundance of each microsatellite GC% category in autosomes and sex chromosomes
Table S14: The significance of the preference of each vertebrate lineage for a specific GC% category of microsatellite assessed using Tukey’s Honest test

Code/Software

Microsatellite distribution data has been generated by MicrosatNavigator and then processed by the Python script

MicrosatNavigator: Exploring nonrandom distribution and lineage-specificity of microsatellite repeat motifs on vertebrate sex chromosomes across 186 whole genomes

Data files

Abstract

README: MicrosatNavigator: Exploring nonrandom distribution and lineage-specificity of microsatellite repeat motifs on vertebrate sex chromosomes across 186 whole genomes

Description of the data and file structure

Description of the data and file structure

Code/Software

Works referencing this dataset