Aquatic biotas of Sundaland and fragmented but not refugial

Delrieu-Trottin, Erwan1 ; Ben Chehida, Selim2 ; Sukmono, Tedjo3 ; Dahrrudin, Hadi4 ; Sholihah, Arni 5 ; Kustiati, Kustiati6 ; Fitriana, Yuli4; Muchlisin, Zainal Abidin7; Elvyra, Roza8 ; Wibowo, Arif4 ; Utama, Ilham Vemandra4 ; Nurhaman, Ujang4 ; Sauri, Sopian4 ; Risdawati, Renny9 ; Zein, Muhammad Syamsul Arifin4 ; Pouzadoux, Juliette10 ; Agnèse, Jean-François11 ; Tilak, Marie-Ka12 ; Page, Lawrence M.13 ; von Rintelen, Thomas14 ; Wowor, Daisy4 ; Steinke, Dirk15 ; Mona, Stefano 1 ; Rüber, Lukas 16 ; Hebert, Paul D.15 ; Hubert, Nicolas 11

Published Feb 07, 2025 on Dryad. https://doi.org/10.5061/dryad.3xsj3txrf

Abstract

Tropical insular systems have long attracted biologists, stimulating some important controversies in ecology and evolution. Eustatic fluctuations during the Pleistocene have been invoked to explain species dispersal and proliferation in these fragmented systems by controlling the extent of landmasses and their temporary connections. In ancient archipelagos, the Pleistocene represents only a small slice of their history so long-standing configurations might better explain insular diversity patterns. With a geological history of ca. 30 million years, the Sunda Shelf is old. Upon entering the Pleistocene, islands of the Sunda Shelf repeatedly separated and merged; however, recent reappraisals of its paleoenvironments and evolutionary dynamics have questioned their biogeographic significance. Based on the molecular inventory of six common freshwater fish families, we explored population fragmentation and demographic history of the most common species using mitochondrial DNA sequences. Species delimitation methods, applied to 1,062 sequences belonging to 37 species from 188 sites, detected 95 Molecular Operational Taxonomic Units (MOTUs). Among the nine most widespread species, the number of MOTUs ranged from 1 to 11 and correlated with time to the most recent common ancestor. Extended Bayesian Skyline Plots applied to mitogenomes and cytochrome c oxidase I sequence detected no variation in past effective population size within MOTUs, while hierarchical Approximate Bayesian Computation provided no evidence of congruent changes in effective population sizes. Fragmentation of an ancestral range is the most likely explanation for the rampant cryptic diversity observed, but demographic inferences do not support MOTUs as being refugial from an evolutionary perspective.

Combined use of mitogenome and cytochrome c oxidase I (COI) to explore the phylogeographic structure and past population demography of Sundaland freshwater fishes.

Description of the Data and file structure

Data are included as follows:

1: (in Related Works section) Excel spreadsheet including specimens colateral information (Systematic, sample ID, GenBank accession number, and geographic origin) and results of the species delimitation analyses (including species partitioning obtained with BIN, sPTP, mPTP, sGMYC, mGMYC, ABGD, and majority-rule consensus)
2: fasta file with the 1,062 COI sequences used for species delimitation analyses
3: alignment of COI sequences and mitogenomes of Anabantidae species used to reconstruct SpeciesTreeUCLN gene and species trees
4: alignment of COI sequences and mitogenomes of Barbodes species and allied used to reconstruct SpeciesTreeUCLN gene and species trees
5: alignment of COI sequences and mitogenomes of Channa species used to reconstruct SpeciesTreeUCLN gene and species trees
6: alignment of COI sequences and mitogenomes of Chitala species used to reconstruct SpeciesTreeUCLN gene and species trees
7: alignment of COI sequences and mitogenomes of Hampala species used to reconstruct SpeciesTreeUCLN gene and species trees
8: alignment of COI sequences and mitogenomes of Monopterus species used to reconstruct SpeciesTreeUCLN gene and species trees
9: alignment of COI sequences and mitogenomes of Parachela species and allied used to reconstruct SpeciesTreeUCLN gene and species trees
10: alignment of COI sequences and mitogenomes of Trigonopoma species used to reconstruct SpeciesTreeUCLN gene and species trees
11: BEAST xml file used to reconstruct SpeciesTreeUCLN gene and species trees of Anabantidae species with COI sequences and mitogenomes
12: BEAST xml file used to reconstruct SpeciesTreeUCLN gene and species trees of Barbodes species and allied with COI sequences and mitogenomes
13: BEAST xml file used to reconstruct SpeciesTreeUCLN gene and species trees of Channa species with COI sequences and mitogenomes
14: BEAST xml file used to reconstruct SpeciesTreeUCLN gene and species trees of Chitala species with COI sequences and mitogenomes
15: BEAST xml file used to reconstruct SpeciesTreeUCLN gene and species trees of Hampala species with COI sequences and mitogenomes
16: BEAST xml file used to reconstruct SpeciesTreeUCLN gene and species trees of Monopterus species with COI sequences and mitogenomes
17: BEAST xml file used to reconstruct SpeciesTreeUCLN gene and species trees of Parachela species with COI sequences and mitogenomes
18: BEAST xml file used to reconstruct SpeciesTreeUCLN gene and species trees of Trigonopoma species with COI sequences and mitogenomes
19: Anabantidae gene tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes
20: Barbodes species and allied gene tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes
21: Channa gene tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes
22: Chitala gene tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes
23: Hampala gene tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes
24: Monopterus gene tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes
25: Parachela gene tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes
26: Trigonopoma gene tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes
27: Anabantidae species tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes, and MOTUs according to the majority rule consensus of the species delimitation analyses
28: Barbodes species and allied species tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes, and MOTUs according to the majority rule consensus of the species delimitation analyses
29: Channa species tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes, and MOTUs according to the majority rule consensus of the species delimitation analyses
30: Chitala species tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes, and MOTUs according to the majority rule consensus of the species delimitation analyses
31: Hampala species tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes, and MOTUs according to the majority rule consensus of the species delimitation analyses
32: Monopterus species tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes, and MOTUs according to the majority rule consensus of the species delimitation analyses
33: Parachela species tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes, and MOTUs according to the majority rule consensus of the species delimitation analyses
34: Trigonopoma species tree reconstructed with SpeciesTreeUCLN based on COI sequences and mitogenomes, and MOTUs according to the majority rule consensus of the species delimitation analyses
35: alignment of COI sequences and mitogenomes of Anabas testudineus used to perform EBSP analysis
36: alignment of COI sequences and mitogenomes of Barbodes binotatus used to perform EBSP analysis
37: alignment of COI sequences and mitogenomes of Channa striata used to perform EBSP analysis
38: alignment of COI sequences and mitogenomes of Chitala lopis used to perform EBSP analysis
39: alignment of COI sequences and mitogenomes of Hampala macrolepidota used to perform EBSP analysis
40: alignment of COI sequences and mitogenomes of Monopterus albus used to perform EBSP analysis
41: alignment of COI sequences and mitogenomes of Parachela hypophthalmus used to perform EBSP analysis
42: alignment of COI sequences and mitogenomes of Trichopodus trichopterus used to perform EBSP analysis
43: alignment of COI sequences and mitogenomes of Trigonopoma pauciperforatum used to perform EBSP analysis
44: BEAST xml file used to perform EBSP analysis for Anabas testudineus with COI sequences and mitogenomes
45: BEAST xml file used to perform EBSP analysis for Barbodes binotatus with COI sequences and mitogenomes
46: BEAST xml file used to perform EBSP analysis for Channa striata with COI sequences and mitogenomes
47: BEAST xml file used to perform EBSP analysis for Chitala lopis with COI sequences and mitogenomes
48: BEAST xml file used to perform EBSP analysis for Hampala macrolepidota with COI sequences and mitogenomes
49: BEAST xml file used to perform EBSP analysis for Monopterus albus with COI sequences and mitogenomes
50: BEAST xml file used to perform EBSP analysis for Parachela hypophthalmus with COI sequences and mitogenomes
51: BEAST xml file used to perform EBSP analysis for Trichopodus trichopterus with COI sequences and mitogenomes
52: BEAST xml file used to perform EBSP analysis for Trigonopoma pauciperforatum with COI sequences and mitogenomes
53: Input file of hABC analysis including MOTU label (dataset), corresponding nominal species, number of individuals (n.ind), number of base pairs analyzed (n.sites), nucleotide diversity, genetic diversity (theta.s), Tajima D score (Tajima D pegas), species generation time, mutation rate (substitution_rate * gen) and number of segregating sites (n.segregating.sites)
54: (in Related Works section) Supplementary material cited in the manuscript

Sharing/Access Information

Links to other publicly accessible locations of the data: in BOLD as dx.doi.org/10.5883/DS-BIFCOPY

Sequencing

Total genomic DNA was extracted from each specimen using QIAGEN DNeasy 96 tissue extraction kits while following the manufacturer's protocol. For the 1,062 fish specimens corresponding to 37 species, a 652-bp segment from the 5′ region of the cytochrome c oxidase I gene was amplified using C_FishF1t1/C_FishR1t1 primers including M13 tails [1]. PCR amplifications were performed using an ABI Veriti 96-well Fast thermocycler (Applied Biosystems) in a final reaction volume of 10.0 μl containing 5.0 μl Buffer 2X, 3.3 μl of ultrapure water, 1.0 μl of each primer (10μM), 0.2 μl of Phire® Hot Start II DNA polymerase enzyme (5U), and 0.5 μl of DNA template (~50 ng). Amplifications employed the following thermocycling regime: initial denaturation at 98 °C for 5 min followed by 30 cycles of denaturation at 98 °C for 5 s, annealing at 56 °C for 20 s and extension at 72 °C for 30 s, followed by a final extension step at 72 °C for 5 min. The PCR products were purified with ExoSap-IT®^TM (USB Corporation, Cleveland, OH, USA) and sequenced in both directions. Sequencing reactions were performed using “BigDye® Terminator v3.1 Cycle Sequencing Ready Reaction” reagents, and sequencing was performed on an ABI 3130 DNA Analyzer or an ABI 3730XL DNA Analyzer (Applied Biosystems, Waltham, MA, USA). Sequences and related information (photographs, voucher collections number, and collections data) were deposited in BOLD [2] and the National Center for Biotechnology Information’s GenBank database.

Genomic libraries for mitogenome skimming [3,4] were prepared for 94 individuals belonging to the nine nominal species selected for investigation following the protocol developed by Tilak et al. (2015). Genomic DNA was physically fragmented using ultrasound (35 kHz) for a duration varying between 10 and 20 min. We followed the Illumina library preparation procedure with blunt-end repair, adapter ligation, adapter fill-in, and indexing PCR steps (13 cycles) developed by Meyer and Kircher (2010). Each step was followed by a purification using SPRI bead suspensions (Agencourt® AMpure® XP), adding 1.7 volume of Agencourt® AMpure® XP reagent per volume of sample, and eluted in 25µl of ultra-pure water. The DNA libraries were quantified with a Nanodrop ND-800 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). Montpellier Genomix platform (MGX, Montpellier, France) performed sequencing. In brief, indexed libraries were pooled using their relative concentrations to ensure equimolarity and a single pool was subjected to single-read sequencing (150-bp long reads) on an Illumina HiSeq 2500 lane. Mitogenomes were assembled using reference mitogenomes of each species when available in GenBank (KJ808811 for Anabas testudineus, NC_034755 for Barbodes binotatus, NC_032037 Channa striata, NC_012711 for Chitala lopis, KF670818 for Hampala macrolepidota, KP100265 for Trichopodus trichopterus) or close relative (Monopterus albus (NC_003192) for M. javanensis and a Metzia longinasus (KF955011) for Parachela hypophtalmus) using Geneious Mapper tool with default parameters of Geneious 9.0.5 [7] and annotated using the online MitoAnnotator tool [8] available at https://mitofish.aori.u-tokyo.ac.jp.

Genetic species delimitations and phylogenetic reconstructions

The inventory of MOTUs, defined as diagnosable mitochondrial lineages, was conducted by applying six methods of species delimitation to the COI data set (1,062 fish specimens corresponding to 37 nominal species). The following algorithms were used: (1) Refined Single Linkage (RESL), implemented in BOLD to produce Barcode Index Numbers (BINs; Ratnasingham and Hebert 2013); (2) Automatic Barcode Gap Discovery (ABGD; Puillandre et al. 2012) available at https://bioinfo.mnhn.fr/abi/public/abgd/; (3) Poisson Tree Process (PTP, Zhang et al. 2013) in its single-threshold (sPTP), (4) multiple-threshold versions (mPTP) available at https://mptp.h-its.org/; and (5) the Generalized Mixed Yule-Coalescent (GMYC) in its single (sGMYC) and (6) multiple rate version (mGMYC) as implemented in the R package Splits v 1.0-19 [12] in R v 4.2.1 [13]. The final delimitation scheme used to delineate MOTUs was established following a majority-rule consensus among the six methods [14,15].

Both RESL and ABGD use DNA alignments as input. For PTP analyses, a maximum likelihood (ML) tree was generated using IQ‐TREE v 1.6.12 [16] while using the most-likely substitution model (TPM2 + I+ Γ) according to ModelFinder [17] based on the Bayesian Information Criterion (BIC), as available at http://iqtree.cibiv.univie.ac.at [18]. Finally, the ultrametric and fully resolved tree required by GMYC analyses was reconstructed using the Bayesian approach implemented in BEAST 2.4.8 [19]. We ran BEAST with two Markov chains, each with a chain length of 50 million generations while using a Yule pure birth model tree prior, a strict-clock model with a substitution rate of 1.2% genetic distance per million years (Myr; Bermingham et al. 1997), and a TPM2 + I + Γ substitution model. Trees were sampled every 10,000 states after an initial burn-in period of 10 million generations. The results of the two runs were combined using LogCombiner 2.4.8, and the maximum credibility tree with Common Ancestor heights was established using TreeAnnotator 2.4.7 [19]. Sequences were collapsed into haplotypes prior to Bayesian inference analyses.

Once MOTUs were delimited, DNA barcodes and mitogenomes were used to reconstruct phylogenetic trees with the StarBEAST2 package [21] in BEAST 2.4.8. As StarBEAST2 jointly reconstructs gene trees and MOTU trees, we designated MOTUs in StarBEAST2 based on the majority-rule consensus of the lineage delimitation analyses. Analyses were conducted using a single partition including protein-coding regions, a GTR + I + Γ substitution model, an Uncorrelated Log-Normal (UCLN) species tree model, a relaxed lognormal molecular clock to account for varying rates among lineages, and an MCMC chain length of 50 million steps. A substitution rate of 1.2 % per Myr was applied for the final reconstructions [20], as it provided age estimates consistent with the geology of the area in other freshwater fish groups [22–27]. Independent runs were combined using LogCombiner 1.10.4 (Bouckaert et al. 2014). Gene and MOTUs maximum clade credibility trees, age estimates, and 95% highest posterior density (HPD) intervals were summarized using TreeAnnotator 1.10.4 [19].

We examined the relationship (strength and direction) between the TMRCA of each MOTU and its geographic range by first performing a Kendall’s nonparametric τ test between TMRCA and the area of geographic range of MOTUs; then by performing a Wilcoxon – Mann – Whitney test to assess if MOTUs that diverged during the Pleistocene possess greater geographic ranges than MOTUs that diverged before the Pleistocene. The area of the geographic range occupied by each MOTU was estimated using convex hulls, which are the smallest convex envelopes that contain all occurrences of a given MOTU. Convex hulls were computed using the ‘CHullAreaEarth’ function of the GeoRange R package v 0.1.0 [28] and visualized using ggplot2 v 3.5.1 [29] and ggConvexHull v 0.1.0 [30] in R v 4.2.1 [13]. When a MOTU was only observed at a single site, a conservative range area of 500 km² was adopted. Finally, comparisons were also performed between the TMCA of each MOTU and the number of islands where it occurs.

Genetic diversity and species demographic inferences

Haplotype diversity (h), nucleotide diversity (p), number of segregating sites (s), and Tajima’s D were calculated in R [31] using the packages pegas (h, p and Tajima’s D Paradis 2010) and popgenome (s, Pfeifer et al. 2014). We investigated potential drivers of genetic diversity by performing multiple correlation tests using Kendall’s ranked correlation between diversity indices (number of MOTUs, h, and p) and species maximum standard length, as a proxy of dispersal ability [34, 35], generation time, and time to the most recent common ancestors (TMRCAs) in R. Generation time and maximum standard length were extracted from Fishbase [36].

Past population demography was explored using the EBSP method in BEAST v 2.4.8 [37]. The EBSP is a non-parametric model that estimates changes in effective population size (N_e) through time without specifying any prior hypothesis on the tempo and mode of demographic changes. Reductions in Ne may happen as a consequence of fragmentation and genetic structuring, reducing the combination of possible breeding events, or a reduction in census population size [38–42]. Because the demographic signature of fragmentation is more likely to be observed at the scale of species range, while fluctuating population size is expected to impact panmictic subpopulations, we performed EBSP for species and MOTUs to disentangle the relative impact of fragmentation and potential reduction in population size on Ne. EBSP was performed with mitochondrial genomes and COI sequences, but we also performed Bayesian Skyline Plots (BSPs) [43] on COI sequences for each species to evaluate the potential effect of missing data on EBSP analyses. Because all mitochondrial genes are linked, we applied the same genealogy and molecular clock rate to the 13 protein-coding genes. However, the EBSP framework allowed us to set different substitution models for each gene. We applied the best model of the evolution of each gene retrieved with ModelFinder available on the IQ-TREE web server (http://iqtree.cibiv.univie.ac.at) using the BIC in BEAST v 2.4.8. A rate of 1.2 % sequence divergence per million years was used to set a relaxed lognormal clock for each analysis.

Community demographic inferences

We performed a hierarchical approximate Bayesian computation (hABC) analysis with the goal of detecting potential concerted demographic changes across MOTUs [44]. This method makes it possible to combine datasets from different species to estimate if and when changes in population size occurred, whether they were synchronous or not, and the timing of such congruent demographic changes [44–46]. The hABC approach assumes that if different species responded similarly to environmental perturbations, the true demographic history should be more easily detected by exploiting comparative data rather than by considering each species independently. Contrary to previous approaches which focused on the detection of the co-expansion of a group of species [44,46], here we built our null hypotheses on the basis of previous results which suggested that most of the MOTUs under investigation have been stable through time. In other words, we tested whether a group of MOTUs is best described by a constant-size demographic model, allowing the other MOTUs to vary in size according to a simple two-phase population model. Briefly, we first extracted the number of MOTUs ζ with constant effective population size from a uniform prior distribution bounded between one and n (total number of MOTUs). Each of these MOTUs was characterized by its own effective population size extracted from an independent prior distribution. If ζ < n, then species followed a one-step population size change. This model was characterized by three demographic parameters: an ancestral population of size N_anc changing instantaneously to N_mod at a given time t. The prior distribution for the hierarchical and MOTU-specific parameters is shown in Table S2.

We performed 1,000,000 coalescent simulations with fastsimcoal [47] and computed the mean, variance, skewness, and kurtosis of the distribution of mean pairwise differences (θ_Π), Watterson’s estimator of theta (θ_s), and Tajima’s D computed over simulated species. Local linear regressions were then applied to the best 5,000 simulations to estimate the posterior distribution of each parameter. We adopted a cross-validation strategy to test the power of our approach to detect groups of MOTUs with constant population size. To this end, we randomly simulated 1,000 datasets generated from prior distributions and re-analyzed them under the same hABC conditions. Finally, we performed a cross-validation experiment on a simulated dataset of 5000 loci, each with the same characteristics as the observed data. This is a power analysis that helps to understand the amount of genetic data that would be needed to correctly estimate the number ζ of species with constant population size through time.

The hABC and cross-validation analyses were run with three substitution rates to explore the sensitivity of our hABC inferences to the pace of molecular evolution. As information about mitochondrial mutation rates for our focal study species is lacking, estimates were derived from known substitution rates per million years and approximated by the product of generation time and substitution rates converted to years. Three substitution rates were used: (1) 1.2% per Myr following the calibration derived from the study of fish species pairs separated by closure of the Isthmus of Panama [20], (2) 2.4% per Myr, corresponding to the lower bound of an accelerated rate of substitution in recent time for fishes [48], (3) 7.2% per Myr, corresponding to the upper bound of an accelerated rate of substitution in recent time for fishes [48].

References

1. Ivanova N V, Zemlak TS, Hanner RH, Hébert PDN. 2007 Universal primers cocktails for fish DNA barcoding. Mol. Ecol. Notes 7, 544–548.

2. Ratnasingham S, Hebert PDN. 2007 BOLD: The Barcode of Life Data System (www.barcodinglife.org). Mol. Ecol. Notes 7, 355–364. (doi:DOI 10.1111/j.1471-8286.2006.01678.x)

3. Dodsworth S. 2015 Genome skimming for next-generation biodiversity analysis. Trends Plant Sci. 20, 525–527.

4. Straub SCK, Parks M, Weitemier K, Fishbein M, Cronn RC, Liston A. 2012 Navigating the tip of the genomic iceberg: Next‐generation sequencing for plant systematics. Am. J. Bot. 99, 349–364.

5. Tilak M-K, Justy F, Debiais-Thibaud M, Botero-Castro F, Delsuc F, Douzery EJP. 2015 A cost-effective straightforward protocol for shotgun Illumina libraries designed to assemble complete mitogenomes from non-model species. Conserv. Genet. Resour. 7, 37–40.

6. Meyer M, Kircher M. 2010 Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010, pdb-prot5448.

7. Drummond AJ et al. 2011 Geneious, version 5.4. Geneious, Auckland, New Zeal.

8. Iwasaki W et al. 2013 MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline. Mol. Biol. Evol. 30, 2531–2540.

9. Ratnasingham S, Hebert PDN. 2013 A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System. PLoS One 8. (doi:10.1371/journal.pone.0066213)

10. Puillandre N, Lambert A, Brouillet S, Achaz G. 2012 ABGD, Automatic Barcode Gap Discovery for primary species delimitation. Mol. Ecol. 21, 1864–1877.

11. Zhang J, Kapli P, Pavlidis P, Stamatakis A. 2013 A general species delimitation method with applications to phylogenetic placements. Bioinformatics 29, 2869–2876. (doi:10.1093/bioinformatics/btt499)

12. Fujisawa T, Barraclough TG. 2013 Delimiting species using single-locus data and the generalized mixed Yule coalescent approach: A revised method and evaluation on simulated data sets. Syst. Biol. 62.

13. R Core Team. 2022 R: A language and environment for statistical com- puting. doi: http:// www.R-project.org/.

14. Shen Y, Hubert N, Huang Y, Wang X, Gan X, Peng Z, He S. 2019 DNA barcoding the ichthyofauna of the Yangtze River: Insights from the molecular inventory of a mega-diverse temperate fauna. Mol. Ecol. Resour. 19. (doi:10.1111/1755-0998.12961)

15. Sholihah A et al. 2020 Disentangling the taxonomy of the subfamily Rasborinae (Cypriniformes, Danionidae) in Sundaland using DNA barcodes. Sci. Rep. 10. (doi:10.1038/s41598-020-59544-9)

16. Nguyen L-T, Schmidt HA, Von Haeseler A, Minh BQ. 2015 IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274.

17. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. 2017 ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589.

18. Trifinopoulos J, Nguyen L-T, von Haeseler A, Minh BQ. 2016 W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 44, W232–W235.

19. Bouckaert RR, Heled J, Kühnert D, Vaughan T, Wu C-H, Xie D, Suchard MA, Rambaut A, Drummond AJ. 2014 BEAST 2: A software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10, e1003537.

20. Bermingham E, McCafferty S, Martin AP. 1997 Fish biogeography and molecular clocks: perspectives from the Panamanian isthmus. In Molecular systematics of fishes (eds TD Kocher, CA Stepien), pp. 113–128. San Diego: CA Academic Press.

21. Ogilvie HA, Bouckaert RR, Drummond AJ. 2017 StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Mol. Biol. Evol. 34, 2101–2114. (doi:10.1093/molbev/msx126)

22. Jamonneau T et al. 2024 Jump dispersal drives the relationship between micro-and macroevolutionary dynamics in the Sicydiinae (Gobiiformes: Oxudercidae) of Sundaland and Wallacea. J. Evol. Biol. , voae017.

23. Utami CY, Sholihah A, Condamine FL, Thébaud C, Hubert N. 2022 Cryptic diversity impacts model selection and macroevolutionary inferences in diversification analyses. Proc. R. Soc. B 289, 20221335.

24. Arida E et al. 2021 Exploring the vertebrate fauna of the Bird’s Head Peninsula (Indonesia, West Papua) through DNA barcodes. Mol. Ecol. Resour. 21, 2369–2387.

25. Sholihah A et al. 2021 Limited dispersal and in situ diversification drive the evolutionary history of Rasborinae fishes in Sundaland. J. Biogeogr. 48, 2153–2173. (doi:10.1111/jbi.14141)

26. Sholihah A, Delrieu-Trottin E, Condamine FL, Wowor D, Rüber L, Pouyaud L, Agnèse JF, Hubert N. 2021 Impact of Pleistocene Eustatic Fluctuations on Evolutionary Dynamics in Southeast Asian Biodiversity Hotspots. Syst. Biol. 70, 940–960. (doi:10.1093/sysbio/syab006)

27. Delrieu‐Trottin E et al. 2020 Biodiversity inventory of the grey mullets (Actinopterygii: Mugilidae) of the Indo‐Australian Archipelago through the iterative use of DNA‐based species delimitation and specimen assignment methods. Evol. Appl. 13, 1451–1467.

28. Boyle J. 2017 GeoRange: calculating geographic range from occurrence data. R Packag. version 0.1. 0

29. Wickham H. 2016 ggplot2: elegant graphics for data analysis Springer-Verlag New York; 2009. Prepr.

30. Martin CA. 2017 ggConvexHull: Add a convex hull geom to ggplot2. R package version 0.1. 0.

31. R Core Team. 2017 R: A Language and Environment for Statistical Computing. R Found. Stat. Comput. Vienna, Austria. 0, {ISBN} 3-900051-07-0. (doi:http://www.R-project.org/)

32. Paradis E. 2010 pegas: an {R} package for population genetics with an integrated--modular approach. Bioinformatics 26, 419–420.

33. Pfeifer B, Wittelsbürger U, Ramos-Onsins SE, Lercher MJ. 2014 PopGenome: an efficient Swiss army knife for population genomic analyses in R. Mol. Biol. Evol. 31, 1929–1936.

34. Donati GFA et al. 2019 A process‐based model supports an association between dispersal and the prevalence of species traits in tropical reef fish assemblages. Ecography (Cop.). 42, 2095–2106.

35. Hubert N, Dettai A, Pruvost P, Cruaud C, Kulbicki M, Myers RF, Borsa P. 2017 Geography and life history traits account for the accumulation of cryptic diversity among Indo-West Pacific coral reef fishes. Mar. Ecol. Prog. Ser. 583, 179–193.

36. Froese R, Pauly D. 2023 Fishbase. Worldw. web Electron. Publ. www.fishbase.org, version. See http://www.fishbase.org.

37. Heled J, Drummond AJ. 2008 Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8, 1–15.

38. Maisano Delser P, Corrigan S, Hale M, Li C, Veuille M, Planes S, Naylor G, Mona S. 2016 Population genomics of C. melanopterus using target gene capture data: Demographic inferences and conservation perspectives. Sci. Rep. 6, 33753. (doi:10.1038/srep33753)

39. Mazet O, Rodríguez W, Grusea S, Boitard S, Chikhi L. 2016 On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference? Heredity (Edinb). 116, 362–371.

40. Peter BM, Wegmann D, Excoffier L. 2010 Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure. Mol. Ecol. 19, 4648–4660.

41. Heller R, BRÜNICHE‐OLSEN A, Siegismund HR. 2012 Cape buffalo mitogenomics reveals a Holocene shift in the African human–megafauna dynamics. Mol. Ecol. 21, 3947–3959.

42. Heller R, Chikhi L, Siegismund HR. 2013 The Confounding Effect of Population Structure on Bayesian Skyline Plot Inferences of Demographic History. PLoS One 8. (doi:10.1371/journal.pone.0062992)

43. Drummond AJ, Rambaut A, Shapiro B, Pybus OG. 2005 Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22, 1185–1192.

44. Chan YL, Schanzenbach D, Hickerson MJ. 2014 Detecting concerted demographic response across community assemblages using hierarchical approximate Bayesian computation. Mol. Biol. Evol. 31, 2501–2515. (doi:10.1093/molbev/msu187)

45. Burbrink FT, Chan YL, Myers EA, Ruane S, Smith BT, Hickerson MJ. 2016 Asynchronous demographic responses to Pleistocene climate change in Eastern Nearctic vertebrates. Ecol. Lett. 19, 1457–1467.

46. Delrieu-Trottin E, Hubert N, Giles EC, Chifflet-Belle P, Suwalski A, Neglia V, Rapu-Edmunds C, Mona S, Saenz-Agudelo P. 2020 Coping with Pleistocene climatic fluctuations: Demographic responses in remote endemic reef fishes. Mol. Ecol. 29, 2218–2233. (doi:10.1111/mec.15478)

47. Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M. 2013 Robust Demographic Inference from Genomic and SNP Data. PLoS Genet. 9. (doi:10.1371/journal.pgen.1003905)

48. Burridge CP, Craw D, Fletcher D, Waters JM. 2008 Geological dates and molecular rates: fish DNA sheds light on time dependency. Mol. Biol. Evol. 25, 624–633.

Aquatic biotas of Sundaland and fragmented but not refugial

Data files

Abstract

README: Aquatic biotas of Sundaland are fragmented but not refugial

Description of the Data and file structure

Sharing/Access Information

Methods