Describing, understanding and predicting the spatial distribution of genetic diversity is a central issue in biological sciences. In river landscapes, it is generally predicted that neutral genetic diversity should increase downstream, but there have been few attempts to test and validate this assumption across taxonomic groups. Moreover, it is still unclear what are the evolutionary processes that may generate this apparent spatial pattern of diversity. Here, we quantitatively synthesized published results from diverse taxa living in river ecosystems, and we performed a meta-analysis to show that a downstream increase in intraspecific genetic diversity (DIGD) actually constitutes a general spatial pattern of biodiversity that is repeatable across taxa. We further demonstrated that DIGD was stronger for strictly waterborne dispersing than for overland dispersing species. However, for a restricted data set focusing on fishes, there was no evidence that DIGD was related to particular species traits. We then searched for general processes underlying DIGD by simulating genetic data in dendritic-like river systems. Simulations revealed that the three processes we considered (downstream-biased dispersal, increase in habitat availability downstream and upstream-directed colonization) might generate DIGD. Using random forest models, we identified from simulations a set of highly informative summary statistics allowing discriminating among the processes causing DIGD. Finally, combining these discriminant statistics and approximate Bayesian computations on a set of twelve empirical case studies, we hypothesized that DIGD were most likely due to the interaction of two of these three processes and that contrary to expectation, they were not solely caused by downstream-biased dispersal.
Simulated data from the gene-flow model
Parameter values and summary statistics for simulations generated under the gene-flow model
Data_gene-flow_model.txt
Simulated data from the habitat availability model
Parameter values and summary statistics for simulations generated under the habitat availability model
Data_habitat-availability_model.txt
Simulated data from the colonization model
Parameter values and summary statistics for simulations generated under the colonization model
Data_colonization_model.txt
Simulated data from the gene-flow / habitat model
Parameter values and summary statistics for simulations generated under the gene-flow / habitat model
Data_gene-flow-habitat_model.txt
Simulated data from the gene-flow / colonization model
Parameter values and summary statistics for simulations generated under the gene-flow / colonization model
Data_gene-flow-colonization_model.txt
Simulated data from the habitat / colonization model
Parameter values and summary statistics for simulations generated under the habitat / colonization model
Data_habitat-colonization_model.txt
Simulated data from the gene-flow / habitat / colonization model
Parameter values and summary statistics for simulations generated under the gene-flow / habitat / colonization model
Data_gene-flow-habitat-colonization_model.txt
Simulated data from the NULL model
Parameter values and summary statistics for simulations generated under the NULL model
Data_NULL_model.txt
Scripts and data for meta-analyses
Scripts and data we used for performing the meta-analyses
Scripts_meta-analyses_v2.zip
Scripts for simulating data under the eight models
This file contains the .est and .par input files used for simulating the genetic datasets used in this article. There are eight different couples of .est/.par files, each of them being associated to one of the eight models presented in the article. The event at 40,000 generations before present (i.e. all genes in the network were send back to an unique deme at this date, considering a backwards in time timeframe), was used to uniformize coalescence times across simulations and models. Please, check the readme file for additional guidance.
Scripts_par_est_MODELS.zip
Demes IDs, and equivalences article vs. scripts
(i) Figure representing the ID of demes in function of their spatial positioning in the network. This figure will help readers to assess to which deme corresponds each summary statistic provided in the simulated datasets shared in DRYAD. (ii) Table reporting the equivalences between the name of the parameters we used in the article and those we used in the .est and .par scripts we share in DRYAD (i.e., the scripts that were used to simulate genetic data with ABCSampler and SIMCOAL 2). We also report the description of the parameters and the prior parameter values we used.
Demes_and_parameter_equivalences.docx