Skip to main content

Regionally divergent drivers of historical diversification in the late Quaternary in a widely distributed generalist species, the common pheasant Phasianus colchicus

Cite this dataset

Liu, Yang et al. (2020). Regionally divergent drivers of historical diversification in the late Quaternary in a widely distributed generalist species, the common pheasant Phasianus colchicus [Dataset]. Dryad.


Aim: Pleistocene climate and associated environmental changes have influenced phylogeographic patterns of many species. These not only depend on a species’ life history but also vary regionally. Consequently, populations of widespread species that occur in several biomes might display different evolutionary trajectories. We aimed to identify regional drivers of diversification in the common pheasant, a widely distributed ecological generalist. 

Study location: Asia

Taxon: Common pheasant Phasianus colchicus

Methods: Using a comprehensive geographic sampling of 204 individuals from the species’ entire range genotyped at seven nuclear and two mitochondrial loci, we reconstructed spatio-temporal diversification and demographic history of the common pheasant. We applied Bayesian phylogenetic inference to describe phylogeographic structure, generated a species tree, and inferred demographic history within and migration between lineages. Moreover, to establish a taxonomic framework, we conducted a species delimitation analysis.

Results: The common pheasant diversified during the late Pleistocene into eight distinct lineages. It originated at the edge of the Qinghai-Tibetan plateau and spread to East and Central Asia. Only the widely distributed lowland lineage of East Asia displayed recent range expansion. Greater phylogeographic structure was identified elsewhere, with lineages showing no sign of recent demographic changes. One lineage in south-central China is the result of long-term isolation within a climatically stable but topographically complex region. In lineages from arid Central Asia and China, range expansions were impeded by repeated population fragmentation during dry glacial periods and by recent aridification. 

Main conclusions: Spatio-temporal phylogeographic frameworks of widespread taxa such as the common pheasant provide valuable opportunities to identify divergent drivers of regional diversification. Our results suggest that diversification and population histories in the eight distinct evolutionary lineages were shaped by regionally variable effects of past climate and associated environmental changes. The evolutionary history of the common pheasant is best reflected by its being split into three species.


1 | Phylogeographic analyses

Sequence alignment was performed using MUSCLE in MEGA v6 (Tamura et al., 2013). Genetic variation was visualized with a median-joining haplotype network for each locus separately in PopART v4.8.4 (Leigh & Bryant, 2015). Neighbor-net phylogenetic networks based on nuclear loci were generated in SplitsTree v4.14.4 (Huson & Bryant, 2006)

A dated gene tree using the mtDNA dataset was reconstructed in BEAST v2.4.7 (Bouckaert et al., 2014) with linked tree models for both loci, but unlinked site models and clock rates. The approach of Li et al(2010) was followed to estimate locus specific substitution rates. We computed the overall mean of genetic distances at each locus among all samples including outgroup in MEGA with 1000 bootstrap replicates. Then the ratio of genetic distance between each locus and cyt b was calculated. Finally, we multiplied the resulting ratio with the average substitution rate of cyt b for Galliformes of 0.0119 substitutions/site/million year (Weir & Schluter, 2008) to estimate locus-specific rates. Inferred best-fitting models of nucleotide evolution at each locus were used as site models. A relaxed uncorrelated lognormal distribution was selected for the clock model. A lognormal distribution with standard deviation of 0.05 was used as prior distribution of the clock rates with different mean values (in real space) for the locus-specific substitution rates (Tab. S5). MCMC analyses were performed three times independently using a coalescent exponential tree model with 50 million iterations storing every 1000 generations. 

We moreover performed a maximum likelihood (ML) tree search based on the mtDNA dataset using RAxML v8.2.1 (Stamatakis, 2014) with independent GTRCAT models of nucleotide substitution for both loci and 1,000 bootstrap runs. Node with bootstrap values ≥ 70 were considered as strongly supported (Hillis & Bull, 1993).

            A dated species tree for the combined dataset was reconstructed using *BEAST in BEAST (Bouckaert et al., 2014) based on the two mtDNA loci and the alleles of six nuclear introns. ALDOB located on the Z chromosome was excluded as information on sex was not available for all individuals. Tree models were linked for the two mtDNA loci but unlinked for the nuclear introns, while site models and clock rates were unlinked for all loci. Based on the results of the mtDNA gene tree (see below), eight different lineages with corresponding samples were set as ‘taxa’. Clock models, locus-specific best-fitting models of nucleotide evolution (site models) and locus-specific substitution rates were implemented as above. We selected a Yule model for the species tree prior and a linear with constant root prior for the population function. Four independent chains were run with 1 billion generations stored every 0.2 million generations. Utilizing the same dataset and parameter settings, *BEAST was employed to reconstruct the colonization routes taken by the eight lineages. Coordinates representing the central locality of each lineage were implemented as a trait using the GEO_SPhERE and BEAST-CLASSIC packages (Lemey et al., 2009). Three independent chains were run for two billion generations and sampled every 0.1 million generations. We used the combined MCC tree (see below) with spherical geography in SPREAD v1.0.7 (Bielejec et al., 2011) to infer the colonization routes, which were displayed in Google-Earth v2 (Yamagishi et al., 2010).

For both BEAST and *BEAST analyses, Tracer v1.6 (Rambaut et al., 2013) was used to check for adequate effective sample sizes (ESS) of the posterior distribution of each parameter and to assess convergence among the independent runs. The runs were then combined using LogCombiner v2.4.7 (Bouckaert et al., 2014). A maximum clade credibility (MCC) tree was generated in TreeAnnotator v1.8.2 (Drummond & Rambaut, 2007) based on the combined posterior distributions, employing 10,000 randomly chosen trees in the BEAST analyses. Analyses were performed on the CIPRES Science Gateway (Miller et al., 2010). Trees were visualized in FigTree v1.4 (Rambaut, 2008) and DensiTree v2.0 (Bouckaert & Heled, 2014).

2 | Demographic and migration analyses

Changes in effective population size (Ne) through time in resulting evolutionary lineages (see below) were reconstructed with extended Bayesian Skyline Plots (EBSP) in BEAST using the combined dataset. Due to low sample sizes, no inference was performed for the tarimensis and formosanus groups. Again, tree models were linked for the two mtDNA loci but unlinked for the nuclear introns, while site models and clock rates were unlinked at all loci. The same priors were implemented as for substitution models and clock rates. A coalescent extended Bayesian Skyline tree prior was used. The MCMC analysis of each group was run for 100 million generations with sampling every 1000 generations. The parameter sum (indicators.alltrees) was checked to infer for the most likely number of demographic changes. Convergence and ESS value were assessed again in Tracer v1.6. Skyline plots were generated using R scripts available at:

Because EBSP analyses assume no gene flow post-divergence between lineages, we additionally inferred non-equilibrium scenarios between parapatric lineages based on an isolation-with-migration (IM) model. This permitted us to assess migration rates, effective population sizes and divergence times. We ran IM analyses on each pair of parapatric lineages/populations using IMa2 (Hey, 2010ba). Population string was defined based on the results of the phylogeographic analyses (see below). Locus-specific substitution rates (estimated as above), and their standard deviation, were used as mutation rates and their ranges. A HKY model was set as a substitution model. Generation time was set to 2 to adjust resulting population genetic parameters. We ran multiple simulations to test for appropriate upper bounds of migration rates, population sizes and times of population splits based on the results of the posterior distribution. Each analysis was executed for 10 million generations, storing every 1000 MCMC steps.

| Species delimitation

A species delimitation analysis was conducted in BPP v3.4 (Yang, 2015). We used the A10 model (speciesdelimitation=1, speciestree=0) and the species tree of the *BEAST analyses (see below) as a user-specified guide tree (Yang & Rannala, 2010Rannala & Yang, 2013) treating the eight distinct evolutionary lineages and their different combinations as potential species with equal prior probabilities. We tested three different sets of gamma priors for the population size (θ) and divergence time (τ) parameters: Γ(1, 1000), θ~Γ(1, 10) and τ~Γ (1, 1000), as well as θ~Γ (1, 1000)  and τ~Γ (1, 10). Substitution rates as estimated above were used as locus rates with both mtDNA loci treated as a single recombination unit. Three independent runs were performed at 100,000 generations with sampling every 10 samples, discarding the first 10,000 samples as burn-in. 


The National Science Foundation of China , Award: 31572251

The National Science Foundation of China, Award: 31572251