Genomic and phenotypic divergence‐with‐gene‐flow across an ecological and elevational gradient in a neotropical bird
Rodríguez-Cajarville, María José et al. (2022), Genomic and phenotypic divergence‐with‐gene‐flow across an ecological and elevational gradient in a neotropical bird, Dryad, Dataset, https://doi.org/10.5061/dryad.c866t1g8h
Aim: Along with environmental gradients, some species show significant differences in morphological, ecological-related traits. Those differences are commonly related to past events of allopatry but, alternatively, could be caused by natural selection in the presence of gene flow. We aimed to explore the prevalence of the divergence-with-gene-flow model across the Chaco-Andes dry forest belt, testing competing models of evolution in a Neotropical bird.
Location: Central Andes Mountain range and Chaco region of Argentina and Bolivia.
Taxon: Phytotoma rutila (Aves, Cotingidae).
Methods: We studied ddRADseq loci (4,893 SNPs) of 21 tissue samples and body size variation of 146 specimens. We evaluated population genetic structure and tested the effects of altitude and distance on genomic divergence. To evaluate allopatry and divergence-with-gene-flow, we compared the divergence on phenotypic traits (bill, tarsus, and wing measurements) versus neutral genomic variation, conducted coalescent analyses to estimate gene flow and divergence time among populations, and calculated relative (FST) versus absolute (DXY) genomic divergence.
Results: a) there is a genomic and phenotypic differentiation in P. rutila matched the highland-lowland axis, where the altitude variation explains genomic variation; b) A larger phenotypic than neutral genomic variation was found. c) there is an asymmetric gene flow between populations; d) a pattern of relative and absolute genomic differentiation compatible with divergence-with-gene-flow.
Main conclusions: The mechanism behind the morphological and genomic diversification along the Chaco-Andes dry forest belt in P. rutila is divergence‐with‐gene‐flow. Far more complex than we traditionally thought, diversification in South America implicates gene flow between populations and also natural selection along with the environmental gradients, as well as vicariance, contrasting with the idea of tropical speciation primarily based on allopatric models.
We sampled tissue of 23 Phytotoma rutila (Table S1.1) collected through the species’ whole breeding and altitudinal range (Fig. 2a). We purified genomic DNA from muscle or blood with the DNeasy Blood & Tissue purification kit (Qiagen). We sampled genomic markers by double-digest restriction site-associated DNA sequencing (ddRADseq, (Peterson, Weber, Kay, Fisher, & Hoekstra, 2012) following the approach outlined by Peterson et al. (Peterson et al., 2012) and described in (Thrasher, Butcher, Campagna, Webster, & Lovette, 2018). Briefly, we digested samples with SbfI and MspI (New England Biolabs, MA) and ligated the digested DNA to adapters on both the 5’ and 3’ ends that allow multiplexing. The 5’ adaptors included barcodes and were unique to each sample, while the 3’ barcode was common to groups of 20 samples. We pooled samples with unique 3’ barcodes and selected DNA fragments in the 400-700 bp range. The libraries were enriched, and the TruSeq adapters were incorporated by performing 13 cycles of PCR. Finally, libraries were combined in equimolar proportions and sequenced on an Illumina HiSeq 2500 lane at the Cornell University Institute for Biotechnology, obtaining single-end 101 bp sequences.
We demultiplexed, trimmed, filtered reads, assembled loci, and called single nucleotide polymorphism (SNPs) with ipyrad 0.7.28 (Eaton & Overcast, 2020). Parameters of the ipyrad pipeline and further SNP filtering using VCFtools (Danecek et al., 2011) are in Table S1.2. We used the first dataset of 23 samples for exploratory analyses and the second dataset of 21 samples (see below) in the downstream analyses. The ipyrad outfiles of the 23 samples dataset were used for the exploratory analyses such as STRUCTURE, maximum likelihood reconstruction, and PCA. Then, we removed two samples with excessive missing data (> 78%, MACN-Or-Ct 4079 and MACN-Or-Ct 2766) and re-run ipyrad over the remaining 21 individuals. We proceeded with SNP filtering using VCFtools (Danecek et al., 2011),(Danecek et al., 2011) to further filtering of SNPs. Our criteria to obtain the final RADseq data-set are: (a) remove sites with a quality score less than 30; (b) exclude sites with more than 80% of missing data (N); (c) remove low-frequency SNPs setting a Minimum Allele frequency of 0.05; (d) keep only biallelic SNPs; (e) reduce the Maximum Depth coverage to 300x (mean depth coverage of SNP in our sample + 1 standard deviation), and Minimum Depth coverage of 6x. Then, we randomly selected one SNP per locus, obtaining a final data set of 4,893 unlinked SNP (n = 21 samples, maximum missing rate per individual of 31.6%, an average depth across all loci of 89.1). We used this final ddRADseq dataset for all the remaining genetic analyses, except for those previously stated. Also, we used .loci ipyrad outfile to carry out G-PhocS and DXY-FST analysis. This filter set was used to remove SNPs that might be the product of sequencing errors, as these can create biases in inferring demographic processes (Linck & Battey, 2019; O’Leary, Puritz, Willis, Hollenbeck, & Portnoy, 2018; Willis, Hollenbeck, Puritz, Gold, & Portnoy, 2017).
VCF_21ind_Prutila.vcf -> vcf with unlinked SNP used for all analyses that uses SNP data.
LOCI_21ind_Prutila.loci -> complete locus sequence (variant and invariant sites).
OUTLIERS_ID. txt -> list of outliers identified by pcadapt and Latent Factors Mixed Models.
Consejo Nacional de Investigaciones Científicas y Técnicas, Award: PIP 2015 0637, PIP 2020 3168, PUE 0098
Fondo para la Investigación Científica y Tecnológica, Award: PICT 2014 2154, PICT 2018 02689