Skip to main content
Dryad

Data from: Museomics help resolving the phylogeny of snowfinches (Aves, Passeridae, Montifringilla and allies)

Cite this dataset

Islam, Safiqul et al. (2024). Data from: Museomics help resolving the phylogeny of snowfinches (Aves, Passeridae, Montifringilla and allies) [Dataset]. Dryad. https://doi.org/10.5061/dryad.m905qfv9k

Abstract

Historical specimens from museum collections provide a valuable source of material also from remote areas or regions of conflict that are not easily accessible to scientists today. With this study, we are providing a taxon-complete phylogeny of snowfinches using historical DNA from whole skins of an endemic species from Afghanistan, the Afghan snowfinch, Pyrgilauda theresae. To resolve the strong conflict between previous phylogenetic hypotheses, we generated novel mitogenome sequences for selected taxa and genome-wide SNP data using from ddRAD sequencing for all extant snowfinch species endemic to the Qinghai-Tibet Plateau (QTP) and for an extended intraspecific sampling of the sole Central and Western Palearctic snowfinch species (Montifringilla nivalis).

README: Data from: Museomics help resolving the phylogeny of snowfinches (Aves, Passeridae, Montifringilla and allies)

https://doi.org/10.5061/dryad.m905qfv9k

This data package includes original data from: Islam, Safiqul et al. (2024). Data from: Museomics help resolving the phylogeny of snowfinches (Aves, Passeridae, Montifringilla and allies). Molecular Phylogenetics and Evolution, 198, 108135. https://doi.org/10.1016/j.ympev.2024.108135

The package comprises sequence alignments (all in FASTA format aligned with MEGA v 10.1.8) and selected Bayesian tree files for a data set of genome-wide single-nucleotide polymorphisms (SNPs) inferred from ddRAD sequencing of 40 individuals of snowfinches (genera Montifringilla, Onychostruthus and Pyrgilauda), plus one sample of the rock sparrow (Petronia petronia). Specimen metadata are provided in supplementary Table S1 (Excel format). Accession numbers of sequence data used for analysis are also provided in the material list in „Islametal_supplementaryTable S1_sample_metadata.xlsx“: Cytochrome-b and mitogenome sequences deposited at GenBank; raw FASTQ files from ddRAD sequencing deposited at European Nucleotide Archive (ENA).

Each of the three treefiles represents a consensus treefile from three combined runs with BEAST, MCMC chain length: 50 million generations, trees sampled every 5000 generation, burnin: 30% of sampled trees; the SNP data sets were inferred from mapping to one of the three reference genomes (each with 20% missing data allowed): 1) Onychostruthus taczanowskii , 2) Pygrilauda ruficollis, 3) Passer domesticus.

Description of the data and file structure

Data files included in this package

Sample metadata, sequencing success:

Filename: Islametal_supplementaryTable S1_sample_metadata.xlsx

Table in Excel format providing information on the geographic origin of samples (country, region, locality; n/a= missing information), data availability for sequences (cytochrome-b and whole mitogenomes, GenBank accession numbers; n/a= sample not sequenced) and raw read data from ddRAD sequencing (FASTQ files deposited at European Nucleotide Archive [ENA]; n/a= sample sample not included in ddRAD sequencing); additional information on sequencing success (n/a= sample not included in ddRAD seq, i.e. not analyzed): absolute numbers of raw reads (total and of reads that passed the filter), average reading depth (in base pairs), numbers of reads mapped to each of the three reference genomes (Passer domesticus, Pyrgilauda ruficollis, Onychrostruthus tazcanowskii) and mapping rates for each of the three reference genomes (in %);

Mitochondrial data:

Filename: Islametal2024_snowfinches_mitogenomes_26ind_16964bp

Alignment of 26 mitogenome sequences (newly generated for this study: n= 7; GenBank sequences: n= 19) used for phylogenetic reconstructions; alignment length: 16,964 base pairs; FASTA format, aligned with MEGA v 10.1.8.

Filename: Islametal2024_snowfinches_cytochromeb_52ind

Alignment of 52 cytochrome-b sequences of snowfinches, rock sparrows and outgroup taxa used for phylogenetic reconstructions; alignment length 1,144 base pair; FASTA format, aligned with MEGA v 10.1.8.

Filename: Islametal2024_mitogenome_assembly

Annotated mitogenome assembly for three novel snowfinch mitogenomes and  three mitogenome sequences from GenBank for comparison, including aligned sequences of primer pairs used for long-range PCR and sequencing; alignment length: 16,965 base pairs; FASTA format, aligned with MEGA v 10.1.8.

Genome-wide SNP data:

Filename: Islametal_SNPalignment_Onychostruthus-reference_0%Missing

SNP alignment (FASTA) inferred from read mapping to the Onychostruthus tazcanowskii reference genome, 0% missing data allowed.

Filename: Islametal_SNPalignment_Onychostruthus-reference_10%Missing

SNP alignment (FASTA) inferred from read mapping to the Onychostruthus tazcanowskii reference genome, 10% missing data allowed.

Filename: Islametal_SNPalignment_Onychostruthus-reference_20%Missing

SNP alignment (FASTA) inferred from read mapping to the Onychostruthus tazcanowskii reference genome, 20% missing data allowed.

Filename: Islametal_SNPalignment_Onychostruthus-reference_30%Missing

SNP alignment (FASTA) inferred from read mapping to the Onychostruthus tazcanowskii reference genome, 30% missing data allowed.

Filename: Islametal_SNPalignment_Pyrgilauda-reference_0%Missing

SNP alignment (FASTA) inferred from read mapping to the Pyrgilauda ruficollis reference genome, 0% missing data allowed.

Filename: Islametal_SNPalignment_Pyrgilauda-reference_10%Missing

SNP alignment (FASTA) inferred from read mapping to the Pyrgilauda ruficollis reference genome, 10% missing data allowed.

Filename: Islametal_SNPalignment_Pyrgilauda-reference_20%Missing

SNP alignment (FASTA) inferred from read mapping to the Pyrgilauda ruficollis reference genome, 20% missing data allowed.

Filename: Islametal_SNPalignment_Pyrgilauda-reference_30%Missing

SNP alignment (FASTA) inferred from read mapping to the Pyrgilauda ruficollis reference genome, 30% missing data allowed.

Filename: Islametal_SNPalignment_Passer-reference-autosomes_0%Missing

SNP alignment (FASTA) inferred from read mapping to the house sparrow (Passer domesticus) reference genome, only autosomal SNP data, 0% missing data allowed.

Filename: Islametal_SNPalignment_Passer-reference-autosomes_10%Missing

SNP alignment (FASTA) inferred from read mapping to the house sparrow (Passer domesticus) reference genome, only autosomal SNP data, 10% missing data allowed.

Filename: Islametal_SNPalignment_Passer-reference-autosomes_20%Missing

SNP alignment (FASTA) inferred from read mapping to the house sparrow (Passer domesticus) reference genome, only autosomal SNP data, 20% missing data allowed.

Filename: Islametal_SNPalignment_Passer-reference-autosomes_30%Missing

SNP alignment (FASTA) inferred from read mapping to the house sparrow (Passer domesticus) reference genome, only autosomal SNP data, 30% missing data allowed.

Filename: Islametal_SNPalignment_Passer-reference-Zchromosome_0%Missing

SNP alignment (FASTA) inferred from read mapping to the house sparrow (Passer domesticus) reference genome, only Z-chromosome SNP data, 0% missing data allowed.

Filename: Islametal_SNPalignment_Passer-reference-Zchromosome_10%Missing

SNP alignment (FASTA) inferred from read mapping to the house sparrow (Passer domesticus) reference genome, only Z-chromosome SNP data, 10% missing data allowed.

Filename: Islametal_SNPalignment_Passer-reference-Zchromosome_20%Missing

SNP alignment (FASTA) inferred from read mapping to the house sparrow (Passer domesticus) reference genome, only Z-chromosome SNP data, 20% missing data allowed.

Filename: Islametal_SNPalignment_Passer-reference-Zchromosome_30%Missing

SNP alignment (FASTA) inferred from read mapping to the house sparrow (Passer domesticus) reference genome, only Z-chromosome SNP data, 30% missing data allowed.

Tree files:

Filename: Islametal_snowfinchesOT_20missing_50Mio3runscombined_TREE

Consensus treefile from three combined runs with BEAST, MCMC chain length: 50 million generations, trees sampled every 5000 generation, burnin: 30% of sampled trees; inferred from mapping to the Onychostruthus taczanowskii genome, unthinned data, 20% missing data allowed (matrix size: 39,952 bp; alignment: filename= Islametal_SNPalignment_Onychostruthus-reference_20%Missing).

Filename: Islametal_snowfinchesPR_20missing_50Mio3runscombined_TREE

Consensus treefile from three combined runs with BEAST, MCMC chain length: 50 million generations, trees sampled every 5000 generation, burnin: 30% of sampled trees; inferred from mapping to the Pyrgilauda ruficollis genome, unthinned data, 20% missing data allowed (matrix size: 40,695 bp; alignment: filename= Islametal_SNPalignment_Pyrgilauda-reference_20%Missing).

Filename: Islametal_snowfinchesPasserautosomes_20missing_50Mio3runscombined_TREE

Consensus treefile from three combined runs with BEAST, MCMC chain length: 50 million generations, trees sampled every 5000 generation, burnin: 30% of sampled trees; inferred from mapping to the Passer domesticus genome, unthinned autosomal data, 20% missing data allowed (matrix size: 30,550 bp; alignment: filename= Islametal_SNPalignment_Passer-reference-autosomes_20%Missing).

Sharing/Access information

Other publicly accessible locations of the data:

Accession numbers of sequence data used for analysis are provided in the material list in „SupplementaryTable S1_sample_metadata.xlsx“: Cytochrome-b and mitogenome sequences deposited at GenBank; raw FASTQ files from ddRAD sequencing deposited at European Nucleotide Archive (ENA).

Methods

We extracted DNA from 40 samples of snowfinches from all eight species of the three genera (Montifringilla, Onychostruthus, and Pyrgilauda) and one further sample of the rock sparrow, Petronia petronia. All samples were either frozen blood or tissue samples preserved in ethanol or preserving buffer except two toe pad samples taken from two historical specimens of the Afghan snowfinch, P. theresae. We generated whole mitochondrial genomes for this species and representatives from the two other snowfinch genera using specific protocols for museum material. To complete a previous single-marker data set, we amplified a 1079-bp-long cytochrome-b fragment for 19 samples using the primer combination of O-L14851/ O-H16065 primers.

For inference of a genome-wide SNP data set were used for double-digest restriction site associated DNA sequencing (ddRAD seq). DdRAD seq was performed at the Deep Sequencing Facility in the Center for Molecular and Cellular Bioengineering (CMCB) Dresden. We used Qubit (Thermo Fisher Scientific, Waltham, MA, USA), dsDNA High-Sensitivity (HS) and Broad-Range (BR) assays for DNA concentration measurement following the manufacturer’s protocol. According to our Qubit measurements, we selected 38 samples with sufficient DNA concentrations for ddRAD seq. For sample preparation, 50 ng gDNA were double-digested with SbfI and MspI (NEB) for 120 minutes at 37°C followed by heat inactivation at 65°C for 20 min. SbfI specific library barcodes carrying Truseq-i5 Illumina adapters with cohesive ends were ligated to the cohesive ends of the SbfI restriction sites of the digested DNA fragments. The same was done for the MspI site with a MspI-specific truncated universal TruSeq-i7 adapter. Samples with different P5 Barcodes were pooled and purified using XP beads (Beckman Coulter, Krefeld, Germany) at a ratio of 1:1 to remove non-ligated adaptors. Libraries were equimolarly pooled before sequencing them in single-end mode on an Illumina NextSeq 500 system to a read length of 75 bp and a depth of at least 1 million reads per sample.

We mapped our cleaned ddRAD seq read data to three different reference genomes of 1.) Pyrgilauda ruficollis (NCBI acc. no: GCF_017590135.1), 2.) Onychostruthus tazcanowskii (NCBI acc. no: GCA_017590055.1) and 3.) to a house sparrow (Passer domesticus) genome which was assembled to chromosome-level. We used ipyrad v.0.9.42 for data assembly and read mapping to the three different reference genomes. We applied a clustering threshold of 85% and a minimum sequencing depth for clustering ≥6X. We applied default parameter settings of the reference-based ipyrad pipeline with a maximum of 8 indels, 0.5 heterozygous sites, 20% SNPs per locus, and a minimum of four samples per locus. After these first filtering steps, the ipyrad pipeline (Eaton and Overcast, 2020) produced three independent VCF output files from mapping against three different reference genomes. From each of the VCF files we generated final SNP data sets allowing for 0%, 10%, 20% and 30% missing data that were used as input data for phylogenetic analysis. For variant calling we used vcftools 1.1.5 and bcftools v1.8 with a quality value ≥30 applied to separate autosomal from Z-chromosomal data sets [which was possible only for the data set inferred from alignment to the house sparrow reference genome that was annotated to the chromosome level].

Funding

Deutsche Forschungsgemeinschaft, Award: PA1818/3-2

Ayudas de Incorporación Científico Titular, Award: 202230I042, CSIC