Skip to main content

A new, undescribed species of Melanocharis berrypecker from western New Guinea and the evolutionary history of the family Melanocharitidae

Cite this dataset

Milá, Borja et al. (2020). A new, undescribed species of Melanocharis berrypecker from western New Guinea and the evolutionary history of the family Melanocharitidae [Dataset]. Dryad.


Western New Guinea remains one of the last biologically underexplored regions of the world, and much remains to be learned regarding the diversity and evolutionary history of its fauna and flora. During a recent ornithological expedition to the Kumawa Mountains in West Papua, we encountered an undescribed species of Melanocharis berrypecker (Melanocharitidae) in cloud forest at an elevation of 1200 m asl. Its main characteristics are iridescent blue-black upperparts, satin-white underparts washed lemon yellow, and white outer edges to the external rectrices. Initially thought to represent a close relative of the Mid-mountain Berrypecker Melanocharis longicauda based on elevation and plumage color traits, a complete phylogenetic analysis of the genus based on full mitogenomes and genome-wide nuclear data revealed that the new species, which we name Satin Berrypecker Melanocharis citreola, is in fact sister to the phenotypically dissimilar Streaked Berrypecker M. striativentris. Phylogenetic relationships within the family Melanocharitidae, including all presently recognized genera (Toxorhamphus, Oedistoma, Rhamphocharis and Melanocharis), reveal that this family endemic to the island of New Guinea originated during the Miocene (~19 Mya), diversified during the main uplift of New Guinea in the Middle and Late Miocene (14-6 Mya), and represents an evolutionary radiation with substantial diversity in bill morphology and signalling traits across species. Rhamphocharis berrypeckers fall within the Melanocharis clade despite their larger beaks, and should be included in the latter genus. Interspecific genetic distances in Melanocharis are pronounced (average interspecific distance: 8.8% in COI, 12.4% in ND2), suggesting a long history of independent evolution of all lineages corresponding to currently recognized species, including the Satin Berrypecker, which shares a most recent common ancestor with its sister species in the early Pleistocene (~2.0 Mya).


We extracted genomic DNA from blood or tissue samples using a Qiagen DNeasy Blood and Tissue Kit (Qiagen Inc., Texas) and following the manufacturer’s instructions. DNA from dry toe-pad tissue samples from some museum specimens was extracted using the same kit with slight modifications to the protocol (e.g. longer proteinase K digestion) and strict conditions to avoid contamination by modern DNA, as described elsewhere (Bruxaux et al, 2018). Libraries were prepared from 54µL of eluted DNA using the Illumina TruSeq Nano DNA Sample Prep kit following the instructions of the supplier (Illumina Inc., San Diego). Samples were then sequenced with Illumina HiSeq technology to obtain millions of ~100-bp-long sequence reads. Library preparation and sequencing were performed at the GeT-PlaGe core facility, hosted by INRAE (Toulouse, France).

1. Nuclear dataset

To reconstruct a nuclear DNA dataset, we retrieved low copy DNA regions using read mapping onto reference sequences, following the strategy presented in Bruxaux et al (2018). Briefly, we used two large sets of independent markers that were assembled recently for large-scale phylogenetic inference in modern birds (McCormack et al. 2013; Prum et al. 2015). The first set of genes is composed of 259 nuclear loci with an average length of 1,501 bp, that were defined based on the data obtained in Corvus albus (Pied Crow; Corvidae) by Prum et al. (2015), for a total sequence length of almost 389 Kbp. The second set of markers is composed of 4,634 ultra-conserved elements (UCEs) with an average length of 610 bp that were characterized in Aphelocoma californica (California Scrub-jay; Corvidae), for a total sequence length of 2.8 Mbp (McCormack et al. 2016).

We first trimmed the raw reads with Trimmomatic v0.38 (Bolger et al., 2014) to remove the adapters and the bases at the beginning and the end of low quality reads (below a value of 3), and kept only the paired-end reads that reached 36 bp after trimming. We then mapped the reads against our reference sequences with bwa mem v0.7.17 (Li & Durbin, 2009), filtered the alignment to skip reads with mapping quality below 30 with samtools v1.9 (Li et al., 2009), sorted the bam file and removed read duplicates with the same software. We called SNPs with bcftools v1.10.2 without minimum base quality, normalized indels, and filtered SNPs in the 5-bp vicinity of these indels. Finally, we obtained the consensus sequence for each individual with the bcftools consensus command and its new option “-a” that deals better with missing data. For the nuclear DNA dataset, we mapped on average 271,360 reads per individual against the reference, ranging from 33,715 to 1,047,251 reads (Table S1). We aligned our Melanocharitidae data with their reference sequence as outgroup with MAFFT v7.313 (Katoh & Standley, 2013) and refined manually the alignment in the region were missing data did not allow a proper automatic alignment.

We concatenated the 4,893 nuclear DNA markers (3.2 Mbp) for the phylogenetic analysis and used the reference sequences from Corvus albus and Aphelocoma californica as outgroups in the phylogenetic analysis. We estimated a Maximum Likelihood tree with RAxML v8.2.11 (Stamatakis, 2014) using the most parameter-rich model of sequence evolution with invariable sites and a gamma distributed rate variation among sites (GTR+I+G model). Due to computing limitations with large datasets, we could not perform nucleotide substitution model selection on the nuclear DNA dataset. We ran the analysis for 20 alternative runs and performed 1,000 replicates of non-parametric bootstrapping.

2. Mitochondrial dataset, Bayesian divergence time estimation

We assembled the mitochondrial genomes from the raw reads with the ORGanelle ASeMbler v1.0.3 (, which removes adapters and takes read quality into account. When sequencing depth did not allow us to obtain a complete circular genome, we mapped reads against one of the mitochondrial genomes obtained for the same species, using the method described below for the nuclear data. To evaluate the quality of our final reconstruction, we mapped the reads from each samples against its own mitochondrial genome, again with the method described below. Finally, we annotated each mitochondrial genome with Geneious v.9.1.8 (Biomatters Ltd., Auckland, New Zealand), using Oedistoma iliolophus NC_024865 as reference, and checked manually for the reading frame of coding genes. We obtained on average 63.17 million reads per individual (ranging from 6.5 to 467 million; Table S1). We were able to reconstruct a complete circular mitochondrial genome for all individuals except five, for which we recovered > 80% of the estimated full length. The number of reads mapping to the final mitogenome assembly ranged from 217 to one million, with an average of 137,114 reads per sample (Table S1). We obtained a final alignment of 17,034 bp, which was used for the phylogenetic analysis.

Because the mitochondrial and nuclear DNA trees were fully congruent at the species level (see Results), and because fossils from members of Melanocharitidae (or closely related groups) are not available, we estimated divergence times using mitochondrial DNA data only, and secondary calibrations from Oliveros et al. (2019). We kept the complete mitochondrial dataset but removed the outgroups, and ran the same analysis as previously described to find the best partitions schemes and associated substitution models. We used BEAST v2.5.1 (Bouckaert et al., 2014) to estimate divergence time within Melanocharitidae, using a relaxed lognormal uncorrelated evolutionary model for each partition. We assumed either Yule or birth-death branching processes as tree priors and applied lognormal priors (M=1, S=1.25) for the mutations rates, and exponential priors (M=1) for the mean clock rates. We calibrated the Melanocharitidae crown node with a normal distribution (M=18.899, S=2.95) and the split between Toxorhamphus sp. and Oedistoma sp. with another normal distribution (M=17.4631, S=2.95) to follow Oliveros et al. (2019). We ran the analysis for 500 million generations, sampling every 500,000 generations, discarding the first 50 million generations as burn-in, and checking chain convergence with Tracer v1.7.1 (Rambaut et al., 2018).

Usage notes

1. Nuclear dataset

Melano_nuclear_reference.fasta : reference built from UCE from McCormack et al. (2013) and Prum et al. (2015), concatenated and separated by 2,000 N. Used to map reads before the phylogenetic analysis.

Melano_nuclear_alignment_reduced.phy : concatenated nuclear alignment following the removal of the positions containing only N, used for the phylogenetic analysis.

2. Mitochondrial dataset, Bayesian divergence time estimation

Melano_mitochondrial_BEAST2_Y.xml : file created by BEAUTi for the dating analysis performed by BEAST2, with a Yule evolutionary model.

Melano_mitochondrial_BEAST2_BD.xml : file created by BEAUTi for the dating analysis performed by BEAST2, with a Birth-Death evolutionary model.


Lengguru Project

Laboratoire d'Excellence TULIP, Award: ANR-10-LABX-41

Czech Science Foundation, Award: 18-23794Y

Lengguru Project