Data from: Museomics reveal origins of East African Pleophylla forest chafers and Miocene forest connectivity
Data files
Mar 28, 2025 version files 79.79 MB
-
pleophylla_areas_adjacency2.txt
91 B
-
Pleophylla_BAYAREALIKE_vs_BAYAREALIKE_J_M0_unconstrained_v1.pdf
159.04 KB
-
pleophylla_biogeo_table3.txt
280 B
-
Pleophylla_DEC_vs_DEC_J_M0_unconstrained_v1.pdf
159.68 KB
-
Pleophylla_DIVALIKE_vs_DIVALIKE_J_M0_unconstrained_v1.pdf
158.60 KB
-
pleophylla_eogs_skim.charset.nex
36.77 KB
-
pleophylla_eogs_skim2.fas
71.30 MB
-
pleophylla_gene_alignments.zip
3.46 MB
-
pleophylla_gene_trees.zip
999.52 KB
-
pleophylla_mcmctree_ahrens2_FigTree.tre
5.55 KB
-
pleophylla_mcmctree_mckenna2_FigTree.tre
5.56 KB
-
pleophylla_partitions.phy
3.12 MB
-
pleophylla_scarab_gene_alignments.zip
375.02 KB
-
pleophylla_skim_astral.tre
3.35 KB
-
pleophylla_skim_RUN_24.treefile
4.38 KB
-
README.md
7.72 KB
-
restable_AIC_rellike_formatted.txt
318 B
-
restable_AICc_rellike_formatted.txt
315 B
Abstract
Here we present a nearly complete species-level phylogeny including 23 of the 25 known species of the forest-dwelling herbivorous scarab chafer beetle genus Pleophylla (Coleoptera: Scarabaeidae: Sericinae), based on the analysis of 950 nuclear genes (metazoan-level universal single-copy orthologs; mzl-USCOs). DNA sequences were obtained from freshly collected, ethanol-preserved samples and from dried museum specimens by target enrichment or genome shotgun sequencing. Alignment completeness of mzl-USCOs newly obtained here by target DNA enrichment of ethanol samples were very heterogenous and lower (29-62%) than in Dietz et al. (2023), while that of sequences recovered from dried samples was even lower (~19%). Alignment completeness of the sequences obtained from low coverage shotgun sequencing was highest (~92%). We used the resulting phylogeny to reconstruct the historical biogeography of the group. To estimate a time-calibrated tree, we combined the mzl-USCO data of Pleophylla with a nucleotide alignment from an available transcriptomic dataset of Scarabaeoidea and used two different sets of secondary calibration points. Despite the problems associated with the capture rate of mzl-USCO sequences from museum specimens, we were able to infer a well-resolved phylogeny of the genus Pleophyllathat also provided reliable estimates of the phylogenetic position of species for which we had little sequence data. Our study clearly identified South Africa as the geographic origin of Pleophylla. Timing and biogeographic history confirm a persistent fragmentation of forests since the Eocene. The occurrence of only one long-distance dispersal event from southern Africa to the Eastern African Arc even during the Miocene highlights the limited dispersal possibilities for these forest-adapted chafers, which do not seem to have had important northerly range expansions along hypothetical forest corridors during the Pleistocene.
Supplementary files for study “Museomics reveal origins of East African Arc forest chafers and Miocene forest connectivity”, L. Dietz et al., submitted
For questions, contact Dirk Ahrens (d.ahrens@leibniz-lib.de) or Lars Dietz (l.dietz@leibniz-lib.de)
All alignments, except for the one used for the MCMCtree analysis, are in FASTA format and can be opened with standard alignment viewers.
Phylogenetic trees are in NEWICK format and can be opened in a standard phylogenetic tree viewer such as FigTree. Partition files for concatenated alignments are in NEXUS format for use in IQ-TREE.
Figures showing BioGeoBEARS results are in PDF format and can be opened e.g. with Adobe Acrobat Reader. All other files can be opened with a standard text editor.
Description of the data and file structure
pleophylla_gene_alignments.zip: DNA alignments of individual USCO loci of 95 Pleophylla individuals and three outgroups. In these alignments, ambiguity codes (R, Y, W, S, M, K) stand for positions inferred to be heterozygous.
pleophylla_gene_trees.zip: Maximum-likelihood phylogenetic trees in NEWICK format based on DNA alignments of individual USCO loci of 95 Pleophylla individuals and three outgroups. In these alignments, ambiguity codes (R, Y, W, S, M, K) stand for positions inferred to be heterozygous
pleophylla_scarab_gene_alignments.zip: DNA alignments of individual USCO loci from 24 Pleophylla/Omaloplia individuals with transcriptomic sequences from 58 other scarabaeoid and outgroup taxa.
pleophylla_eogs_skim2.fas: Concatenated alignment in FASTA format of USCO genes of 95 Pleophylla individuals and three outgroups.
pleophylla_eogs_skim.charset.nex: Partition file in NEXUS format for concatenated alignment of USCO genes of 95 Pleophylla individuals and three outgroups.
pleophylla_skim_RUN_24.treefile: Maximum-likelihood phylogenetic tree created with IQ-TREE from concatenated USCO dataset of 95 Pleophylla individuals and three outgroups.
pleophylla_skim_astral.tre: Coalescent-based tree created with ASTRAL based on phylogenetic trees of individual USCO loci of 95 Pleophylla individuals and three outgroups.
pleophylla_partitions.phy: Partitioned alignment in PHYLIP format of USCO sequences from 24 Pleophylla/Omaloplia individuals with transcriptomic sequences from 58 other scarabaeoid and outgroup taxa.
pleophylla_mcmctree_ahrens2_FigTree.tre: Calibrated tree of Scarabaeoidea transcriptomic dataset including 24 Pleophylla/Omaloplia individuals calculated with MCMCTREE according to calibration points from Ahrens et al. (2014).
pleophylla_mcmctree_mckenna2_FigTree.tre: Calibrated tree of Scarabaeoidea transcriptomic dataset including 24 Pleophylla/Omaloplia individuals calculated with MCMCTREE according to calibration points from McKenna et al. (2019).
pleophylla_areas_adjacency2.txt: Areas-adjacency matrix used for BioGeoBEARS analysis. 1 indicates adjacency between areas, 0 indicates non-adjacency. Abbreviations: Ca: Cape, Na: Natal, Ka: Kalahari, ZS: Zambesia (South), ZN: Zambesia (North), Sh: Shaba.
pleophylla_biogeo_table3.txt: Table containing distribution ranges of Pleophylla species used for BioGeoBEARS analysis. Rows stand for species, columns for areas. 1 indicates occurrence of species within area, 0 indicates non-occurrence. Species are indicated by first three letters of their name, see Fig. 2 for full species names. Abbreviations for areas: Ca: Cape, Na: Natal, Ka: Kalahari, ZS: Zambesia (South), ZN: Zambesia (North), Sh: Shaba.
Pleophylla_BAYAREALIKE_vs_BAYAREALIKE+J_M0_unconstrained_v1.pdf: Trees showing inferred ancestral ranges from BioGeoBEARS analysis using the models BAYAREALIKE and BAYAREALIKE+J.
Pleophylla_DEC_vs_DEC+J_M0_unconstrained_v1.pdf: Trees showing inferred ancestral ranges from BioGeoBEARS analysis using the models DEC and DEC+J.
Pleophylla_DIVALIKE_vs_DIVALIKE+J_M0_unconstrained_v1.pdf: Trees showing inferred ancestral ranges from BioGeoBEARS analysis using the models DIVALIKE and DIVALIKE+J.
restable_AIC_rellike_formatted.txt: Likelihood, estimated parameter values, and weight of each model used in BioGeoBEARS according to Akaike Information Criterion (AIC). Rows stand for models, columns for parameters. Abbreviations: LnL: logarithm of likelihood, numparams: number of parameters, d: dispersal parameter, e: extinction parameter, j: jump parameter, AIC: Akaike Information Criterion, AIC_wt: weight of model according to AIC.
restable_AICc_rellike_formatted.txt: Likelihood, estimated parameter values, and weight of each model used in BioGeoBEARS according to corrected Akaike Information Criterion (AICc). Rows stand for models, columns for parameters. Abbreviations: LnL: logarithm of likelihood, numparams: number of parameters, d: dispersal parameter, e: extinction parameter, j: jump parameter, AICc: corrected Akaike Information Criterion, AICc_wt: weight of model according to AICc.
Sharing/Access information
Data was derived from the following sources:
- Raw reads from hybrid enrichment and genomic shotgun sequencing of Pleophylla/Omaloplia spp. (NCBI SRA)
- Transcriptome assemblies of scarabaeoid beetles from Dietz et al. (2023b): https://www.ncbi.nlm.nih.gov/bioproject/PRJNA906571/
Code/Software
trinity_longest_d.pl: This script creates filtered versions of Trinity assembly results, containing only the longest variant of each contig. Requires, in that order, the input folder containing assemblies in FASTA format, and an output folder for filtered assemblies. Names of assembly files must end in .fasta. Example: trinity_longest_d.pl input_folder/ output_folder/
hmmalign_cut2_d.pl: This script removes all positions not covered by the HMM from hmmalign protein alignments (STOCKHOLM format), and the nucleotide alignments (FASTA format) based on them. As part of this process, the STOCKHOLM format of the protein alignments is converted to FASTA. Requires, in that order, the paths to the folder containing input protein alignments, the folder containing input nucleotide alignments, an output folder for protein alignments, and an output folder for nucleotide alignments. Names of protein and nucleotide alignments must be identical, with the former ending in .sth and the latter in .fas. Example: hmmalign_c2ut_d.pl input_prot/ input_nuc/ output_prot/ output_nuc/
extract_codpos_d.pl: This script removes the third codon position from all nucleotide alignments (FASTA format) in a folder. Requires, in that order, the paths to the folder containing input alignments and an output folder for modified alignments. Names of alignment files must end in .fas. Example: extract_codpos_d.pl input_folder/ output_folder/
concat_eogs_part_d.pl: This script creates a concatenated alignment FASTA file from all alignments (FASTA format) in a folder. It also creates a partition file in NEXUS format listing each alignment as a partition. Requires, in that order, the path to the folder containing input alignments, a name for the concatenated output alignment, and the partition file. Names of alignment files must end in .fas. Example: concat_eogs_part_d.pl input_folder/ concat.fas partition.nex
sam_coverage_d.pl: This script calculates the average coverage for a set of genes weighted by their length based on the output of the samtools “coverage” function. Requires, in that order, a folder containing the samtools output files in txt format, and an name for the output table. Example: sam_coverage_d.pl input_folder/ output.txt
Sequencing was done via targeted enrichment or whole-genome shotgun sequencing from ethanol preserved or dry museum specimens of Pleophylla beetles. Reads were then mapped against reference mzl-USCO sequences with bwa and diploid consensus sequences extracted with samtools. Maximum-likelihood phylogenetic trees were then calculated with IQ-TREE, calibrated trees were created with MCMCtree, and biogeographic analyses were performed with BioGeoBEARS.