Genomic and acoustic biogeography of the iconic sulphur-crested cockatoo clarifies species limits and patterns of intraspecific diversity
Data files
Oct 16, 2024 version files 59.05 GB
-
ADMIXTURE1.3.zip
33.08 MB
-
BEAST2.7.5.zip
61.26 MB
-
BioGeoBEARS0.2.1_RASP4.3.zip
1.21 MB
-
db-RDA_IBD.zip
193.84 MB
-
DiversityIndices_SequenceDistance.zip
35.09 KB
-
EEMS.zip
1.89 MB
-
ESU-delimitation.zip
582 KB
-
GeneralInputs.zip
47.59 GB
-
IQ-TREE2.2.0.3.zip
219.05 MB
-
Plink2-PCAs.zip
58.49 MB
-
PseudochromosomeLayout_ReferenceGenome.zip
331.81 MB
-
PSMCs.zip
8.63 GB
-
README.md
7.92 KB
-
SMCpp.zip
1.92 GB
Abstract
Many highly recognisable species lack genetic data important for conservation due to neglect over their hyperabundance. This likely applies to the Sulphur-crested Cockatoo (Cacatua galerita), one of the world’s most iconic parrots. The species is native to Australia, New Guinea and some surrounding Melanesian islands. Four subspecies are currently recognised based on morphology. Australian subspecies and populations are abundant, but several factors threaten those in New Guinea and Melanesia. Genetic data from natural populations are scarce – information that is vital to identifying evolutionarily significant units (ESUs) important for modern conservation planning. We used whole-genome resequencing to investigate patterns of differentiation, evolutionary affinities and demographic history across C. galerita’s range to assess whether currently recognised subspecies represent ESUs. We complement this with an assessment of bioacoustic variation across the species’ range. Our results point to C. galerita sensu lato (s.l.) comprising two species. We restrict C. galerita sensu stricto (s.s.) to populations in Australia and the Trans-Fly ecoregion of southern New Guinea. The second species, recognised here as Cacatua triton, likely occurs over much of the rest of New Guinea. Restricting further discussion of intraspecific diversity in C. triton, we show that within C. galerita s.s. two ESUs exist, which align to Cacatua galerita galerita in eastern Australia and southern New Guinea and Cacatua galerita fitzroyi in northern and north-western Australia. We suggest that the evolution of these species and ESUs are linked to Middle and Late Pleistocene glacial cycles and their effects on sea level and preferential habitats. We argue that conservation assessments need updating, protection of preferential forest and woodland habitats are important and reintroductions require careful management to avoid possible negative hybridization effects of non-complementary lineages.
README: Genomic and Acoustic Biogeography of the Iconic Sulphur-Crested Cockatoo Clarifies Species Limits and Patterns of Intraspecific Diversity
https://doi.org/10.5061/dryad.ghx3ffbxc
This dataset contains raw results and several supporting input data files of analyses for the research article by Sands et al. "Genomic and Acoustic Biogeography of the Iconic Sulphur-Crested Cockatoo Clarifies Species Limits and Patterns of Intraspecific Diversity". See the research article and appendices for more information.
Processing and analytical scripts (and notes thereof) pertaining to the data and results presented here can be found at:
https://github.com/AFSands/Genomic-and-Acoustic-Biogeography-of-the-Iconic-Sulphur-Crested-Cockatoo
The data and results presented here should be read in conjunction with the associated article and the processing and analytical scripts and notes.
File structure
Data and results are placed in the following zipped folders named according to the analytical programs/software or contents per the manuscript Sands et al. "Genomic and Acoustic Biogeography of the Iconic Sulphur-Crested Cockatoo Clarifies Species Limits and Patterns of Intraspecific Diversity":
#####################################
ADMIXTURE1.3.zip - This folder includes Plink 1.9 generated input files for the admixture analyses conducted in ADMIXTURE1.3 for the associated article with the suffixes .bed, .bim, .fam, .log., .nosex files. The folder also contains raw replicate results (.Q files) of admixture graphs under the six genetic clusters tested along with associated .out files expressing the likelihoods and support for each iteration in subfolder Raw_replicate_results under A-J. A further Excel document is provided in this subfolder detailing the collective results and their support.
BEAST2.7.5.zip - This folder includes BEAST2.7.5 input files (.xml) + independent simulation result files (.log, .trees) + combined raw final result files (.log, .trees, .tree) for both PSMC-derived and secondary-dating calibrated snapp phylogenies as explained and noted in the associated article.
BioGeoBEARS0.2.1_RASP4.3.zip - This folder includes raw results of ancestral area estimations for the two phylogenies of the associated article, as generated through RASP4.3, partitioned into subfolders for each phylogenetic method (i.e. those generated for the maximum likelihood phylogeny and dated snapp phylogeny). Specifically, these results include raw data outputs of RASP (.txt files), direct graphical results of the best model (i.e. DEC+J model) as a *tree.png file, a legend for the graphical results (Legend.png), a .csv state file which can be opened in Excel indicating the regional codes applied to tips and an Excel .xls file summarising the likelihood and results of each of the six models tested for fit.
db-RDA_*IBD.zip** - This folder contains three subfolders "Genetic_distance", "Geographic_distance" and "Labels". The subfolder "Genetic distance" *includes inputs (.vcf, .log, .geno.gz, .labels, .nosex, .traw) and raw and mean results files (raw = .dist; mean = .csv) of genetic distance. "Geographic distance" includes inputs for geographic distance (inclusive of specimen GPS coordinate data) both in Excel readable .csv format. Finally, a file of specimen labels used in the analyses is given in the text file .labels in the subfolder "Labels".
DiversityIndices&SequenceDistance.zip - This folder is broken down into two subfolders. The first, "Results_dxy", contains result text files (.txt) of the dxy results for population comparisons as described in the associated article. The second, "Results_pi_ThetaW_nsites_TajimaD" population includes text (.txt) result files containing pi, ThetaW, nsites and TajimaD for each population as covered in the associated article.
EEMS.zip - This folder includes three subfolders. Subfolder "Inputs" contains the three input files for Estimated Effective Migration Surfaces (EEMS) analyses, namely a generic differences file, a coordinates file of samples and a shape file of GPS data covering the range to be assessed (.diffs, .coord, .outer). A further subfolder found within "Inputs" includes the Plink1.9 .bed, .bim and .fam files used to create the .diffs file. Two further subfolders in the main EEMS folder "_chain_1" and "_chain_2" contain the raw outputs from both independent EEMS chains (i.e. simulations) in .pdf and .txt files.*
ESU-SpeciesDelimitation.zip - This folder includes the raw input tree in newik (.newik) format as well as subfolders for each phylogenetic species elimination approach applied: "bPTP_NoOutgroup", "bPTP_Outgroup", "GMYC_multi" and "GMYC_single". These subfolders contain the raw results from bPTP and GMYC online tree-based delimitation analyses, as downloaded from and explained online (https://species.h-its.org/).
GeneralInputs.zip - This folder includes VQSR recalibrated and sorted .vcf.gz of SNP and Indels mapped to chromosomes (also see WGR-processing, Step13) + initial Plink inputs for autosomes only (also see WGR-processing, Step17).
IQ-TREE2.2.0.3.zip - This folder contains two subfolders. The subfolder "VCF_to_Fasta" includes the starting VCF file (.vcf) and the converted version in fasta (.fasta) format. The subfolder "Results" contains three result files from IQ-Tree2.2.03; namely the consensus phylogeny in tree format (.contree), the results of the best-fit model (.model.gz) and the bootstrap tree replicates (.ufboot) for the maximum likelihood phylogeny as presented in the associated article.
Plink2-PCAs.zip - This folder contains two subfolders, "Interspecific" and "Intraspecific", which relate to the two PCAs generated as part of the associated article. Each subfolder contains a further two subfolders, "Input_files" and "Results". The "Input_files" subfolder contains the Plink1.9 generated input files for the PCAs (.bed, .bim, .fam, .log., .nosex), while the "Results" subfolders contain the PCA raw result outputs from Plink2 (.eigenval, .eigenvec, .log).
PseudochromosomeLayout_ReferenceGenome.zip - This folder includes the pseudochromosome layout of Palm Cockatoo reference genome for contigs as used for chromosome mapping/partitioning (see GenBank assembly GCA_013397665.1for raw genome assembly; Feng et al. 2020). Also see the supplementary information of the associated article for more information on construction and associated files.
PSMCs.zip - This folder contains two subfolders, "Specimen_fastq_files" and "PSMC_results". The former contains zipped fastq sequence files (.fastq.gz) for each specimen used in PSMC analyses. The "PSMC_results" subfolder includes the raw PSMC bootstrap (.psmc, .txt) and combined/summarised and viewable results for each target specimen (.eps, .gp, .par).
SMCpp.zip - This folder includes input .vcg.gz files as well as subfolders for repetition of SMC++ for each group independently and combined. These subfolders contain .smc.gz files and .json model files relating to SMC++ analyses. They aslo include the rough SMC++ plot results in .csv and .pdf formats. Also see the supplementary information of the associated article and the analytical scripts for more information on construction and descriptions of included files.
Methods
The collection and processing of data follows that described within the associated research paper and its appendices.