Data from: Resolving a phylogenetic hypothesis for parrots: implications from systematics to conservation

Provost, Kaiya L.1 ; Joseph, Leo2; Smith, Brian Tilston1

Published Sep 14, 2018 on Dryad. https://doi.org/10.5061/dryad.m6n2t

Abstract

Advances in sequencing technology and phylogenetics have revolutionised avian biology by providing an evolutionary framework for studying natural groupings. In the parrots (Psittaciformes), DNA-based studies have led to a reclassification of clades, yet substantial gaps remain in the data gleaned from genetic information. Here we provide an overview of published genetic data of parrots, characterise sampling depth across the phylogeny, and evaluate support for existing systematic treatments. We inferred a concatenated tree with 307 species from a 30-gene supermatrix. We recovered well-supported relationships among recently proposed clades. Taxonomic groups were more stable towards the base of the tree and increased sampling will be required to clarify relationships at the tips, particularly below the generic level. Only a third of species have been sampled intraspecifically in population genetic or phylogeographic surveys. Intraspecific sampling has not been geographically or phylogenetically even across Psittaciformes, especially poor in the cockatoos, Southeast Asia, and parts of Australo-Papua. Threatened species are poorly sampled in the Neotropics. We highlight where effort should be focused to improve sampling based on geography and conservation status. In sum, phylogenetic relationships among the major parrot clades are robust, but relationships within and between genera and species provide opportunities for future investigations.

Map Making Scripts

This zip file contains within it three R scripts which are used to make the ASCII-formatted Raster files used in our manuscript. For the purposes of our research, these are used to describe IUCN status, within-species sampling, GenBank sampling, and species richness of parrot species per spatial grid cell.

Figure Making Scripts

This zip file contains five R scripts which are used to create the figures in the main text and supplementary materials. Script names indicate which figures are made by which scripts.

GenBank Pipeline Main Scripts

This zip file contains three bash/shell scripts which are used to download files from GenBank, filter out unwanted loci or individuals, and construct alignments for use in phylogenetic analyses. These three shell scripts call many subscripts which are located in the GenBank Pipeline Subscripts zip file.

Genbank Pipeline Main Scripts.zip

GenBank Pipeline Subscripts

This zip file contains multiple Python scripts called by the scripts in the GenBank Main Scripts file. These execute the specific functions to download sequences from GenBank and convert them into an aligned supermatrix of genes for use in phylogenetics. For details on individual scripts, see README.txt.

Genbank Pipeline Subscripts.zip

Subset_XX_Genes_100bp.fasta Alignment Files

This zip file contains 15 fasta-formatted alignment files. These are supermatrices produced from GenBank sequence data. They are subsets of the main supermatrix (a.k.a. Subset_01) where each subset requires that all individual species retained in the supermatrix have at least XX genes, with XX ranging from 01 to 15.

RunPartitionFinder_Subset_XX_Genes_rclusterf.cfg Config Files

This zip file contains config files for the program PartitionFinder2. They are associated with the gene subsets found in "Subset_XX_Genes_100bp.fasta Alignment Files.zip".

best_scheme_Subset_XX_Genes_rcluserf.part Partition files

This zip file contains our results from PartitionFinder2. It gives the nucleotide partitions for use in later phylogenetic analyses such as RAxML. Each file is associated with one of the gene subsets from "Subset_XX_Genes_100bp.fasta Alignment Files.zip".

COMBINED_Parrots_XXXX.asc ASCII raster files

This zip file contains multiple raster files in ASCII format. They are worldwide summaries of parrot species diversity, IUCN status, within-species sampling, GenBank sampling, and combinations of the above. Four of these files were used to make Figure 4 in the main manuscript, while the remainder were not used.

Intraspecific_Genetic_Sampling_Citations

This CSV file represents our dataset used to determine whether parrot species had intraspecific within-species genetic sampling done. If within-species sampling was found, we cite the reference. We also note situations in which we are aware of ongoing but unpublished work on this subject. In some cases due to taxonomic uncertainty, whether a species has been sampled is unclear, indicated by a "?".

ConcatenatedGbFiles_Parrots_March2017

This large file is the concatenated, GenBank-formatted sequences downloaded from GenBank for use in this publication, dating to March 2017. This file forms the basis for creating the aligned supermatrix for use in our phylogeny.

getUniqueGbAccession

This Python script is used to extract the unique GenBank accession numbers from Fasta-formatted alignment files produced by our GenBank pipeline.

extractReferencesFromGenbank

This Python script is used to extract all of the reference information from a large GenBank-formatted (.gb) file and place it into a CSV for ease of access later on. This was used to create Supplementary Table 1.

RAxML_AllSubsets_BestTrees_WithBootstraps_100bp_Partitioned

This newick-formatted file contains 15 maximum-likelihood phylogenies produced in RAxML, one for each of the 15 gene subsets. These contain bootstrap values as well. This file incorporates nucleotide partitioning into the RAxML runs, and forms the basis for all of the trees in the manuscript.

RAxML_AllSubsets_BestTrees_WithBootstraps_100bp_NotParitioned

This newick-formatted file contains 15 maximum-likelihood phylogenies produced in RAxML, one for each of the 15 gene subsets. These contain bootstrap values as well. This file does not use any nucleotide partitioning. None of the trees were used in the main manuscript, but are provided for comparison with the partitioned trees.