Data from: Introgressed variants obscure phylogenetic relationships but are not subject to positive selection in Australasian long-tailed parrots
Abstract
Gene flow often obscures phylogenetic relationships, but the evolutionary significance of introgressed variants is unclear. Here, we examine the Australasian long-tailed parrots (Psittaculinae: Polytelini), in which an unexpected sister relationship between Polytelis alexandrae and the genus Aprosmictus, and not the other Polytelis species, has been observed. We tested whether this relationship was due to ancient introgression in whole genomes and found that the majority of gene trees had Ap. erythropterus and P. alexandrae as sister taxa, whereas network analysis indicated monophyly of Polytelis, and 48% of gene trees were in phylogenetic conflict due to introgression from Ap. erythropterus into P. alexandrae. Some 4–8% of the genome of P. alexandrae was introgressed from Ap. erythropterus, with signals of gene flow occurring throughout the genome. These findings indicate that topologies with P. alexandrae and the genus Ap. erythropterus as sister taxa were biased by gene flow and affirm that Polytelis is monophyletic. Next, we assessed the evolutionary outcomes for introgressed variants and found that, among introgressed protein-coding genes, only two (0.8%) were under positive selection, in comparison to 99 (1.7%) of non-introgressed genes. Our results indicate that, despite the ubiquity of genetic introgression across a given phylogeny, many genetic variants flowing between species may play a small role in molecular adaptations, with selection most frequently acting on existing variation.
README.md file was updated on Oct 24, 2024 (edited May 1, 2025; Oct 29 2025) by Brian T. Smith & Agusto Luzuriaga-Neira.
Contacts: bsmith1@amnh.org, aluzuriaganeira@amnh.org.
Scripts, lists of commands, and complementary files used in data analyses related to:
Authors:
Brian Tilston Smith
Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
Agusto Luzuriaga-Neira
Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA
David Alvarez-Ponce
Biology Department, University of Nevada, Reno, Reno, NV 89557, USA
Kaiya L. Provost
Biology Department, Adelphi University, Garden City, NY 11530, USA
Gregory Thom
Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
Leo Joseph
Australian National Wildlife Collection, National Research Collections Australia, CSIRO, GPO Box 1700, Canberra, ACT, 2601, Australia
DATA & FILE OVERVIEW
Description of the dataset
These data were collected to infer phylogenetic relationships among a group of Australian long tailed parrots.
This submission includes files, scripts and commands to reproduce the results presented in the study.
Additional files to reproduce the results can be found in the NCBI bioproject PRJNA1157411 or provided by the authors upon request.
The dataset is organized in hierarchical structure.
├── Data.tar.gz: Contains input files and some of the output files, we avoid uplaodig large input and output files. They are available upon request.
│ ├── abba-baba-test: Input files for abba-baba-test using Dsuite, including a bed file for the gaped analysis.
│ │ ├── concatenatedtree.nwk
│ │ └── sets.txt
│ ├── aphid-analysis: Contains input files and output files from aphid introgression analysis.
│ │ ├── aphid_out.csv
│ │ ├── aphid_out.log
│ │ ├── gaped-loci.bed
│ │ ├── ltailed_parrots.in
│ │ ├── ltailed_parrots.opt
│ │ ├── ltailed_parrots.tax
│ │ └── ltailed_parrots.trees
│ ├── astral-tree-est: Contains the list of trees used as astral input and the output files.
│ │ ├── 295503loci.trees
│ │ └── polytelis_295503_GT.astral.tre
│ ├── bpp-introg-analysis: Contains the ima and input control files used for BPP introgression analyses, input alignments are available under request.
│ │ ├── A00-ale-Apr_bpp.ctl
│ │ ├── imap-ale-apr.txt
│ │ ├── imap.txt
│ │ └── polytelis-bpp-msci-100.ctl
│ ├── busco-whole-genome: Contain the image summarizing the BUSCO scores for all read mapping genomes used in this study.
│ │ ├── busco_figure.png
│ │ └── fastas.list
│ ├── cds-prediction: Contains fast file of read mapping genomes and the list of the NCBI genome accesions of genomes used as reference.
│ │ ├── ncbi_acc_ref_geno.list
│ │ └── reaad-mapped-based-genomes
│ │ ├── ale_PRS4773.fas
│ │ ├── ali_ANWCB3182.fas
│ │ ├── ant_ANWCB31719.fas
│ │ ├── apr_ANWCB43803.fas
│ │ └── swa_ANWCB53967.fas
│ ├── genome-processing: Contains vcf file resulting of GATK haplotype calling and the list of ncbi SRA accesions of the sequences generated and used in this study. Also includes the bed file used to split whole genome alignment in 50 SNPs windows
│ │ ├── 1_50SNPs_windows.bed
│ │ ├── SNP-only_filtered_no_MD.recode_ren.vcf.gz
│ │ └── wgs_ncbi_acces.txt
│ ├── genome-stats: Contains the *.geno and the pop set file used to calculate genome statistics, including bed files for gaped and non-gaped analysis.
│ │ ├── 295kregions.sort.bed
│ │ ├── gaped_loci.bed
│ │ ├── input.geno.gz
│ │ └── pops.txt
│ ├── paml-sel-test: Contains the CDS's, alignments and the control files used for positive selection in paml.
│ │ ├── 06-outFilteredPhy.tar.gz
│ │ ├── ale_PRS4773_filt.codingseq
│ │ ├── ali_ANWCB3182_filt.codingseq
│ │ ├── ant_ANWCB31719_filt.codingseq
│ │ ├── apr_ANWCB43803_filt.codingseq
│ │ ├── M1a.ctl
│ │ ├── M2a.ctl
│ │ ├── M7.ctl
│ │ ├── M8.ctl
│ │ ├── MA.ctl
│ │ ├── mel_GCF_012275295.1_filt.codingseq
│ │ ├── nullMA.ctl
│ │ ├── swa_ANWCB53967_filt.codingseq
│ │ ├── tree1.trees
│ │ └── tree2.trees
│ ├── SNaQ-analysis: Contains the list of trees used for SNaQ analysis.
│ │ ├── allpolytrees.unroot.txt
│ │ ├── gaped_loci.bed
│ │ └── species_tree.nwk
│ ├── snpeff-variant-an: Contains annotated vcf file from GF and non-GF snps as well their summary in html format. The annotation was performed using snpeEffect software.
│ │ ├── ale-apr-gf-annon.vcf.gz
│ │ ├── gf_ale-apr_summary.html
│ │ ├── nointro_only_ale-apr.annon.vcf.gz
│ │ └── non-intro-ale-apr_snpEff.html
│ └── uces-processing: Contains the list of UCE's SRA accessions, input and output files used in this study.
│ ├── mafft-nexus-clean-75p.charsets
│ ├── mafft-nexus-clean-75p.phylip
│ ├── mafft-nexus-clean-75p.phylip.contree
│ └── uces_ncbi_sra_acc.tsv
├── README.md: Describe the files in the repository.
├── Scripts.tar.gz: Contains the list of scripts and command lines used in this study, bash scripts need to be adapted to the local configuration to run directly from command line. Any doubts contact the authors. Scripts are organized by analysis in their respective folder.
│ ├── abba-baba-test
│ │ └── run_dsuite.sh
│ ├── aphid-analysis
│ │ └── runAphid.sh
│ ├── astral-tree-est
│ │ └── astral_tree_est_cmd.sh
│ ├── bpp-introg-analysis
│ │ └── run_bpp_cmd.sh
│ ├── busco-whole-genome
│ │ └── run_busco_loop.sh
│ ├── cds-prediction
│ │ └── run_gmoma.sh
│ ├── gene-tree-estimation
│ │ ├── 1_50SNPs_windows.bed
│ │ ├── 1_producing_50SNP_window _alignments.sh
│ │ ├── loop2.sh
│ ├── genome-processing
│ │ ├── 1_producing_50SNP_window _alignments.sh
│ │ ├── A1.reheader_bams.sh
│ │ ├── fix_vcf_headers.py
│ │ ├── loop2.sh
│ │ ├── split_genetrees_directories.sh
│ │ ├── STEP01-BWA_alignment_filtZF_Parrots_combined.job
│ │ ├── STEP02.1-GATK_SNP_calling_parrots.job
│ │ ├── STEP02-GATK_SNP_calling_parrots.job
│ │ ├── STEP03-1-GATK_SNP_GVCF_parrots.job
│ │ ├── STEP03.1-Validade_Variants_parrots.job
│ │ ├── STEP04.1-GATK_Variant_Quality_parrots.job
│ │ ├── STEP04-GATK_SNP_GVCF_combine_parrots.job
│ │ ├── STEP05.0-GATK_SNP_filter_parrots.job
│ │ ├── STEP06-GATK_SNP_BQSR_parrots.job
│ │ └── STEP8.0-GATK_SNP_select_variants.job
│ ├── genome-stats
│ │ └── run_popgen_stats.sh
│ ├── paml-sel-test
│ │ └── run_paml_sel_test.sh
│ ├── SNaQ-analysis
│ │ ├── launch_snaq.job
│ │ ├── network_est_cmd.jl
│ │ └── snaq_boot.jl
│ ├── snpeff-variant-an
│ │ └── run_Snpeff.sh
│ └── uces-processing
│ │ ├── uces_proc_cmd.sh
│ │ └── uces_tree_cmd.sh
└── SupplementaryTablesAndFigures.tar.gz: Contains supplementary tables and figures mentioned in the main text of the aritcle.
├── Supplementary-Figures.pdf
└── Supplementary-Tables.xlsx
29 directories, 91 files
The data was generated using the methods described herein and in the manuscript Smith et al. "Introgressed variants obscure phylogenetic relationships but are not subject to positive selection in Australasian long-tailed parrots."
