Data from: Limited introgression between rock-wallabies with extensive chromosomal rearrangements

Data files

Jan 14, 2026 version files 999.51 MB

Dryad_archive_Petrogale_penicillata_complex_Potter_et_al_2022_2.zip

999.48 MB
ReadMe_Petrogale_pencillata_complex.rtf

15.80 KB
README.md

14.20 KB

Abstract

Chromosome rearrangements can result in the rapid evolution of hybrid incompatibilities. Robertsonian fusions, particularly those with monobrachial homology, can drive reproductive isolation amongst recently diverged taxa. The recent radiation of rock-wallabies (genus Petrogale) is an important model to explore the role of Robertsonian fusions in speciation. Here we pursue that goal using an extensive sampling of populations and genomes of Petrogale from north-eastern Australia. In contrast to previous assessments using mitochondrial DNA or nuclear microsatellite loci, genomic data are able to separate the most closely related species and to resolve their divergence histories. Both phylogenetic and population genetic analyses indicate introgression between two species that differ by a single Robertsonian fusion. Based on the available data, there is also evidence for introgression between two species which share complex chromosomal rearrangements. However, the remaining results show no consistent signature of introgression amongst species pairs and where evident, indicate generally low introgression overall. X-linked loci have elevated divergence compared to autosomal loci indicating a potential role for genic evolution to produce reproductive isolation in concert with chromosome change. Our results highlight the value of genome scale data in identifying a strong role for Robertsonian fusions and structural variation in speciation.

https://doi.org/10.5061/dryad.6m905qg0d

28 June 2021

GENERAL INFORMATION

Title: Genomic data used to examine introgression between rock-wallabies with extensive chromosomal rearrangements

Principal Investigator: Dr Sally Potter
Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia
sally.potter@mq.edu.au

Date of data collection: 2017-2021

Geographic location of data collection: northeast Australia (Queensland)

Keywords: chromosome rearrangement, introgression, marsupial, speciation, Robertsonian fusion

Citation: Potter S, Bragg JG, Turakulov R, Eldridge MDB, Deakin J, Kirkpatrick M, Edwards RJ, Moritz C (2022) Limited introgression between rock-wallabies with extensive chromosomal rearrangements. Molecular Biology and Evolution 39: msab333 https://doi.org/10.1093/molbev/msab333

DATA & FILE OVERVIEW

These data were generated to investigate the role of chromosome rearrangements in the evolutionary history of the penicillata group of rock-wallabies from northeast Queensland. Exon capture sequence data as well as Diversity Arrays Technology genome reduction (single nucleotide polymorphism - SNP) data were used applying both population genomic and phylogenomic approaches to evaluate introgression between species and in relation to X chromosome loci, as well as loci from rearranged and non-rearranged chromosomes. Here we have three directories, containing 1) data used in the various analyses, 2) design of the exon capture experiment, and 3) workflows used to assemble and clean the data.

\1. data -------------------------

File 1 Name: lib_sample_Petrogale_penicillata_complex
File 1 Description: A Comma-Separated Values (CSV) file with sample information including sample ID, Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra/) information linking to raw sequencing files, sex and geographical location. All samples are in SRA Bioprojects PRJNA360868 and PRJNA741610.

Directory 1.1 Name: Datasets/
Directory 1.1 Description: data files used in analyses
File 1 Name: rw.2_3.recode.vcf--
File 1 Description: A variant call format (VCF) file, a standard genetic file used in bioinformatics to store genetic variants (SNPs) for the final filtered DArT SNPs (22,724 SNPs) used for population genetic analyses in this study.

File 2 Name: SAMIG_vcf_pop.csv
File 2 Description: A Comma-Separated Values (CSV) file of sample name and population information used in R for PCoA of rw.2_3.recode.vcf data.

File 3 Name: SAMIG_final.gl6.csv
File 3 Description: A csv file of filtered SNP data from analysis of rw.2_3.recode.vcf using dartR. This file was used to create files for Structure (.struc) and TreeMix analyses.

File 4 Name: rw_gt_SAMIG.struc
File 4 Description: A text file of genetic variants used as input for Structure population genetic clustering analysis for all five species (Petrogale assimilis, Petrogale godmani, Petrogale mareeba, Petrogale inornata, Petrogale sharmani).

File 5 Name: rw_gt_SAM.struc
File 5 Description: A text file of genetic variants used as input for Structure population genetic clustering analysis for three SAM species (Petrogale assimilis, Petrogale mareeba, Petrogale sharmani).

File 6 Name: SAM_polymorphicsites_dapc.csv 
File 6 Description: A csv file of individuals and their genetic variants (SNPs) where 0=homozygous reference, 1=heterozygote, 2=homozygous alternate allele for discriminant function analysis in R.

File 7 Name: TreeMix_SAMIG.txt
File 7 Description: A text file of allele frequency data for each of the five species used as input for TreeMix analysis.

File 8 Name: SAMI_Gog_SETS.txt
File 8 Description: A text file assigning individuals to a population used as input for TreeMix analysis used to estimate migration with Petrogale godmani used as outgroup.

File 9 Name: SAMIGC_h0_1215_concat.phy 
File 9 Description: A phylip sequence alignment of concatenated phased haplotypes for the penicillata species complex (n=1215 loci) used for phylogenetic analysis.

File 10 Name: SAMI_MIGRATE_1617_interleaved.phy 
File 10 Description: A Phylip sequence alignment of concatenated data used as input for MIGRATE analysis of Petrogale assimilis, Petrogale mareeba, Petrogale inornata and Petrogale sharmani used to examine theta and migration between these species using the unphased nuclear sequence data (n=1617 loci).

Directory 1.2 Name: R_code_files/
Directory 1.2 Description: R code used for principal coordinate and discriminant function analysis, as well as PopGenome analyses of Tajima’s D
File 1 Name: R_dapc_19PC_SAM.txt
File 1 Description: A text file of the R code used for discriminant function analysis for the SAM species complex (Petrogale sharmani, Petrogale assimilis, Petrogale mareeba), using packages dartR and poppr.

File 2 Name: R_pcoa_SAMIG_final.txt
File 2 Description: A text file of the R code used to principal coordinate analysis for the five penicillata complex species.

File 3 Name: PopGenome_code_SAMIG.txt
File 3 Description: text file of the R code used to calculate dXY in PopGenome, this was run on each chromosome to estimate average dXY

Directory 1.3 Name: MIGRATE/
Directory 1.3 Description: A folder of pdf files of the output from MIGRATE analyses for all data, X loci only, and a subset of rearranged (R) and non-rearranged (NR) loci estimating migration and theta values
File 1 Name: SAMI_1617loci.pdf
File 1 Description: pdf file of the results output from MIGRATE analysis for the SAMI species complex using all unphased nuclear loci (n=1617)

File 2 Name: SAMI_NR.pdf
File 2 Description: pdf file of the results output from MIGRATE analysis for the SAMI species complex using all unphased nuclear loci mapped to non-rearranged chromosomes (1,2,7,8; n=50 loci)

File 3 Name: SAMI_R.pdf
File 3 Description: pdf file of the results output from MIGRATE analysis for the SAMI species complex using all unphased nuclear loci mapped to rearranged chromosomes (5,6,9,10; n=50 loci)

File 4 Name: SAMI_X.pdf
File 4 Description: pdf file of the results output from MIGRATE analysis for the SAMI species complex using all unphased nuclear loci mapped to the X chromosome (n=46 loci)

Directory 1.4 Name: Petrogale penicillata draft genome/
Directory 1.4 Description: folder containing the draft pseudohaplotype genome of Petrogale penicillata from 10X genomics library, used to map raw reads from DArTseq and exons from exon capture experiments

File 1 Name: wallaby10xv1.dipnr.pri.fasta
File 1 Description: fasta file of the genome of Petrogale penicillata used to map exon loci to

File 2 Name: wallaby10xv1.dipnr.pri.fasta.fai
File 2 Description: fasta index file including: column 1 - contig name, column 2 - number of bases in the contig, column 3 - byte index of the file where the contig sequence begins, column 4- bases per line in the fasta file, column 5 - bytes per line in the fasta file

Directory 1.5 Name: Chromosome_data/
Directory 1.5 Description: contains directories of phylip file format of exon sequence data which were mapped to each chromosome (1-10, X) and sorted by autosomal loci, X loci, rearranged loci (R) and non-rearranged loci (NR). # Note Supplementary Table 2 from the manuscript outlines the scaffolds used to map loci to, to establish the chromosome location of Petrogale penicillata loci from the tammar wallaby (Notamacropus eugenii)
Directory 1.5.1 Name: Chr1/
Directory 1.5.1 Description: directory of phylip sequence alignments with exon ID for chromosome 1

Directory 1.5.2 Name: Chr2/
Directory 1.5.2 Description: directory of phylip sequence alignments with exon ID for chromosome 2 

Directory 1.5.3 Name: Chr3/
Directory 1.5.3 Description: directory of phylip sequence alignments with exon ID for chromosome 3 

Directory 1.5.4 Name: Chr4/ 
Directory 1.5.4 Description: directory of phylip sequence alignments with exon ID for chromosome 4 

Directory 1.5.5 Name: Chr5/ 
Directory 1.5.5 Description: directory of phylip sequence alignments with exon ID for chromosome 5 

Directory 1.5.6 Name: Chr6/ 
Directory 1.5.6 Description: directory of phylip sequence alignments with exon ID for chromosome 6

Directory 1.5.7 Name: Chr7/ 
Directory 1.5.7 Description: directory of phylip sequence alignments with exon ID for chromosome 7

Directory 1.5.8 Name: Chr8/ 
Directory 1.5.8 Description: directory of phylip sequence alignments with exon ID for chromosome 8

Directory 1.5.9 Name: Chr9/ 
Directory 1.5.9 Description: directory of phylip sequence alignments with exon ID for chromosome 9

Directory 1.5.10 Name: Chr10/ 
Directory 1.5.10 Description: directory of phylip sequence alignments with exon ID for chromosome 10

Directory 1.5.11 Name: Autosomes/ 
Directory 1.5.11 Description: directory of concatenated phylip sequence alignments with exon ID for each autosomal chromosome of Petrogale penicillata

Directory 1.5.12 Name: X_46/ 
Directory 1.5.12 Description: directory of phylip sequence alignments with exon ID for the X chromosome

Directory 1.5.13 Name: R_50/ 
Directory 1.5.13 Description: directory of subset of 50 phylip sequence alignments with exon ID for rearranged chromosomes (5, 6, 9, 10)

Directory 1.5.14 Name: R_all/ 
Directory 1.5.14 Description: directory of subset of all mapped sequences in phylip sequence alignments with exon ID format for rearranged chromosomes (5, 6, 9, 10)

Directory 1.5.15 Name: NR_50/ 
Directory 1.5.15 Description: directory of subset of 50 phylip sequence alignments with exon ID for non-rearranged chromosomes (1, 2, 7, 8)

Directory 1.5.16 Name: NR_all/ 
Directory 1.5.16 Description: directory of subset of all mapped sequences in phylip sequence alignments with exon ID format for non-rearranged chromosomes (1, 2, 7, 8)

Directory 1.6 Name: Haplotype_h0_alignments/
Directory 1.6 Description: directory of nuclear h0 haplotype data in phylip sequence alignments with exon ID for all species in the penicillata complex, individuals outlined by library ID, this includes all exons analysed

Directory 1.7 Name: Unphased_nuclear_alignments/
Directory 1.7 Description: directory of nuclear exon phylip sequence alignments with exon ID for all species in the penicillata complex, individuals outlined by the library ID and ambig as ambiguity codes are for the unphased dataset, this includes all exons analysed and a concatenated alignment

\2. design -----------------------

Information on the sequence capture kit

Transcriptome sequence of Petrogale xanthous used for target identification is available in Dryad (doi: 10.5061/dryad.5606t)

File 1 Name: targetExons.fa
File 1 Description: the list of target exons used for probe design

\3. workflow --------------------

folder of perl scripts used to clean raw sequence data and assembling

External dependencies including bowtie2, samtools and GATK

Directory 3.1 Name: clean/--
Directory 3.1 Description: script used to clean reads (adaptor/duplicate/contamination removal), trimming

File 1 Name: scrubReads.pl
File 1 Description: copy of script used for cleaning reads, see also https://github.com/MVZSEQ, SCPP directory

Directory 3.2 Name: assembly/
Directory 3.2 Description: scripts used to assemble exon capture data using a series of Perl scripts that are linked with shell wrapper scripts. It accepts three main datasets: target exon sequences, the reference proteome used during annotation, and captured sequence reads from each sample. Pipeline details can be found at https://github.com/jasongbragg/exon-capture-phylo/

Directory 3.2.1 Name: pl/
Directory 3.2.1 Description: perl scripts for assembly workflow

Directory 3.2.2 Name: sh/

Directory 3.2.2 Description: shell scripts for assembly workflow

File 1 Name: exon_capture_phylo.sh

File 1 Description: main script for assembly workflow, run the exon_capture_phylo.sh script to run the assembly and bash scripts in the sh/ and pl/ directory (3.2.1 and 3.2.2)

File 2 Name: rw.all.config

File 2 Description: example configuration script

Directory 3.3 Name: GATK_docker_info/

Directory 3.3 Description: folder containing information about the GATK docker # this docker is available at https://hub.docker.com/r/trust1/gatk

File 1 Name: Overview_GATK_docker.txt

File 1 Description: text file outlining the GATK docker, instructions and background

File 2 Name: callingpipe_GATK.pl

File 2 Description: perl script used to run GATK from the docker

Directory 3.4 Name: SNP_filtering/

Directory 3.4 Description: folder containing code and readme files for filtering SNP dataset

File 1 Name: process.vcf.1.sh

File 1 Description: shell script used to filter snps, it does this by calling vcftools. It also contains calls to scripts for comparing genotypes of technical replicates, largely using R code from /r file

Directory 3.4.1 Name: r/

Directory 3.4.1 Description: contains r code for opening vcf files, and comparing technical replicates

Code/software

Scripts in the workflow directory are provided here primarily for archival purposes. For updated versions, see:

https://github.com/MVZSEQ

https://github.com/jasongbragg/exon-capture-phylo

https://github.com/MozesBlom/EAPhy

Access information

Other publicly accessible locations of the data:

NCI Short Read Archive (http://www.ncbi.nlm.nih.gov/sra/) information, geographical location. All samples are in SRA Bioprojects PRJNA360868 and PRJNA741610.

Data was derived from the following sources:

METHODOLOGICAL INFORMATION

Please refer to the manuscript for methods of data collection and processing https://doi.org/10.1093/molbev/msab333