Data from: Limited introgression between rock-wallabies with extensive chromosomal rearrangements
Data files
Jan 14, 2026 version files 999.51 MB
-
Dryad_archive_Petrogale_penicillata_complex_Potter_et_al_2022_2.zip
999.48 MB
-
ReadMe_Petrogale_pencillata_complex.rtf
15.80 KB
-
README.md
14.20 KB
Abstract
Chromosome rearrangements can result in the rapid evolution of hybrid incompatibilities. Robertsonian fusions, particularly those with monobrachial homology, can drive reproductive isolation amongst recently diverged taxa. The recent radiation of rock-wallabies (genus Petrogale) is an important model to explore the role of Robertsonian fusions in speciation. Here we pursue that goal using an extensive sampling of populations and genomes of Petrogale from north-eastern Australia. In contrast to previous assessments using mitochondrial DNA or nuclear microsatellite loci, genomic data are able to separate the most closely related species and to resolve their divergence histories. Both phylogenetic and population genetic analyses indicate introgression between two species that differ by a single Robertsonian fusion. Based on the available data, there is also evidence for introgression between two species which share complex chromosomal rearrangements. However, the remaining results show no consistent signature of introgression amongst species pairs and where evident, indicate generally low introgression overall. X-linked loci have elevated divergence compared to autosomal loci indicating a potential role for genic evolution to produce reproductive isolation in concert with chromosome change. Our results highlight the value of genome scale data in identifying a strong role for Robertsonian fusions and structural variation in speciation.
https://doi.org/10.5061/dryad.6m905qg0d
28 June 2021
GENERAL INFORMATION
Title: Genomic data used to examine introgression between rock-wallabies with extensive chromosomal rearrangements
Principal Investigator: Dr Sally Potter
Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia
sally.potter@mq.edu.au
Date of data collection: 2017-2021
Geographic location of data collection: northeast Australia (Queensland)
Keywords: chromosome rearrangement, introgression, marsupial, speciation, Robertsonian fusion
Citation: Potter S, Bragg JG, Turakulov R, Eldridge MDB, Deakin J, Kirkpatrick M, Edwards RJ, Moritz C (2022) Limited introgression between rock-wallabies with extensive chromosomal rearrangements. Molecular Biology and Evolution 39: msab333 https://doi.org/10.1093/molbev/msab333
DATA & FILE OVERVIEW
These data were generated to investigate the role of chromosome rearrangements in the evolutionary history of the penicillata group of rock-wallabies from northeast Queensland. Exon capture sequence data as well as Diversity Arrays Technology genome reduction (single nucleotide polymorphism - SNP) data were used applying both population genomic and phylogenomic approaches to evaluate introgression between species and in relation to X chromosome loci, as well as loci from rearranged and non-rearranged chromosomes. Here we have three directories, containing 1) data used in the various analyses, 2) design of the exon capture experiment, and 3) workflows used to assemble and clean the data.
\1. data -------------------------
File 1 Name: lib_sample_Petrogale_penicillata_complex
File 1 Description: A Comma-Separated Values (CSV) file with sample information including sample ID, Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra/) information linking to raw sequencing files, sex and geographical location. All samples are in SRA Bioprojects PRJNA360868 and PRJNA741610.
Directory 1.1 Name: Datasets/
Directory 1.1 Description: data files used in analyses
File 1 Name: rw.2_3.recode.vcf--
File 1 Description: A variant call format (VCF) file, a standard genetic file used in bioinformatics to store genetic variants (SNPs) for the final filtered DArT SNPs (22,724 SNPs) used for population genetic analyses in this study.
File 2 Name: SAMIG_vcf_pop.csv
File 2 Description: A Comma-Separated Values (CSV) file of sample name and population information used in R for PCoA of rw.2_3.recode.vcf data.
File 3 Name: SAMIG_final.gl6.csv
File 3 Description: A csv file of filtered SNP data from analysis of rw.2_3.recode.vcf using dartR. This file was used to create files for Structure (.struc) and TreeMix analyses.
File 4 Name: rw_gt_SAMIG.struc
File 4 Description: A text file of genetic variants used as input for Structure population genetic clustering analysis for all five species (Petrogale assimilis, Petrogale godmani, Petrogale mareeba, Petrogale inornata, Petrogale sharmani).
File 5 Name: rw_gt_SAM.struc
File 5 Description: A text file of genetic variants used as input for Structure population genetic clustering analysis for three SAM species (Petrogale assimilis, Petrogale mareeba, Petrogale sharmani).
File 6 Name: SAM_polymorphicsites_dapc.csv
File 6 Description: A csv file of individuals and their genetic variants (SNPs) where 0=homozygous reference, 1=heterozygote, 2=homozygous alternate allele for discriminant function analysis in R.
File 7 Name: TreeMix_SAMIG.txt
File 7 Description: A text file of allele frequency data for each of the five species used as input for TreeMix analysis.
File 8 Name: SAMI_Gog_SETS.txt
File 8 Description: A text file assigning individuals to a population used as input for TreeMix analysis used to estimate migration with Petrogale godmani used as outgroup.
File 9 Name: SAMIGC_h0_1215_concat.phy
File 9 Description: A phylip sequence alignment of concatenated phased haplotypes for the penicillata species complex (n=1215 loci) used for phylogenetic analysis.
File 10 Name: SAMI_MIGRATE_1617_interleaved.phy
File 10 Description: A Phylip sequence alignment of concatenated data used as input for MIGRATE analysis of Petrogale assimilis, Petrogale mareeba, Petrogale inornata and Petrogale sharmani used to examine theta and migration between these species using the unphased nuclear sequence data (n=1617 loci).
Directory 1.2 Name: R_code_files/
Directory 1.2 Description: R code used for principal coordinate and discriminant function analysis, as well as PopGenome analyses of Tajima’s D
File 1 Name: R_dapc_19PC_SAM.txt
File 1 Description: A text file of the R code used for discriminant function analysis for the SAM species complex (Petrogale sharmani, Petrogale assimilis, Petrogale mareeba), using packages dartR and poppr.
File 2 Name: R_pcoa_SAMIG_final.txt
File 2 Description: A text file of the R code used to principal coordinate analysis for the five penicillata complex species.
File 3 Name: PopGenome_code_SAMIG.txt
File 3 Description: text file of the R code used to calculate dXY in PopGenome, this was run on each chromosome to estimate average dXY
Directory 1.3 Name: MIGRATE/
Directory 1.3 Description: A folder of pdf files of the output from MIGRATE analyses for all data, X loci only, and a subset of rearranged (R) and non-rearranged (NR) loci estimating migration and theta values
File 1 Name: SAMI_1617loci.pdf
File 1 Description: pdf file of the results output from MIGRATE analysis for the SAMI species complex using all unphased nuclear loci (n=1617)
File 2 Name: SAMI_NR.pdf
File 2 Description: pdf file of the results output from MIGRATE analysis for the SAMI species complex using all unphased nuclear loci mapped to non-rearranged chromosomes (1,2,7,8; n=50 loci)
File 3 Name: SAMI_R.pdf
File 3 Description: pdf file of the results output from MIGRATE analysis for the SAMI species complex using all unphased nuclear loci mapped to rearranged chromosomes (5,6,9,10; n=50 loci)
File 4 Name: SAMI_X.pdf
File 4 Description: pdf file of the results output from MIGRATE analysis for the SAMI species complex using all unphased nuclear loci mapped to the X chromosome (n=46 loci)
Directory 1.4 Name: Petrogale penicillata draft genome/
Directory 1.4 Description: folder containing the draft pseudohaplotype genome of Petrogale penicillata from 10X genomics library, used to map raw reads from DArTseq and exons from exon capture experiments
File 1 Name: wallaby10xv1.dipnr.pri.fasta
File 1 Description: fasta file of the genome of Petrogale penicillata used to map exon loci to
File 2 Name: wallaby10xv1.dipnr.pri.fasta.fai
File 2 Description: fasta index file including: column 1 - contig name, column 2 - number of bases in the contig, column 3 - byte index of the file where the contig sequence begins, column 4- bases per line in the fasta file, column 5 - bytes per line in the fasta file
Directory 1.5 Name: Chromosome_data/
Directory 1.5 Description: contains directories of phylip file format of exon sequence data which were mapped to each chromosome (1-10, X) and sorted by autosomal loci, X loci, rearranged loci (R) and non-rearranged loci (NR). # Note Supplementary Table 2 from the manuscript outlines the scaffolds used to map loci to, to establish the chromosome location of Petrogale penicillata loci from the tammar wallaby (Notamacropus eugenii)
Directory 1.5.1 Name: Chr1/
Directory 1.5.1 Description: directory of phylip sequence alignments with exon ID for chromosome 1
Directory 1.5.2 Name: Chr2/
Directory 1.5.2 Description: directory of phylip sequence alignments with exon ID for chromosome 2
Directory 1.5.3 Name: Chr3/
Directory 1.5.3 Description: directory of phylip sequence alignments with exon ID for chromosome 3
Directory 1.5.4 Name: Chr4/
Directory 1.5.4 Description: directory of phylip sequence alignments with exon ID for chromosome 4
Directory 1.5.5 Name: Chr5/
Directory 1.5.5 Description: directory of phylip sequence alignments with exon ID for chromosome 5
Directory 1.5.6 Name: Chr6/
Directory 1.5.6 Description: directory of phylip sequence alignments with exon ID for chromosome 6
Directory 1.5.7 Name: Chr7/
Directory 1.5.7 Description: directory of phylip sequence alignments with exon ID for chromosome 7
Directory 1.5.8 Name: Chr8/
Directory 1.5.8 Description: directory of phylip sequence alignments with exon ID for chromosome 8
Directory 1.5.9 Name: Chr9/
Directory 1.5.9 Description: directory of phylip sequence alignments with exon ID for chromosome 9
Directory 1.5.10 Name: Chr10/
Directory 1.5.10 Description: directory of phylip sequence alignments with exon ID for chromosome 10
Directory 1.5.11 Name: Autosomes/
Directory 1.5.11 Description: directory of concatenated phylip sequence alignments with exon ID for each autosomal chromosome of Petrogale penicillata
Directory 1.5.12 Name: X_46/
Directory 1.5.12 Description: directory of phylip sequence alignments with exon ID for the X chromosome
Directory 1.5.13 Name: R_50/
Directory 1.5.13 Description: directory of subset of 50 phylip sequence alignments with exon ID for rearranged chromosomes (5, 6, 9, 10)
Directory 1.5.14 Name: R_all/
Directory 1.5.14 Description: directory of subset of all mapped sequences in phylip sequence alignments with exon ID format for rearranged chromosomes (5, 6, 9, 10)
Directory 1.5.15 Name: NR_50/
Directory 1.5.15 Description: directory of subset of 50 phylip sequence alignments with exon ID for non-rearranged chromosomes (1, 2, 7, 8)
Directory 1.5.16 Name: NR_all/
Directory 1.5.16 Description: directory of subset of all mapped sequences in phylip sequence alignments with exon ID format for non-rearranged chromosomes (1, 2, 7, 8)
Directory 1.6 Name: Haplotype_h0_alignments/
Directory 1.6 Description: directory of nuclear h0 haplotype data in phylip sequence alignments with exon ID for all species in the penicillata complex, individuals outlined by library ID, this includes all exons analysed
Directory 1.7 Name: Unphased_nuclear_alignments/
Directory 1.7 Description: directory of nuclear exon phylip sequence alignments with exon ID for all species in the penicillata complex, individuals outlined by the library ID and ambig as ambiguity codes are for the unphased dataset, this includes all exons analysed and a concatenated alignment
\2. design -----------------------
Information on the sequence capture kit
Transcriptome sequence of Petrogale xanthous used for target identification is available in Dryad (doi: 10.5061/dryad.5606t)
File 1 Name: targetExons.fa
File 1 Description: the list of target exons used for probe design
\3. workflow --------------------
folder of perl scripts used to clean raw sequence data and assembling
External dependencies including bowtie2, samtools and GATK
Directory 3.1 Name: clean/--
Directory 3.1 Description: script used to clean reads (adaptor/duplicate/contamination removal), trimming
File 1 Name: scrubReads.pl
File 1 Description: copy of script used for cleaning reads, see also https://github.com/MVZSEQ, SCPP directory
Directory 3.2 Name: assembly/
Directory 3.2 Description: scripts used to assemble exon capture data using a series of Perl scripts that are linked with shell wrapper scripts. It accepts three main datasets: target exon sequences, the reference proteome used during annotation, and captured sequence reads from each sample. Pipeline details can be found at https://github.com/jasongbragg/exon-capture-phylo/
Directory 3.2.1 Name: pl/
Directory 3.2.1 Description: perl scripts for assembly workflow
Directory 3.2.2 Name: sh/
Directory 3.2.2 Description: shell scripts for assembly workflow
File 1 Name: exon_capture_phylo.sh
File 1 Description: main script for assembly workflow, run the exon_capture_phylo.sh script to run the assembly and bash scripts in the sh/ and pl/ directory (3.2.1 and 3.2.2)
File 2 Name: rw.all.config
File 2 Description: example configuration script
Directory 3.3 Name: GATK_docker_info/
Directory 3.3 Description: folder containing information about the GATK docker # this docker is available at https://hub.docker.com/r/trust1/gatk
File 1 Name: Overview_GATK_docker.txt
File 1 Description: text file outlining the GATK docker, instructions and background
File 2 Name: callingpipe_GATK.pl
File 2 Description: perl script used to run GATK from the docker
Directory 3.4 Name: SNP_filtering/
Directory 3.4 Description: folder containing code and readme files for filtering SNP dataset
File 1 Name: process.vcf.1.sh
File 1 Description: shell script used to filter snps, it does this by calling vcftools. It also contains calls to scripts for comparing genotypes of technical replicates, largely using R code from /r file
Directory 3.4.1 Name: r/
Directory 3.4.1 Description: contains r code for opening vcf files, and comparing technical replicates
Code/software
Scripts in the workflow directory are provided here primarily for archival purposes. For updated versions, see:
https://github.com/jasongbragg/exon-capture-phylo
https://github.com/MozesBlom/EAPhy
Access information
Other publicly accessible locations of the data:
- NCI Short Read Archive (http://www.ncbi.nlm.nih.gov/sra/) information, geographical location. All samples are in SRA Bioprojects PRJNA360868 and PRJNA741610.
Data was derived from the following sources:
-
METHODOLOGICAL INFORMATION
Please refer to the manuscript for methods of data collection and processing https://doi.org/10.1093/molbev/msab333
We used the draft genome assembly for Petrogale penicillata to serve as a reference for mapping reads and calling single nucleotide polymorphisms (SNPs). We generated three datasets, two exon sequence datasets for the six species in the penicillata group and one SNP dataset for five species (excluding P. coenensis). The SNP dataset we refer to as "DArT” based on the genome reduction approach used. It genotyped 22,724 SNPs from 77 individuals with a mean of 15 per taxon (including four known F1 P. godmani x P. mareeba hybrids). The second dataset is referred to as “phased exons” because it consists of phased haplotype sequences (627,699 bp) from exon capture experiments. It resolved haplotypes at 1215 loci for 67 individuals (mean of 15 per taxon). The third dataset is referred to as “unphased exons” and is based on sequence data (843,619 bp) from exon capture experiments, and includes 1617 loci on those same individuals.
Exon data was collected using a custom in-solution exon capture approach using target sequences from a yellow-footed rock-wallaby (Petrogale xanthopus) transcriptome. Sequences were cleaned and de novo assembled using a range of scripts included here.
SNP data was generated using Diversity Arrays technology and raw sequencing data was mapped to the reference Petrogale genome using the GATK docker (information included here as well).
