Data for: New insights into Xenopus sex chromosome genomics from the Marsabit clawed frog, X. borealis
Evans, Ben (2022), Data for: New insights into Xenopus sex chromosome genomics from the Marsabit clawed frog, X. borealis, Dryad, Dataset, https://doi.org/10.5061/dryad.3tx95x6k0
In many groups, sex chromosomes change frequently but the drivers of their rapid evolution are varied and often poorly characterized. With an aim of further understanding sex chromosome turnover, we investigated the polymorphic sex chromosomes of the Marsabit clawed frog, Xenopus borealis, using genomic data and a new chromosome-scale genome assembly. We confirmed previous findings that 54.1 Mb of chromosome 8L is sex-linked in animals from east Kenya and a lab strain, but most (or all) of this region is not sex-linked in natural populations from west Kenya. Previous work suggests possible degeneration of the Z chromosomes in the east population because many sex-linked transcripts of this female heterogametic population have female-biased expression, and we therefore expected this chromosome to not be present in the west population. In contrast, our simulations support a model where the sex-linked portion of the Z chromosome from the east acquired autosomal segregation in the west, and where the W chromosome from the east was lost in the west. These recent changes are consistent with the hot potato model, wherein sex chromosome turnover is favoured by natural selection if it purges a (minimally) degenerate sex-specific sex chromosome, but counterintuitively suggest natural selection failed to purge a Z chromosome that has signs of more advanced and possibly more ancient regulatory degeneration. These findings highlight complex evolutionary dynamics of young, rapidly evolving Xenopus sex chromosomes, and set the stage for mechanistic work aimed at pinpointing additional sex-determining genes in this group.
A draft genome assembly for X. borealis
DNA was extracted from lysed blood of a female frog that was obtained from Nasco (Fort Atkinson, WI, USA). Short insert Nextera and TruSeq libraries were prepared, respectively, by Jessica B. Lyons and by the Functional Genomics Laboratory at the University of California Berkeley, and then sequenced on the Illumina HiSeq 2500 (NCBI-SRA:SRR18802894–SRR18802896) by the Vincent J. Coates Genomics Sequencing Laboratory at the University of California Berkeley (VCGSL). Nextera mate pair libraries were prepared and sequenced on the Illumina HiSeq 2500 (NCBI-SRA:SRR18802888 and SRR18802889) by the HudsonAlpha Institute for Biotechnology. A Chicago in vitro proximity ligation library was prepared by Dovetail Genomics and sequenced on the Illumina HiSeq 2500 (NCBI-SRA:SRR18802892 and SRR18802893) by the VCGSL. Using a liver sample from a male frog, also from Nasco, a DpnII Hi-C library was prepared by Dovetail Genomics and sequenced on the Illumina HiSeq 2500 (NCBI-SRA:SRR18802890 and SRR18802891) by the VCGSL. The short insert data were adapter trimmed with ea-utils fastq-mcf version 1.04.807-18-gbd148d4 (Aronesty, 2013). The mate pair data were adapter trimmed and split with NxTrim version 0.4.1-53c2193 (O'Connell et al., 2015) and filtered using nxtrim_pipeline.sh version 1.0 (Bredeson et al., 2021).
Trimmed data were then assembled with Meraculous version 2.2.4 (Chapman et al., 2011). This assembly was scaffolded with Chicago and Hi-C data using the Dovetail Genomics HiRise algorithm (Putnam et al., 2016). The mitochondrial genome was assembled from adapter trimmed data using organelle_pipeline.py version 1.0 (Bredeson et al., 2021) and NOVOPlasty version 2.6.3 (Dierckxsens et al., 2017), starting with other Pipidae mitochondrial assemblies available on NCBI as input seeds. The assembly was screened with general_decon.sh version 1.0 (Mudd et al., 2020) to identify archaea, bacteria, virus, and vector contaminants using the respective RefSeq and UniVec databases, queried using mt_decon.sh version 1.0 (Mudd et al., 2020) against the mitochondrial assembled sequence, and filtered using nt_decon.sh version 1.0 (Bredeson et al., 2021) against the NT database and other completed frog assemblies. The assembly was then run through align_pipeline.sh version 1.0 (Bredeson et al., 2021) to identify and remove duplicate haplotype sequences. Scaffolds smaller than one kb were removed from the final assembly with Seqtk version 1.3-r106 (https://github.com/lh3/seqtk). Chromosomes were named according to the corresponding chromosomes in X. tropicalis version 9 (Mitros et al., 2019) and X. laevis version 9 (Session et al., 2016) based on alignment using MUMmer version 4.0.0 (Marcais et al., 2018), and scaffolds were numbered in order of decreasing size using SeqKit version 0.7.2-dev (Shen et al., 2016). The above sequencing reads and draft assembly are deposited under NCBI BioProject PRJNA827809. Assembly statistics were calculated using the Genome Assembly Annotation Service (GAAS) (https://github.com/NBISweden/GAAS).
RRGS data and analysis of sex-linkage
RRGS data from a lab strain and wild-caught individuals were obtained from GenBank, including 49 X. borealis lab strain individuals (PRJNA319044; Furman & Evans, 2016) and 54 X. borealis wild-caught individuals (PRJNA616217; Song et al., 2020). These data were trimmed using Trimmomatic version 0.39 (Bolger et al., 2014), and mapped to the draft X. borealis genome assembly using Bwa version 0.7.17. The HaplotypeCaller, CombineGVCFs, and GenotypeGVCF functions of the Genome Analysis Toolkit (GATK) version 4.1 (McKenna et al., 2010) were used to call genotypes for each sample and combine them into a joint genotype file. The VariantFiltration and SelectVariants functions of GATK were used to filter low-quality genotypes. Positions with the following attributes were removed: QD > 2.0, QUAL < 20, SOR > 3.0, FS > 60.0, MQ < 30.0, where these acronyms respectively refer to variant confidence/quality by depth (QD), genotype quality (QUAL), Symmetric Odds Ratio of 2x2 contingency table to detect strand bias (SOR), Fisher exact test for strand bias (FS), and map quality (MQ). PLINK version 1.9 (Purcell et al., 2007) was used to assess sex-linkage of single nucleotide polymorphisms (SNPs) from the RRGS data that mapped to any of the 18 chromosome assemblies; data from unplaced scaffolds were excluded.
New whole genome sequencing (WGS) data from geographic isolates
New genomic data were generated from a male and female X. borealis individual from east Kenya (field identification numbers BJE4536 and BJE4515, respectively, both from Wundanyi, Kenya) and a male and female individual from west Kenya (BJE4442 and BJE4441, respectively, both from Lukhome, Kenya). Specimens and genetic samples for these individuals are deposited at the Museum of Comparative Zoology at Harvard University, USA (MCZ Herpetology A-153183, MCZ Herpetology A-153181, MCZ Herpetology A-153148, MCZ Herpetology A-153147, respectively). These data have been deposited in the NCBI-SRA (BioProject PRJNA616217). The new genomic data were obtained using PCR-free library prep and sequencing each on 1/6th of a lane of a Novaseq S4 machine with paired-end 150 base pair reads. We analyzed these new data along with published genomic data from a female and male individual from our lab strain (BioProject PRJNA421481). We mapped these genomic data to the draft X. borealis genome and called genotypes using the same procedures detailed above for the RRGS data, except that a de-duplication step was performed for the genomic data using Picard (Development_team, 2019) before genotyping. Coverage of the six individuals ranged from 30–46X.
Please see manuscript.
Natural Sciences and Engineering Research Council of Canada, Award: RGPIN-2017-05770
Eunice Kennedy Shriver National Institute of Child Health and Human Development, Award: R01HD080708
National Institute of General Medical Sciences, Award: R01GM086321
Eunice Kennedy Shriver National Institute of Child Health and Human Development, Award: R01HD065705
National Institute of General Medical Sciences, Award: R35GM127069