Skip to main content
Dryad logo

Data from: High-resolution estimates of crossover and noncrossover recombination from a captive baboon colony

Citation

Wall, Jeffrey; Robinson, Jacqueline; Cox, Laura (2022), Data from: High-resolution estimates of crossover and noncrossover recombination from a captive baboon colony, Dryad, Dataset, https://doi.org/10.7272/Q6HH6H9D

Abstract

Homologous recombination has been extensively studied in humans and a handful of model organisms. Much less is known about recombination in other species, including non-human primates. Here we present a study of crossovers and non-crossover (NCO) recombination in olive baboons (Papio anubis) from two pedigrees containing a total of 20 paternal and 17 maternal meioses, and compare these results to linkage disequlibrium (LD) based recombination estimates from 36 unrelated olive baboons. We demonstrate how crossovers, combined with LD-based recombination estimates, can be used to identify genome assembly errors. We also quantify sex-specific differences in recombination rates, including elevated male crossover and reduced female crossover rates near telomeres. Finally, we add to the increasing body of evidence suggesting that while most NCO recombination tracts in mammals are short (e.g., < 500 bp), there are a non-negligible fraction of longer (e.g., > 1 Kbp) NCO tracts. For NCO tracts shorter than 10 Kbp, we fit a mixture of two (truncated) geometric distributions model to the NCO tract length distribution and estimate that >99% of all NCO tracts are very short (mean 24 bp), but the remaining tracts can be quite long (mean 4.3 Kbp). A single geometric distribution model for NCO tract lengths is incompatible with the data, suggesting that LD-based methods for estimating NCO recombination rates that make this assumption may need to be modified.

Methods

VCF files

VCF files contain raw unfiltered genotypes from 66 olive baboons (Papio anubis) from the Southwest National Primate Research Center (SNPRC). Genomes are aligned to the Panubis1.0 reference genome (GCA_008728515.1, Batra et al., 2020 (https://doi.org/10.1093/gigascience/giaa134)). Sequencing was performed with HiSeq 4000 and X machines (450 bp mean insert size, 150 bp x 150 bp paired-end sequencing) using DNA extracted from blood samples. Sequences generated for this study (n=23) were combined with previously generated sequence data from Robinson et al., 2019 (https://doi.org/10.1101/gr.247122.118) and Wu et al., 2020 (https://doi.org/10.1371/journal.pbio.3000838). All raw sequence data are available from the Sequence Read Archive under BioProject PRJNA433868. Median depth of coverage across samples is 35.6X. Briefly, reads were trimmed with TrimGalore v0.6.4 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore) using the following options: -q 20 --stringency 1 --length 50, then aligned with BWA MEM v0.7.17 (Li, 2013 (https://arxiv.org/abs/1303.3997)) before marking duplicate reads with Picard v2.21.3 (https://broadinstitute.github.io/picard) and genotyping with GATK HaplotypeCaller v3.8-1-0-gf15c1c3ef (McKenna et al., 2010 (https://doi.org/10.1101/gr.107524.110)). Genotypes from each individual were merged into a joint call set using GATK GenotypeGVCFs, followed by GATK LeftAlignAndTrimVariants. VCF files compressed with bgzip (Bonfield et al., 2021 (https://doi.org/10.1093/gigascience/giab007)).

 

Panubis1.0_mask.bed.gz

Bed file contains the coordinates of soft-masked bases in the Panubis1.0 genome assembly, which represent regions identified as repetitive by NCBI using WindowMasker (Morgulis et al., 2006, https://doi.org/10.1093/bioinformatics/bti774). Columns are: Chromosome, Start, End. File compressed with gzip.

 

baboon_pyrho_chr1-20.rmap.bed.gz

Recombination map generated with pyrho v0.1.5 (Spence and Song, 2019, https://doi.org/10.1126/sciadv.aaw9206) using genotypes from 36 putatively unrelated baboons. Only the autosomes (Chromsomes 1-20) are included. Raw rates generated by pyrho were multiplied by 5.577 to match the total map length we estimated from crossovers (2,293 cM). Columns are: Chromosome, Start, End, Recombination rate (per bp). File compressed with gzip.

Funding

National Institutes of Health, Award: OD017859

National Institutes of Health, Award: GM115433