Skip to main content
Dryad

Range-wide whole-genome resequencing of the brown bear reveals drivers of intraspecies divergence

Citation

de Jong, Menno (2023), Range-wide whole-genome resequencing of the brown bear reveals drivers of intraspecies divergence, Dryad, Dataset, https://doi.org/10.5061/dryad.qbzkh18n6

Abstract

The brown bear is a textbook example species of the effect of Quaternary glaciation cycles on the present-day geographical distribution of mtDNA haplotypes. We compiled and analysed a range-wide whole-genome dataset of 128 brown bear individuals in order to re-evaluate brown bear population structure and genetic diversity using nuclear markers from autosomes and sex chromosomes. The file 'PLOTCOMMANDS.txt' contains detailed instructions on how to recreate the figures presented within the paper.  

Methods

VCF-FILES WITH BIALLELIC SNPS

The four gzipped vcf-files contain information for biallelic single nucleotide polymorphisms (SNPs) found either on autosomes, the Y chromosome, the X-chromosome or the mitogenome (mtDNA), for a collection of 128 brown bears (1 Norwegian individual sequenced duplex), 4 polar bears, and 2 American black bears. The datasets were generated by mapping whole-genome sequencing reads against a brown bear reference genome (either nuclear or mtDNA genome), and by subsequent genotype calling using the samtools/bcftools pipeline. All samples were genotyped independently, i.e., not considering the data available for other individuals. The autosomal and X-chromosomal vcf-files were randomly thinned to retain at maximum one site every 20 Kb and 10 Kb respectively, using the thin command of the software vcftools. Diploidy levels are according to genomic region.

The haploid datasets (Y chromosome and mtDNA), as well as the haplodiploid dataset (X chromosome), can be converted into artificial diploid datasets (i.e., 0 becomes 0/0, and 1 becomes 1/1) using the script VCF_haploid2diploid.sh.

Diploid data in gzipped vcf-files can be converted into binary plink files (RAW and BIM) using the script CONVERT_vcf2ped.sh.

UNCORRECTED GENETIC DISTANCES

The file 'Distances.brown135_auto.thin100.txt' contains count data necessary to calculate uncorrected genetic distances (allele sharing distances) for all possible pairwise comparisons. Given that alle possible sample comparisons are present twice within the file (i.e., x vs y, and y vs x), including self-comparisons, the total number of lines is 135*135 = 18225. The counts are based on a thinned autosomal dataset containing both monomorphic and polymorphic sites. The distance estimates can be calculated as follows: (n1*0.5+n2he*0.5+n2ho)/(n0+n1+n2he+n2ho). In here, n0, n1, n2he, and n2ho denote the number of sites with AA-AA, AA-AB, AB-AB, and AA-BB patterns respectively, where xx-yy denote the genotypes of diploid individuals x and y. 

SLIDING-WINDOW HETEROZYGOSITY

The file 'mywindowhe.20000.Brown135He.txt' contains sliding window counts of homozygous and heterozygous sites, generated with 'Darwindow'. For comparison, genome-wide counts were also generated with 'bcftools stats -s -', and this output is stored in the file 'bcftools.stats.he.txt'. The file 'myvcfsamples.txt', which lists the order of the samples within the vcf-file. For more information about Darwindow, see: https://github.com/mennodejong1986/Darwindow

MSC-BASED ANALYSES

The file 'haploblocktrees.3075loci.newick.txt' contains the phylogenies, one per line, for 3075 haploblocks, generated with the pipeline 'PopMSC'. The file 'haploblocktrees.3075loci.ASTRALsupertree.newick.txt' contains the supertree inferred from the 3075 gene trees using ASTRAL, using the commands: 

java -jar astral.5.7.8.jar -i haploblocktrees.3075loci.newick.txt -o supertree.newick.txt 

java -jar astral.5.7.8.jar -i haploblocktrees.3075loci.newick.txt -q supertree.newick.txt -o haploblocktrees.3075loci.ASTRALsupertree.newick.txt

Unzipping the file 'Twisst.allscaffolds.quartetscores.zip' will create 36 files called 'twisst.HiC_scaffold_[1-36].quartetscores.txt', containing for each of the 36 chromosomes quartet topology counts aggregated per population, generated with the software 'Twisst'. The neighbour-joining tree 'Poptree.neiD.newick.txt' has been generated from the autosomal data set and is needed subsequent analyses of the PopMSC pipeline. For more information, see the file 'PLOTCOMMANDS.txt', but also: https://github.com/mennodejong1986/PopMSC

Y-CHROMOSOME PHYLOGENIES

The files 'IQtree_ychrom.newick.txt' and 'RAxML_ychrom.newick.txt' contain phylogenies for y-chromosomal data, generated with the commands:

raxmlHPC -f a -m GTRGAMMA -p 12345 -o Americanblack1 -x 12345 -# 100 -s Brown135_ychrom.mysnps.multiallelic.min4.phy -n bootstrap100

iqtree2 -s Brown135_ychrom.mysnps.multiallelic.min4.phy -m MFP -B 1000 --bnni --alrt 1000 -o Americanblack1 -T AUTO

The input phylip file, Brown135_ychrom.mysnps.multiallelic.min4.phy', was created from a vcf containing all variable sites (both biallelic and multiallelic), using scripts available from GitHub – edgardomortiz/vcf2phylip. 

TREEMIX

The Treemix output files were generated using the commands:

gzip Treemixinput.snpsfilter.txt

./treemix-1.13/bin/treemix -i Treemixinput.snpsfilter.txt.gz -m 1 -root Black -o Treemixout.4

The input file 'Treemixinput.snpsfilter.txt' was generated from autosomal data using the SambaR command:

exportsambarfiles(do_pednumber=TRUE,do_pedletter=FALSE,do_treemix=TRUE,do_immanc=FALSE,per_pop=FALSE)

MICROSATELLITE DATA

The file 'Brown135_3000microsat_calls.stru' contains genotype information for approximately 3000 microsatellites, inferred based on genotype counts in sequencing data ('Brown135_3000microsat_scores.txt'). For more information, see: https://github.com/mennodejong1986/MicrosatelliteGenotypingFromBam

Usage notes

To reproduce the figures presented within the paper, download all files from this Dryad repository, and run the R-commands listed in the file 'PLOTCOMMANDS.txt'.  

Funding

Leibniz-Gemeinschaft

Estonian Ministry of Education and Research, Award: IUT20-32

Estonian Ministry of Education and Research, Award: PRG1209

Landes-Offensive zur Entwicklung Wissenschaftlich-ökonomischer Exzellenz (LOEWE)