Skip to main content
Dryad

RecView: An interactive R application for locating recombination positions using pedigree data

Cite this dataset

Zhang, Hongkai; Hansson, Bengt (2023). RecView: An interactive R application for locating recombination positions using pedigree data [Dataset]. Dryad. https://doi.org/10.5061/dryad.2fqz612w5

Abstract

We present RecView, an interactive R application and its homonymous R package, to facilitate locating recombination positions along chromosomes or scaffolds using whole-genome genotype data of a three-generation pedigree. We demonstrate applicability of RecView using the genotype data from two offspring, as well as their grandparents and parents, of the great reed warbler (Acrocephalus arundinaceus).

README: Description of the files

The dataset is from a three-generation pedigree of great reed warbler (Acrocephalus arundinaceus) including 4 grandparents (Aarun_H7-38, Aarun_H0-81, Aarun_V9-73, Aarun_H7-41), 2 parents (Aarun_H3-00, Aarun_H5-17) and 2 offspring (ID-256 and ID-258).

1 VCF files

VCF files consist of bi-allelic SNPs on chromosome 1 and 21 of the great reed warbler.

1.1 VCF files list

  • full_dataset.vcf.gz: bi-allelic SNPs on chromosome 1 and 21, referred to as full dataset.
  • 10percent_downsampled_dataset.vcf.gz: 10% downsampling of the total number of bi-allelic SNPs on chromosome 1 and 21 from the full dataset.
  • 1percent_downsampled_dataset.vcf.gz: 1% downsampling of the total number of bi-allelic SNPs on chromosome 1 and 21 from the full dataset.

1.2 Variable explanation

  • CHROM: The column contains the labels of scaffolds in the great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
  • POS: The column contains the positions on the scaffolds for the bi-allelic SNPs.
  • Aarun_H7-38: Aarun_H7-38 is the label for the paternal grandfather at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
  • Aarun_H0-81: Aarun_H0-81 is the label for the paternal grandmother at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
  • Aarun_V9-73: Aarun_V9-73 is the label for the maternal grandfather at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
  • Aarun_H7-41: Aarun_H7-41 is the label for the maternal grandmother at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
  • Aarun_H3-00: Aarun_H3-00 is the label for the father at F1 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
  • Aarun_H5-17: Aarun_H5-17 is the label for the mother at F1 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
  • ID-256: ID-256 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
  • ID-258: ID-258 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.

2 Genotype files

Genotype files consist of unphased genotypes extracted from the VCF files using vcftools option: --extract-FORMAT-info GT.

2.1 Genotype files list

  • full_dataset.GT: genotypes extracted from the full_dataset.vcf.gz.
  • 10percent_downsampled_dataset.GT: genotypes extracted from 10percent_downsampled_dataset.vcf.gz.
  • 1percent_downsampled_dataset.GT: genotypes extracted from 1percent_downsampled_dataset.vcf.gz.

2.2 Variable explanation

  • CHROM: The column contains the labels of scaffolds in the great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
  • POS: The column contains the positions on the scaffolds for the bi-allelic SNPs.
  • Aarun_H7-38: Aarun_H7-38 is the label for the paternal grandfather at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
  • Aarun_H0-81: Aarun_H0-81 is the label for the paternal grandmother at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
  • Aarun_V9-73: Aarun_V9-73 is the label for the maternal grandfather at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
  • Aarun_H7-41: Aarun_H7-41 is the label for the maternal grandmother at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
  • Aarun_H3-00: Aarun_H3-00 is the label for the father at F1 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
  • Aarun_H5-17: Aarun_H5-17 is the label for the mother at F1 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
  • ID-256: ID-256 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
  • ID-258: ID-258 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.

3 Scaffold file

Scaffold file consists of the order and orientation information of the scaffolds on chromosome 1 and 21.

3.1 Scaffold file list

  • scaffold_file.csv: the order and orientation information of the scaffolds on chromosome 1 and 21 extracted from great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).

3.2 Variable explanation

  • scaffold: The column contains the labels of scaffolds in the great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
  • size: The column contains the size in base pairs for the scaffolds.
  • CHR: The column contains the chromosomal assignment for the scaffolds in great reed warbler.
  • order: The column contains the order of the scaffolds on each chromosome.
  • orientaion: The column contains the orientaion of the scaffolds on each chromosome.

Methods

We randomly selected a three-generation pedigree, including 4 grandparents (Aarun_H7-38, Aarun_H0-81, Aarun_V9-73, Aarun_H7-41), 2 parents (Aarun_H3-00, Aarun_H5-17) and 2 offspring (ID-256 and ID-258), from our long-term study population of great reed warblers at Lake Kvismaren, southern Central Sweden (59°10ʹ N, 15°24ʹ E). The birds were whole-genome sequenced with Illumina.

The sequence reads were trimmed, mapped to the great reed warbler genome assembly, and read duplicates were removed. Then, a VCF file of called variants were produced, and the genotypes at bi-allelic SNPs on chromosome 1 and 21 were extracted. In addition to the full dataset, we downsampled the number of SNPs to 10% and 1% of the original number (referred to as the “10% downsampled dataset” and “1% downsampled dataset”).

Funding

Swedish Research Council