RecView: An interactive R application for locating recombination positions using pedigree data
Data files
Nov 29, 2023 version files 106.83 MB
-
10percent_downsampled_dataset.GT
2.96 MB
-
10percent_downsampled_dataset.vcf.gz
7.11 MB
-
1percent_downsampled_dataset.GT
296.16 KB
-
1percent_downsampled_dataset.vcf.gz
740.48 KB
-
full_dataset.GT
29.61 MB
-
full_dataset.vcf.gz
66.11 MB
-
README.md
5.89 KB
-
scaffold_file.csv
229 B
Abstract
We present RecView, an interactive R application and its homonymous R package, to facilitate locating recombination positions along chromosomes or scaffolds using whole-genome genotype data of a three-generation pedigree. We demonstrate applicability of RecView using the genotype data from two offspring, as well as their grandparents and parents, of the great reed warbler (Acrocephalus arundinaceus).
The dataset is from a three-generation pedigree of great reed warbler (Acrocephalus arundinaceus) including 4 grandparents (Aarun_H7-38, Aarun_H0-81, Aarun_V9-73, Aarun_H7-41), 2 parents (Aarun_H3-00, Aarun_H5-17) and 2 offspring (ID-256 and ID-258).
1 VCF files
VCF files consist of bi-allelic SNPs on chromosome 1 and 21 of the great reed warbler.
1.1 VCF files list
- full_dataset.vcf.gz: bi-allelic SNPs on chromosome 1 and 21, referred to as full dataset.
- 10percent_downsampled_dataset.vcf.gz: 10% downsampling of the total number of bi-allelic SNPs on chromosome 1 and 21 from the full dataset.
- 1percent_downsampled_dataset.vcf.gz: 1% downsampling of the total number of bi-allelic SNPs on chromosome 1 and 21 from the full dataset.
1.2 Variable explanation
- CHROM: The column contains the labels of scaffolds in the great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
- POS: The column contains the positions on the scaffolds for the bi-allelic SNPs.
- Aarun_H7-38: Aarun_H7-38 is the label for the paternal grandfather at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- Aarun_H0-81: Aarun_H0-81 is the label for the paternal grandmother at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- Aarun_V9-73: Aarun_V9-73 is the label for the maternal grandfather at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- Aarun_H7-41: Aarun_H7-41 is the label for the maternal grandmother at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- Aarun_H3-00: Aarun_H3-00 is the label for the father at F1 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- Aarun_H5-17: Aarun_H5-17 is the label for the mother at F1 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- ID-256: ID-256 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- ID-258: ID-258 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
2 Genotype files
Genotype files consist of unphased genotypes extracted from the VCF files using vcftools option: --extract-FORMAT-info GT.
2.1 Genotype files list
- full_dataset.GT: genotypes extracted from the full_dataset.vcf.gz.
- 10percent_downsampled_dataset.GT: genotypes extracted from 10percent_downsampled_dataset.vcf.gz.
- 1percent_downsampled_dataset.GT: genotypes extracted from 1percent_downsampled_dataset.vcf.gz.
2.2 Variable explanation
- CHROM: The column contains the labels of scaffolds in the great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
- POS: The column contains the positions on the scaffolds for the bi-allelic SNPs.
- Aarun_H7-38: Aarun_H7-38 is the label for the paternal grandfather at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- Aarun_H0-81: Aarun_H0-81 is the label for the paternal grandmother at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- Aarun_V9-73: Aarun_V9-73 is the label for the maternal grandfather at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- Aarun_H7-41: Aarun_H7-41 is the label for the maternal grandmother at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- Aarun_H3-00: Aarun_H3-00 is the label for the father at F1 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- Aarun_H5-17: Aarun_H5-17 is the label for the mother at F1 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- ID-256: ID-256 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- ID-258: ID-258 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
3 Scaffold file
Scaffold file consists of the order and orientation information of the scaffolds on chromosome 1 and 21.
3.1 Scaffold file list
- scaffold_file.csv: the order and orientation information of the scaffolds on chromosome 1 and 21 extracted from great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
3.2 Variable explanation
- scaffold: The column contains the labels of scaffolds in the great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
- size: The column contains the size in base pairs for the scaffolds.
- CHR: The column contains the chromosomal assignment for the scaffolds in great reed warbler.
- order: The column contains the order of the scaffolds on each chromosome.
- orientaion: The column contains the orientaion of the scaffolds on each chromosome.
We randomly selected a three-generation pedigree, including 4 grandparents (Aarun_H7-38, Aarun_H0-81, Aarun_V9-73, Aarun_H7-41), 2 parents (Aarun_H3-00, Aarun_H5-17) and 2 offspring (ID-256 and ID-258), from our long-term study population of great reed warblers at Lake Kvismaren, southern Central Sweden (59°10ʹ N, 15°24ʹ E). The birds were whole-genome sequenced with Illumina.
The sequence reads were trimmed, mapped to the great reed warbler genome assembly, and read duplicates were removed. Then, a VCF file of called variants were produced, and the genotypes at bi-allelic SNPs on chromosome 1 and 21 were extracted. In addition to the full dataset, we downsampled the number of SNPs to 10% and 1% of the original number (referred to as the “10% downsampled dataset” and “1% downsampled dataset”).