RecView: An interactive R application for locating recombination positions using pedigree data
Data files
Nov 29, 2023 version files 106.83 MB
-
10percent_downsampled_dataset.GT
2.96 MB
-
10percent_downsampled_dataset.vcf.gz
7.11 MB
-
1percent_downsampled_dataset.GT
296.16 KB
-
1percent_downsampled_dataset.vcf.gz
740.48 KB
-
full_dataset.GT
29.61 MB
-
full_dataset.vcf.gz
66.11 MB
-
README.md
5.89 KB
-
scaffold_file.csv
229 B
Abstract
We present RecView, an interactive R application and its homonymous R package, to facilitate locating recombination positions along chromosomes or scaffolds using whole-genome genotype data of a three-generation pedigree. We demonstrate applicability of RecView using the genotype data from two offspring, as well as their grandparents and parents, of the great reed warbler (Acrocephalus arundinaceus).
README: Description of the files
The dataset is from a three-generation pedigree of great reed warbler (Acrocephalus arundinaceus) including 4 grandparents (Aarun_H7-38, Aarun_H0-81, Aarun_V9-73, Aarun_H7-41), 2 parents (Aarun_H3-00, Aarun_H5-17) and 2 offspring (ID-256 and ID-258).
1 VCF files
VCF files consist of bi-allelic SNPs on chromosome 1 and 21 of the great reed warbler.
1.1 VCF files list
- full_dataset.vcf.gz: bi-allelic SNPs on chromosome 1 and 21, referred to as full dataset.
- 10percent_downsampled_dataset.vcf.gz: 10% downsampling of the total number of bi-allelic SNPs on chromosome 1 and 21 from the full dataset.
- 1percent_downsampled_dataset.vcf.gz: 1% downsampling of the total number of bi-allelic SNPs on chromosome 1 and 21 from the full dataset.
1.2 Variable explanation
- CHROM: The column contains the labels of scaffolds in the great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
- POS: The column contains the positions on the scaffolds for the bi-allelic SNPs.
- Aarun_H7-38: Aarun_H7-38 is the label for the paternal grandfather at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- Aarun_H0-81: Aarun_H0-81 is the label for the paternal grandmother at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- Aarun_V9-73: Aarun_V9-73 is the label for the maternal grandfather at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- Aarun_H7-41: Aarun_H7-41 is the label for the maternal grandmother at F0 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- Aarun_H3-00: Aarun_H3-00 is the label for the father at F1 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- Aarun_H5-17: Aarun_H5-17 is the label for the mother at F1 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- ID-256: ID-256 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
- ID-258: ID-258 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains the FORMAT information of VCF file (see headings in VCF files for details) for this individual.
2 Genotype files
Genotype files consist of unphased genotypes extracted from the VCF files using vcftools option: --extract-FORMAT-info GT.
2.1 Genotype files list
- full_dataset.GT: genotypes extracted from the full_dataset.vcf.gz.
- 10percent_downsampled_dataset.GT: genotypes extracted from 10percent_downsampled_dataset.vcf.gz.
- 1percent_downsampled_dataset.GT: genotypes extracted from 1percent_downsampled_dataset.vcf.gz.
2.2 Variable explanation
- CHROM: The column contains the labels of scaffolds in the great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
- POS: The column contains the positions on the scaffolds for the bi-allelic SNPs.
- Aarun_H7-38: Aarun_H7-38 is the label for the paternal grandfather at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- Aarun_H0-81: Aarun_H0-81 is the label for the paternal grandmother at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- Aarun_V9-73: Aarun_V9-73 is the label for the maternal grandfather at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- Aarun_H7-41: Aarun_H7-41 is the label for the maternal grandmother at F0 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- Aarun_H3-00: Aarun_H3-00 is the label for the father at F1 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- Aarun_H5-17: Aarun_H5-17 is the label for the mother at F1 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- ID-256: ID-256 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
- ID-258: ID-258 is the label for the offspring at F2 generation of the three-generation pedigree. The column contains genotypes at bi-allelic SNPs for this individual.
3 Scaffold file
Scaffold file consists of the order and orientation information of the scaffolds on chromosome 1 and 21.
3.1 Scaffold file list
- scaffold_file.csv: the order and orientation information of the scaffolds on chromosome 1 and 21 extracted from great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
3.2 Variable explanation
- scaffold: The column contains the labels of scaffolds in the great reed warbler reference genome acrAru1 (BioProject ID PRJNA765537).
- size: The column contains the size in base pairs for the scaffolds.
- CHR: The column contains the chromosomal assignment for the scaffolds in great reed warbler.
- order: The column contains the order of the scaffolds on each chromosome.
- orientaion: The column contains the orientaion of the scaffolds on each chromosome.
Methods
We randomly selected a three-generation pedigree, including 4 grandparents (Aarun_H7-38, Aarun_H0-81, Aarun_V9-73, Aarun_H7-41), 2 parents (Aarun_H3-00, Aarun_H5-17) and 2 offspring (ID-256 and ID-258), from our long-term study population of great reed warblers at Lake Kvismaren, southern Central Sweden (59°10ʹ N, 15°24ʹ E). The birds were whole-genome sequenced with Illumina.
The sequence reads were trimmed, mapped to the great reed warbler genome assembly, and read duplicates were removed. Then, a VCF file of called variants were produced, and the genotypes at bi-allelic SNPs on chromosome 1 and 21 were extracted. In addition to the full dataset, we downsampled the number of SNPs to 10% and 1% of the original number (referred to as the “10% downsampled dataset” and “1% downsampled dataset”).