Data from: Ancient genomes reveal an extensive kinship network and endogamy in a Three-Kingdoms period society in Korea
Data files
Mar 04, 2026 version files 77.89 MB
-
1240K.KIN.bedfile.bed
30.64 MB
-
ancIBD_codes_results.7z
6.67 MB
-
calculate_PMR.7z
1.58 KB
-
eigenstrat_1240K_Imdang_Joyeong.7z
25.23 MB
-
hapROH.7z
7.56 KB
-
KIN_script.bash
1.49 KB
-
PCA.7z
2.19 MB
-
ped-sim_inbreeding.7z
1.78 MB
-
README.md
5.98 KB
-
run_admixtools2_f3_f4_server.7z
204.83 KB
-
run_admixtools2_qpAdm_server.7z
1.55 KB
-
run_GLIMPSE_code.7z
4.60 KB
-
run_ped-sim_compare_ancIBD.7z
11.15 MB
Abstract
Dataset DOI: 10.5061/dryad.fj6q57489
Description of the data and file structure
This repository contains genotype data, scripts, and output files used for the analyses in "Ancient genomes reveal an extensive kinship network and endogamy in a Three-Kingdoms period society in Korea".
Files and variables
File: calculate_PMR.7z
Description: Codes used to calculate Pairwise Mismatch Rate from pseudohaploid data. Used for Supplementary Data S4 and S10.
[run_pmr.sh]: Wrapper for running "Calculate_PMR_matrix.R" in Unix shell
[Calculate_PMR_matrix.R]: Code for calculating PMR from eigenstrat format.
File: eigenstrat_1240K_Imdang_Joyeong.7z
Description: 1240K Eigenstrat format file of ancient individuals from the Imdang-Joyeong burial complex newly reported from this study. Consists of geno file, snp file, and ind file.
File: run_admixtools2_f3_f4_server.7z
Description: R script for running the f3, f4 command in admixtools 2 on a HPC server, with f3 and f4 statistics results for figures S5 and S6, Supplementary Data S6, S7, S8
[f4_compare.R], [outgroupf3.R]: R script for calculating f4 and f3 statistics, respectively.
csv files: f3 and f4 values calculated using [f4_compare.R] and [outgroupf3.R], equal to information in supplementary data S6, S7, and S8.
File: run_admixtools2_qpAdm_server.7z
Description: R script for running qpAdm command in admixtools2 on a HPC server, used for figure 5 and Supplementary Data S9.
[commandline_qpadm.sh]: main script for running script in commandline.
[runqpadm.wrapper.sh] : helper script for running qpadm_parallel_250331.R.
[qpadm_parallel_250331.R]: R script for running qpadm function in admixtools2 R library.
File: PCA.7z
Description: Eigenvalues, Eigenvector calculations, Populations used in PC calculation, and the main code for running smartPCA for figures 5 and S9.
[PCA_240516.eval.txt.gz]: Eigenvalue calculated through smartPCA.
[PCA_240516_X.evec.txt.gz]: Eigenvector calculated through smartPCA.Number corresponds to the _X.pops file which specifies the populations used for PC calculation.
[PCA_code.sh]: script used to run smartPCA
[PCA_240516_X.pops]: Modern population sets used to calculate smartPCA
File: run_GLIMPSE_code.7z
Description: Code for running GLIMPSE for ancient DNA imputation required for our IBD analysis.
[commandline_GLIMPSE.sh] :Main code to run GLIMPSE pipeline.
[wrapper_XXX.sh]: Helper scripts used to prepare and run GLIMPSE pipeline.
[commandline_1000GP_panel.sh]: Script to prepare 1000GP_Phase3 data to use as a reference panel for GLIMPSE
File: run_ped-sim_compare_ancIBD.7z
Description: Codes and results for simulating IBD sharing between close kin to compare with ancIBD results using ped-sim, used for Figure S6.
[ped-sim_commandline.sh]: script used to run ped-sim in UNIX environment
[merge_IBD_blocks.R]: R script to merge consequative IBD1 and IBD2 blocks to match ancIBD output.
[ancibd_background.seg]: Segment information (output) of ped-sim
[ancibd_background.def]: Family relationships specified for ped-sim.
[ancibd_background.csv]: seg file in csv format.
[1240K.240829.snp] SNP file of 1240K sites used for filtering ped-sim results to match ancIBD output.
File: ancIBD_codes_results.7z
Description: Codes and results for ancIBD, Cytoscape file for network analysis and visualization, R script for statistical analysis of IBD network. Used for figures 4, S6, S7, and S8.
[ancIBD_commandline.sh]: Code used to run ancIBD in Unix environment.
[ancIBD_240720_res.tsv, ancIBD_240720_ch_all.tsv]: IBD sharing information inferred from ancIBD
[Imd_Joy_240914.cys]: Cytoscape file used to analyze and visualize IBD network.
[adultnode_network_statistics.csv]: CSV file exported from cytoscape containing Network statistics information such as degree centrality. 1240Kcount is the number of 1240K SNPs that were genotyped, and all1240K is the total number of 1240K SNPs. All other values do not have a unit of measure.
[network_analysis.R]: R script used to perform statistical analysis on network.
File: KIN_script.bash
Description: Code used to run KIN for figure 3, Supplementary Data S4 and S10.
File: ped-sim_inbreeding.7z
Description: Code and results for simulating ROH between four scenarios of inbreeding using ped-sim, used for figure s5.
[.def]: Files specifying kinship scenarios for simulation using ped-sim
[.seg]: Segment files of IBD and ROH information simulated using ped-sim
[run_pedsim_commandline.sh] Script used to run ped-sim in Unix environment.
File: hapROH.7z
Description: Code and result for estimating ROH from pseudohaploid data using hapROH, used for figures 2 and S4.
[Commandline_Imdang_hapROH_script.bash]: Main script for running hapROH
[run_hapROH.py, postprocess_hapROH.py]: Helper scripts for running hapROH
[Imdang_hapROH.combined.tsv]: ROH information of individuals generated by hapROH.
[individual_ROH]: Tables containing output of ROH information inferred through hapROH.
File: 1240K.KIN.bedfile.bed
Description: bed file used for KIN_script.bash. Format is 5 columns tab delimited with Chromosome, 1240K SNP position 0 based, 1240K SNP position end, Reference allele, and Alternative Allele.
Code/software
All shell script are written in using Bash in mind.
For high performance computing, we used slurm 22.05.11 as a scheduler.
All other program versions and software used are described in the Material and Methods section of our manuscript.
Access information
Other publicly accessible locations of the data:
