The original FLARE method provides computationally efficient and highly accurate local ancestry inference in cases where a closely-matched reference panel is available for each ancestry. In this work, we extend FLARE to incorporate a haplotype clustering algorithm that enables accurate local ancestry inference in scenarios where one or more ancestries do not have a closely-matched reference. This method retains the computational efficiency and accuracy of the original FLARE method while greatly extending its applicability. We apply the new method to data from the Mozabite population from the Human Genome Diversity Project. On the autosomes, we find that the Mozabite samples derive 67% of their ancestry from a population related to European and Middle Eastern populations, with the other 33% of their ancestry coming from a population related to West African populations, with an admixture time 48 generations ago. In contrast, on the X chromosome, we find that the individuals have 76% of their ancestry from a population related to European and Middle Eastern populations.

Dataset DOI: 10.5061/dryad.bk3j9kdrk

Description of the data and file structure

This dataset contains data used in the manuscript "Local ancestry inference with poorly-matched reference panels" by SR Browning, SD Temple, and BL Browning (2025/2026). This dataset includes data underlying the figures, phased genotype data, and scripts used to generate the results.

Simulated genotype data were created, local ancestry was called, and MDS plots were made. HGDP Mozabite genotype data were phased, local ancestry was called, and MDS plots were made.

The preprint for the paper is located at: https://www.biorxiv.org/content/10.1101/2025.10.13.681993

Files and variables

File: data_for_fig3a.tsv

Description: Data plotted in Figure 3A of paper

Variables

POP/ANC: Reference population, or ancestry (for admixed population)
MDS1: 1st dim of MDS
MDS2: 2nd dim of MDS
MDS3: 3rd dim of MDS
MDS4: 4th dim of MDS

File: data_for_fig3b.tsv

Description: Data plotted in Figure 3B of the paper

Variables

POP/ANC: Reference population, or ancestry (for admixed population)
MDS1: 1st dim of MDS
MDS2: 2nd dim of MDS
MDS3: 3rd dim of MDS
MDS4: 4th dim of MDS

File: data_for_fig2.tsv

Description: Data plotted in Figure 2 of the paper

Variables

Scenario: Scenarios match those in the figure
Replicate: Three replicates of each scenario
Clust.FLARE: Accuracy for Clustered FLARE
Orig.FLARE: Accuracy for Original FLARE
MOSAIC: Accuracy for MOSAIC
RFMix: Accuracy for RFMix

File: hgdp_wgs.20190516.metadata.txt

Description: For each individual in the HGDP set, provides information. For this study, only the sample and population are relevant.

Variables

sample: Sample ID
library: Library
sample_accession: Sample Accession ID
source: Source of sequence data ("sanger", "sgdp", or "meyer2012")
library_type: Type of library ("PCR" or "PCRfree")
population: Population name, e.g. "Mozabite"
latitude: Latitude (degrees)
longitude: Longitude (degrees)
region: Continental region of the population, one of: AFRICA, AMERICA, CENTRAL_SOUTH_ASIA, EAST_ASIA, EUROPE, MIDDLE_EAST, OCEANIA
sex: F (female) or M (male)
coverage: Average sequencing coverage
freemix: Freemix score (estimated level of DNA contamination)
capmq: Capmq score (maximum allowable Mapping Quality)
insert_size_average: Average length of sequenced fragments
array_non_reference_discordance: Non-reference discordance with array data (NA for some individuals)
library_alias_ENA: Library alias in the European Nucleotide Archive

File: hgdp.readme

Description: Information about the phased HGDP data

File: hgdp.chrX.phased.vcf.gz

Description: Filtered and phased HGDP genotype data for Chr X

File: hgdp.chr1to22.vcf.gz

Description: Filtered and phased HGDP genotype data for Chr 1-22

File: data_for_fig4a.tsv

Description: Data for Figure 4A of the paper. There is no header. The first column is population/ancestry, final four columns are the four MDS dimensions.

File: data_for_fig4b.tsv

Description: Data for Figure 4b, same format as for Figure 4a

File: data_for_fig4c.tsv

Description: Data for Figure 4c, same format as for Figure 4a

File: sim_sample_map.txt

Description: For each simulated individual IDs begin with "tsk" in the first column, and the corresponding population identifier is given in the second column. See the sim.readme file for information about the populations.

File: sim_gts_rep3_phased.vcf.gz

Description: Phased simulated data, 3rd replicate

File: sim.readme

Description: Information about the simulated data

File: chrXdiploid.mozabite.filtered.postcluster2.anc.vcf.gz

Description: Local ancestry calls for the HGDP Mozabite, Chr X

File: chr1to22.mozabite.filtered.postcluster2.anc.vcf.gz

Description: Local ancestry calls for the HGDP Mozabite, Chr 1-22

File: sim_gts_rep2_phased.vcf.gz

Description: Phased simulated data, 2nd replicate

File: sim_gts_rep1_phased.vcf.gz

Description: Phased simulated data, 1st replicate

Code/software

https://github.com/browning-lab/flare

Access information

Data was derived from the following sources:

HGDP data were obtained from another source; see the HGDP readme file.

Data from: FLARE2: Local ancestry inference with poorly-matched reference panels

Data files

Abstract

Description of the data and file structure

Files and variables

File: data_for_fig3a.tsv

Variables

File: data_for_fig3b.tsv

Variables

File: data_for_fig2.tsv

Variables

File: hgdp_wgs.20190516.metadata.txt

Variables

File: hgdp.readme

File: hgdp.chrX.phased.vcf.gz

File: hgdp.chr1to22.vcf.gz

File: data_for_fig4a.tsv

File: data_for_fig4b.tsv

File: data_for_fig4c.tsv

File: sim_sample_map.txt

File: sim_gts_rep3_phased.vcf.gz

File: sim.readme

File: chrXdiploid.mozabite.filtered.postcluster2.anc.vcf.gz

File: chr1to22.mozabite.filtered.postcluster2.anc.vcf.gz

File: sim_gts_rep2_phased.vcf.gz

File: sim_gts_rep1_phased.vcf.gz

Code/software

Access information

Data from: FLARE2: Local ancestry inference with poorly-matched reference panels

Data files

Abstract

README: Data from: FLARE2: Local ancestry inference with poorly-matched reference panels

Description of the data and file structure

Files and variables

File: data_for_fig3a.tsv

Variables

File: data_for_fig3b.tsv

Variables

File: data_for_fig2.tsv

Variables

File: hgdp_wgs.20190516.metadata.txt

Variables

File: hgdp.readme

File: hgdp.chrX.phased.vcf.gz

File: hgdp.chr1to22.vcf.gz

File: data_for_fig4a.tsv

File: data_for_fig4b.tsv

File: data_for_fig4c.tsv

File: sim_sample_map.txt

File: sim_gts_rep3_phased.vcf.gz

File: sim.readme

File: chrXdiploid.mozabite.filtered.postcluster2.anc.vcf.gz

File: chr1to22.mozabite.filtered.postcluster2.anc.vcf.gz

File: sim_gts_rep2_phased.vcf.gz

File: sim_gts_rep1_phased.vcf.gz

Code/software

Access information