Metagenomic and genomic data associated with the tooth-cavity hair and endogenous DNA of the Tsavo lions
Data files
Jun 30, 2025 version files 127.03 MB
-
Alida_Tsavo_H1_GCAATTCC-CTCGAACA_L001_R1_001_k2_report.txt
1.23 MB
-
Alida_Tsavo_H2_CTCAGAAG-GGCAAGTT_L001_R1_001_k2_report.txt
1.37 MB
-
Alida_Tsavo_H3_GTCCTAAG-AGCTACCA_L001_R1_001_k2_report.txt
1.53 MB
-
Alida_Tsavo_H4_GCGTTAGA-CAGCATAC_L001_R1_001_k2_report.txt
1.41 MB
-
Alida_Tsavo_H5_CAAGGTAC-CGTATCTC_L001_R1_001_k2_report.txt
1.51 MB
-
Alida_Tsavo_H6_AGACCTTG-TTACGTGC_L001_R1_001_k2_report.txt
1.14 MB
-
Alida_Tsavo_H7_GTCGTTAC-AGCTAAGC_L001_R1_001_k2_report.txt
1.13 MB
-
Descriptive_Data.xlsx
15.74 KB
-
H4_human_nuclear_misincorporation.txt
5.54 MB
-
H5_zebra_misincorporation.txt
112.14 MB
-
H6_giraffe_mtDNA_misincorporation.txt
23.17 KB
-
README.md
10.02 KB
Abstract
The synergistic advancement of molecular and computational technologies has pushed genomics into a new era; nuclear genome data for phylogenomic analyses can now be sequenced from minuscule quantities of DNA (Essel et al., 2023) and from specimens that are more than a million years old (van der Valk et al., 2021). DNA analysis from hair is a well-established approach (Higuchi et al., 1988) widely used in forensic science (Bisbing, 2020) and wildlife conservation (Phoebus et al., 2020). Hair samples are largely resistant to contamination by exogenous DNA sources or can be effectively decontaminated (Gilbert et al., 2006), and can be used to identify the mammalian species from which the hair was shed (Singh et al., 2020; Meiklejohn et al., 2021). We aimed to use ancient DNA and bioinformatic methodologies (Figure 1; STAR methods) optimized for degraded DNA to systematically identify dietary prey species from hair compacted in the teeth of two Tsavo lions that lived during the 1890s in Kenya (see Description of Samples for background on the Tsavo lion specimens and hair samples; and Patterson, 1907; Kerbis Peterhans and Gnoske, 2001 for general background on the Tsavo ‘man-eaters’). Analysis of hair DNA identified giraffe, human, oryx, waterbuck, wildebeest and zebra as prey, and also identified hair that originated from lion. DNA preservation allowed for analyses of complete mitogenome profiles of zebra, giraffe, and lion. Giraffe mitogenomes are phylogeographically partitioned, and we found that the lions ate at least two individuals that belong to a subspecies of Masai giraffe (Giraffa tippelskirchi tippelskirchi) typically found in southeast Kenya. The lion mitogenome from a hair sample was identical to the Tsavo lion endogenous mitogenome, and most closely matched other East African lions from Kenya and Tanzania. The protocol and approach reported here enable a better understanding of the hunting behaviors, diets, and ecology of historical individuals, populations, and species and holds promise for elucidating these characteristics in extinct populations and species.
https://doi.org/10.5061/dryad.2fqz612zb
Description of the data and file structure
We used a four-step approach to taxonomically identify species from individual hair and hair-clumps that were extracted from tooth cavities of two lion specimens that are housed at the Field Museum of Natural History, Chicago. Analysis of DNA from a single hair allows for the identification of a single species, while hair clumps can contain hair from multiple individuals and can therefore result in the identification of a collection of individuals and/or species. During Step 1, hair were extracted from broken lower right canines of lion FMNH 23970 and FMNH 23969. Hairs were tightly compacted in the pulp cavity of each broken canine and were carefully removed to prevent damage to the teeth. A subset of the extracted hair was used for DNA analysis (Step 2; see https://github.com/adeflamingh/de_Flamingh_et_al_lionteeth_hair for protocol) and microscopy (Step 3). Extracted DNA from 4 individual hair samples and 3 clumps of hair (~0.001g) was shotgun sequenced and analyzed using a metagenomic approach through which each sequencing read was compared to the NCBI non-redundant nucleotide database (Pruitt et al., 2007) using the software Kraken2 (Wood et al., 2019). We compiled a mitochondrial genome (mtDNA) reference database of potential prey species based on 1) the results of the metagenomic analyses and species identified through microscopy and 3) information related of prey that would have been available to lions in that geographic region during the late 1890s. Bird mitogenomes (dove, guineafowl, pigeon) and porcupine were added based on historical knowledge and microscopy results. We also included the lion mitogenome to account for self-grooming or allogrooming, or cannibalism. For species that did not have complete mitogenome reference sequences available on GenBank, we used the closest relative species in the same genus for which a complete mitogenome sequence was available. The mtDNA reference genome dataset consisted of 22 complete mitochondrial reference sequences to which shotgun sequencing data was aligned using the software Bowtie2. We used a comparative approach to identify potential prey species from which the hair originated by calculating the breadth of genome coverage (i.e., the percentage of the genome that has >1 X-fold read coverage) and the average depth of coverage (the average X-fold number of reads that mapped at any location across the genome) using SAMtools.
Files and variables
File: Alida_Tsavo_H1_GCAATTCC-CTCGAACA_L001_R1_001_k2_report.txt
Description: Kraken metagenomic classification results associated with Hair 1.
Variables
- The output of kraken-report is tab-delimited, with one line per taxon. The fields of the output, from left-to-right, are as follows:
- Percentage of reads covered by the clade rooted at this taxon
- Number of reads covered by the clade rooted at this taxon
- Number of reads assigned directly to this taxon
- A rank code, indicating (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. All other ranks are simply '-'.
- NCBI taxonomy ID
- indented scientific name
File: Alida_Tsavo_H2_CTCAGAAG-GGCAAGTT_L001_R1_001_k2_report.txt
Description: Kraken metagenomic classification results associated with Hair 2.
Variables
- The output of kraken-report is tab-delimited, with one line per taxon. The fields of the output, from left-to-right, are as follows:
- Percentage of reads covered by the clade rooted at this taxon
- Number of reads covered by the clade rooted at this taxon
- Number of reads assigned directly to this taxon
- A rank code, indicating (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. All other ranks are simply '-'.
- NCBI taxonomy ID
- indented scientific name
File: Alida_Tsavo_H4_GCGTTAGA-CAGCATAC_L001_R1_001_k2_report.txt
Description: Kraken metagenomic classification results associated with Hair 4.
Variables
- The output of kraken-report is tab-delimited, with one line per taxon. The fields of the output, from left-to-right, are as follows:
- Percentage of reads covered by the clade rooted at this taxon
- Number of reads covered by the clade rooted at this taxon
- Number of reads assigned directly to this taxon
- A rank code, indicating (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. All other ranks are simply '-'.
- NCBI taxonomy ID
- indented scientific name
File: H6_giraffe_mtDNA_misincorporation.txt
Description: Nucleotide misincorporation rates (occurrences for each type of mutations and relative positions from the reads ends) for DNA from Hair clump 6 aligned to the giraffe mitogenome
Variables
- Chr; chromosome label
- End; 3-prime or 5-prime
- Std; positive or negative strand
- Pos; position
- bases A C G T
- Total; base count
- G>A; G to A base change
- C>T; C to T base change
- A>G; A to G base change
File: Alida_Tsavo_H3_GTCCTAAG-AGCTACCA_L001_R1_001_k2_report.txt
Description: Kraken metagenomic classification results associated with Hair 3.
Variables
- The output of kraken-report is tab-delimited, with one line per taxon. The fields of the output, from left-to-right, are as follows:
- Percentage of reads covered by the clade rooted at this taxon
- Number of reads covered by the clade rooted at this taxon
- Number of reads assigned directly to this taxon
- A rank code, indicating (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. All other ranks are simply '-'.
- NCBI taxonomy ID
- indented scientific name
File: Alida_Tsavo_H7_GTCGTTAC-AGCTAAGC_L001_R1_001_k2_report.txt
Description: Kraken metagenomic classification results associated with Hair clump 7.
Variables
- The output of kraken-report is tab-delimited, with one line per taxon. The fields of the output, from left-to-right, are as follows:
- Percentage of reads covered by the clade rooted at this taxon
- Number of reads covered by the clade rooted at this taxon
- Number of reads assigned directly to this taxon
- A rank code, indicating (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. All other ranks are simply '-'.
- NCBI taxonomy ID
- indented scientific name
File: Alida_Tsavo_H5_CAAGGTAC-CGTATCTC_L001_R1_001_k2_report.txt
Description: Kraken metagenomic classification results associated with Hair clump 5.
Variables
- The output of kraken-report is tab-delimited, with one line per taxon. The fields of the output, from left-to-right, are as follows:
- Percentage of reads covered by the clade rooted at this taxon
- Number of reads covered by the clade rooted at this taxon
- Number of reads assigned directly to this taxon
- A rank code, indicating (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. All other ranks are simply '-'.
- NCBI taxonomy ID
- indented scientific name
File: Alida_Tsavo_H6_AGACCTTG-TTACGTGC_L001_R1_001_k2_report.txt
Description: Kraken metagenomic classification results associated with Hair clump 6.
Variables
- The output of kraken-report is tab-delimited, with one line per taxon. The fields of the output, from left-to-right, are as follows:
- Percentage of reads covered by the clade rooted at this taxon
- Number of reads covered by the clade rooted at this taxon
- Number of reads assigned directly to this taxon
- A rank code, indicating (U)nclassified, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies. All other ranks are simply '-'.
- NCBI taxonomy ID
- indented scientific name
File: Descriptive_Data.xlsx
Description: The read count, alignment statistics, read length and deamination rates for each hair mapped to all species in the mtDNA reference database.
Please note that Average read length (bp), Range of read length as shortest(bp)-longest(bp), mtDNA deamination 3'G>A pos1, mtDNA deamination 5'C>T pos1, nDNA deamination 3'G>A pos1, nDNA deamination 5'C>T pos1 were only calculated for Hair alignments that had sufficient data to allow for DNA damage pattern estimation using MapDamage, empty cells therefore correspond to alignments for which damage patterns were not estimated and therefore have no information.
Variables
- read count
- coverage depth
- coverage breadth
- average read length (bp)
- range of read length
- deamination on terminal ends
File: H4_human_nuclear_misincorporation.txt
Description: Nucleotide misincorporation rates (occurrences for each type of mutations and relative positions from the reads ends) for DNA from Hair 4 aligned to the human nuclear genome.
Variables
- Chr; chromosome label
- End; 3-prime or 5-prime
- Std; positive or negative strand
- Pos; position
- bases A C G T
- Total; base count
- G>A; G to A base change
- C>T; C to T base change
- A>G; A to G base change
File: H5_zebra_misincorporation.txt
Description: Nucleotide misincorporation rates (occurrences for each type of mutations and relative positions from the reads ends) for DNA from Hair clump 5 aligned to the zebra nuclear genome.
Variables
- Chr; chromosome label
- End; 3-prime or 5-prime
- Std; positive or negative strand
- Pos; position
- bases A C G T
- Total; base count
- G>A; G to A base change
- C>T; C to T base change
- A>G; A to G base change
