Context-dependent substitution matrices for coding regions of angiosperm cpDNA
Data files
Jan 03, 2023 version files 20.42 KB
-
FFD_Matrices.txt
-
README.md
Abstract
The dataset contains 4x4 substitution matrices generated from fourfold degenerate sites within coding regions of flowering plant chloroplast DNA. Substitution data was generated from multiple sets of 3 taxa sequence comparisons, one taxon serving as the outgroup to root the substitution. There is one matrix for each of the possible 192 tetranucleotide contexts consisting of the two bases immediately flanking the substitution on the 5' side and the two bases immediately flanking the substitution on the 3' side. These are count matrices but can be converted to Markov transition matrices. The analysis presented in the paper examines variation in substitution dynamics as a function of context.
Methods
Dataset was collected by recording substitutions within sequence data downloaded from NCBI. All aspects are described in the manuscript's Materials and Methods.
Usage notes
Text files.