Overview: --------- This repository contains the data used in the study "Rapid radiation in a highly diverse marine environment" by Hench et. al., which covers the population genomic analysis of a set of 170 reef fish from the Caribbean. The data set includes: - phased genotypes (SNPs only; phased_mac2.vcf.gz) - phenotype data (phenotypes.tsv) - zipped population- and phylo-genetic results: - sliding window analysis (fst, dxy, GxP, pi, rho; sliding_window_stats.zip) - genome summary statistics (average fst, fst outlier regions; genome_wide_summaries.zip) - permutation test results for genetic differentiation (fst_permutation.zip) - cross coalescence results (cross_coalescence.zip) - multiple sequentially Markovian coalescent results (msmsc.zip) - allele aging results (geva.zip) - sample level phylogenies (astral.zip) - population-level maximum likelihood trees (revPoMo.zip) - Fish Tree of Life Serraninae sub-sample phylogeny (astral.zip) - topoloy weighting results (twisst.zip) - D-statistics results (dstats.zip) - Identity by descent results (ibd.zip) - checksums for all data files (MD5.txt) - this README file (README.md) ``` ├─ phased_mac2.vcf.gz │ ████████████████████████████│ 48% 507.58 MiB ├─ sliding_window_stats.z │ █████████████████████│ 37% 388.90 MiB ├─ geva.zip │ ███│ 7% 76.20 MiB ├─ ibd.zip │ █│ 3% 37.99 MiB ├─ twisst.zip │ │ 1% 19.26 MiB ├─ fst_permutation.zip │ │ 0% 7.00 MiB ├─ cross_coalescence.zip │ │ 0% 158.94 KiB ├─ dstats.zip │ │ 0% 37.61 KiB ├─ msmc.zip │ │ 0% 34.98 KiB ├─ astral.zip │ │ 0% 5.76 KiB ├─ phenotypes.tsv │ │ 0% 5.44 KiB ├─ README.md │ │ 0% 2.84 KiB ├─ revPoMo.zip │ │ 0% 1.76 KiB ├─ genome_wide_summaries. │ │ 0% 1.42 KiB ├─ fotl.zip │ │ 0% 1.37 KiB └─ MD5.txt │ │ 0% 711 B ``` The integrity of the files can be checked using the provided checksums within the file MD5.txt using the following unix command: ```sh md5sum -c MD5.txt ``` File types: ----------- After unpacking, all files are plain text files regardless of their suffix (eg .vcf) and can be read by any text editor like any other .txt file. The (non-txt) suffix was kept for consistency between the analysis and the repository. Unpacking example (using a unix terminal): ```sh gunzip sliding_window_stats.zip ``` The genotypes (`phased_mac2.vcf.gz`) are the result of genotyping with the software `Genome Analysis Toolkit` (v. 4.0.8.1) and phasing with `shapeit` (v2.r837). The file `phenotypes.tsv` contains the manual assigned presence/absence scores of color pattern traits (vertical bars, snout spot and dark saddle on the caudal peduncle) of the individuals used within the study. All other files are output files from population genomic analysis. For an in depth description of the analysis pipeline please also refer to the accompanying zenodo repository (doi: 10.5281/zenodo.4709889). Relevant software: ------------------ For reading the files: - GNU gzip 1.6 Software originally used within the analysis: - admixture (1.3.0) - ASTRAL-III (5.7.5) - BCFtools (1.9) - BLAST (2.6.0) - bwa (0.7.17-r1188) - Dsuite (0.4 r38) - gatk (4.0.8.1) - GBlocks (0.91b) - gemma (0.98) - GEVA (v1beta) - IQ-TREE (2.1.2) - MAFFT (v7.475) - msmsc2 (2.0.0) - nextflow (0.31.1.4886) - PGDSpider (2.1.1.5) - plink (v1.90b4) - Python (2.7.15) - R (3.5.2, for analysis) (60) - RAxML-NG (v1.0.2) - samtools (1.9) - shapeit (v2.r837) - truffle (v1.38) - vcftools (0.1.14 & 0.1.15) For the exact interplay and usage of these software in the creation of the data, please refer to the zenodo repository (and to the detailed documentation in the github repository linked to it). Abbreviations: -------------- Locations abbreviations correspond to Belize (bel), Florida (flo), Honduras (hon) and Panama (pan) Species abbreviations correspond to *H. aberrans* (abe), *H. floridae* (flo), *H. gummigutta* (gum), *H. indigo* (ind), *H. randallorum* (ran), *S. tabacarius* (tab) and *S. tortugarum* (tor), where *H.* stands for *Hypoplectrus* and *S.* for *Serranus*.