Data from: New genome assemblies for Poeciliidae: A foundation for adaptation studies
Data files
Oct 16, 2025 version files 5.53 GB
-
Geury_kraken_filtered.fa
686.82 MB
-
Geury.aa
24.15 MB
-
Geury.codingseq
71.68 MB
-
Geury.gtf
125.67 MB
-
Geury.ragtag.scaffold.braker.gff
3.73 MB
-
Gsex_NS_kraken_filtered.fa
669.11 MB
-
Gsex_NS.aa
24.87 MB
-
Gsex_NS.codingseq
73.83 MB
-
Gsex_NS.gtf
131.51 MB
-
Gsex_NS.ragtag.scaffold.braker.gff
3.67 MB
-
Gsex_S_kraken_filtered.fa
669.99 MB
-
Gsex_S.aa
25.10 MB
-
Gsex_S.codingseq
74.49 MB
-
Gsex_S.gtf
132.32 MB
-
Gsex_S.ragtag.scaffold.braker.gff
3.73 MB
-
Pmex_NS_braker.aa
21.07 MB
-
Pmex_NS_braker.codingseq
62.50 MB
-
Pmex_NS_braker.gtf
109.84 MB
-
Pmex_NS_kraken_filtered.fa
728.09 MB
-
Pmex_NS.ragtag.scaffold.braker.gff
3.78 MB
-
Pmex_S_braker.aa
24.46 MB
-
Pmex_S_braker.codingseq
72.58 MB
-
Pmex_S_braker.gtf
122.01 MB
-
Pmex_S_kraken_filtered.fa
722 MB
-
Pmex_S.ragtag.scaffold.braker.gff
4.03 MB
-
Psulph_braker.aa
23.63 MB
-
Psulph_braker.codingseq
70.11 MB
-
Psulph_braker.gtf
120.79 MB
-
Psulph_kraken_filtered.fa
723.45 MB
-
Psulph.ragtag.scaffold.braker.gff
4.13 MB
-
README.md
3.03 KB
Nov 04, 2025 version files 6.68 GB
-
Geury_kraken_filtered.fa
686.82 MB
-
Geury.aa
24.15 MB
-
Geury.codingseq
71.68 MB
-
Geury.gtf
125.67 MB
-
Geury.ragtag.scaffold.braker.gff
3.73 MB
-
Gsex_NS_kraken_filtered.fa
669.11 MB
-
Gsex_NS.aa
24.87 MB
-
Gsex_NS.codingseq
73.83 MB
-
Gsex_NS.gtf
131.51 MB
-
Gsex_NS.ragtag.scaffold.braker.gff
3.67 MB
-
Gsex_S_kraken_filtered.fa
669.99 MB
-
Gsex_S.aa
25.10 MB
-
Gsex_S.codingseq
74.49 MB
-
Gsex_S.gtf
132.32 MB
-
Gsex_S.ragtag.scaffold.braker.gff
3.73 MB
-
Pmex_NS_braker.aa
21.07 MB
-
Pmex_NS_braker.codingseq
62.50 MB
-
Pmex_NS_braker.gtf
109.84 MB
-
Pmex_NS_kraken_filtered.fa
728.09 MB
-
Pmex_NS.ragtag.scaffold.braker.gff
3.78 MB
-
Pmex_NS.ragtag.scaffold.liftoff.ncbi.anno.polished.gff
384.01 MB
-
Pmex_S_braker.aa
24.46 MB
-
Pmex_S_braker.codingseq
72.58 MB
-
Pmex_S_braker.gtf
122.01 MB
-
Pmex_S_kraken_filtered.fa
722 MB
-
Pmex_S.ragtag.scaffold.braker.gff
4.03 MB
-
Pmex_S.ragtag.scaffold.liftoff.ncbi.anno.polished.gff
382.50 MB
-
Psulph_braker.aa
23.63 MB
-
Psulph_braker.codingseq
70.11 MB
-
Psulph_braker.gtf
120.79 MB
-
Psulph_kraken_filtered.fa
723.45 MB
-
Psulph.ragtag.scaffold.braker.gff
4.13 MB
-
Psulph.ragtag.scaffold.liftoff.ncbi.anno.polished.gff
382.73 MB
-
README.md
3.39 KB
Abstract
Multiple lineages in the family Poeciliidae have independently adapted to hydrogen-sulfide-rich springs. The independent colonizations of such springs mean that there are naturally replicated lineages that provide a powerful model for studying adaptation and convergent evolution. However, there are limited genomic resources for many genera and species across Poeciliidae. Here, we present six high-quality, chromosome-level, annotated genome assemblies for Poecilia and Gambusia populations, five of which are the first for the species or ecotype, and the remaining assembly improved the current reference genome contiguity by more than 100-fold. We compare repeat content and model historical changes in effective population size using these new assemblies.
Dataset DOI: https://doi.org/10.5061/dryad.bg79cnpnh
Description of the Data and File Structure
This dataset includes genome assemblies, annotation files, and predicted coding and protein sequences for several species in the family Poeciliidae. These resources serve as a foundation for comparative and adaptive genomics studies across sulfidic and nonsulfidic populations.
Each genome was assembled and annotated using the BRAKER2 pipeline, producing gene predictions in both GTF (contig-level) and GFF (scaffold-level) formats. Scaffold-level annotations were generated with Liftoff using the corresponding contig-level annotations. Predicted coding sequences and translated amino acid sequences are also included. In addition, Kraken2-filtered FASTA files contain the final assemblies after contaminant removal.
Files and Descriptions
| File pattern | Description |
|---|---|
*.fa |
Filtered contig-level assemblies after removal of contaminants using Kraken2 |
*.aa |
Predicted amino acid (protein) sequences generated by BRAKER |
*.codingseq |
Predicted coding sequences (CDS) generated by BRAKER |
*.gtf |
Gene annotations (GTF format) of predicted protein-coding genes from the BRAKER pipeline applied to contig-level assemblies |
*scaffold.braker.gff |
Gene annotations (GFF3 format) of predicted genes lifted to scaffold-level assemblies using Liftoff |
*ncbi.anno.polished.gff |
Gene annotations (GFF3 format) of liftover refseq genes renamed to align with NCBI assembly scaffold names |
Included Files
Poecilia mexicana (nonsulfidic and sulfidic)
Pmex_NS_kraken_filtered.faPmex_NS_braker.aaPmex_NS_braker.codingseqPmex_NS_braker.gtfPmex_NS.ragtag.scaffold.braker.gffPmex_NS.ragtag.scaffold.liftoff.ncbi.anno.polished.gffPmex_S_kraken_filtered.faPmex_S_braker.aaPmex_S_braker.codingseqPmex_S_braker.gtfPmex_S.ragtag.scaffold.braker.gffPmex_S.ragtag.scaffold.liftoff.ncbi.anno.polished.gff
Poecilia sulphuraria
Psulph_kraken_filtered.faPsulph_braker.aaPsulph_braker.codingseqPsulph_braker.gtfPsulph.ragtag.scaffold.braker.gffPsulph.ragtag.scaffold.liftoff.ncbi.anno.polished.gff
Gambusia eurystoma
Geury_kraken_filtered.faGeury.aaGeury.codingseqGeury.gtfGeury.ragtag.scaffold.braker.gff
Gambusia sexradiata (sulfidic and nonsulfidic)
Gsex_S_kraken_filtered.faGsex_S.aaGsex_S.codingseqGsex_S.gtfGsex_S.ragtag.scaffold.braker.gffGsex_NS_kraken_filtered.faGsex_NS.aaGsex_NS.codingseqGsex_NS.gtfGsex_NS.ragtag.scaffold.braker.gff
Changes after Oct 16, 2025:
Added the renamed GFF files to align with NCBI scaffold names
