Genome assembly variation and its implications for gene discovery in nematode species
Data files
May 15, 2024 version files 1.67 GB
-
Cbovis_canu_asm.fsa.gz
19.80 MB
-
Cbovis_canu_braker3.fsa.gz
4.09 MB
-
Cbovis_canu_braker3.gtf.gz
3.44 MB
-
Cbovis_falcon_asm.fsa.gz
18.73 MB
-
Cbovis_falcon_braker3.fsa.gz
3.90 MB
-
Cbovis_falcon_braker3.gtf.gz
3.28 MB
-
Cbovis_falconunzip_asm.fsa.gz
18.61 MB
-
Cbovis_falconunzip_braker3.fsa.gz
3.89 MB
-
Cbovis_falconunzip_braker3.gtf.gz
3.27 MB
-
Cbovis_flye_asm.fsa.gz
18.24 MB
-
Cbovis_flye_braker3.fsa.gz
3.91 MB
-
Cbovis_flye_braker3.gtf.gz
3.26 MB
-
Cbovis_redbean2.5_asm.fsa.gz
18.22 MB
-
Cbovis_redbean2.5_braker3.fsa.gz
3.90 MB
-
Cbovis_redbean2.5_braker3.gtf.gz
3.27 MB
-
Cbovis_smartdenovo_asm.fsa.gz
18.37 MB
-
Cbovis_smartdenovo_braker3.fsa.gz
3.89 MB
-
Cbovis_smartdenovo_braker3.gtf.gz
3.25 MB
-
Cbovis_stevens_asm.fsa.gz
18.47 MB
-
Cbovis_stevens_braker3.fsa.gz
4.04 MB
-
Cbovis_stevens_braker3.gtf.gz
3.35 MB
-
Hbakeri_chow_asm.fsa.gz
207.98 MB
-
Hbakeri_chow_braker3.fsa.gz
5.18 MB
-
Hbakeri_chow_braker3.gtf.gz
6.78 MB
-
Hbakeri_flye_asm.fsa.gz
182.15 MB
-
Hbakeri_flye_braker3.fsa.gz
5.15 MB
-
Hbakeri_flye_braker3.gtf.gz
6.30 MB
-
Hbakeri_hicanu_asm.fsa.gz
184.82 MB
-
Hbakeri_hicanu_braker3.fsa.gz
5.05 MB
-
Hbakeri_hicanu_braker3.gtf.gz
6.21 MB
-
Hbakeri_hifiasm_asm.fsa.gz
182.67 MB
-
Hbakeri_hifiasm_braker3.fsa.gz
5.04 MB
-
Hbakeri_hifiasm_braker3.gtf.gz
6.21 MB
-
Hcontortus_canu_asm.fsa.gz
101.02 MB
-
Hcontortus_canu_braker3.fsa.gz
4.99 MB
-
Hcontortus_canu_braker3.gtf.gz
6.10 MB
-
Hcontortus_doyle_asm.fsa.gz
85.61 MB
-
Hcontortus_doyle_braker3.fsa.gz
4.17 MB
-
Hcontortus_doyle_braker3.gtf.gz
4.95 MB
-
Hcontortus_falcon_asm.fsa.gz
82.76 MB
-
Hcontortus_falcon_braker3.fsa.gz
4.27 MB
-
Hcontortus_falcon_braker3.gtf.gz
5.15 MB
-
Hcontortus_falconunzip_asm.fsa.gz
75.30 MB
-
Hcontortus_falconunzip_braker3.fsa.gz
4.23 MB
-
Hcontortus_falconunzip_braker3.gtf.gz
5.09 MB
-
Hcontortus_flye_asm.fsa.gz
94.22 MB
-
Hcontortus_flye_braker3.fsa.gz
4.87 MB
-
Hcontortus_flye_braker3.gtf.gz
6.08 MB
-
Hcontortus_redbean2.5_asm.fsa.gz
85.34 MB
-
Hcontortus_redbean2.5_braker3.fsa.gz
4.69 MB
-
Hcontortus_redbean2.5_braker3.gtf.gz
5.67 MB
-
Hcontortus_smartdenovo_asm.fsa.gz
89.70 MB
-
Hcontortus_smartdenovo_braker3.fsa.gz
4.80 MB
-
Hcontortus_smartdenovo_braker3.gtf.gz
5.82 MB
-
README.md
1.17 KB
Oct 09, 2024 version files 1.67 GB
Abstract
Genome assemblers are a critical component of genome science, but the choice of assembly software and protocols can be daunting. Here, we investigate genome assembly variation and its implications for gene discovery across three nematode species—Caenorhabditis bovis, Haemonchus contortus, and Heligmosomoides bakeri—highlighting the critical interplay between assembly choice and downstream genomic analysis. Selecting commonly used genome assemblers, we generated multiple assemblies for each species, analyzing their structure, completeness, and effect on gene family analysis. Our findings demonstrate that assembly variations can significantly affect gene family composition, with notable differences in gene families important in anthelmintic discovery and immunomodulation. Despite broadly similar performance using various assembly metrics, comparisons of assemblies with a single species revealed underlying structural rearrangements and inconsistencies in gene content, which would affect downstream analyses. This emphasizes the need for continuous refinement of genome assemblies and their annotations.
README: Genome assembly variation and its implications for gene discovery in nematode species
https://doi.org/10.5061/dryad.p2ngf1vzh
We wanted to determine the effect of genome assembly variation. We used DNA sequence reads from three species of nematodes: Caenorhabditis bovis (Oxford Nanopore reads); Haemonchus contortus (PacBio RSII reads); Heligmosomoides polygyrus (PacBio HiFi reads). For each species, several assembly algorithms were used and their results were compared. We also generated gene models for each assembly using the Braker3 software.
Description of the data and file structure
All data files are compressed using gzip.
The naming structure for most of the files is aaa_bbb_ccc.ddd, where
- aaa --> the species, where Cbovis --> Caenorhabditis bovis; Hcontortus --> Haemonchus contortus; Hbakeri --> Heligmosomoides bakeri.
- bbb --> the assembly algorithm used, or for the published genomes, the first author of the paper (e.g., Stevens, Chow, Doyle).
- ccc --> asm for assembly or braker3 for gene models.
- ddd --> fsa for fasta format or gtf for gene transfer file format.
Version change log
9-oct-2024: We have included the supplementary information associated with the publication.
- Mariene_n_Wasmuth_SOM_Figures --> Figures S1-S6.
- Mariene_n_Wasmuth_SOM_Tables --> Tables S1-S71.
- Mariene_n_Wasmuth_FileS1 --> Additional methods, including the commands used for various software.
Methods
The assemblies were generated using different software.
Caenorhabditis bovis and Haemonchus contortus
- Redbean2.5
- Flye
- Canu
- SMARTdenovo
- Falcon
- Falcon-unzip
Heligmosomoides bakeri
- Flye
- HiCanu
- Hifiasm
The gene models were generated using the software BRAKER3