Progressive Cactus alignment of 298 drosophilid species
Data files
Dec 01, 2023 version files 68.23 GB
-
drosophila.hal.00
13.96 GB
-
drosophila.hal.01
13.96 GB
-
drosophila.hal.02
13.96 GB
-
drosophila.hal.03
13.96 GB
-
drosophila.hal.04
12.40 GB
-
README.md
893 B
Abstract
Long-read sequencing is driving rapid progress in genome assembly across all major groups of life, including species of the family Drosophilidae, a longtime model system for genetics, genomics, and evolution. Whole-genome sequence alignments link evolution at the nucleotide level across species and are a critical but computationally intensive step for downstream genomic analyses. Progressive Cactus is a reference-free, whole-genome alignment tool designed to scale to alignments of thousands of species.
In the study associated with this dataset, we conducted Oxford Nanopore long-read sequencing of both inbred lines and single wild flies obtained either directly from the field or from ethanol-preserved specimens in museum collections. We selected a set of 298 suitably high-quality drosophilid genomes from this study, from publicly available genomes assembled previously by us, and genomes assembled by other studies. Repeats were identified and soft-masked in each genome with RepeatModeler2 and RepeatMasker. A guide tree was constructed from 1,000 single-copy orthologs annotated by BUSCO v5 in all genomes. Individual gene trees were inferred with IQTREE2 and a species tree was estimated from the gene trees with ASTRAL-MP. The tree was scaled by the substitution rate at 4-fold degenerate sites and provided to Progressive Cactus as the guide tree for the alignment. Detailed methods are provided in the study.
The alignment is released as an open resource and as a tool for studying evolution at the scale of an entire insect family.
https://doi.org/10.5061/dryad.x0k6djhrd
A Progressive Cactus whole-genome, reference-free alignment was built with 298 drosophilid genomes as listed in our preprint https://doi.org/10.1101/2023.10.02.560517
The Cactus alignment was broken up with GNU split to facilitate upload and download. Please download all the parts and run:
cat drosophila.hal. > drosophila.hal
The Progressive Cactus tools must to be installed to utilize the alignments.
https://github.com/ComparativeGenomicsToolkit/cactus
One representative genome is used for each species. The genome’s name in the alignment is the first letter of the genus name, underscore, and full species name, all caps. For example, “Drosophila melanogaster” is “D_MELANOGASTER”.
A summary of the data in the alignment can be viewed with:
halStats drosophila.hal
The following table lists the set of genomes incorporated into the alignment. Note that accessions will be updated once genomes are processed through NCBI GenBank.
name | genome source | genome accession |
Stegana nigrithorax | NCBI GenBank | PENDING |
Leucophenga maculata | NCBI GenBank | PENDING |
Leucophenga montana | NCBI GenBank | PENDING |
Cacoxenus indagator | NCBI GenBank | PENDING |
Amiota minor | NCBI GenBank | PENDING |
Amiota communis | NCBI GenBank | PENDING |
Amiota mariae | NCBI GenBank | PENDING |
Chymomyza caudatula | NCBI GenBank | PENDING |
Chymomyza procnemis | NCBI GenBank | PENDING |
Chymomyza amoena | NCBI GenBank | PENDING |
Chymomyza costata | NCBI GenBank | GCA_018150985.1 |
Chymomyza fuscimana | NCBI GenBank | GCA_949987675.1 |
Scaptodrosophila lebanonensis | NCBI RefSeq | GCF_003285725.1 |
Scaptodrosophila latifasciaeformis | NCBI GenBank | PENDING |
Hirtodrosophila duncani | NCBI GenBank | PENDING |
Lordiphosa mommai | NCBI GenBank | GCA_018904225.1 |
Lordiphosa collinella | NCBI GenBank | PENDING |
Lordiphosa andalusiaca | NCBI GenBank | PENDING |
Lordiphosa fenestrarum | NCBI GenBank | PENDING |
Lordiphosa magnipectinata | NCBI GenBank | PENDING |
Lordiphosa clarofinis | NCBI GenBank | GCA_018904275.1 |
Drosophila sturtevanti | NCBI GenBank | GCA_018150375.1 |
Drosophila emarginata | NCBI GenBank | PENDING |
Drosophila neocordata | NCBI GenBank | GCA_018903615.1 |
Drosophila saltans | NCBI GenBank | GCA_018903575.1 |
Drosophila austrosaltans | NCBI GenBank | PENDING |
Drosophila prosaltans | NCBI GenBank | GCA_018151275.1 |
Drosophila sucinea | NCBI GenBank | GCA_018150745.1 |
Drosophila nebulosa | NCBI GenBank | GCA_024703675.1 |
Drosophila insularis | NCBI GenBank | GCA_018903935.1 |
Drosophila tropicalis | NCBI GenBank | GCA_018151085.1 |
Drosophila willistoni | NCBI RefSeq | GCF_018902025.1 |
Drosophila paulistorum | NCBI GenBank | GCA_018152135.1 |
Drosophila equinoxialis | NCBI GenBank | GCA_018150345.1 |
Drosophila guanche | NCBI RefSeq | GCF_900245975.1 |
Drosophila subobscura | NCBI RefSeq | GCF_008121235.1 |
Drosophila bifasciata | NCBI GenBank | GCA_009664405.1 |
Drosophila subsilvestris | NCBI GenBank | PENDING |
Drosophila obscura | NCBI RefSeq | GCF_018151105.1 |
Drosophila ambigua | NCBI GenBank | GCA_018150905.1 |
Drosophila tristis | NCBI GenBank | GCA_018150885.1 |
Drosophila lowei | NCBI GenBank | GCA_008121275.1 |
Drosophila miranda | NCBI RefSeq | NCBI GenBank |
Drosophila pseudoobscura | NCBI RefSeq | GCF_009870125.1 |
Drosophila persimilis | NCBI RefSeq | GCF_003286085.1 |
Drosophila helvetica | NCBI RefSeq | PENDING |
Drosophila azteca | NCBI GenBank | GCA_005876895.1 |
Drosophila affinis | NCBI GenBank | PENDING |
Drosophila algonquin | NCBI GenBank | PENDING |
Drosophila athabasca | NCBI GenBank | GCA_008121215.1 |
Drosophila setifemur | NCBI GenBank | GCA_021224005.1 |
Drosophila ironensis | NCBI GenBank | GCA_021223825.1 |
Drosophila varians | NCBI GenBank | GCA_018150405.1 |
Drosophila vallismaia | NCBI GenBank | PENDING |
Drosophila merina | NCBI GenBank | PENDING |
Drosophila ercepeae | NCBI GenBank | GCA_018150545.1 |
Drosophila pseudoananassae | NCBI GenBank | GCA_018153035.1 |
Drosophila malerkotliana | NCBI GenBank | GCA_018153235.1 |
Drosophila bipectinata | NCBI RefSeq | GCF_018153845.1 |
Drosophila parabipectinata | NCBI GenBank | GCA_018153455.1 |
Drosophila atripex | NCBI GenBank | PENDING |
Drosophila monieri | NCBI GenBank | PENDING |
Drosophila pandora | NCBI GenBank | GCA_021223865.1 |
Drosophila anomalata | NCBI GenBank | PENDING |
Drosophila ananassae | NCBI RefSeq | GCF_017639315.1 |
Drosophila pallidosa | NCBI GenBank | PENDING |
Drosophila oshimai | NCBI GenBank | GCA_018150695.1 |
Drosophila ficusphila | NCBI RefSeq | GCF_018152265.1 |
Drosophila gunungcola | NCBI RefSeq | GCF_025200985.1 |
Drosophila elegans | NCBI RefSeq | GCF_018152505.1 |
Drosophila fuyamai | NCBI GenBank | GCA_018153365.1 |
Drosophila kurseongensis | NCBI GenBank | GCA_018153305.1 |
Drosophila rhopaloa | NCBI RefSeq | GCF_018152115.1 |
Drosophila carrolli | NCBI GenBank | GCA_018152295.1 |
Drosophila biarmipes | NCBI RefSeq | GCF_025231255.1 |
Drosophila subpulchrella | NCBI RefSeq | GCF_014743375.2 |
Drosophila suzukii | NCBI RefSeq | GCF_013340165.1 |
Drosophila mimetica | NCBI GenBank | PENDING |
Drosophila takahashii | NCBI RefSeq | GCF_018152695.1 |
Drosophila lutescens | NCBI GenBank | PENDING |
Drosophila pseudotakahashii | NCBI GenBank | PENDING |
Drosophila prostipennis | NCBI GenBank | PENDING |
Drosophila eugracilis | NCBI RefSeq | GCF_018153835.1 |
Drosophila melanogaster | NCBI RefSeq | GCF_000001215.4 |
Drosophila simulans | NCBI RefSeq | GCF_016746395.2 |
Drosophila mauritiana | NCBI RefSeq | GCF_004382145.1 |
Drosophila sechellia | NCBI RefSeq | GCF_004382195.2 |
Drosophila orena | NCBI GenBank | GCA_005876975.1 |
Drosophila erecta | NCBI RefSeq | GCF_003286155.1 |
Drosophila teissieri | NCBI RefSeq | GCF_016746235.2 |
Drosophila santomea | NCBI RefSeq | GCF_016746245.2 |
Drosophila yakuba | NCBI RefSeq | GCF_016746365.2 |
Drosophila pectinifera | NCBI GenBank | GCA_008042775.1 |
Drosophila triauraria | NCBI GenBank | GCA_014170255.2 |
Drosophila auraria | NCBI GenBank | GCA_008042615.1 |
Drosophila tani | NCBI GenBank | GCA_008042535.1 |
Drosophila rufa | NCBI GenBank | GCA_018153105.1 |
Drosophila asahinai | NCBI GenBank | GCA_008042795.1 |
Drosophila lacteicornis | NCBI GenBank | GCA_008044355.1 |
Drosophila kanapiae | NCBI GenBank | GCA_008042475.1 |
Drosophila ogumai | NCBI GenBank | GCA_018904815.1 |
Drosophila kikkawai | NCBI RefSeq | GCF_018152535.1 |
Drosophila bocki | NCBI GenBank | GCA_008042715.1 |
Drosophila leontia | NCBI GenBank | GCA_008042735.1 |
Drosophila watanabei | NCBI GenBank | GCA_008042575.1 |
Drosophila punjabiensis | NCBI GenBank | GCA_008042585.1 |
Drosophila serrata | NCBI GenBank | PENDING |
Drosophila bunnanda | NCBI GenBank | PENDING |
Drosophila truncata | NCBI GenBank | GCA_008042515.1 |
Drosophila birchii | NCBI GenBank | PENDING |
Drosophila mayri | NCBI GenBank | GCA_008042485.1 |
Drosophila anomelani | NCBI GenBank | PENDING |
Drosophila jambulina | NCBI GenBank | GCA_018152175.1 |
Drosophila seguyi | NCBI GenBank | GCA_008042675.1 |
Drosophila vulcana | NCBI GenBank | GCA_008042555.1 |
Drosophila bakoue | NCBI GenBank | GCA_008044335.1 |
Drosophila tsacasi | NCBI GenBank | GCA_018904565.1 |
Drosophila nikananu | NCBI GenBank | GCA_008042635.1 |
Drosophila bocqueti | NCBI GenBank | GCA_018151655.1 |
Drosophila burlai | NCBI GenBank | GCA_008042655.1 |
Drosophila busckii | NCBI RefSeq | GCF_011750605.1 |
Drosophila repletoides | NCBI GenBank | GCA_018150835.1 |
Liodrosophila aerea | NCBI GenBank | PENDING |
Hypselothyrea guttata | NCBI GenBank | PENDING |
Zaprionus bogoriensis | NCBI GenBank | PENDING |
Zaprionus ghesquierei | NCBI GenBank | GCA_018904095.1 |
Zaprionus inermis | NCBI GenBank | GCA_018151445.1 |
Zaprionus kolodkinae | NCBI GenBank | GCA_018901885.1 |
Zaprionus tuberculatus | ||
Zaprionus tsacasi | NCBI GenBank | GCA_018904105.1 |
Zaprionus ornatus | NCBI GenBank | GCA_018904035.1 |
Zaprionus africanus | NCBI GenBank | GCA_018151435.1 |
Zaprionus gabonicus | NCBI GenBank | GCA_018903695.1 |
Zaprionus indianus | NCBI GenBank | GCA_018904595.1 |
Zaprionus capensis | NCBI GenBank | GCA_018903675.1 |
Zaprionus taronus | NCBI GenBank | GCA_018901805.1 |
Zaprionus davidi | NCBI GenBank | GCA_018903715.1 |
Zaprionus vittiger | NCBI GenBank | GCA_018904025.1 |
Zaprionus lachaisei | NCBI GenBank | GCA_018901815.1 |
Zaprionus nigranus | NCBI GenBank | GCA_018903425.1 |
Zaprionus camerounensis | NCBI GenBank | GCA_018904165.1 |
Drosophila quadrilineata | NCBI GenBank | GCA_018150725.1 |
Drosophila pruinosa | NCBI GenBank | GCA_018150935.1 |
Drosophila rubida | NCBI GenBank | PENDING |
Drosophila hypocausta | NCBI GenBank | PENDING |
Drosophila siamana | NCBI GenBank | PENDING |
Drosophila immigrans | NCBI GenBank | GCA_018153375.1 |
Drosophila formosana | NCBI GenBank | PENDING |
Drosophila ustulata | NCBI GenBank | PENDING |
Drosophila niveifrons | NCBI GenBank | PENDING |
Drosophila sulfurigasteralbostrigata | NCBI GenBank | GCA_023558435.1 |
Drosophila sulfurigasterbilimbata | NCBI GenBank | GCA_023558465.1 |
Drosophila sulfurigastersulfurigaster | NCBI GenBank | GCA_023558475.1 |
Drosophila kohkoa | NCBI GenBank | GCA_019972355.1 |
Drosophila pallidifrons | NCBI GenBank | GCA_023558445.1 |
Drosophila kepulauana | NCBI GenBank | GCA_023558455.1 |
Drosophila nasuta | NCBI GenBank | GCA_023558535.1 |
Drosophila albomicans | NCBI RefSeq | GCF_009650485.2 |
Drosophila tripunctata | NCBI GenBank | PENDING |
Drosophila cardini | NCBI GenBank | GCA_018903735.1 |
Drosophila parthenogenetica | NCBI GenBank | PENDING |
Drosophila acutilabella | NCBI GenBank | PENDING |
Drosophila dunni | NCBI GenBank | GCA_018152125.1 |
Drosophila nigrodunni | NCBI GenBank | GCA_020829145.1 |
Drosophila arawakana | NCBI GenBank | GCA_018151165.1 |
Drosophila macrospina | NCBI GenBank | PENDING |
Drosophila funebris | NCBI GenBank | GCA_958295475.1 |
Drosophila putrida | NCBI GenBank | PENDING |
Drosophila neotestacea | NCBI GenBank | PENDING |
Drosophila testacea | NCBI GenBank | PENDING |
Drosophila histrio | NCBI GenBank | GCA_958299025.1 |
Drosophila kuntzei | NCBI GenBank | PENDING |
Drosophila phalerata | NCBI GenBank | GCA_951394115.1 |
Drosophila innubila | NCBI GenBank | GCA_004354385.2 |
Drosophila falleni | NCBI GenBank | PENDING |
Drosophila rellima | NCBI GenBank | PENDING |
Drosophila quinaria | NCBI GenBank | PENDING |
Drosophila suboccidentalis | NCBI GenBank | PENDING |
Drosophila subquinaria | NCBI GenBank | PENDING |
Drosophila recens | NCBI GenBank | PENDING |
Zygothrica flavofinira | NCBI GenBank | PENDING |
Hirtodrosophila trivittata | NCBI GenBank | PENDING |
Hirtodrosophila cameraria | NCBI GenBank | GCA_949708635.1 |
Hirtodrosophila histrioides | NCBI GenBank | PENDING |
Hirtodrosophila confusa | NCBI GenBank | PENDING |
Hirtodrosophila alboralis | NCBI GenBank | PENDING |
Drosophila lacertosa | NCBI GenBank | GCA_004143845.1 |
Drosophila colorata | NCBI GenBank | PENDING |
Drosophila robusta | NCBI GenBank | PENDING |
Drosophila sordidula | NCBI GenBank | PENDING |
Drosophila micromelanica | NCBI GenBank | GCA_004143825.1 |
Drosophila nigromelanica | NCBI GenBank | GCA_004149465.1 |
Drosophila melanica | NCBI GenBank | GCA_004143765.1 |
Drosophila paramelanica | NCBI GenBank | PENDING |
Drosophila lacicola | NCBI GenBank | PENDING |
Drosophila montana | NCBI GenBank | PENDING |
Drosophila borealis | NCBI GenBank | PENDING |
Drosophila littoralis | NCBI GenBank | GCA_018903485.1 |
Drosophila kanekoi | NCBI GenBank | PENDING |
Drosophila ezoana | NCBI GenBank | PENDING |
Drosophila virilis | NCBI RefSeq | GCF_003285735.1 |
Drosophila novamexicana | NCBI RefSeq | GCF_003285875.2 |
Drosophila americana | NCBI GenBank | GCA_030788265.1 |
Drosophila pseudotalamancana | NCBI GenBank | PENDING |
Drosophila nannoptera | NCBI GenBank | GCA_020883555.1 |
Drosophila pachea | NCBI GenBank | GCA_020883565.1 |
Drosophila gaucha | NCBI GenBank | PENDING |
Drosophila pegasa | NCBI GenBank | PENDING |
Drosophila mettleri | NCBI GenBank | PENDING |
Drosophila hydei | NCBI RefSeq | GCF_003285905.1 |
Drosophila eohydei | NCBI GenBank | PENDING |
Drosophila anceps | NCBI GenBank | PENDING |
Drosophila leonis | NCBI GenBank | PENDING |
Drosophila nigricruria | NCBI GenBank | PENDING |
Drosophila fulvimacula | NCBI GenBank | PENDING |
Drosophila peninsularis | NCBI GenBank | PENDING |
Drosophila repleta | NCBI GenBank | PENDING |
Drosophila paranaensis | NCBI GenBank | PENDING |
Drosophila mercatorum | NCBI GenBank | GCA_961210405.1 |
Drosophila meridionalis | NCBI GenBank | PENDING |
Drosophila meridiana | NCBI GenBank | PENDING |
Drosophila stalkeri | NCBI GenBank | PENDING |
Drosophila buzzatii | NCBI GenBank | PENDING |
Drosophila koepferae | NCBI GenBank | GCA_023375715.1 |
Drosophila antonietae | NCBI GenBank | GCA_030914385.1 |
Drosophila borborema | NCBI GenBank | GCA_030914335.1 |
Drosophila hamatofila | NCBI GenBank | PENDING |
Drosophila mayaguana | NCBI GenBank | PENDING |
Drosophila aldrichi | NCBI GenBank | PENDING |
Drosophila mulleri | NCBI GenBank | PENDING |
Drosophila mojavensis | NCBI RefSeq | GCF_018153725.1 |
Drosophila arizonae | NCBI RefSeq | GCF_001654025.1 |
Drosophila flavopinicola | NCBI GenBank | PENDING |
Drosophila maculinotata | NCBI GenBank | PENDING |
Scaptomyza hsui | NCBI GenBank | GCA_018152825.1 |
Scaptomyza graminum | NCBI GenBank | GCA_018901835.1 |
Scaptomyza polygonia | NCBI GenBank | PENDING |
Scaptomyza flava | NCBI GenBank | GCA_030179655.1 |
Scaptomyza montana | NCBI GenBank | GCA_018904305.1 |
Scaptomyza caliginosa | NCBI GenBank | PENDING |
Scaptomyza parva | NCBI GenBank | PENDING |
Scaptomyza pallida | NCBI GenBank | GCA_018152965.1 |
Scaptomyza reducta | NCBI GenBank | PENDING |
Scaptomyza tumidula | NCBI GenBank | PENDING |
Scaptomyza cyrtandrae | NCBI GenBank | PENDING |
Drosophila setosimentum | NCBI GenBank | PENDING |
Drosophila picticornis | NCBI GenBank | PENDING |
Drosophila anomalipes | NCBI GenBank | PENDING |
Drosophila quasianomalipes | NCBI GenBank | PENDING |
Drosophila cyrtoloma | NCBI GenBank | PENDING |
Drosophila melanocephala | NCBI GenBank | PENDING |
Drosophila differens | NCBI GenBank | PENDING |
Drosophila planitibia | NCBI GenBank | PENDING |
Drosophila silvestris | NCBI GenBank | PENDING |
Drosophila heteroneura | NCBI GenBank | PENDING |
Drosophila basisetae | NCBI GenBank | PENDING |
Drosophila paucipuncta | NCBI GenBank | PENDING |
Drosophila glabriapex | NCBI GenBank | PENDING |
Drosophila macrothrix | NCBI GenBank | PENDING |
Drosophila hawaiiensis | NCBI GenBank | PENDING |
Drosophila crucigera | NCBI GenBank | PENDING |
Drosophila pullipes | NCBI GenBank | PENDING |
Drosophila grimshawi | NCBI RefSeq | GCF_018153295.1 |
Drosophila engyochracea | NCBI GenBank | PENDING |
Drosophila villosipedis | NCBI GenBank | PENDING |
Drosophila ochracea | NCBI GenBank | PENDING |
Drosophila sproati | NCBI GenBank | GCA_018904355.1 |
Drosophila murphyi | NCBI GenBank | GCA_018904325.1 |
Drosophila dives | NCBI GenBank | PENDING |
Drosophila multiciliata | NCBI GenBank | PENDING |
Drosophila demipolita | NCBI GenBank | PENDING |
Drosophila longiperda | NCBI GenBank | PENDING |
Drosophila fungiperda | NCBI GenBank | PENDING |
Drosophila melanosoma | NCBI GenBank | PENDING |
Drosophila mimica | NCBI GenBank | PENDING |
Drosophila kambysellisi | NCBI GenBank | PENDING |
Drosophila infuscata | NCBI GenBank | PENDING |
Drosophila cognata | NCBI GenBank | PENDING |
Drosophila yooni | NCBI GenBank | PENDING |
Drosophila tanythrix | NCBI GenBank | PENDING |
Drosophila kokeensis | NCBI GenBank | PENDING |
Drosophila nrfundita | NCBI GenBank | PENDING |
Drosophila cracens | NCBI GenBank | PENDING |
Drosophila paracracens | NCBI GenBank | PENDING |
Drosophila imparisetae | NCBI GenBank | PENDING |
Drosophila nigritarsus | NCBI GenBank | PENDING |
Drosophila nrmedialis2 | NCBI GenBank | PENDING |
Drosophila nrmedialis3 | NCBI GenBank | PENDING |
Drosophila seclusa | NCBI GenBank | PENDING |
Drosophila kupee | NCBI GenBank | PENDING |
Drosophila kuia | NCBI GenBank | PENDING |
Drosophila atroscutellata | NCBI GenBank | PENDING |
Drosophila trichaetosa | NCBI GenBank | PENDING |
Drosophila neutralis | NCBI GenBank | PENDING |
Drosophila percnosoma | NCBI GenBank | PENDING |
Drosophila incognita | NCBI GenBank | PENDING |
Drosophila conformis | NCBI GenBank | PENDING |
Drosophila sordidapex | NCBI GenBank | PENDING |