An increasing number of phylogenetic studies have reported discordances among nuclear and mitochondrial markers. These discrepancies are highly relevant to widely used biodiversity assessment approaches, such as DNA barcoding, that rely almost exclusively on mitochondrial markers. Although the theoretical causes of mito-nuclear discordances are well understood, it is often extremely challenging to determine the principal underlying factor in a given study system. In this study, we uncovered significant mito-nuclear discordances in a pair of sibling caddisfly species. Application of genome sequencing, ddRAD, and DNA barcoding revealed ongoing hybridization, as well as historical hybridization in Pleistocene refugia, leading us to identify introgression as the ultimate cause of the observed discordance pattern. Our novel genomic data, the discovery of a European-wide hybrid zone, and the availability of established techniques for laboratory breeding make this species pair an ideal model system for studying species boundaries with ongoing gene flow.
Geneset of the Sericostoma genome assembly
Genes were predicted using ab initio and homology-based approaches. Augustus was used for ab initio predictions. For the homology-based approach, protein datasets from related species (Acyrthosiphon pisum, Camponotus floridanus, Nasonia vitripennis, Bombyx mori, and Tribolium castaneum) were aligned to the draft genome assemblies using BLASTp (cutoff: 10-5), and gene models were generated by GeneWise. A consensus gene set that merged ab initio and homology-based predictions was created using GLEAN.
The three resulting data files contain a table with summary information per predicted gene (Sericostoma_personatum_gapClosed.gene.gff), the predicted mRNA sequences (Sericostoma_personatum_gapClosed.gene.cds) and the predicted protein sequences ( Sericostoma_personatum_gapClosed.gene.pep).
Functional annotation of the Sericostoma genome assembly
Gene functions were assigned to the Geneset using BLASTp (cutoff: 10-5) against KEGG (release 58), non-redundant protein sequences (NCBI release 20150222), Swiss-Prot, and TrEMBL (Uniprot release 201203). Conserved protein domains were assessed by InterPro and InterProScan. Separate files are provided for each database.
nuclear RNA of the Sericostoma genome assembly
Nuclear RNA sequences were identified in the draft genome assembly. tRNA genes were identified using tRNAscan-SE with default parameters. miRNA and snRNA were identified using the INFERNAL software by searching against the Rfam database (release 9.1) with default parameters. rRNA genes were identified by BLASTn (cutoff: 10-5) searches against conserved invertebrate rRNA sequences. Separate files are provided for each type of RNA.