Generation of a de novo low redundancy transcriptome (aepLRv2) for Hydra vulgaris strain AEP and reference annotation. Three stranded transcriptomes were assembled using Trinity 2.1.1 with the following setting: Trinity --left readA.fastq --readB.fastq --seqType fq --group_pairs_distance 452 --min_contig_length 200 --SS_lib_type RF In the first two assemblies (assembly A, B) error corrected reads were assembled either with (A) or without flag --min_kmer_cov 2. In the third assembly (assembly C) non-error corrected reads were assembled after trimming off the first 12bp of every read to remove potential sequence biases (random hexamer bias) and using flag --min_kmer_cov 2. Assembly A included 83,512 Trinity transcripts, assembly B 92,474 Trinity transcripts and Assembly C 109,749 Trinity transcripts. Since we considered only uniquely mapped reads in downstream expression analyses the three assemblies were subsequently processed using script EvidentialGene tr2aacds.pl (v2017.12.21) to reduce redundancy (http://arthropods.eugenes.org/EvidentialGene/trassembly.html): 1) We ran trformat.pl to regularize IDs in fastas, ensure unique IDs, add prefixes for parameter sets. trformat.pl -output hydra_aep.tr -input A_assemE_alt2_Trinity.fasta B_assemE_Trinity.fasta C_assemE_alt1_Trinity.fasta 2) We ran tr2aacds.pl tr2aacds.pl -NCPU=16 -logfile -MINCDS=20 -mrnaseq hydra_aep.tr Transcripts in file .okay.tr were blasted against the Hydra mitochondrial genome and three sequences were removed from the reference. The final reference (aepLRv2) comprises 38,749 sequences with an assembly N50 of 1.54 kb. Using BUSCO _v1.1b1.py we found a reduced duplicate ratio of 15.3% (Complete Duplicated BUSCOs/Total BUSCO groups searched) compared to 46.6% in one of the starting transcriptomes. Since EvidentialGene introduces long transcript ids we simplified the naming scheme to read tXXXaep numbering the transcripts in the reference from 1 through 38,749. Trinity_assemblies/ - three stranded Trinity transcriptomes that were processed using EvidentialGene aepLRv2/ - aepLRv2.fasta: low redundancy transcriptome with tXXXXaep IDs - aepLRv2_SP.fasta: low redundancy transcriptome with the tXXXXaep ID and the swissprot ID. This naming scheme is used throughout our analysis. aepLRv2_annotation/ We provide Pfam results from a HMMER search (suite 3.1b2 (February 2015, http://www.hmmer.org/) and Pfam v31.0 database, evalue cut-off 1e-6)and blast results from searches against databases nr (1e-5) and SwissProt (1e-5). - aepLRv2_annot.tsv - file with tab separated values - aepLRv2_annot.xcl - excel file