### Overview This section details steps used to produce a raw custom transposable element library for a bird genome. ### Required Input This step requires a pseudohaploid genome assembly, which is archived on NCBI (PRJNA655929) and provided in the Dryad repository as "Willisornis_vidua_nigrigula_JTW1144_700mill.fasta.gz". It also requires the TE-free bird protein databases produced in the previous step (2.0_Reference_protein_databases.md). ### Output The output is a raw transposable element library that can be used to repeat mask the genome. Before being analyzed on its own though, it should be curated. The raw library is provided in the Dryad repository as "Willisornis_uncurated_TE_library.fasta" # Pipeline # Repeat Library Construction For a genome annotation project to work well, the genome needs to be repeat-masked, which involves identifying transposable elements and other repetitive elements and removing or hiding them before running gene prediction programs. This is because the repetitive elements can make gene identification slower and less accurate, as they often contain genes or gene-like sequences. This protocol includes steps to build a repetitive element library which can be used by RepeatMasker to identify and mask repetitive elements in the genome. This is important for non-nonmodel organisms in which transposable elements have not yet been studied. This protocol roughly follows the tutorial on the Maker2 wiki [here](http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced), with more details [here](http://i5k.github.io/webinar_slides/i5k_webinar_Jiang-03-07-2018.pdf), although it does not follow it exactly. ## Setup First, set up some folders for this project. ```bash mkdir -p Willisornis cd Willisornis ``` We will go through several structure-based programs for identifying transposable elements specific to our organism using their similarity to known TE structures. Finally, we will use a de novo approach with RepeatModeler to identify repetitive elements in the genome. In the end, the headers of the RepeatMasker-bound files should be formatted [like this:](https://www.animalgenome.org/bioinfo/resources/manuals/RepeatMasker.html) >repeatname#class/subclass or >repeatname#class ## MITEs There are several programs to detect MITEs (Miniature Inverted repeat Transposable Elements), including [MITE-hunter](http://target.iplantcollaborative.org/mite_hunter.html) from 2010 which has been widely used, as well as some newer tools like [DetectMITE](https://www.nature.com/articles/srep19688) (which requires MATLAB), and [MITE Tracker](https://github.com/INTABiotechMJ/MITE-Tracker), which may be more efficient for large genomes. We will try MITE Tracker first, and then use MITE-Hunter. ```bash #setup folder for output cd Willisornis mkdir MITE mkdir MITE/MITE-Tracker #setup the environment for MITE-tracker cd ~/tools/MITE-Tracker #need to be in the MITE-Tracker folder to run it export PATH=$PATH:~/tools/ncbi-blast-2.9.0+/bin #add blast+ to your path so the executables can be found virtualenv -p python3 venv #create a python environment source venv/bin/activate #activate python virtual environment #run MITE-tracker python3 -m MITETracker -g ~/Genomes/Willisornis_poecilinotus_JTW1144v2_2018March1/FASTAOUPUT/Willisornis_poecilinotus_JTW1144_FASTAOUTPUT.fasta -w 21 -j Willisornis mv ./results/Willisornis ~/Willisornis/MITE/MITE-Tracker ``` * -g is the path to the genome * -j is the job name * -w is the number of proccesses to run simultaneously (initially ran with 28 but it became stuck for a few days after 66% complete, using 100% memory until I ended it. Then reran with j=21 and it completed in 13 hours. We can compare these results with MITE-Hunter, another program which is a bit older but has been widely used for MITE detection. NOTE: MITE-Hunter can change the original sequence file! It may be a good idea to save an archived copy of your genome, and then replace the used genome sequence with a fresh copy after you have run MITE-Hunter. ```bash #setup mkdir ~/Willisornis/MITE-Hunter cd ~/Willisornis/MITE-Hunter cp ~/Willisornis/Genome/Willisornis_poecilinotus_JTW1144_FASTAOUTPUT.fasta ./WillisornisMITE_genome.fasta #create a working copy of the genome to preserve the original #run MITE-Hunter perl ~/tools/MITE-hunter/MITE_Hunter_manager.pl -i ./WillisornisMITE_genome.fasta -g WillisornisMiteHunt -c 21 -n 21 -P 1 -S 12345678 ``` * -i: input FASTA file of genome * -g: run name, prefix names of output files * -c: number of computer cores to use * -P: proportion of the genome that MITE-Hunter will search, can be reduced to save time * -S: The steps of MITE-Hunter that you want to run. There are 8 steps, which you can do all at once or separately. If there is an error part way through you can restart where you left off. For example, if it failed in step 3 but you have name_step2.fa complete, then you can use -S 345678 to do the remaining 6 steps with out redoing the first two. other options could be -w 1000 (max 1000bp length, default 2000), -L 80 (80% similarity will be grouped together, default 90), -m 1 (min copy number, default 3), -l 2 (max unmatched bp in TIR, default 1) ### Results | Program | Runtime | MITE families | MITEs |MITE candidates |------|--------|------|-------|-------| | MITE-tracker | 13.63 hours |9 | 38 |968,622 | MITE-Hunter | >24 hours |8 | 37 |NA Output of MITE-Tracker can be openned in Mesquite as a FASTA file. First delete the leading "------" line from the beginning of the file, as this will not be interpreted correctly by Mesquite. Then within Mesquite, go File-> Open File -> (select the .fasta file) -> FASTA (DNA/RNA) -> (name the nexus file) -> Save Remove this line from MITE-Tracker/Willisornis/families.fasta and save as families_edit.fasta >---------- Now, create a library which will be used to later mask our genome sequence before gene annotation. ```bash cd ~/Willisornis/MITE/MITE-Hunter cat ./WillisornisMiteHunt_Step8_*.fa > MITE.lib #combine all MITE families into a single fasta file cd .. cat ./MITE-Hunter/MITE.lib MITE-Tracker/Willisornis/families_edit.fasta > WillisornisMITEs.fa ``` We could check which sequences match each other from the two programs with Blast: ```bash ~/tools/ncbi-blast-2.9.0+/bin/makeblastdb -dbtype nucl -in MITE-Hunter/MITE.lib -out MITE.lib ~/tools/ncbi-blast-2.9.0+/bin/blastn -subject MITE-Hunter/MITE.lib -query MITE-Tracker/Willisornis/families.fasta -out MITE_methodscompare.out ``` They do not correspond perfectly, the two methods came up with slightly different consensus sequences. ### Format for RepeatMasker Next we are using scripts from the package EDTA to filter and format the MITE results. To rename the file to be RepeatMasker readable, use this script from the EDTA package (does not work if copied in terminal, must make into a script file). Type this command and then copy-paste everything between the asterisks into the terminal, then press ctrl-d to finish. ```bash cat > format_name.pl #(ctrl-d to finish) ``` >#!/usr/bin/perl -w `perl -i -nle \'s/MITEhunter//; print \$_ and next unless /^>/; my \$id = (split)[0]; print \"\${id}#MITE/unknown\"\' WillisornisMITEs.fa`; ```bash perl format_name.pl #formats names to be RepeatMasker-readable perl ~/tools/EDTA/util/rename_TE.pl WillisornisMITEs.fa > Willisornis.MITE.raw.fa.renamed #formats names to be RepeatMasker-readable perl ~/tools/EDTA/util/cleanup_tandem.pl -misschar N -nc 50000 -nr 0.9 -minlen 80 -minscore 3000 -trf 1 -cleanN 1 -cleanT 1 -f Willisornis.MITE.raw.fa.renamed > Willisornis.MITE.fa ``` cleanup_tandem removed nothing - nothing that needs to be cleaned up! Result: 75 candidate MITEs ## Helitrons Helitrons can be identified by HelitronScanner, which uses training sets of known helitron TEs to locate TEs in a genome. HelitronScanner runs as a series of separate commands to identify helitron heads and tails. Head scanner took around an hour with 23 threads, tail scanner took 5 hours on 1 thread. I ran in separately on the draft assembly, but when I got the improved assembly I reran it as part of the EDTA package. ## Long Terminal Repeat Retrotransposons There are a couple pipelines available. [LTR_retriever](http://www.plantphysiol.org/content/176/2/1410#F2) ([manual](https://github.com/oushujun/LTR_retriever/blob/master/Manual.pdf)) looks excellent but was designed with plants in mind. [LocaTR](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3043-1) was designed for birds and so the results can be more directly compared there. LTR Retriever takes the output from LTR Finder and LTR Harvest, filters it, and produces a library of LTR retrotransposons. I ran LTR_Retriever on my old genome assembly, but now I am rerunning it as part of the EDTA package with my new assembly. LTR_Retriever requires 4 databases * DNA Tpases: default contains *Gallus* and *Taeniopygia* so we can keep the defaults for those. * LINE Tpases: default contains *Gallus* and *Taeniopygia* so we can keep the defaults for those. * Proteins: made from plants, not as appopriate for us (so I made a bird one, I put the process in the proteomes page) * TEfam.hmm: made from rice, could make a bird one ## TIRs TIRs are the larger, functional version of MITEs that can be identified by their terminal inverted repeat structure. The program TIR-learner is run from within the EDTA pipeline, and is the slowest step so we will give it the most threads of computer power. # EDTA Pipeline This is an excellent pipeline stringing together several current structural TE identification programs. Previously, I ran all these programs on the old version of my Willisornis genome. Now, with this new program, you can run them all in one go, plus it has some nice scripts to further filter your library for you! It is designed for plants, so the TEs are more planty (we are excecting mostly CR1 in our birds, and should not find many Helitrons) but we can provide bird protein databases to make it a bit more customized. To customize slightly for birds, I am making these changes before running: 1) The program LTR_Retriever uses a plant protein library - not as useful for birds. Instead, I will replace it with the TE-free model bird protein database I made earlier in the proteome part of the pipeline (`~/Genetic_resources/RefAves_proteins_no_tes.fa`). * Edit the EDTA script line `perl $LTR_retriever -genome $genome -inharvest $genome.rawLTR.scn -threads $threads -noanno` to be `perl $LTR_retriever -genome $genome -inharvest $genome.rawLTR.scn -threads $threads -noanno -plantprolib /home/0_PROGRAMS/EDTA/bin/LTR_retriever/database/RefAves_proteins_no_tes.fa` so that it uses the bird protein database instead of the plant one. ```bash cp ~/Genetic_resources/RefAves_proteins_no_tes.fa /home/0_PROGRAMS/EDTA/bin/LTR_retriever/database nano /home/0_PROGRAMS/EDTA/EDTA_raw.pl #edit the LTR_retriever line (#155) ``` We also need a CDS library to filter out proteins. I made this earlier, but in the end it did not do anything in the pipeline. Note that when I ran HelitronScanner before, I had issues with the fasta headers - it is not expecting there to be any spaces in the .fasta headers. To fix this, I made a quick sed script below before running EDTA that will remove everything after the first space in your fasta header - this of course expects that there is some unique number or word before the first space in the headers! This works fine for the output of Supernova, keeping only the number (like this `>123`) associated with the scaffold and removing extraneous info. Right now I found I needed two different conda environments to run EDTA, because TEsorter did not work with the same python2 environment as the rest of the pipeline, so we need to go in and change the code. You can make them like this, however actually you can do it with a single conda environment after messaging the creator of EDTA so I would not do it this way again, but here is what I did anyways: ```bash conda create -n EDTA conda activate EDTA conda config --env --add channels anaconda --add channels conda-forge --add channels biocore --add channels bioconda --add channels cyclus conda install -n EDTA -y cd-hit repeatmodeler muscle mdust repeatmasker=4.0.9_p2 blast-legacy java-jdk perl perl-text-soundex multiprocess regex tensorflow=1.14.0 keras=2.2.4 scikit-learn=0.19.0 biopython pandas glob2 python=3.6 trf conda deactivate conda create -n TEsorter python=2.7 conda activate TEsorter conda install blast pp biopython drmaa conda install -c biocore hmmer conda install -c anaconda drmaa #install DRMAA properly #sudo -i #apt install gridengine-drmaa-dev ``` ```bash nano /home/0_PROGRAMS/EDTA/util/cleanup_TE.pl ``` At the line 36, before the call to python 2, I added `conda deactivte` and `conda activate TEsorter` so it switches to a python2 environment, then afterwards deactivate and switch back with `conda activate EDTA`. Again I would not really do it this way again but it is what I did and it worked. ```bash `conda deactivate`; `conda activate TEsorter`; `python2 $TEsorter $cds -p $threads`; `conda deactivate`; `conda activate EDTA`; ``` ### Run EDTA ```bash mkdir /home/0_GENOMES1/0_WEIRLAB_GENOMES_CHROMIUMX/Willisornis/EDTA cd /home/0_GENOMES1/0_WEIRLAB_GENOMES_CHROMIUMX/Willisornis/EDTA cp /home/else/Genetic_resources/ref_aves_CDS.lib . cp /hhome/else/Genetic_resources/RefAves_proteins_no_tes.fa . #place a copy of your genome in this folder. Mine is called Willisornis700.fasta sed 's/\s.*//g' Willisornis700.fasta > temp ; mv temp Willisornis700.fasta #rename the genome and simplify the chromosome names for the following script to work #run EDTA conda activate EDTA time perl /home/0_PROGRAMS/EDTA/EDTA_raw.pl -genome Willisornis700.fasta -species others -threads 8 -type tir #should have used 16 threads time perl /home/0_PROGRAMS/EDTA/EDTA_raw.pl -genome Willisornis700.fasta -species others -threads 8 -type helitron #should have used 6 threads time perl /home/0_PROGRAMS/EDTA/EDTA_raw.pl -genome Willisornis700.fasta -species others -threads 8 -type ltr #should have used 2 threads #perl /home/0_PROGRAMS/EDTA/EDTA_raw.pl -genome Willisornis700.fasta -species others -threads 8 -type mite #not implemented yet, maybe in development? export DRMAA_LIBRARY_PATH=/usr/lib/gridengine-drmaa/lib/libdrmaa.so.1.0 time perl /home/0_PROGRAMS/EDTA/EDTA.pl -overwrite 0 -genome Willisornis700.fasta -species others -cds ref_aves_CDS.lib -protlib /home/0_GENOMES1/0_WEIRLAB_GENOMES_CHROMIUMX/Willisornis/EDTA/RefAves_proteins_no_tes.fa -sensitive 0 -threads 23 perl /home/0_PROGRAMS/EDTA/util/output_by_list.pl 1 ref_aves_CDS.lib.mod 1 ref_aves_CDS.lib.mod.TE.list -FA -ex > ref_aves_CDS.lib.rmTE2 RepeatMasker -pa 23 -q -no_is -norna -nolow -div 40 -lib $cds.TE -cutoff 225 $cds.rmTE.code`; perl /home/0_PROGRAMS/EDTA/util/cleanup_tandem.pl -Nscreen 1 -nc 300 -nc 0.3 -minlen 300 -maxlen 300000 -cleanN 1 -cleanT 0 -trf 0 -f $cds.rmTE.code.masked > $cds.noTE`; ``` * `-genome`: [File] The genome FASTA * `-species`: [Rice|Maize|others] Specify the species for identification of TIR candidates. Default: others * `-step`: "[all|filter|final|anno] Specify which steps you want to run EDTA. all: run the entire pipeline (default). filter: start from raw TEs to the end. final: start from filtered TEs to finalizing the run. anno: perform whole-genome annotation/analysis after TE library construction. * `-cds`: [File] Provide a FASTA file containing the coding sequence (no introns, UTRs, nor TEs) of this genome or its close relative. * `-sensitive`: "Use RepeatModeler to identify remaining TEs (1) or not (0, default). This step is very slow and MAY help to recover some TEs." * `-threads`: "Number of theads to run this script (default: 4)" * `-overwrite`: "[0|1] If previous results are found, decide to overwrite (1, rerun) or not (0, default)." **Timing**: getting the raw LTR candidates took 49m51s on 8 threads (81m16s user time, 8m3s sys time). The raw Helitron candidates took 605m16 on 8 threads but did not use all the threads, only 1-4 of them (621m34 user time, 4m2 sys time). The raw TIR candidates took 1425m38 on 8 threads (8438m38 user time, 181m39 sys time). Considering TIRs took much longer and had the best benefit of real time vs user time, I would in the future just give 2 threads to LTR searching, 6 threads to Helitron searching, and 16 threads to TIR searching. step|threads|Real time|User time|Number of TEs| ---|---|---|---|---| Raw Helitron|8|605m16s|521m34s|146| Raw TIR|8|1425m38s|8438m38s|5108| Raw LTR|8|49.51s|81m16s|157| Quality Control|23|79m27|1334m50s|5278| **What is happenning inside**: First, it runs TIR Learner using maxint = 5000, reformats the names with a perl script, filters based on their flanking sequences, removes simple repeats, and filters out likely misclassified TIRs. Then, it will run HelitronScanner to identify Helitrons. These will also be renamed, filtered based on their flanking sequences, removed simple repeats, and filtered out likely misclassified TIRs. They will also be filtered to remove any with a score below 12. ### Output The final library is in **/home/0_GENOMES1/0_WEIRLAB_GENOMES_CHROMIUMX/Willisornis/EDTA/Willisornis700.fasta.EDTA.TElib.fa** ``` ######################################################## ##### Extensive de-novo TE Annotator (EDTA) v1.6.3 #### ##### Shujun Ou (shujun.ou.1@gmail.com) #### ######################################################## Sat Nov 30 18:00:29 UTC 2019 Dependency checking: All passed! A CDS file is provided via -cds. Please make sure there is no TE-related sequences in this file. Sat Nov 30 18:00:42 UTC 2019 Obtain raw TE libraries using various structure-based programs: Sat Nov 30 18:00:42 UTC 2019 EDTA_raw: Check files and dependencies, prepare working directories. Sat Nov 30 18:00:42 UTC 2019 Start to find LTR candidates. Sat Nov 30 18:00:42 UTC 2019 Existing result file Willisornis700.fasta.LTRlib.fa found! Will keep this file without rerunning this module. Please specify -overwrite 1 if you want to rerun this module. Sat Nov 30 18:00:42 UTC 2019 Finish finding LTR candidates. Sat Nov 30 18:00:42 UTC 2019 Start to find TIR candidates. Sat Nov 30 18:00:42 UTC 2019 Existing result file Willisornis700.fasta.TIR.raw.fa found! Will keep this file without rerunning this module. Please specify -overwrite 1 if you want to rerun this module. Finish finding TIR candidates. Sat Nov 30 18:00:42 UTC 2019 Start to find Helitron candidates. Sat Nov 30 18:00:42 UTC 2019 Existing result file Willisornis700.fasta.Helitron.raw.fa found! Will keep this file without rerunning this module. Please specify -overwrite 1 if you want to rerun this module. Sat Nov 30 18:00:42 UTC 2019 Finish finding Helitron candidates. Sat Nov 30 18:00:42 UTC 2019 Execution of EDTA_raw.pl is finished! Sat Nov 30 18:00:42 UTC 2019 Obtain raw TE libraries finished. Sat Nov 30 18:00:42 UTC 2019 Perform EDTA advcance filtering for raw TE candidates and generate the stage 1 library: Sat Nov 30 18:42:08 UTC 2019 EDTA advcance filtering finished. Sat Nov 30 18:42:08 UTC 2019 Perform EDTA final steps to generate a non-redundant comprehensive TE library: Skipping the RepeatModeler step (-sensitive 0). Run EDTA.pl -step final -sensitive 1 if you want to use RepeatModeler. Sat Nov 30 18:42:08 UTC 2019 Remove CDS in the EDTA library: CommandNotFoundError: Your shell has not been properly configured to use 'conda deactivate'. To initialize your shell, run $ conda init Currently supported shells are: - bash - fish - tcsh - xonsh - zsh - powershell See 'conda init --help' for more information and options. IMPORTANT: You may need to close and restart your shell after running 'conda init'. CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. To initialize your shell, run $ conda init Currently supported shells are: - bash - fish - tcsh - xonsh - zsh - powershell See 'conda init --help' for more information and options. IMPORTANT: You may need to close and restart your shell after running 'conda init'. 2019-11-30 18:42:13,262 -INFO- VARS: {'seq_type': 'nucl', 'min_coverage': 20, 'disable_pass2': False, 'tmp_dir': './tmp', 'processors': 23, 'sequence': 'ref_aves_CDS.lib.mod', 'no_library': False, 'p2_identity': 80.0, 'no_cleanup': False, 'force_write_hmmscan': False, 'p2_length': 80.0, 'prefix': 'ref_aves_CDS.lib.mod.rexdb', 'max_evalue': 0.001, 'p2_coverage': 80.0, 'pass2_rule': '80-80-80', 'hmm_database': 'rexdb', 'no_reverse': False} 2019-11-30 18:42:13,262 -INFO- checking dependencies: 2019-11-30 18:42:13,275 -INFO- hmmer 3.2.1 OK 2019-11-30 18:42:13,366 -INFO- blastn 2.9.0+ OK 2019-11-30 18:42:13,367 -INFO- Start classifying pipeline 2019-11-30 18:42:16,634 -INFO- total 173967 sequences 2019-11-30 18:42:16,635 -INFO- translating `ref_aves_CDS.lib.mod` in six frames /hhome/else/.local/lib/python2.7/site-packages/Bio/Seq.py:2748: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. BiopythonWarning) 2019-11-30 18:49:48,488 -INFO- HMM scanning against `/home/0_PROGRAMS/EDTA/bin/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm` 2019-11-30 18:50:04,155 -INFO- Creating server instance (pp-1.6.5) 2019-11-30 18:50:04,155 -INFO- Running on Python 2.7.15+ linux2 2019-11-30 18:50:04,825 -INFO- pp local server started with 23 workers 2019-11-30 18:50:04,828 -INFO- Task 0 started 2019-11-30 18:50:04,829 -INFO- Task 1 started 2019-11-30 18:50:04,830 -INFO- Task 2 started 2019-11-30 18:50:04,831 -INFO- Task 3 started 2019-11-30 18:50:04,831 -INFO- Task 4 started 2019-11-30 18:50:04,832 -INFO- Task 5 started 2019-11-30 18:50:04,832 -INFO- Task 6 started 2019-11-30 18:50:04,833 -INFO- Task 7 started 2019-11-30 18:50:04,833 -INFO- Task 8 started 2019-11-30 18:50:04,834 -INFO- Task 9 started 2019-11-30 18:50:04,835 -INFO- Task 10 started 2019-11-30 18:50:04,836 -INFO- Task 11 started 2019-11-30 18:50:04,836 -INFO- Task 12 started 2019-11-30 18:50:04,837 -INFO- Task 13 started 2019-11-30 18:50:04,837 -INFO- Task 14 started 2019-11-30 18:50:04,838 -INFO- Task 15 started 2019-11-30 18:50:04,839 -INFO- Task 16 started 2019-11-30 18:50:04,839 -INFO- Task 17 started 2019-11-30 18:50:04,840 -INFO- Task 18 started 2019-11-30 18:50:04,841 -INFO- Task 19 started 2019-11-30 18:50:04,841 -INFO- Task 20 started 2019-11-30 18:50:04,842 -INFO- Task 21 started 2019-11-30 18:50:04,843 -INFO- Task 22 started 2019-11-30 18:59:58,418 -INFO- generating gene anntations 2019-11-30 19:00:22,681 -INFO- 1825 sequences classified by HMM 2019-11-30 19:00:22,681 -INFO- see protein domain sequences in `ref_aves_CDS.lib.mod.rexdb.dom.faa` and annotation gff3 file in `ref_aves_CDS.lib.mod.rexdb.dom.gff3` 2019-11-30 19:00:22,681 -INFO- classifying the unclassified sequences by searching against the classified ones 2019-11-30 19:00:34,810 -INFO- using the 80-80-80 rule 2019-11-30 19:00:34,810 -INFO- run CMD: `makeblastdb -in ./tmp/pass1_classified.fa -dbtype nucl` 2019-11-30 19:00:35,072 -INFO- run CMD: `blastn -query ./tmp/pass1_unclassified.fa -db ./tmp/pass1_classified.fa -out ./tmp/pass1_unclassified.fa.blastout -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs qcovhsp sstrand' -num_threads 23` 2019-11-30 19:01:59,182 -INFO- 1631 sequences classified in pass 2 2019-11-30 19:01:59,182 -INFO- total 3456 sequences classified. 2019-11-30 19:01:59,182 -INFO- see classified sequences in `ref_aves_CDS.lib.mod.rexdb.cls.tsv` 2019-11-30 19:01:59,182 -INFO- writing library for RepeatMasker in `ref_aves_CDS.lib.mod.rexdb.cls.lib` 2019-11-30 19:02:09,144 -INFO- writing classified protein domains in `ref_aves_CDS.lib.mod.rexdb.cls.pep` 2019-11-30 19:02:09,231 -INFO- Summary of classifications: Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains LTR Bel-Pao 80 0 0 0 LTR Copia 932 221 16 0 LTR Gypsy 1098 516 14 0 LTR Retrovirus 162 0 0 0 LTR mixture 15 0 0 0 pararetrovirus unknown 6 0 0 0 DIRS unknown 14 0 0 0 Penelope unknown 7 0 0 0 LINE unknown 275 0 0 0 TIR Kolobok 19 0 0 0 TIR MuDR_Mutator 33 0 0 0 TIR P 20 0 0 0 TIR PIF_Harbinger 15 0 0 0 TIR PiggyBac 1 0 0 0 TIR Tc1_Mariner 5 0 0 0 TIR hAT 95 0 0 0 Helitron unknown 5 0 0 0 Maverick unknown 669 0 0 0 mixture mixture 5 0 0 0 2019-11-30 19:02:09,238 -INFO- Pipeline done. 2019-11-30 19:02:09,238 -INFO- cleaning the temporary directory ./tmp CommandNotFoundError: Your shell has not been properly configured to use 'conda deactivate'. To initialize your shell, run $ conda init Currently supported shells are: - bash - fish - tcsh - xonsh - zsh - powershell See 'conda init --help' for more information and options. IMPORTANT: You may need to close and restart your shell after running 'conda init'. CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. To initialize your shell, run $ conda init Currently supported shells are: - bash - fish - tcsh - xonsh - zsh - powershell See 'conda init --help' for more information and options. IMPORTANT: You may need to close and restart your shell after running 'conda init'. Warning...unknown stuff < > Input file "ref_aves_CDS.lib.mod.rmTE.code.masked" not found! Usage: perl cleanup_tandem.pl -f sample.fa [options] > sample.cln.fa Options: -misschar [n|l] Define the letter representing unknown sequences; default: n. l: recognize lower case letters -Nscreen [0|1] Enable (1) or disable (0) the -nc parameter; default: 1 -nc [int] Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0 -nr [0-1] Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1 -minlen [int] Minimum sequence length filter after clean up; default: 100 (bp) -maxlen [int] Maximum sequence length filter after clean up; default: 25000 (bp) -cleanN [0|1] Retain (0) or remove (1) the -misschar taget in output sequence; default: 0 -cleanT [0|1] Remove entire seq. if any terminal seq (20bp) has 15bp of N (1); disabled by default (0). -minrm [int] The minimum length of -misschar to be removed if -cleanN 1; default: 1. -trf [0|1] Enable (1) or disable (0) tandem repeat finder (trf); default: 1 -trf_path path Path to the trf program Warning: No CDS left after clean up (ref_aves_CDS.lib.mod.noTE.mod.noTE empty). Will not clean CDS in the raw lib. Sat Nov 30 19:19:56 UTC 2019 EDTA final stage finished! Check out the final EDTA TE library: Willisornis700.fasta.EDTA.TElib.fa real 79m27.376s user 1334m49.987s sys 80m28.839s ``` ## Galluhop The element galluhop has been characterized in several species in this [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5556988/) which gave the consensus sequence of the element in their Additional file 3. Is this element in our genome too? We will search for it using BLASTn in our genome. To use cat to create the file, use this code and then copy paste the sequences below. Press ctr-d when you are done to finish writing the file. ```bash cd ~/Willisornis/Class_II cat > galluhop_consensus_Bertocchietal.fasta #(ctrl-d to finish) ``` ### >galluhop_Ggal tgagggtgctctgaaagtaatgcctcctattttattatgttggcccacaacatcagaggcagatgttggtggtatggcagtagagttgaaccttcccaccaatattccattacattttgttgctgtgtgacagatggcagcagaggggcagtctgacaaaatggcgtctgacatggaagtgcatatgaagcaaaggtgtgtcactgaattcctccatgtggaaaaaatggcacccactgacattcattgatgcttgctgaacgtttatggagaccaaacagtggatgtgagcacagtgaggcggtgggtggtgcatttcagcagtggtgacagtgacatgaaagacaagccatgttctggatggcctgcacagctgtcacaccacaaaatgaagagcatctcaatcagctcatccatgcaaatcagcagattatgaccagggaactgtgtacagagctgaatatcagcttcaatgcattggaaataatggtggcaatgttggaatatcacaaagtttgcaccaggtgggtcccacaaatgctcacacaggaacagaaagaacaccatatgcaagtttgtcaggacctattgaaccaatatgaggctgaaggtgacagtttcctggatcacatcattacagtgatgagatgtggtgtcaccactatgagccagagtcaaaacagcagtccatggagtggtgacatgtgaattccccatcaagaaaaagttcaagatgcagccctcagtgggtaaagtgatgtgcactgtcttttgggataggaaaggggtgatccttctggatttcctggaacccgacaaaccatcaactctgaccactacatcacgatgctgactaagctgaaggctcaaacttccagagtcaggccagagaagaagacaacctttctcttgcaacacgataacgccaggccccataccagtttgaagaccatggagcacattgccaatcttggctggactgtcctaccacacccactgtatagtctggatttggaccttctgacttccatctgtttgggctgatgaaagatggactgcatgggcaacattttcctagcaataatgccatcatagcagctgtgaaacagtgggtcacctccactggtgcagatttttatgagcatagcatgcaggctcttgttcatcgctggtgaaaatgcatagctaatggtggtgactatgttgaaaaatagtgttttgtagctgagaatttgctctatcaaatagtgttattgtgctctttgtatctggtagtttccatggaaataaataggaggcattactttcagagcaacctacata >galluhop_Mgal tgagggctgctctgaaagtaatgcctcctattttattatgttggcccacaacatcagaggcagatgttggtggtatggcagtagaggtgaaccttcccaccaatattccattacattttgttgctgtgtgacagatggcagcagaggggcagtctgacaaaatggcatctgacatggaagtgcatatgaagcaaaggtgtgtcactgaattcctccatgtggaaaaaaatggcacccactgacattcattgatgcttgctgaatgtttaggagaccaaacagtggatgtgagcacagtgaggtggtgggtggtgcatttcagcagtggtgacagtgacatgaaagacaagccacgttccagatggccatgcacagctgtcacaccacaaaatgaagagcatctcaatcagctcatccacgcaaatcagcagattatgaccagggaactgtgtacggagctgaatatcagcttcaatgcattggaaacaatggtggcaatgttggaatatcacaaagtttgtaccaggtgggtcccacaaatgctcacacaggaacagaaagaacaccatatgcaagtttgtcaggacctattgaaccaatatgaggctgaagggacagtttcctggatcacatcattaccagtgatgagatgtggtgtcaccactatgagctggagtcaaaacagcagtccatggagtggtgacatgtgaattccccattgaagaaaaagttcaagatgcagccctcagtgggtaaagtgatgtgcactgtcttttgggataggaaaggggtgatccttctggatttcctggaacccagacaaaccatcaactctgaccactacatcacaatgctgactaagctgaaggctcaaacttccagagtcaggccagagaagaagacaacctttctcttgcaacacaataataccaggccccataccagtttgaagaccatggagcacattgccaatcttggctggactgtcctaccacacccactatatagtccagatttggcaccttctgacttccatctgtttgggccaatgaaagatggactgcatgggcaacattttcctagcaataatgccatcatagcagctgtgtgaaacagtgggtcacctccactggtgcagatttttatgagcatagcatgcaggctcttgttcattgctggtgaaaatgatagctaatggtggtgactatgttgaaaaatagtgttttgtagctgagaatttgctctatcaaatagtgttattgtgctctttgtatctgttgtagtttccatggaaataaataggagcattactttcagagcaacctaac >galluhop_Ltet agtaatgcctcctattttattatgttggcccacaattcagaggcagatgttggtggtatggcagtagaggttgaaccttcccaccaatattccattacattttgttgccgtgtgacagatggcagcagagggggcagtctgacaaaatggcgtctgacatggaagtgcgtatgaagcaaaggtgtgtcactgaattcctccatgtggaaaaaatggcacccactgacattcatcaacacttgctgaacatttatggagaccaaacagtggatgtgagcacagtgaggcagtgggtggtgcatttcagcagtggtgacagtgacatgaaagacaagccacgttccagatggccatgcacagctgtcacaccacaaaatgaagagcatctcaatcagctcatccatgcaaatcagcggattatgaccagggaactgttacgagctgaatatggcttcagtgcattggaaatgatggtgacaatgttggaatatagcaagtttagccagatgggtcccacaaatgctcacacaggaacagaaagaacactgtatgcaagtttgtcaggacctattgaaccaatatgaggctgaaggtgacagtttcctggatcaccatcattaccagtgatgagagtggtgtcaccactatagcggagtcaaaacagcagtccatggagtggcaacatgtgaattccccattgaagaaaaagttcaagatacagctctcagtgggtaaagtgatgtgcactgtcttttgggataggaaaggggtgatccttctggatttcctggaacccagacaaaccatcaactctgacgctacatgcatgctgactaagctgaaggctcaaacttccagagtcaggccagagaagaagacaacctttctcttgcaacacaataacaccaggccccataccagtttgaagaccatggagcacattgccaatcttggctggactgtcctaccacacccaccatatagtccgatttggcccttctgacttccatctgtttgggccaatgaaagatggactgcatgggcaacattttcctagcaacaataccatcatagcagctgtgaaacagtgggtcacctcgctggtgcagatttttatgagccagcatgcaggctcttgttcattgctggcaaaaatgcatagctaatggtggtgactatgttgaaaaatagtgttttgtagctgagaatttgctctatcaaatagtgttattgtgctctttgtatctgttgtagtttccatggaaataaataggaggcat >gallohop_Cvir taagggctgctccaaaagtaatgcctcctattttattatgttggcccacaacatcagaggcagatgttggtggtatggcagtagaggttgaaccttcccaccaatattccattacattttgttgctgtgtgacagatggcagcagaggggcagtctgacaaaatgggtgtctgacatggaagtgtgtatgaagcaaaggtgtgtcactgattcctccatgtggggaaaaaaattggcacccactgacattcatcaatgcttgctgaatgtttatggagaccaaacagtggatgtgagcacagtgaggcagtgggtggtgcatttcagcagtggtgacagcaatgtgaaagacaagccacgttctggatggccatgcacagctgtcacaccaaaaatgaagaggtctcatcagctcatccatacaatagcagattatgaccagggaactgtgtacgagctgaatatcagcttcaatgcattggaaatgatggtggcaacattggaatatacaaagtttgtgccaggtgggtcccacaaatgctcacacaggaacagaaagaacaccatatgcaagtttgtcaggacctattgaaccaatatgaggctgaaggtgacagtttcctggatcacatcattaccagtgataagacatggtgtcaccactacgagctgagtcaaaacggcagtccatggagtggcaacatgtgaattccccattgaagaaaaagttcaagatagcagccctcagcgggtaaagtgatgtgcactgtcttttgggataggaaaggggtgatccttctggatttcctggaacccagacaaaccatcaactctgaccgctacatcatgacactgactaagctgaaggctcaaacttccagagtcaggccagagaagaagacaacctttctcttgcaacacaataacaccaggccccataccagtttgaagactatggagcacatgccaatcttggctggactgtcctaccacacccaccatatagtctggatttggcaccttctgacttccatctgtttgggctgatgaaagatggactgcatgggcaacattttcctagcaacaatgccatcatagcagctgtgaaacagtgggtcacctccactggtgcagatttttatgagtgtggcatgcagagctcttgttcattgctggcaaaaatgcatagctaatggtggtgactatgttgaaaaatagtgttttgtagctgagaatttgctctatcaaaatagtgttattgtgctcttttgtatctgttgtagtttccatggaaataaataggaggcattacttttagagcaacctatgta >galluhop_Brhi cgagggctgctctgaaagtaatgcctcctattttattatgttggcccacacgtcagaggcagatgttggtggtatggcagtagaggttgaaccttcccaccaatattccttaaattttgttgccgtgtgacagaggcagcagaggggcagtctgacaacatggcgtctgacatgggagtgtgtatgaagcaaaggggtgtaactgaattcctccatgtggaaaaaattgcacccactgacattcatcgacacttgctgaacatttatggagaccaaacagtggtgtcagcacagtgagggggtgggtggtgcgtttcagcagtggtgacagtgacggtgggtcccctccgctggtgcagatttttatgacgagcgcggcacgcaggctcttgttcatcgctggtgaaaatgcacagccagtggtggggactgtgtcaaaaaatagtgttttgtagctgagaatttgctctatcaaacagtgttattgtgctctttgtatctgttgtagtttccatggaaataaataggaggcattactttcggagcgacctacgt >gallohop_Cjap ctagggctgctctgaaagtaatgcttcctattttatatgctatcccaggatcagaggtaaatgttggtggtacggcagtagaggttgaaccttcccatcaatatccattacagtttgttgacgtgtgacagatggaagcataggggcagtctgagaaaatggtatctgatacggagtgtgatgaagcaaagctggtcatgaattcttcaatgtggaaaaaatggtacccactgacatccatcaacactagctgaaggtttatggagcccaaacagtggatgagagcacagtgaggcagtgggtggtgcatttcagcagtggtgacagtgacagtgggtcacctctgctggtgcagatttttacaagccggcatgcaggctcttgttcatggctggcgaagatatgtaattaatggtggtgactatgttgaaaaatagtgttttgtcattgagaatttctctatgaaagagcgttattgtgctctttgaaaattgtatttccatggaaataaatacaggattacttttggagcaaccat ### Now search for the above sequences with BLAST: ```bash ~/tools/ncbi-blast-2.9.0+/bin/blastn -subject ~/Willisornis/Genome/Willisornis_poecilinotus_JTW1144_FASTAOUTPUT.fasta -query galluhop_consensus_Bertocchietal.fasta -out galluhop.out ``` Output: No hits found There is no recognizable galluhop sequence in the Willisornis genome. ## De Novo RepeatModeler Library I already made a custom structure-based TE library. Now I need a *de novo* repeat library. I already ran RepeatModeler but now I have a more contiguous genome. So, I will run it again on the new genome. Also, this time, I will not repeatmask my genome before running RepeatModeler because maybe it will do a better job at making a consensus sequence. ```bash mkdir repeatmodeler cd /home/0_GENOMES1/0_WEIRLAB_GENOMES_CHROMIUMX/Willisornis/repeatmodeler #I moved a copy of the genome sequence here, called Willisornis700.fasta export PERLBREW_ROOT=/opt/perl5 #perlbrew will be installed in opt /opt/perl5/bin/perlbrew switch perl-5.30.0 #A sub-shell is launched with perl-5.30.0 as the activated perl. Run 'exit' to finish it. time /home/0_PROGRAMS/RepeatModeler-2.0/BuildDatabase -name Willisornis700 -engine ncbi Willisornis700.fasta time /home/0_PROGRAMS/RepeatModeler-2.0/RepeatModeler -engine ncbi -pa 23 -database Willisornis700 -LTRStruct >& Willisornis700_RMod.log time /home/0_PROGRAMS/RepeatModeler-2.0/RepeatModeler -engine ncbi -pa 23 -database Willisornis700 -LTRStruct -recoverDir RM_20262.WedNov270641302019 >& Willisornis700_RMod_part2.log tail -n 25 Willisornis700_RMod_part2.log #check the log to make sure all is ok #Separate known and unknown repeats. time perl ~/tools/CRL_Scripts1.0/repeatmodeler_parse.pl --fastafile Willisornis700-families.fa --unknowns Willisornis700_repeatmodeler_unknowns.fasta --identities Willisornis700_repeatmodeler_identities.fasta #count output! grep -c ">" Willisornis700_repeatmodeler_identities.fasta #205 grep -c ">" Willisornis700_repeatmodeler_unknowns.fasta #241 ``` * `--unknowns`: file to place unknown repeats into * `--identities`: file to place identified repeats into * `--fastafile`: output file of RepeatModeler **Timing**: building the database took 19 seconds. Running RepeatModeler took 33h39m according to the log, and the computer counted 2019m39s (44537m41s user time, 23m32 sys time) but that is only round 5. Try to identify the unknown repeats by comparing them to a transposase library which is available from the Maker2 wiki. ```bash wget http://www.hrt.msu.edu/uploads/535/78637/Tpases020812DNA.gz #2340 sequences gunzip Tpases020812DNA.gz ~/tools/ncbi-blast-2.9.0+/bin/makeblastdb -in Tpases020812DNA -dbtype prot time ~/tools/ncbi-blast-2.9.0+/bin/blastx -query Willisornis700_repeatmodeler_unknowns.fasta -db Tpases020812DNA -evalue 1e-10 -num_descriptions 10 -out Willisornis700_modelerunknown_blast_results.txt time perl ~/tools/CRL_Scripts1.0/transposon_blast_parse.pl --blastx Willisornis700_modelerunknown_blast_results.txt --modelerunknown Willisornis700_repeatmodeler_unknowns.fasta #rename the output file to be more memorable mv unknown_elements.txt WillisornisModelerUnknown.lib #combine the newly identified TE file with the previously-identified TE file cat identified_elements.txt Willisornis700_repeatmodeler_identities.fasta > WillisornisModelerID.lib ``` **Timing:** very fast, the BLAST search took 7s, parsing took 0.4s No hits were found, we can not classify any of the unknowns. Finally, we need to clean our library by removing possible protein sequences. This is discussed in the proteomes page of this repository, but briefly I made a reference protein library of a few well model bird species and removed any proteins that seemed to match a database of TE proteins. We will use this TE-free database to remove host proteins from our TE library. ```bash #get TE-free bird proteins and make a BLAST database cp /hhome/else/Genetic_resources/RefAves_proteins_no_tes.fa . #60780 sequences time /home/0_PROGRAMS/ncbi-blast-2.9.0+/bin/makeblastdb -in RefAves_proteins_no_tes.fa -dbtype prot #get the taxID database for BLAST wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz gunzip taxdb.tar.gz tar -xvf taxdb.tar #run blastx, matching the repeat library to the non-TE protein library that we used before time /home/0_PROGRAMS/ncbi-blast-2.9.0+/bin/blastx -query WillisornisModelerUnknown.lib -db RefAves_proteins_no_tes.fa -outfmt '6 qseqid staxids bitscore std sscinames sskingdoms stitle' -max_target_seqs 25 -culling_limit 2 -num_threads 23 -evalue 1e-10 -out WillisornisModelerUnknown.vs.RefAves_proteins_no_tes.out #remove the significant hits with fastaqual_select ~/tools/assemblage/fastaqual_select.pl -f WillisornisModelerUnknown.lib -e <(awk '{print $1}' WillisornisModelerUnknown.vs.RefAves_proteins_no_tes.out | sort | uniq) > WillisornisModelerUnknown_noprot.lib #removed 8 #run blastx, matching the repeat library to the non-TE protein library that we used before time ~/tools/ncbi-blast-2.9.0+/bin/blastx -query WillisornisModelerID.lib -db RefAves_proteins_no_tes.fa -outfmt '6 qseqid staxids bitscore std sscinames sskingdoms stitle' -max_target_seqs 25 -culling_limit 2 -num_threads 23 -evalue 1e-10 -out WillisornisModelerID.vs.RefAves_proteins_no_tes.out #remove the significant hits with fastaqual_select ~/tools/assemblage/fastaqual_select.pl -f WillisornisModelerID.lib -e <(awk '{print $1}' WillisornisModelerID.vs.RefAves_proteins_no_tes.out | sort | uniq) > WillisornisModelerID_noprot.lib #removed 32 #Now, concatenate our final RepeatModeler library of repeats! cat WillisornisModelerID_noprot.lib WillisornisModelerUnknown_noprot.lib > Willisornis_RepeatModeler_library.lib #clean up tandem repeats time perl /home/0_PROGRAMS/EDTA/util/cleanup_tandem.pl -misschar N -nc 50000 -nr 0.8 -minlen 80 -minscore 3000 -trf 1 -trf_path /home/0_PROGRAMS/trf -cleanN 1 -cleanT 1 -f Willisornis_RepeatModeler_library.lib > Willisornis_RepeatModeler_library_clean.lib ``` 2s to makeblastdb, 16s BLAST (5.31 user time) [unknowns], 22s BLAST [knowns], other steps \~instant. # Final Species-specific Library ```bash #And don't forget our structure-based repeats! #First I need to take out any proteins from the MITEs as I had not done that yet cp /hhome/else/Willisornis/MITE/Willisornis.MITE.fa . #run blastx, matching the TE DNA library to the non-TE protein library /home/0_PROGRAMS/ncbi-blast-2.9.0+/bin/blastx -query Willisornis.MITE.fa -db RefAves_proteins_no_tes.fa -outfmt '6 qseqid staxids bitscore std sscinames sskingdoms stitle' -max_target_seqs 25 -culling_limit 2 -num_threads 23 -evalue 1e-10 -out Willisornis_MITEs_lib.vs.RefAves_proteins_no_tes.out #8 hits #remove the significant hits with fastaqual_select ~/tools/assemblage/fastaqual_select.pl -f Willisornis.MITE.fa -e <(awk '{print $1}' Willisornis_MITEs_lib.vs.RefAves_proteins_no_tes.out | sort | uniq) > Willisornis_MITEs_noprot.lib #removed 4 cp /home/0_GENOMES1/0_WEIRLAB_GENOMES_CHROMIUMX/Willisornis/EDTA/Willisornis700.fasta.EDTA.TElib.fa . #run blastx, matching the TE DNA library to the non-TE protein library /home/0_PROGRAMS/ncbi-blast-2.9.0+/bin/blastx -query Willisornis700.fasta.EDTA.TElib.fa -db RefAves_proteins_no_tes.fa -outfmt '6 qseqid staxids bitscore std sscinames sskingdoms stitle' -max_target_seqs 25 -culling_limit 2 -num_threads 23 -evalue 1e-10 -out Willisornis_EDTAs_lib.vs.RefAves_proteins_no_tes.out #remove the significant hits with fastaqual_select ~/tools/assemblage/fastaqual_select.pl -f Willisornis700.fasta.EDTA.TElib.fa -e <(awk '{print $1}' Willisornis_EDTAs_lib.vs.RefAves_proteins_no_tes.out | sort | uniq) > Willisornis_EDTAs_noprot.lib #removed 22 cat Willisornis_RepeatModeler_library_clean.lib Willisornis_EDTAs_noprot.lib Willisornis_MITEs_noprot.lib > Willisornis_repeat_library.lib #5709 sequences ``` Our final species-specific repeat library is found in ***/home/0_GENOMES1/0_WEIRLAB_GENOMES_CHROMIUMX/Willisornis/repeatmodeler/Willisornis_repeat_library.lib***. I have also placed it in the Repeat_library folder of this repository. # Hints from other species To make our library even more comprehensive, we can concatenate it with previously published, curated bird TE libraries. Are there more repeats we could use for repeatmasking our genomes? Often, researchers will use the RepBase database, however I personally to not have a subscription. There are, however some previously published, high-quality passerine repeat libraries available. Two, the Blue-capped Cordon Bleu (*Uraeginthus cyanocephalus*) and Collared Flycatcher *Ficedula albicollis* look like they would be beneficial to add with our own library. ```bash #download the sequences cd ~/Genetic_resources wget https://dfam.org/TE_repository/141/2019/5/uraCya_rm2.45.fasta #Blue-capped Cordon Bleu Uraeginthus wget https://dfam.org/TE_repository/8/2017/11/68015-fAlb15_rm3.0.lib.gz #Ficedula albicollis gunzip 68015-fAlb15_rm3.0.lib.gz #concatenate with our own library cat /home/0_GENOMES1/0_WEIRLAB_GENOMES_CHROMIUMX/Willisornis/repeatmodeler/Willisornis_repeat_library.lib ~/Genetic_resources/68015-fAlb15_rm3.0.lib ~/Genetic_resources/uraCya_rm2.45.fasta > /home/0_GENOMES1/0_WEIRLAB_GENOMES_CHROMIUMX/Willisornis/repeatmodeler/Willisornis_repeat_library_withFicalbUracya.lib ``` Our final repeat library to use for RepeatMasking is found in ***/home/0_GENOMES1/0_WEIRLAB_GENOMES_CHROMIUMX/Willisornis/repeatmodeler/Willisornis_repeat_library_withFicalbUracya.lib***