Department of Biotechnology and Bioinformatics

RetroPred: A Tool for localizing retro-transposable elements in the genome and assigning as LINEs and SINEs based on conserve pattern and Artificial Neural Network

Developers: Dr. P. K. Naik, Sumit Gupta, Vinay Kumar Mittal
[Introduction]         [Instructions]         [Protocol]         [Statistics]         [Results]         [Downloads]         [About us]        

Repeat analysis plays a key role in the study, analysis,and comparison of complete genomes. In the analysis of a single genome, a basic task is to characterize and locate the repetitive elements of the genome. One of the several categories of repetitive elements is transposable elements. They are the normal and ubiquitous components of prokaryote and eukaryote genome and are present in all organisms. Transposable elements cause genetic changes and make important contributions to the evolution of genomes. Repetitive elements form a major fraction of eukaryotic genomes.

The tool "Retropred" develped is an automated methods integrating results from PALS, PILER, MEME and artificial neural network (ANN). The pipeline allows rapid detection of genomic repeats and their further assignment as LINEs and SINEs based on conserve pattern.Pals and Piler are used to identify transposable DNA family.Then MEME is run to discover conserved short patterns (50 bp long) present in the identified repeats. From the discovered patterns, binary pattern files are generated.These patterns files are used as input for a trained Artificial Neural Network for classification into LINEs and SINEs. The results are parsed into graphical representation, indicating the location of LINEs and SINEs in the genome. By clicking the corresponding label it is possible to extract the repeat sequence. The Tool is standalone and can be downloadable from the link provided. The user should follow the instructions below regarding installation and uses.

User Instructions

Software downloads and Installation

  1. Download PALS

  2. Download PILER

  3. Install these softwares on LINUX platform.
    (Note: One can use cygwin to install above softwares on windows platform only.)

Instructions for using the model

Follow the following steps to identify and classify Transposable DNA elements in the whole genome or very large DNA sequence.

  1. Paste FASTA formated DNA sequence file (say genome.fasta) in the installation directory of PALS (say pals) and run the following command on the new terminal:

    pals -self genome.fasta -out hits.gff

  2. Paste the output file (hits.gff) generated and genome file (genome.fasta) in the PILER installation directory (say piler) and run the following commands in the new terminal window:

    piler -trs hits.gff -out trs.gff
    mkdir fams
    piler -trs2fasta trs.gff -seq genome.fasta -path fams

  3. Go to the directory 'fams' and find the repaeat family files generated.Each file will contain few repeat sequences having high similarity among them and will belong to the same sub-class:

  4. Check the output folder location for gene files generated. These are the number of gene sequences present in the input genome.

  5. Go to the  Download  page where files for running tool on standalone system are available along with the guideline for their uses.

Working Team

  1. Dr. Pradeep K Naik , Sr. Lecturer, Dept. of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan

    E-mail :

  2. Sumit Gupta, B.Tech. Bioinformatics, Jaypee University of Information Technology, Solan

    E-mail :

  3. Vinay Kumar Mittal, B.Tech. Bioinformatics, Jaypee University of Information Technology, Solan

    E-mail :

copyright to JUIT,Waknaghat