Ubuntu Manpage: pscan - detection of transcription factor binding sites in DNA sequences

Provided by: pscan-tfbs_1.2.2-4build1_amd64

NAME

       pscan - detection of transcription factor binding sites in DNA sequences

SYNOPSIS

       pscan -q multifastafile -p multifastafile [options]
       pscan -p multifastafile [options]
       pscan -q multifastafile -M matrixfile [options]

DESCRIPTION

       Pscan  inspects  the upstream non-coding regions of many genes to derive subsequences that
       are characteristic for the binding of proteins, i.e. transcription factors,  that  control
       the  tissue-  and  situation-dependent expression of a gene.  The tool is supported by the
       JASPAR database and other data that is downloadable from the tool's home page.

       The command line tool pscan is meant for bulk submission. The tool is  also offered with a
       web interface that has all auxillary data updated.

OPTIONS

pscan options only have single dashes (`-') and (with notable exceptions) followed by a
single letter. Options are case-sensitive. A summary of options is included below.

-h Show summary of options similar to this man page.

-v Show version of program.

-q file Specify the multifasta file containing the foreground sequences.

-p file Specify the multifasta file containing the background sequences.

-m file Use it if the background data are already available in a file (see -g option).

-M file Scan the foreground sequences using only the Jaspar/Transfac matrix file
contained in the specified file.

-l file Use the matrices contained in that file (for matrix file format see below).

-N name Use only the matrix with that name (usable only in association with -l).

-ss Perform single strand only analysis.

-rs Perform single strand only analysis on the reverse strand.

-split num1num2 Sequences are scanned only from position num1 and for num2 nucleotides.

-trashn
Discards sequences containing "N".

-n Oligos containing "N" will not be discarded. Instead a "N" will obtain an "average"
score.

-g If a background sequences file is used than a file will be written containing the
data calculated for that background sequences and the current set of matrices.
From now on one can use that file (-m option) instead of the sequences file for
faster processing.

-ui file
An index of the background file will be used to avoid duplicated sequences.

-bi Build an index of the background sequences file (to be used later with the -ui
option). This is useful when you have duplicated sequences in your background that
may introduce a bias in your results.

NOTES

       The  sequences  to be used with Pscan have to be promoter sequences.  To obtain meaningful
       results it's critical that the background and  the  foreground  sequences  are  consistent
       between  them  either  in  size  and  in position (with respect to the transcription start
       site). For optimal results the foreground set should be a subset of the background set.

       If the "-l" option is not used Pscan will try to find Jaspar/Transfac matrix files in  the
       current  folder.   Jaspar  files  have  ".pfm"  extension  while Transfac ones have ".pro"
       extension.  If Jaspar matrix files are used than a file called "matrix_list.txt"  must  be
       present  in  the  same folder.  That file contains required info about the matrices in the
       ".pfm" files.

EXAMPLES

1) pscan -p human_450_50.fasta -bi

This command will scan the file "human_450_50.fasta" using the matrices in the current
folder. It is handy to use that command the first time one uses a set of matrices with a
given background sequences file. A file called human_450_50.short_matrix will be written
and it can be used from now on every time you want to use the same background sequences
with the same set of matrices. A file called human_450_50.index will be written too and
it will be useful every time you will use the same background file.

2) pscan -q human_nfy_targets.fasta -m human_450_50.short_matrix -ui human_450_50.index

This command will scan the file human_nfy_targets.fasta searching for over-represented
binding sites (with respect to the preprocessed background contained in the
"human_450_50.short_matrix" file) using the matrices in the current folder. Please note
that the query file "human_nfy_targets.fasta" must be a subset of the sequences contained
in the background file "human_450_50.fasta" in order to use the index file with the "-ui"
option. This means that both the sequences and their FASTA headers used in the query file
must appear in the background file as well. Using the "-ui" option when the sequences
contained in the query file are not a subset of the background file will have
undefined/unpredictable outcomes. The output will be a file called
"human_nfy_targets.fasta.res" where you will find all the used matrices sorted by
ascending P-value. The lower the P-value obtained by a matrix, the higher are the chances
that the transcription factor associated to that matrix is a regulator of the input
promoter sequences. The fields of the output are the following: "Transcription Factor
Name", "Matrix ID", "Z Score", "Pvalue", "Foreground Average", "Background Average".

3) pscan -q human_nfy_targets.fasta -M MA0108.pfm

This command will scan the sequences file "human_nfy_targets.fasta" using the matrix
contained in "MA0108.pfm". The result will be written in a file called
"human_nfy_targets.fasta.ris" where you will find the sequences in input sorted by a
descending score (between 1 and 0). The higher the score, the better is the oligo found
with respect to the used matrix. The fields of the output are the following: "Sequence
Header", "Score", "Position from the end of sequence", "Oligo that obtained the score",
"Strand where the oligo was found".

4) pscan -p human_450_50.fasta -bi -l matrixfile.wil

This command is like Example #1 with the difference that the matrices set to be used is
the one contained in the "matrixfile.wil" file. Please look at the
"example_matrix_file.wil" file included in this Pscan distribution to see the correct
format for matrices file.

5) pscan -q human_nfy_targets.fasta -l matrixfile.wil -N MATRIX1