lunar (1) pscan-tfbs.1.gz

Provided by: pscan-tfbs_1.2.2-4build1_amd64 bug

NAME

       pscan - detection of transcription factor binding sites in DNA sequences

SYNOPSIS

       pscan -q multifastafile -p multifastafile [options]
       pscan -p multifastafile [options]
       pscan -q multifastafile -M matrixfile [options]

DESCRIPTION

       Pscan  inspects  the upstream non-coding regions of many genes to derive subsequences that
       are characteristic for the binding of proteins, i.e. transcription factors,  that  control
       the  tissue-  and  situation-dependent expression of a gene.  The tool is supported by the
       JASPAR database and other data that is downloadable from the tool's home page.

       The command line tool pscan is meant for bulk submission. The tool is  also offered with a
       web interface that has all auxillary data updated.

OPTIONS

       pscan  options  only  have single dashes (`-') and (with notable exceptions) followed by a
       single letter. Options are case-sensitive.  A summary of options is included below.

       -h     Show summary of options similar to this man page.

       -v     Show version of program.

       -q     file Specify the multifasta file containing the foreground sequences.

       -p     file Specify the multifasta file containing the background sequences.

       -m     file Use it if the background data are already available in a file (see -g option).

       -M     file Scan the foreground sequences  using  only  the  Jaspar/Transfac  matrix  file
              contained in the specified file.

       -l     file Use the matrices contained in that file (for matrix file format see below).

       -N     name Use only the matrix with that name (usable only in association with -l).

       -ss    Perform single strand only analysis.

       -rs    Perform single strand only analysis on the reverse strand.

       -split num1num2 Sequences are scanned only from position num1 and for num2 nucleotides.

       -trashn
              Discards sequences containing "N".

       -n     Oligos containing "N" will not be discarded. Instead a "N" will obtain an "average"
              score.

       -g     If a background sequences file is used than a file will be written  containing  the
              data  calculated  for  that  background  sequences and the current set of matrices.
              From now on one can use that file (-m option) instead of  the  sequences  file  for
              faster processing.

       -ui file
              An index of the background file will be used to avoid duplicated sequences.

       -bi    Build  an  index  of  the  background sequences file (to be used later with the -ui
              option).  This is useful when you have duplicated sequences in your background that
              may introduce a bias in your results.

NOTES

       The  sequences  to be used with Pscan have to be promoter sequences.  To obtain meaningful
       results it's critical that the background and  the  foreground  sequences  are  consistent
       between  them  either  in  size  and  in position (with respect to the transcription start
       site). For optimal results the foreground set should be a subset of the background set.

       If the "-l" option is not used Pscan will try to find Jaspar/Transfac matrix files in  the
       current  folder.   Jaspar  files  have  ".pfm"  extension  while Transfac ones have ".pro"
       extension.  If Jaspar matrix files are used than a file called "matrix_list.txt"  must  be
       present  in  the  same folder.  That file contains required info about the matrices in the
       ".pfm" files.

EXAMPLES

       1) pscan -p human_450_50.fasta -bi

       This command will scan the file "human_450_50.fasta" using the  matrices  in  the  current
       folder.   It is handy to use that command the first time one uses a set of matrices with a
       given background sequences file.  A file called human_450_50.short_matrix will be  written
       and  it  can  be used from now on every time you want to use the same background sequences
       with the same set of matrices.  A file called human_450_50.index will be written  too  and
       it will be useful every time you will use the same background file.

       2) pscan -q human_nfy_targets.fasta -m human_450_50.short_matrix -ui human_450_50.index

       This  command  will  scan  the file human_nfy_targets.fasta searching for over-represented
       binding  sites  (with  respect  to  the   preprocessed   background   contained   in   the
       "human_450_50.short_matrix"  file)  using the matrices in the current folder.  Please note
       that the query file "human_nfy_targets.fasta" must be a subset of the sequences  contained
       in  the background file "human_450_50.fasta" in order to use the index file with the "-ui"
       option. This means that both the sequences and their FASTA headers used in the query  file
       must  appear  in  the  background  file as well. Using the "-ui" option when the sequences
       contained in  the  query  file  are  not  a  subset  of  the  background  file  will  have
       undefined/unpredictable    outcomes.         The    output   will   be   a   file   called
       "human_nfy_targets.fasta.res" where  you  will  find  all  the  used  matrices  sorted  by
       ascending P-value.  The lower the P-value obtained by a matrix, the higher are the chances
       that the transcription factor associated to that  matrix  is  a  regulator  of  the  input
       promoter  sequences.   The  fields  of the output are the following: "Transcription Factor
       Name", "Matrix ID", "Z Score", "Pvalue", "Foreground Average", "Background Average".

       3) pscan -q human_nfy_targets.fasta -M MA0108.pfm

       This command will scan the  sequences  file  "human_nfy_targets.fasta"  using  the  matrix
       contained   in   "MA0108.pfm".    The   result   will   be   written   in  a  file  called
       "human_nfy_targets.fasta.ris" where you will find the  sequences  in  input  sorted  by  a
       descending  score  (between  1 and 0). The higher the score, the better is the oligo found
       with respect to the used matrix.  The fields of the output are  the  following:  "Sequence
       Header",  "Score",  "Position  from the end of sequence", "Oligo that obtained the score",
       "Strand where the oligo was found".

       4) pscan -p human_450_50.fasta -bi -l matrixfile.wil

       This command is like Example #1 with the difference that the matrices set to  be  used  is
       the    one    contained    in   the   "matrixfile.wil"   file.    Please   look   at   the
       "example_matrix_file.wil" file included in this Pscan  distribution  to  see  the  correct
       format for matrices file.

       5) pscan -q human_nfy_targets.fasta -l matrixfile.wil -N MATRIX1

       This  command  is like Example #3 but it will use the matrix called "MATRIX1" contained in
       the "matrixfile.wil" file.

SEE ALSO

        For info on how Pscan works pleare refer to the paper.

                                           May  3 2018                                   PSCAN(1)