Provided by: spaln_2.4.0+dfsg-2build1_amd64 bug

NAME

       spaln  -  mapping  and alignment of cDNA/protein sequences onto genomic sequence or rapid homology search
       against protein sequence database and (semi)global alignment

SYNOPSIS

       spaln -WGenome.bk[n|p] -K[D|P] [W_Options] Genome.mfa(.gz)
       spaln -WProsDb.bka -KA [W_Options] ProteinSeqenceDB.faa(.gz)
       spaln [-Q[0|1|2|3]] [R_options] Genome_segment [cDNA|protein]_queries
       spaln -Q[4|5|6|7] -dGenome [R_options] [cDNA|protein]_queries
       spaln -Q[4|5|6|7] -aProsDb [R_options] Genomic_segment
       spaln -Q[4|5|6|7] -aProsDb [R_options] protein_queries
       spaln -Q[4|5|6|7] [R_options] Genome.mfa [cDNA|protein]_queries
       spaln -Q[4|5|6|7] [R_options] ProsDb.faa protein_queries

DESCRIPTION

       spaln is a stand-alone program that maps and aligns a  set  of  cDNA/protein  sequences  onto  a  genomic
       sequence.   From  version  1.4,  spaln  supports  a  combination of protein sequence database and a given
       genomic segment. From version 2.2, it also supports rapid similarity search of protein sequences  against
       a protein sequence database followd by ordinary alignment.

       Spaln  runs  with relatively small internal memory, making it feasible to search a whole mammalian genome
       in a single job on a conventional computer.

EXAMPLES

       spaln -Wddignm.bkn -KD -Xk10 -Xb8192 ddignm.mfa
              Make the block index table 'ddignm.bkn' of the genomic sequence 'ddignm.mfa' in multi-fasta format
              with the word size of 10 nucleotides and the block size of 8192.

       spaln -O1 'chr1.fa 10001 40000 <' cdna.fa
              Align  the cDNA sequence 'cdna.fa' onto the genomic segment of 'chr1.fa' within the range from the
              10001-th to the 40000-th nucleotide of the complementary strand in the semi-global mode.

       spaln -Q7 -LS -t10 -dddignm ddicdna.mfa > ddi.exon
              Report the gene structures corresponding to individual cDNAs in 'ddicdna.mfa'. The local alignment
              mode  is  used  and  the  results  are  written  in  'ddi.exon'  in the exon-oriented format.  The
              computation proceeds with 10 threads in parallel.

       spaln -Q7 -O5 -XG1M -yX -TMus -ommu.intron -dmmugnm 'ratcdna.mfa (1 20)'
              Cross-species comparison between the genome 'mmugnm' and the cDNAs in 'ratcdna.mfa' from the first
              to  the  20-th  entries.   The  maximal  expected  gene size is reset to 1 Mbp.  The outputs go to
              'mmu.intron' in the intron-oriented format. The tetrapod-specific parameter set is used.

       spaln -Q7 -O0 -Tarabthal -aSwiss -oyour_genes.gff3 your_genomic_segment
              A set of ORFs in the genomic segment are translated  and  rapidly  searched  against  the  protein
              database, and then the best-hit sequence is used as the template of the spliced alignment to infer
              the organization of the gene located in the relevant genomic region. The output is shown  in  Gff3
              gene format.  The dicot-specific parameter set is used.

       spaln -Q4 -O0 -aSwiss -M4 -t10 aa_queries > output
              Find  up  to  four  SwissProt  sequences  most  similar  to each query sequence, and report global
              alignment statistics. To show  alignment  themselves,  use  -O1  option  in  stead  of  -O0.   The
              computation proceeds with 10 threads in parallel.

OPTIONS

       Conventions: #: number; $: string; default values in ()
        (default for DNA, with -yX option, default for protein)

FORMATING OPTIONS

       -K$    Format the genomic sequence for DNA ($=D) or protein ($=P) queries, or format the protein database
              sequences ($=A) for rapid search of template.

       -Xk#   Word size (11 for DNA, 5 for protein)

       -Xb#   Block size (4096)

       -XG#   Maximum gene size (262144)

       -Xa#   Abundance factor (10)

       -Xs#   Shifts between adjacent seeds (k)

RUN TIME OPTIONS

       -C#    NCBI transl_table number indicating the genetic code (1)

       -H#    Minimum alignment score for report (35)

       -LS    Smith-Waterman-type local alignment

       -M[#]  Multiple loci (0)
              #=empty: Multiple loci maximally up to 4
              #=0: Single locus
              #=1: Re-search unaligned parts
              #>1: Multiple loci maximally up to #

       -O[#]  Output format (4)
              #=0: GFF3 format (gene) in genome vs [cDNA|protein] mode
              #=0: Alignment statistics in protein vs protein mode
              #=1: Alignment
              #=2: GFF3 format (match)
              #=3: Bed format
              #=4: Exon-oriented format similar to output of megablast -D 3
              #=5: Intron-oriented output
              #=6: Concatenated exon sequence
              #=7: Translated amino-acid sequence
              #=8: Mapping (block) information only. Use with -Q4
              #=12: Output the same information as -O4 in binary formats

       -Q[#]  Select algorithm (3)
              0<=#<=3: Genomic segment in the fasta format given by the first argument vs. cdna/protein given by
              the second argument
              4<=#<=7: Genome mapping and alignment
              #=0,4: DP procedure without HSP search
              #=1-3,5-7: Recursive HSP searches up to the level of (# % 4)

       -R$    Read block index table from the file $

       -S[#]  Specify the orientation of query (0)
              0: depend on the query annotation
              1: forward direction only
              2: reverse direction only
              3: both directions;

       -T$    Specify the species (in combination with the -yS option for cDNA query)

       -U     Map/align without splicing

       -V#    Minimum space to induce the Hirschberg's algorithm (16M)

       -W$    Write block index table to file $

       -i[a|p]
              Paired-end reads as the queries from a single or two files
              -ia: 5' and 3' matching pairs must appear alternatively in a file
              -ip: 5' and 3' matching pairs must appear in the same order in the two files

       -o$    Destination of output file (stdout)

       -pa    Suppress trimming of terminal polyA or polyT sequence

       -pq    Suppress some outputs to stderr

       -pw    Report the result irrespective of the alignment score

       -u#    Gap-extension penalty (3, 2, 2)

       -v#    Gap-opening penalty (8, 6, 9)

       -xB$   Bit pattern of the seeds used for HSP search at level 1

       -xb$   Bit pattern of the seeds used for HSP search at level 3

       -ya#   Dinucleotide pairs at the ends of an intron (0)
              0: canonical only (GT..AG, GC..AG, AT..AC)
              1: relaxed to (GT..AG, GC..AG, AT..AN)
              2: #=1 + allow 1 mismatch from GT..AG
              3: any;

       -yi#   Intron penalty (11, 8, 11)

       -yj#   Incline of long gap penalty (0.6)

       -yk#   Flex point where the incline of gap penalty changes (7)

       -yl#   Double affine gap penalty if #=3; affine penalty otherwise

       -ym#   Score for a nucleotide match (2, 2)

       -yn#   Penalty for a nucleotide mismatch (6, 2)

       -yo#   Penalty for in-frame termination codon (100)

       -yp#   PAM level used in the alignment (third) phase (150)

       -yq#   PAM level used in the second phase (50)

       -yx#   Penalty for a frame shift (100)

       -yy#   Relative contribution of splicing signal (8)

       -yz#   Relative contribution of coding potential (2)

       -yA#   Relative contribution of the translational initiation or termination signal (8)

       -yB#   Relative contribution of branch point signal (0)

       -yE#   Minimum exon length (2)

       -yI$   Intron distribution parameters

       -yJ#   Relative contribution of the bonus given to a conserved intron position

       -yL#   The minimum intron length (20)

       -yS#   Percent  contribution  of  the  species-specific splice signal. if #=0, only the ubiquitous signal
              given to the dinucleotide pair at the ends of an intron is used. By default #=0 for DNA and  #=100
              for  protein  queries.  -yX option automatically sets #=100 for DNA and #=30 for protein.  -yS For
              cDNA queries, use species-specific exon-intron  boundary  signals.  For  protein  queries,  invoke
              'salvage' procedure in phase 1

       -yX    For  a DNA query, this option sets parameter values for cross-species comparison. Conversely, this
              option specifies an intra-species mode for a protein query.

       -yY#   Relative contribution of length-dependent part of intron penalty (8)

       -yZ#   Relative contribution of oligomer composition within an intron (0)

REFERENCES

       (1) "A Space-Efficient and  Accurate  Method  for  Mapping  and  Aligning  cDNA  Sequences  onto  Genomic
       Sequence", Osamu Gotoh, Nucleic Acid Res., 36 (8), 2630-2638 (2008).
       (2)   "Direct   Mapping  and  Alignment  of  Protein  Sequences  onto  Genomic  Sequence",  Osamu  Gotoh,
       Bioinformatics, 24 (21) 2438-2444 (2008).
       (3) "Benchmarking spliced alignment programs  including   Spaln2,  an  extended  version  of  Spaln  that
       incorporates additional species-specific features", Iwata, H. and Gotoh, O.", Nucleic Acids Res., 40 (20)
       e161 (2012).

AUTHOR

       Osamu Gotoh <o.gotoh@aist.go.jp>

                                                   2018-09-06                                           spaln(1)