Ubuntu Manpage: SOAPaligner/soap2 - Short Oligonucleotide Analysis Package aligner

NAME

       SOAPaligner/soap2 - Short Oligonucleotide Analysis Package aligner

SYNOPSIS

       soap reference.index short_reads.fast[a|q] alignment.out [options]

DESCRIPTION

       SOAPaligner/soap2  is a member of the SOAP (Short Oligonucleotide Analysis Package). It is
       an updated version of SOAP software for short oligonucleotide alignment. The  new  program
       features in super fast and accurate alignment for huge amounts of short reads generated by
       Illumina/Solexa Genome Analyzer. Compared to soap v1, it is one order of magnitude faster.
       It  require  only 2 minutes aligning one million single-end reads onto the human reference
       genome. Another remarkable improvement of SOAPaligner is that it now supports a wide range
       of the read length.

       SOAPaligner  benefitted  in  time  and  space efficiency by a revolution in the basic data
       structures and algorithms used.The core algorithms and the indexing data structures (2way-
       BWT) are developed by the algorithms research group of the Department of Computer Science,
       the University of Hong Kong (T.W. Lam, Alan Tam, Simon Wong, Edward Wu and S.M. Yiu).

COMMAND AND OPTIONS

       soap -D <in.fasta.index> -a <query.file.a> [-b <query.file.b>] -o  <alignment.output>  [-2
       <unpaired.output>] [options]

       OPTIONS:

              -D STR Prefix  name  for  reference  index [*.index]. See APPENDIX How to build the
                     reference index

              -a STR Query file, for SE reads alignment or one end of PE reads

              -b STR Query b file, one end of PE reads

              -o STR Output file for alignment results

              -2 STR Output file contains mapped but unpaired reads when do PE alignment

              -u STR Output file for unmapped reads, [none]

              -m INT Minimal insert size INT allowed for PE, [400]

              -x INT Maximal insert size INT allowed for PE, [600]

              -n INT Filter low quality reads containing more INT bp Ns, [5]

              -t     Output reads id instead reads name, [none]

              -r INT How to report repeat hits, 0=none; 1=random one; 2=all, [1]

              -R     RF alignment for long insert size(>= 2k bps) PE data, [none] FR alignment

              -l INT For long reads with high error rate  at  3'-end,  those  can't  align  whole
                     length,  then  first  align 5' INT bp subsequence as a seed, [256] use whole
                     length of the read

              -s INT minimal alignment length (for soft clip)

              -v INT Totally allowed mismatches in one read, when use subsequence as a seed, [5]

              -g INT Allow gap size in one read, [0]

              -M INT Match mode for each read or the seed part of read, which  shouldn't  contain
                     more than 2 mismatches, [4]

                     0: exact match only

                     1: 1 mismatch match only

                     2: 2 mismatch match only

                     4: find the best hits
              -p INT Multithreads, n threads, [1]

OUTPUT FORMAT

       SOAP2 output format contains following column information:

       1. reads name / reads ID (if -t is available)

       2.  reads  sequence  (if  read  align  to  reverse strand, here is the reverse sequence of
       original read)

       3. quality sequence (if input is fasta reads, the column will be all 'h', and the sequence
       is backward if reads mapping reverse )

       4.

APPENDIX

       Before use soap2 to do alignment, the reference index must be generated by 2bwt-builder.

              2bwt-builder <reference.fasta>

              NOTE:  1.  the reference input should only be FASTA format; 2. the program wil auto
              generate the index files in the directory where  the  fasta  file  is  located,  so
              confirm the permission at first.

ENVIRONMENT

       The  datastructure  is  imcompatible  with  32bit,  so  it  can't be migrated on any 32bit
       platforms.  Due to using the MMX instruction to  opitimize  parts  of  code,  the  current
       version  can only run on x86_64 platform.  We will provide a universal version for most of
       the 64bit platform later.

       HARDWARE REQUIREMENT
              1.8Gb RAM (for a genome as large as human's)

              2.at least 8Gb hard disk to store index (for a genome as large as human's)

       SYSTEM REQUIREMENT
              Linux x86_64

ATHOUR

       BGI  Shenzhen  SOAP team. The core algorithm Bidirect-BWT is wrotten by Prof. T.W. Lam and
       his team at HongKong University.

REPORT BUGS

       Report bugs to <soap@genomics.org.cn>

ACKNOWLEDGEMENTS

       We appreciate Prof. T.W. Lam, Alan Tam, Simon Wong, Edward Wu and S.M. Yiu prominent  work
       on Bidirect-BWT.

NAME

SYNOPSIS

DESCRIPTION

COMMAND AND OPTIONS

OUTPUT FORMAT

APPENDIX

ENVIRONMENT

SEE ALSO

ATHOUR

REPORT BUGS

ACKNOWLEDGEMENTS