Provided by: biobambam2_2.0.95-1_amd64 bug

NAME

       fastqtobam - convert FastQ to unmapped BAM

SYNOPSIS

       fastqtobam [options]

DESCRIPTION

       fastqtobam  reads  one or two FastQ files and converts them to a BAM file in which each read is marked as
       unmapped. If no input file name is given, then a single FastQ file is read from standard  input.  If  one
       file name is given, then a single FastQ file is read from the given file. In both cases the read names in
       the file are parsed to determine whether the contained reads are paired or not if the name scheme is  not
       set  to pairedfiles.  If two file names are given, then the program assumes to find two FastQ files which
       are synchronous, i.e. where the first read in the first file is the mate of the first read in the  second
       file  etc. Input file names can be given either via the I key or after the key=value pairs on the command
       line. The program accepts read name formats as described below under the key namescheme.

       The following key=value pairs can be given:

       verbose=<[0|1]> print progress report. By default progress is not reported.

       I=<filename>: input file name (data is read from standard input if this option is not  given).  This  key
       can be given twice.

       level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are

       -1:    zlib/gzip default compression level

       0:     uncompressed

       1:     zlib/gzip level 1 (fast) compression

       9:     zlib/gzip level 9 (best) compression

       If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-
       a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value
       is

       11:    igzip compression

       md5=<0|1>: md5 checksum creation for output file. Valid values are

       0:     do not compute checksum. This is the default.

       1:     compute  checksum.  If the md5filename key is set, then the checksum is written to the given file.
              If md5filename is unset, then no checksum will be computed.

       md5filename file name for md5 checksum if md5=1.

       gz=<[0|1]> input is gzip compressed FastQ. By default input is assumed to be uncompressed FastQ.

       threads=<1> additional BAM encoding helper threads.

       PGID=<> read group identifier for reads. By default no read group identifer is set.  The fields  CN,  DS,
       DT,  FO,  KS, LB, PG, PI, PL, PU and SM of the corresponding @RG header line can be set by using the keys
       RGCN, RGDS, etc.  respectively.

       qualityoffset=<33> FastQ quality offset. This value is subtracted from the ASCII character representation
       to get the quality score value.

       qualitymax=<41> maximum valid quality value, 41 by default. Higher values may indicate a wrong setting of
       the qualityoffset parameter. BAM allows quality values up to the value of 94.

       qualityhist=<0> compute a quality histogram and print it on the standard error channel  after  processing
       has  finished  successfully.  Lines  for  the  quality  histogram  are  prefixed with [H] and contain tab
       separated values. The histogram enumerates quality scores from high to low values. The histogram has four
       columns (after the [H] marker). The first is the ASCII representation of the quality with offset 33, i.e.
       the symbol ! denotes quality 0. The second column gives the absolute frequency of the  value.  The  third
       column  stores  the  relative  frequency  of  the value, i.e. the fraction of all values assigned to this
       value.  The fourth column gives a cumulative relative frequency value over all quality  for  the  current
       line and those for higher quality values.

       checkquality=<1>  check  whether  quality  values  are  in  range  and  terminate  if an invalid value is
       encountered.

       namescheme=<generic> read name scheme. This determines how read names are parsed. There are four possible
       options:

       generic:
              the  first  sequence of non whitespace characters is extracted from the @ line of the FastQ record
              and the rest of the @ line is discarded. If the retained name ends in /1 or /2, then the  read  is
              part  of a read pair, otherwise it is the single read for the template. For a pair the part of the
              name before the /1 or /2 is considered  the  template  name.  For  a  single  the  whole  name  is
              considered the name of the template.

       c18s:  The  name  is  expected  to consist of two sequences of non white-space characters where the first
              contains seven colon separated fields and the second four colon separated fields. The first of the
              two  is  considered  to be the name of the template. It is assumed that this read is the only read
              for the template.

       c18pe: As for c18s, the name is expected to consist of two sequences of non white-space characters  where
              the  first  contains  seven colon separated fields and the second four colon separated fields. The
              first of the two is considered to be the name of the template. The read is assumed to be part of a
              read pair. The first field in the second non-whitespace sequence of the @ line designates, whether
              it is the first or second of the pair depending on whether the field stores  the  number  1  or  2
              respectively.

       pairedfiles:
              The  input  framgents are assumed to be paired. If there is a single input file then the pairs are
              expected consecutive in the file. If there are two input files then the read names in the two  are
              expected  to  be  synchronous.   All characters in read names beginning from the first white space
              character are discarded. If the two (so  reduced)  read  names  in  question  end  on  /1  and  /2
              respectively,  then  those suffixes will be clipped off also. The remaining read names are checked
              for equality. If they are not equal, then the program will reject the input and terminate.

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <tischler@mpi-cbg.de>

COPYRIGHT

       Copyright © 2009-2014 German Tischler, © 2011-2014 Genome Research  Limited.   License  GPLv3+:  GNU  GPL
       version 3 <http://gnu.org/licenses/gpl.html>
       This  is  free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent
       permitted by law.