Provided by: fitsh_0.9.4-1_amd64 bug

NAME

       grcollect - performing transposition on the input tabulated data

SYNOPSIS

       grcollect [options] <input> [...] [-o <output>|-b <basename>]

DESCRIPTION

       The  main  purpose of the program `grcollect` is twofold. First, it is intended to do data
       transposition on the input data, i.e. the input (which is  read  from  files  or  standard
       input)  is  sorted  and  splitted  to  separate  files  where  the splitting is based on a
       respective key. These keys are taken from the input data. In such a case where  the  input
       is  from  more  files  and each key is unique in a given file, this process is called data
       transposition (since it is similar when a 2 dimensional data matrix is stored in the  form
       as  each  row  is  in a separate file, and one intends to transpose the matrix, i.e. store
       each column in a separate file). The other feature of `grcollect` is to do  some  sort  of
       statistics  on  data associated to different keys. These statistics include average (mean,
       median, mode) and scatter (standard deviation or median  deviance)  estimations  with  the
       optional deselection of outlier points, summation, count statistics and so on.

OPTIONS

   General options:
       -h, --help
              Give general summary about the command line options.

       --long-help, --help-long
              Gives a detailed list of command line options.

       --wiki-help, --help-wiki, --mediawiki-help, --help-mediawiki
              Gives a detailed list of command line options in Mediawiki format.

       --version, --version-short, --short-version
              Give some version information about the program.

       <input> [,<input>, ...]
              Name  of  the  input  file.  At  least,  one file should be specified. Reading from
              standard input can be forced using a single dash  "-"  as  input  file  name.  More
              dashes are silently ignored.

       -c, --col-base <key column index>
              Column index for the key.

   Data transposition specific options:
       -b, --basename <base-%b-name>
              Base  name  of  the output files. The base name string should conatain at least one
              "%b" tag, which is replaced by the respective key string on  the  creation  of  the
              file.

       -x, --extension <extension>, -p, --prefix <prefix>
              Equivalent  to  "-b|--basename  <prefix>%b.<extension>".  Note  that  in  practice,
              <prefix> might be some sort of directory name  and  extension  is  a  regular  file
              extension,  but  the  above  substitution  is  done literally. Therefore, the "dot"
              between the key and the <extension> is always inserted in the  final  name  of  the
              output  files  but a trailing slash is required at the end of <prefix> if the files
              are to be created in that particular directory.  Note  also  that  this  case,  the
              target  directory  must  exist  before the invocation of `grcollect`, otherwise the
              output files cannot be created.

       -C, --comment
              Insert a commented line  (starting  with  "#")  containing  information  about  the
              version  and  command line invocation syntax of `grcollect` to the beginning of the
              transposed files.

       -S, --additional-comment <...>
              Insert an additional commented lines (starting with "#") to the  beginning  of  the
              transposed files.

   Options for cumulative statistics:
       -d, --col-stat <>[,...]
              Comma-separated  list  of  column  indices  on  which  the  statistics  are  to  be
              calculated. Columns with non-numerical contents are ignored.Note that  this  option
              imply the cumulative statistics mode of `grcollect`.

       -o, --output <filename>
              The  name  of the output file to which the output statistics are written. The total
              number of columns in this file will be 1+C*N, where C is the number of columns (see
              -d|--col-stat)  on  which  the  statistics  are  calculated  and N is the number of
              statistic quantities (see --stat). The first column in the output file is the  key,
              which  is  followed  by the per-column list of statistics, in the same order as the
              user defined after -d|--col-stat and --stat.

       -s, --stat <list of statistics>
              Comma-separated list of statistics to be estimated on the input data. These can  be
              one or more of the following:

       count  Total number of records, for the given key.

       rcount The  number  of records after rejecting outliers (i.e. it is always the same as the
              "count" value if no "--rejection" was used).

       mean, median, mode
              Mean, median or mode statistics of the data.

       rmean, rmedian, rmode
              Mean, median or mode, after rejecting outliers.

       {mean|median|mode}stddev, {mean|median|mode}meddev, stddev
              Scatter of the data around the mean, median or mode.  The  scatter  can  either  be
              standard  deviation  (stddev)  or median deviance (meddev). The literal "stddev" is
              the classic standard deviation, equivalent to "meanstddev".

       r{mean|median|mode}stddev, r{mean|median|mode}meddev, rstddev
              The same scatters as above but after rejecting outliers.

       sum, rsum
              Sum of the data, esp. total sum and sum after rejecting outliers.

       sum2, rsum2
              Sum of the squares, total and after rejecting outliers.

       min, max
              Minimal and maximal data values.

       rmin, rmax
              Minimal and maximal data values after the rejection of outliers.

       -r, --rejection column=<index>,<rejection parameters>
              Comma-separated directives for outlier rejection  for  the  specified  column.  The
              rejection parameters are:

       iterations=<n>
              Maximum number of iterations to reject outliers.

       mean, median, mode
              Use the mean, median or mode for the center of the rejection.

       stddev, meddev, absolute=<limit>
              Use  the standard deviation or median deviance  for rejection limit units or define
              an absolute limit for rejection level.

       Note that each column can have different kind of rejection  method,  thus  more  than  one
       "--rejection ..." command line option can be used at the invocation of `grcollect`.

   Other options:
       -m, --max-memory <memory>[kmg]
              Maximum  amount  of  memory available for `grcollect`. The prefixes "k", "m" or "g"
              can be used for kilobytes, megabytes and gigabytes, respectively. On 32bit systems,
              the maximum memory is limited to 3gigabytes. Note that `grcollect` does not use any
              kind of operating system specific  methods  to  determine  the  maximum  amount  of
              memory,  it  always  should be set by the user. The default value of 8 megabytes is
              somewhat small, so upon massive data transposition (tens or hundreds of gigabytes),
              this limit is worth to be set accordingly to the physical memory available.

       -t, --tmpdir <directory>
              Directory  for temporary file storage. Note that the default temporary directory is
              always the current one (which is is equivalent to define "--tmpdir ./"), since in a
              usual  configuration  the /tmp directory is small, moreover, it can be some sort of
              "tmpfs", temporary file system mount on the physical memory itself.

REPORTING BUGS

       Report bugs to <apal@szofi.net>, see also https://fitsh.net/.

COPYRIGHT

       Copyright © 1996, 2002, 2004-2008, 2010-2016, 2018-2020; Pal, Andras <apal@szofi.net>