Ubuntu Manpage: funcalc - Funtools calculator (for binary tables)

NAME

       funcalc - Funtools calculator (for binary tables)

SYNOPSIS

       funcalc [-n] [-a argstr] [-e expr] [-f file] [-l link] [-p prog] <iname> [oname [columns]]

OPTIONS

         -a argstr    # user arguments to pass to the compiled program
         -e expr      # funcalc expression
         -f file      # file containing funcalc expression
         -l libs      # libs to add to link command
         -n           # output generated code instead of compiling and executing
         -p prog      # generate named program, no execution
         -u           # die if any variable is undeclared (don't auto-declare)

DESCRIPTION

       funcalc is a calculator program that allows arbitrary expressions to be constructed,
       compiled, and executed on columns in a Funtools table (FITS binary table or raw event
       file). It works by integrating user-supplied expression(s) into a template C program, then
       compiling and executing the program. funcalc expressions are C statements, although some
       important simplifications (such as automatic declaration of variables) are supported.

       funcalc expressions can be specified in three ways: on the command line using the -e
       [expression] switch, in a file using the -f [file] switch, or from stdin (if neither -e
       nor -f is specified). Of course a file containing funcalc expressions can be read from
       stdin.

       Each invocation of funcalc requires an input Funtools table file to be specified as the
       first command line argument.  The output Funtools table file is the second optional
       argument. It is needed only if an output FITS file is being created (i.e., in cases where
       the funcalc expression only prints values, no output file is needed). If input and output
       file are both specified, a third optional argument can specify the list of columns to
       activate (using FunColumnActivate()).  Note that funcalc determines whether or not to
       generate code for writing an output file based on the presence or absence of an output
       file argument.

       A funcalc expression executes on each row of a table and consists of one or more C
       statements that operate on the columns of that row (possibly using temporary variables).
       Within an expression, reference is made to a column of the current row using the C struct
       syntax cur-[colname]>, e.g. cur->x, cur->pha, etc.  Local scalar variables can be defined
       using C declarations at very the beginning of the expression, or else they can be defined
       automatically by funcalc (to be of type double). Thus, for example, a swap of columns x
       and y in a table can be performed using either of the following equivalent funcalc
       expressions:

         double temp;
         temp = cur->x;
         cur->x = cur->y;
         cur->y = temp;

       or:

         temp = cur->x;
         cur->x = cur->y;
         cur->y = temp;

       When this expression is executed using a command such as:

         funcalc -f swap.expr itest.ev otest.ev

       the resulting file will have values of the x and y columns swapped.

       By default, the data type of the variable for a column is the same as the data type of the
       column as stored in the file. This can be changed by appending ":[dtype]" to the first
       reference to that column. In the example above, to force x and y to be output as doubles,
       specify the type 'D' explicitly:

         temp = cur->x:D;
         cur->x = cur->y:D;
         cur->y = temp;

       Data type specifiers follow standard FITS table syntax for defining columns using TFORM:

       •   A: ASCII characters

       •   B: unsigned 8-bit char

       •   I: signed 16-bit int

       •   U: unsigned 16-bit int (not standard FITS)

       •   J: signed 32-bit int

       •   V: unsigned 32-bit int (not standard FITS)

       •   E: 32-bit float

       •   D: 64-bit float

       •   X: bits (treated as an array of chars)

       Note that only the first reference to a column should contain the explicit data type
       specifier.

       Of course, it is important to handle the data type of the columns correctly.  One of the
       most frequent cause of error in funcalc programming is the implicit use of the wrong data
       type for a column in expression.  For example, the calculation:

         dx = (cur->x - cur->y)/(cur->x + cur->y);

       usually needs to be performed using floating point arithmetic. In cases where the x and y
       columns are integers, this can be done by reading the columns as doubles using an explicit
       type specification:

         dx = (cur->x:D - cur->y:D)/(cur->x + cur->y);

       Alternatively, it can be done using C type-casting in the expression:

         dx = ((double)cur->x - (double)cur->y)/((double)cur->x + (double)cur->y);

       In addition to accessing columns in the current row, reference also can be made to the
       previous row using prev-[colname]>, and to the next row using next-[colname]>.  Note that
       if prev-[colname]> is specified in the funcalc expression, the very first row is not
       processed.  If next-[colname]> is specified in the funcalc expression, the very last row
       is not processed. In this way, prev and next are guaranteed always to point to valid rows.
       For example, to print out the values of the current x column and the previous y column,
       use the C fprintf function in a funcalc expression:

         fprintf(stdout, "%d %d\n", cur->x, prev->y);

       New columns can be specified using the same cur-[colname]> syntax by appending the column
       type (and optional tlmin/tlmax/binsiz specifiers), separated by colons. For example,
       cur->avg:D will define a new column of type double. Type specifiers are the same those
       used above to specify new data types for existing columns.

       For example, to create and output a new column that is the average value of the x and y
       columns, a new "avg" column can be defined:

         cur->avg:D = (cur->x + cur->y)/2.0

       Note that the final ';' is not required for single-line expressions.

       As with FITS TFORM data type specification, the column data type specifier can be preceded
       by a numeric count to define an array, e.g., "10I" means a vector of 10 short ints, "2E"
       means two single precision floats, etc.  A new column only needs to be defined once in a
       funcalc expression, after which it can be used without re-specifying the type. This
       includes reference to elements of a column array:

         cur->avg[0]:2D = (cur->x + cur->y)/2.0;
         cur->avg[1] = (cur->x - cur->y)/2.0;

       The 'X' (bits) data type is treated as a char array of dimension (numeric_count/8), i.e.,
       16X is processed as a 2-byte char array. Each 8-bit array element is accessed separately:

         cur->stat[0]:16X  = 1;
         cur->stat[1]      = 2;

       Here, a 16-bit column is created with the MSB is set to 1 and the LSB set to 2.

       By default, all processed rows are written to the specified output file. If you want to
       skip writing certain rows, simply execute the C "continue" statement at the end of the
       funcalc expression, since the writing of the row is performed immediately after the
       expression is executed. For example, to skip writing rows whose average is the same as the
       current x value:

         cur->avg[0]:2D = (cur->x + cur->y)/2.0;
         cur->avg[1] = (cur->x - cur->y)/2.0;
         if( cur->avg[0] == cur->x )
           continue;

       If no output file argument is specified on the funcalc command line, no output file is
       opened and no rows are written. This is useful in expressions that simply print output
       results instead of generating a new file:

         fpv = (cur->av3:D-cur->av1:D)/(cur->av1+cur->av2:D+cur->av3);
         fbv =  cur->av2/(cur->av1+cur->av2+cur->av3);
         fpu = ((double)cur->au3-cur->au1)/((double)cur->au1+cur->au2+cur->au3);
         fbu =  cur->au2/(double)(cur->au1+cur->au2+cur->au3);
         fprintf(stdout, "%f\t%f\t%f\t%f\n", fpv, fbv, fpu, fbu);

       In the above example, we use both explicit type specification (for "av" columns) and type
       casting (for "au" columns) to ensure that all operations are performed in double
       precision.

       When an output file is specified, the selected input table is processed and output rows
       are copied to the output file.  Note that the output file can be specified as "stdout" in
       order to write the output rows to the standard output.  If the output file argument is
       passed, an optional third argument also can be passed to specify which columns to process.

       In a FITS binary table, it sometimes is desirable to copy all of the other FITS extensions
       to the output file as well. This can be done by appending a '+' sign to the name of the
       extension in the input file name. See funtable for a related example.

       funcalc works by integrating the user-specified expression into a template C program
       called tabcalc.c.  The completed program then is compiled and executed. Variable
       declarations that begin the funcalc expression are placed in the local declaration section
       of the template main program.  All other lines are placed in the template main program's
       inner processing loop. Other details of program generation are handled automatically. For
       example, column specifiers are analyzed to build a C struct for processing rows, which is
       passed to FunColumnSelect() and used in FunTableRowGet().  If an unknown variable is used
       in the expression, resulting in a compilation error, the program build is retried after
       defining the unknown variable to be of type double.

       Normally, funcalc expression code is added to funcalc row processing loop. It is possible
       to add code to other parts of the program by placing this code inside special directives
       of the form:

         [directive name]
           ... code goes here ...
         end

       The directives are:

       •   global add code and declarations in global space, before the main routine.

       •   local add declarations (and code) just after the local declarations in main

       •   before add code just before entering the main row processing loop

       •   after add code just after exiting the main row processing loop

       Thus, the following funcalc expression will declare global variables and make subroutine
       calls just before and just after the main processing loop:

         global
           double v1, v2;
           double init(void);
           double finish(double v);
         end
         before
           v1  = init();
         end
         ... process rows, with calculations using v1 ...
         after
           v2 = finish(v1);
           if( v2 < 0.0 ){
             fprintf(stderr, "processing failed %g -> %g\n", v1, v2);
             exit(1);
           }
         end

       Routines such as init() and finish() above are passed to the generated program for linking
       using the -l [link directives ...]  switch. The string specified by this switch will be
       added to the link line used to build the program (before the funtools library). For
       example, assuming that init() and finish() are in the library libmysubs.a in the
       /opt/special/lib directory, use:

         funcalc  -l "-L/opt/special/lib -lmysubs" ...

       User arguments can be passed to a compiled funcalc program using a string argument to the
       "-a" switch.  The string should contain all of the user arguments. For example, to pass
       the integers 1 and 2, use:

         funcalc -a "1 2" ...

       The arguments are stored in an internal array and are accessed as strings via the ARGV(n)
       macro.  For example, consider the following expression:

         local
           int pmin, pmax;
         end

         before
           pmin=atoi(ARGV(0));
           pmax=atoi(ARGV(1));
         end

         if( (cur->pha >= pmin) && (cur->pha <= pmax) )
           fprintf(stderr, "%d %d %d\n", cur->x, cur->y, cur->pha);

       This expression will print out x, y, and pha values for all rows in which the pha value is
       between the two user-input values:

         funcalc -a '1 12' -f foo snr.ev'[cir 512 512 .1]'
         512 512 6
         512 512 8
         512 512 5
         512 512 5
         512 512 8

         funcalc -a '5 6' -f foo snr.ev'[cir 512 512 .1]'
         512 512 6
         512 512 5
         512 512 5

       Note that it is the user's responsibility to ensure that the correct number of arguments
       are passed. The ARGV(n) macro returns a NULL if a requested argument is outside the limits
       of the actual number of args, usually resulting in a SEGV if processed blindly.  To check
       the argument count, use the ARGC macro:

         local
           long int seed=1;
           double limit=0.8;
         end

         before
           if( ARGC >= 1 ) seed = atol(ARGV(0));
           if( ARGC >= 2 ) limit = atof(ARGV(1));
           srand48(seed);
         end

         if ( drand48() > limit ) continue;

       The macro WRITE_ROW expands to the FunTableRowPut() call that writes the current row. It
       can be used to write the row more than once.  In addition, the macro NROW expands to the
       row number currently being processed. Use of these two macros is shown in the following
       example:

         if( cur->pha:I == cur->pi:I ) continue;
         a = cur->pha;
         cur->pha = cur->pi;
         cur->pi = a;
         cur->AVG:E  = (cur->pha+cur->pi)/2.0;
         cur->NR:I = NROW;
         if( NROW < 10 ) WRITE_ROW;

       If the -p [prog] switch is specified, the expression is not executed. Rather, the
       generated executable is saved with the specified program name for later use.

       If the -n switch is specified, the expression is not executed. Rather, the generated code
       is written to stdout. This is especially useful if you want to generate a skeleton file
       and add your own code, or if you need to check compilation errors. Note that the comment
       at the start of the output gives the compiler command needed to build the program on that
       platform. (The command can change from platform to platform because of the use of
       different libraries, compiler switches, etc.)

       As mentioned previously, funcalc will declare a scalar variable automatically (as a
       double) if that variable has been used but not declared.  This facility is implemented
       using a sed script named funcalc.sed, which processes the compiler output to sense an
       undeclared variable error.  This script has been seeded with the appropriate error
       information for gcc, and for cc on Solaris, DecAlpha, and SGI platforms. If you find that
       automatic declaration of scalars is not working on your platform, check this sed script;
       it might be necessary to add to or edit some of the error messages it senses.

       In order to keep the lexical analysis of funcalc expressions (reasonably) simple, we chose
       to accept some limitations on how accurately C comments, spaces, and new-lines are placed
       in the generated program. In particular, comments associated with local variables declared
       at the beginning of an expression (i.e., not in a local...end block) will usually end up
       in the inner loop, not with the local declarations:

         /* this comment will end up in the wrong place (i.e, inner loop) */
         double a; /* also in wrong place */
         /* this will be in the the right place (inner loop) */
         if( cur->x:D == cur->y:D ) continue; /* also in right place */
         a = cur->x;
         cur->x = cur->y;
         cur->y = a;
         cur->avg:E  = (cur->x+cur->y)/2.0;

       Similarly, spaces and new-lines sometimes are omitted or added in a seemingly arbitrary
       manner. Of course, none of these stylistic blemishes affect the correctness of the
       generated code.

       Because funcalc must analyze the user expression using the data file(s) passed on the
       command line, the input file(s) must be opened and read twice: once during program
       generation and once during execution. As a result, it is not possible to use stdin for the
       input file: funcalc cannot be used as a filter. We will consider removing this restriction
       at a later time.

       Along with C comments, funcalc expressions can have one-line internal comments that are
       not passed on to the generated C program. These internal comment start with the #
       character and continue up to the new-line:

         double a; # this is not passed to the generated C file
         # nor is this
         a = cur->x;
         cur->x = cur->y;
         cur->y = a;
         /* this comment is passed to the C file */
         cur->avg:E  = (cur->x+cur->y)/2.0;

       As previously mentioned, input columns normally are identified by their being used within
       the inner event loop. There are rare cases where you might want to read a column and
       process it outside the main loop. For example, qsort might use a column in its sort
       comparison routine that is not processed inside the inner loop (and therefore not
       implicitly specified as a column to be read).  To ensure that such a column is read by the
       event loop, use the explicit keyword.  The arguments to this keyword specify columns that
       should be read into the input record structure even though they are not mentioned in the
       inner loop. For example:

         explicit pi pha

       will ensure that the pi and pha columns are read for each row, even if they are not
       processed in the inner event loop. The explicit statement can be placed anywhere.

       Finally, note that funcalc currently works on expressions involving FITS binary tables and
       raw event files. We will consider adding support for image expressions at a later point,
       if there is demand for such support from the community.

NAME

SYNOPSIS

OPTIONS

DESCRIPTION

SEE ALSO