Ubuntu Manpage: FunColumnSelect - select Funtools columns

Provided by: libfuntools-dev_1.4.7-2_amd64

NAME

       FunColumnSelect - select Funtools columns

SYNOPSIS

         #include <funtools.h>

         int FunColumnSelect(Fun fun, int size, char *plist,
                             char *name1, char *type1, char *mode1, int offset1,
                             char *name2, char *type2, char *mode2, int offset2,
                             ...,
                             NULL)

         int FunColumnSelectArr(Fun fun, int size, char *plist,
                                char **names, char **types, char **modes,
                                int *offsets, int nargs);

DESCRIPTION

       The FunColumnSelect() routine is used to select the columns from a Funtools binary table extension or raw
       event file for processing. This routine allows you to specify how columns in a file are to be read into a
       user record structure or written from a user record structure to an output FITS file.

       The first argument is the Fun handle associated with this set of columns. The second argument specifies
       the size of the user record structure into which columns will be read.  Typically, the sizeof() macro is
       used to specify the size of a record structure.  The third argument allows you to specify keyword
       directives for the selection and is described in more detail below.

       Following the first three required arguments is a variable length list of column specifications.  Each
       column specification will consist of four arguments:

       •   name: the name of the column

       •   type: the data type of the column as it will be stored in the user record struct (not the data type
           of the input file). The following basic data types are recognized:

           •   A: ASCII characters

           •   B: unsigned 8-bit char

           •   I: signed 16-bit int

           •   U: unsigned 16-bit int (not standard FITS)

           •   J: signed 32-bit int

           •   V: unsigned 32-bit int (not standard FITS)

           •   E: 32-bit float

           •   D: 64-bit float

           The syntax used is similar to that which defines the TFORM parameter in FITS binary tables. That is,
           a numeric repeat value can precede the type character, so that "10I" means a vector of 10 short ints,
           "E" means a single precision float, etc.  Note that the column value from the input file will be
           converted to the specified data type as the data is read by FunTableRowGet().

           [ A short digression regarding bit-fields: Special attention is required when reading or writing the
           FITS bit-field type ("X"). Bit-fields almost always have a numeric repeat character preceding the 'X'
           specification. Usually this value is a multiple of 8 so that bit-fields fit into an integral number
           of bytes. For all cases, the byte size of the bit-field B is (N+7)/8, where N is the numeric repeat
           character.

           A bit-field is most easily declared in the user struct as an array of type char of size B as defined
           above. In this case, bytes are simply moved from the file to the user space.  If, instead, a short or
           int scalar or array is used, then the algorithm for reading the bit-field into the user space depends
           on the size of the data type used along with the value of the repeat character.  That is, if the user
           data size is equal to the byte size of the bit-field, then the data is simply moved (possibly with
           endian-based byte-swapping) from one to the other. If, on the other hand, the data storage is larger
           than the bit-field size, then a data type cast conversion is performed to move parts of the bit-field
           into elements of the array.  Examples will help make this clear:

           •   If the file contains a 16X bit-field and user space specifies a 2B char array[2], then the bit-
               field is moved directly into the char array.

           •   If the file contains a 16X bit-field and user space specifies a 1I scalar short int, then the
               bit-field is moved directly into the short int.

           •   If the file contains a 16X bit-field and user space specifies a 1J scalar int, then the bit-field
               is type-cast to unsigned int before being moved (use of unsigned avoids possible sign extension).

           •   If the file contains a 16X bit-field and user space specifies a 2J int array[2], then the bit-
               field is handled as 2 chars, each of which are type-cast to unsigned int before being moved (use
               of unsigned avoids possible sign extension).

           •   If the file contains a 16X bit-field and user space specifies a 1B char, then the bit-field is
               treated as a char, i.e., truncation will occur.

           •   If the file contains a 16X bit-field and user space specifies a 4J int array[4], then the results
               are undetermined.

           For all user data types larger than char, the bit-field is byte-swapped as necessary to convert to
           native format, so that bits in the resulting data in user space can be tested, masked, etc. in the
           same way regardless of platform.]

           In addition to setting data type and size, the type specification allows a few ancillary parameters
           to be set, using the full syntax for type:

            [@][n]<type>[[['B']poff]][:[tlmin[:tlmax[:binsiz]]]]

           The special character "@" can be prepended to this specification to indicated that the data element
           is a pointer in the user record, rather than an array stored within the record.

           The [n] value is an integer that specifies the number of elements that are in this column (default is
           1). TLMIN, TLMAX, and BINSIZ values also can be specified for this column after the type, separated
           by colons. If only one such number is specified, it is assumed to be TLMAX, and TLMIN  and BINSIZ are
           set to 1.

           The [poff] value can be used to specify the offset into an array. By default, this offset value is
           set to zero and the data specified starts at the beginning of the array. The offset usually is
           specified in terms of the data type of the column. Thus an offset specification of [5] means a
           20-byte offset if the data type is a 32-bit integer, and a 40-byte offset for a double. If you want
           to specify a byte offset instead of an offset tied to the column data type, precede the offset value
           with 'B', e.g. [B6] means a 6-bye offset, regardless of the column data type.

           The [poff] is especially useful in conjunction with the pointer @ specification, since it allows the
           data element to anywhere stored anywhere in the allocated array.  For example, a specification such
           as "@I[2]" specifies the third (i.e., starting from 0) element in the array pointed to by the pointer
           value. A value of "@2I[4]" specifies the fifth and sixth values in the array. For example, consider
           the following specification:

             typedef struct EvStruct{
               short x[4], *atp;
             } *Event, EventRec;
             /* set up the (hardwired) columns */
             FunColumnSelect( fun, sizeof(EventRec), NULL,
                              "2i",    "2I  ",    "w", FUN_OFFSET(Event, x),
                              "2i2",   "2I[2]",   "w", FUN_OFFSET(Event, x),
                              "at2p",  "@2I",     "w", FUN_OFFSET(Event, atp),
                              "at2p4", "@2I[4]",  "w", FUN_OFFSET(Event, atp),
                              "atp9",  "@I[9]",   "w", FUN_OFFSET(Event, atp),
                              "atb20", "@I[B20]", "w", FUN_OFFSET(Event, atb),
                              NULL);

           Here we have specified the following columns:

           •   2i: two short ints in an array which is stored as part the record

           •   2i2: the 3rd and 4th elements of an array which is stored as part of the record

           •   an array of at least 10 elements, not stored in the record but allocated elsewhere, and used by
               three different columns:

               •   at2p: 2 short ints which are the first 2 elements of the allocated array

               •   at2p4: 2 short ints which are the 5th and 6th elements of the allocated array

               •   atp9: a short int which is the 10th element of the allocated array

           •   atb20: a short int which is at byte offset 20 of another allocated array

           In this way, several columns can be specified, all of which are in a single array. NB: it is the
           programmer's responsibility to ensure that specification of a positive value for poff does not point
           past the end of valid data.

       •   read/write mode: "r" means that the column is read from an input file into user space by
           FunTableRowGet(), "w" means that the column is written to an output file. Both can specified  at the
           same time.

       •   offset: the offset into the user data to store this column. Typically, the macro FUN_OFFSET(recname,
           colname) is used to define the offset into a record structure.

       When all column arguments have been specified, a final NULL argument must added to signal the column
       selection list.

       As an alternative to the varargs FunColumnSelect() routine, a non-varargs routine called
       FunColumnSelectArr() also is available. The first three arguments (fun, size, plist) of this routine are
       the same as in FunColumnSelect().  Instead of a variable argument list, however, FunColumnSelectArr()
       takes 5 additional arguments. The first 4 arrays arguments contain the names, types, modes, and offsets,
       respectively, of the columns being selected. The final argument is the number of columns that are
       contained in these arrays. It is the user's responsibility to free string space allocated in these
       arrays.

       Consider the following example:

         typedef struct evstruct{
           int status;
           float pi, pha, *phas;
           double energy;
         } *Ev, EvRec;

         FunColumnSelect(fun, sizeof(EvRec), NULL,
           "status",  "J",     "r",   FUN_OFFSET(Ev, status),
           "pi",      "E",     "r",  FUN_OFFSET(Ev, pi),
           "pha",     "E",     "r",  FUN_OFFSET(Ev, pha),
           "phas",    "@9E",   "r",  FUN_OFFSET(Ev, phas),
           NULL);

       Each time a row is read into the Ev struct, the "status" column is converted to an int data type
       (regardless of its data type in the file) and stored in the status value of the struct.  Similarly, "pi"
       and "pha", and the phas vector are all stored as floats. Note that the "@" sign indicates that the "phas"
       vector is a pointer to a 9 element array, rather than an array allocated in the struct itself. The row
       record can then be processed as required:

         /* get rows -- let routine allocate the row array */
         while( (ebuf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){
           /* process all rows */
           for(i=0; i<got; i++){
             /* point to the i'th row */
             ev = ebuf+i;
             ev->pi = (ev->pi+.5);
             ev->pha = (ev->pi-.5);
           }

       FunColumnSelect() can also be called to define "writable" columns in order to generate a FITS Binary
       Table, without reference to any input columns.  For example, the following will generate a 4-column FITS
       binary table when FunTableRowPut() is used to write Ev records:

         typedef struct evstruct{
           int status;
           float pi, pha
           double energy;
         } *Ev, EvRec;

         FunColumnSelect(fun, sizeof(EvRec), NULL,
           "status",  "J",     "w",   FUN_OFFSET(Ev, status),
           "pi",      "E",     "w",   FUN_OFFSET(Ev, pi),
           "pha",     "E",     "w",   FUN_OFFSET(Ev, pha),
           "energy",  "D",       "w",   FUN_OFFSET(Ev, energy),
           NULL);

       All columns are declared to be write-only, so presumably the column data is being generated or read from
       some other source.

       In addition, FunColumnSelect() can be called to define both "readable" and "writable" columns.  In this
       case, the "read" columns are associated with an input file, while the "write" columns are associated with
       the output file. Of course, columns can be specified as both "readable" and "writable", in which case
       they are read from input and (possibly modified data values are) written to the output.  The
       FunColumnSelect() call itself is made by passing the input Funtools handle, and it is assumed that the
       output file has been opened using this input handle as its Funtools reference handle.

       Consider the following example:

         typedef struct evstruct{
           int status;
           float pi, pha, *phas;
           double energy;
         } *Ev, EvRec;

         FunColumnSelect(fun, sizeof(EvRec), NULL,
           "status",  "J",     "r",   FUN_OFFSET(Ev, status),
           "pi",      "E",     "rw",  FUN_OFFSET(Ev, pi),
           "pha",     "E",     "rw",  FUN_OFFSET(Ev, pha),
           "phas",    "@9E",   "rw",  FUN_OFFSET(Ev, phas),
           "energy",  "D",     "w",   FUN_OFFSET(Ev, energy),
           NULL);

       As in the "read" example above, each time an row is read into the Ev struct, the "status" column is
       converted to an int data type (regardless of its data type in the file) and stored in the status value of
       the struct.  Similarly, "pi" and "pha", and the phas vector are all stored as floats.  Since the "pi",
       "pha", and "phas" variables are declared as "writable" as well as "readable", they also will be written
       to the output file.  Note, however, that the "status" variable is declared as "readable" only, and hence
       it will not be written to an output file.  Finally, the "energy" column is declared as "writable" only,
       meaning it will not be read from the input file. In this case, it can be assumed that "energy" will be
       calculated in the program before being output along with the other values.

       In these simple cases, only the columns specified as "writable" will be output using FunTableRowPut().
       However, it often is the case that you want to merge the user columns back in with the input columns,
       even in cases where not all of the input column names are explicitly read or even known. For this
       important case, the merge=[type] keyword is provided in the plist string.

       The merge=[type] keyword tells Funtools to merge the columns from the input file with user columns on
       output.  It is normally used when an input and output file are opened and the input file provides the
       Funtools reference handle for the output file. In this case, each time FunTableRowGet() is called, the
       raw input rows are saved in a special buffer. If FunTableRowPut() then is called (before another call to
       FunTableRowGet()), the contents of the raw input rows are merged with the user rows according to the
       value of type as follows:

       •   update: add new user columns, and update value of existing ones (maintaining the input data type)

       •   replace: add new user columns, and replace the data type and value of existing ones.  (Note that if
           tlmin/tlmax values are not specified in the replacing column, but are specified in the original
           column being replaced, then the original tlmin/tlmax values are used in the replacing column.)

       •   append: only add new columns, do not "replace" or "update" existing ones

       Consider the example above. If merge=update is specified in the plist string, then "energy" will be added
       to the input columns, and the values of "pi", "pha", and "phas" will be taken from the user space (i.e.,
       the values will be updated from the original values, if they were changed by the program).  The data type
       for "pi", "pha", and "phas" will be the same as in the original file.  If merge=replace is specified,
       both the data type and value of these three input columns will be changed to the data type and value in
       the user structure.  If merge=append is specified, none of these three columns will be updated, and only
       the "energy" column will be added. Note that in all cases, "status" will be written from the input data,
       not from the user record, since it was specified as read-only.

       Standard applications will call FunColumnSelect() to define user columns. However, if this routine is not
       called, the default behavior is to transfer all input columns into user space. For this purpose a default
       record structure is defined such that each data element is properly aligned on a valid data type
       boundary.  This mechanism is used by programs such as fundisp and funtable to process columns without
       needing to know the specific names of those columns.  It is not anticipated that users will need such
       capabilities (contact us if you do!)

       By default, FunColumnSelect() reads/writes rows to/from an "array of structs", where each struct contains
       the column values for a single row of the table. This means that the returned values for a given column
       are not contiguous. You can set up the IO to return a "struct of arrays" so that each of the returned
       columns are contiguous by specifying org=structofarrays (abbreviation: org=soa) in the plist.  (The
       default case is org=arrayofstructs or org=aos.)

       For example, the default setup to retrieve rows from a table would be to define a record structure for a
       single event and then call
        FunColumnSelect() as follows:

         typedef struct evstruct{
           short region;
           double x, y;
           int pi, pha;
           double time;
         } *Ev, EvRec;

         got = FunColumnSelect(fun, sizeof(EvRec), NULL,
                               "x",       "D:10:10", mode, FUN_OFFSET(Ev, x),
                               "y",       "D:10:10", mode, FUN_OFFSET(Ev, y),
                               "pi",      "J",       mode, FUN_OFFSET(Ev, pi),
                               "pha",     "J",       mode, FUN_OFFSET(Ev, pha),
                               "time",    "1D",      mode, FUN_OFFSET(Ev, time),
                               NULL);

       Subsequently, each call to FunTableRowGet() will return an array of structs, one for each returned row.
       If instead you wanted to read columns into contiguous arrays, you specify org=soa:

         typedef struct aevstruct{
           short region[MAXROW];
           double x[MAXROW], y[MAXROW];
           int pi[MAXROW], pha[MAXROW];
           double time[MAXROW];
         } *AEv, AEvRec;

         got = FunColumnSelect(fun, sizeof(AEvRec), "org=soa",
                             "x",       "D:10:10", mode, FUN_OFFSET(AEv, x),
                             "y",       "D:10:10", mode, FUN_OFFSET(AEv, y),
                             "pi",      "J",       mode, FUN_OFFSET(AEv, pi),
                             "pha",     "J",       mode, FUN_OFFSET(AEv, pha),
                             "time",    "1D",      mode, FUN_OFFSET(AEv, time),
                             NULL);

       Note that the only modification to the call is in the plist string.

       Of course, instead of using statically allocated arrays, you also can specify dynamically allocated
       pointers:

         /* pointers to arrays of columns (used in struct of arrays) */
         typedef struct pevstruct{
           short *region;
           double *x, *y;
           int *pi, *pha;
           double *time;
         } *PEv, PEvRec;

         got = FunColumnSelect(fun, sizeof(PEvRec), "org=structofarrays",
                             "$region", "@I",       mode, FUN_OFFSET(PEv, region),
                             "x",       "@D:10:10", mode, FUN_OFFSET(PEv, x),
                             "y",       "@D:10:10", mode, FUN_OFFSET(PEv, y),
                             "pi",      "@J",       mode, FUN_OFFSET(PEv, pi),
                             "pha",     "@J",       mode, FUN_OFFSET(PEv, pha),
                             "time",    "@1D",      mode, FUN_OFFSET(PEv, time),
                             NULL);

       Here, the actual storage space is either allocated by the user or by the FunColumnSelect() call).

       In all of the above cases, the same call is made to retrieve rows, e.g.:

           buf = (void *)FunTableRowGet(fun, NULL, MAXROW, NULL, &got);

       However, the individual data elements are accessed differently.  For the default case of an "array of
       structs", the individual row records are accessed using:

         for(i=0; i<got; i++){
           ev = (Ev)buf+i;
           fprintf(stdout, "%.2f\t%.2f\t%d\t%d\t%.4f\t%.4f\t%21.8f\n",
                   ev->x, ev->y, ev->pi, ev->pha, ev->dx, ev->dy, ev->time);
         }

       For a struct of arrays or a struct of array pointers, we have a single struct through which we access
       individual columns and rows using:

         aev = (AEv)buf;
         for(i=0; i<got; i++){
           fprintf(stdout, "%.2f\t%.2f\t%d\t%d\t%.4f\t%.4f\t%21.8f\n",
                   aev->x[i], aev->y[i], aev->pi[i], aev->pha[i],
                   aev->dx[i], aev->dy[i], aev->time[i]);
         }

       Support for struct of arrays in the FunTableRowPut() call is handled analogously.

       See the evread example code and evmerge example code for working examples of how FunColumnSelect() is
       used.

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO