Ubuntu Manpage: FunColumnSelect - select Funtools columns

Provided by: libfuntools-dev_1.4.7-4_amd64

NAME

       FunColumnSelect - select Funtools columns

SYNOPSIS

         #include <funtools.h>

         int FunColumnSelect(Fun fun, int size, char *plist,
                             char *name1, char *type1, char *mode1, int offset1,
                             char *name2, char *type2, char *mode2, int offset2,
                             ...,
                             NULL)

         int FunColumnSelectArr(Fun fun, int size, char *plist,
                                char **names, char **types, char **modes,
                                int *offsets, int nargs);

DESCRIPTION

       The FunColumnSelect() routine is used to select the columns from a Funtools binary table
       extension or raw event file for processing. This routine allows you to specify how columns
       in a file are to be read into a user record structure or written from a user record
       structure to an output FITS file.

       The first argument is the Fun handle associated with this set of columns. The second
       argument specifies the size of the user record structure into which columns will be read.
       Typically, the sizeof() macro is used to specify the size of a record structure.  The
       third argument allows you to specify keyword directives for the selection and is described
       in more detail below.

       Following the first three required arguments is a variable length list of column
       specifications.  Each column specification will consist of four arguments:

       •   name: the name of the column

       •   type: the data type of the column as it will be stored in the user record struct (not
           the data type of the input file). The following basic data types are recognized:

           •   A: ASCII characters

           •   B: unsigned 8-bit char

           •   I: signed 16-bit int

           •   U: unsigned 16-bit int (not standard FITS)

           •   J: signed 32-bit int

           •   V: unsigned 32-bit int (not standard FITS)

           •   E: 32-bit float

           •   D: 64-bit float

           The syntax used is similar to that which defines the TFORM parameter in FITS binary
           tables. That is, a numeric repeat value can precede the type character, so that "10I"
           means a vector of 10 short ints, "E" means a single precision float, etc.  Note that
           the column value from the input file will be converted to the specified data type as
           the data is read by FunTableRowGet().

           [ A short digression regarding bit-fields: Special attention is required when reading
           or writing the FITS bit-field type ("X"). Bit-fields almost always have a numeric
           repeat character preceding the 'X' specification. Usually this value is a multiple of
           8 so that bit-fields fit into an integral number of bytes. For all cases, the byte
           size of the bit-field B is (N+7)/8, where N is the numeric repeat character.

           A bit-field is most easily declared in the user struct as an array of type char of
           size B as defined above. In this case, bytes are simply moved from the file to the
           user space.  If, instead, a short or int scalar or array is used, then the algorithm
           for reading the bit-field into the user space depends on the size of the data type
           used along with the value of the repeat character.  That is, if the user data size is
           equal to the byte size of the bit-field, then the data is simply moved (possibly with
           endian-based byte-swapping) from one to the other. If, on the other hand, the data
           storage is larger than the bit-field size, then a data type cast conversion is
           performed to move parts of the bit-field into elements of the array.  Examples will
           help make this clear:

           •   If the file contains a 16X bit-field and user space specifies a 2B char array[2],
               then the bit-field is moved directly into the char array.

           •   If the file contains a 16X bit-field and user space specifies a 1I scalar short
               int, then the bit-field is moved directly into the short int.

           •   If the file contains a 16X bit-field and user space specifies a 1J scalar int,
               then the bit-field is type-cast to unsigned int before being moved (use of
               unsigned avoids possible sign extension).

           •   If the file contains a 16X bit-field and user space specifies a 2J int array[2],
               then the bit-field is handled as 2 chars, each of which are type-cast to unsigned
               int before being moved (use of unsigned avoids possible sign extension).

           •   If the file contains a 16X bit-field and user space specifies a 1B char, then the
               bit-field is treated as a char, i.e., truncation will occur.

           •   If the file contains a 16X bit-field and user space specifies a 4J int array[4],
               then the results are undetermined.

           For all user data types larger than char, the bit-field is byte-swapped as necessary
           to convert to native format, so that bits in the resulting data in user space can be
           tested, masked, etc. in the same way regardless of platform.]

           In addition to setting data type and size, the type specification allows a few
           ancillary parameters to be set, using the full syntax for type:

            [@][n]<type>[[['B']poff]][:[tlmin[:tlmax[:binsiz]]]]

           The special character "@" can be prepended to this specification to indicated that the
           data element is a pointer in the user record, rather than an array stored within the
           record.

           The [n] value is an integer that specifies the number of elements that are in this
           column (default is 1). TLMIN, TLMAX, and BINSIZ values also can be specified for this
           column after the type, separated by colons. If only one such number is specified, it
           is assumed to be TLMAX, and TLMIN  and BINSIZ are set to 1.

           The [poff] value can be used to specify the offset into an array. By default, this
           offset value is set to zero and the data specified starts at the beginning of the
           array. The offset usually is specified in terms of the data type of the column. Thus
           an offset specification of [5] means a 20-byte offset if the data type is a 32-bit
           integer, and a 40-byte offset for a double. If you want to specify a byte offset
           instead of an offset tied to the column data type, precede the offset value with 'B',
           e.g. [B6] means a 6-bye offset, regardless of the column data type.

           The [poff] is especially useful in conjunction with the pointer @ specification, since
           it allows the data element to anywhere stored anywhere in the allocated array.  For
           example, a specification such as "@I[2]" specifies the third (i.e., starting from 0)
           element in the array pointed to by the pointer value. A value of "@2I[4]" specifies
           the fifth and sixth values in the array. For example, consider the following
           specification:

             typedef struct EvStruct{
               short x[4], *atp;
             } *Event, EventRec;
             /* set up the (hardwired) columns */
             FunColumnSelect( fun, sizeof(EventRec), NULL,
                              "2i",    "2I  ",    "w", FUN_OFFSET(Event, x),
                              "2i2",   "2I[2]",   "w", FUN_OFFSET(Event, x),
                              "at2p",  "@2I",     "w", FUN_OFFSET(Event, atp),
                              "at2p4", "@2I[4]",  "w", FUN_OFFSET(Event, atp),
                              "atp9",  "@I[9]",   "w", FUN_OFFSET(Event, atp),
                              "atb20", "@I[B20]", "w", FUN_OFFSET(Event, atb),
                              NULL);

           Here we have specified the following columns:

           •   2i: two short ints in an array which is stored as part the record

           •   2i2: the 3rd and 4th elements of an array which is stored as part of the record

           •   an array of at least 10 elements, not stored in the record but allocated
               elsewhere, and used by three different columns:

               •   at2p: 2 short ints which are the first 2 elements of the allocated array

               •   at2p4: 2 short ints which are the 5th and 6th elements of the allocated array

               •   atp9: a short int which is the 10th element of the allocated array

           •   atb20: a short int which is at byte offset 20 of another allocated array

           In this way, several columns can be specified, all of which are in a single array. NB:
           it is the programmer's responsibility to ensure that specification of a positive value
           for poff does not point past the end of valid data.

       •   read/write mode: "r" means that the column is read from an input file into user space
           by FunTableRowGet(), "w" means that the column is written to an output file. Both can
           specified  at the same time.

       •   offset: the offset into the user data to store this column. Typically, the macro
           FUN_OFFSET(recname, colname) is used to define the offset into a record structure.

       When all column arguments have been specified, a final NULL argument must added to signal
       the column selection list.

       As an alternative to the varargs FunColumnSelect() routine, a non-varargs routine called
       FunColumnSelectArr() also is available. The first three arguments (fun, size, plist) of
       this routine are the same as in FunColumnSelect().  Instead of a variable argument list,
       however, FunColumnSelectArr() takes 5 additional arguments. The first 4 arrays arguments
       contain the names, types, modes, and offsets, respectively, of the columns being selected.
       The final argument is the number of columns that are contained in these arrays. It is the
       user's responsibility to free string space allocated in these arrays.

       Consider the following example:

         typedef struct evstruct{
           int status;
           float pi, pha, *phas;
           double energy;
         } *Ev, EvRec;

         FunColumnSelect(fun, sizeof(EvRec), NULL,
           "status",  "J",     "r",   FUN_OFFSET(Ev, status),
           "pi",      "E",     "r",  FUN_OFFSET(Ev, pi),
           "pha",     "E",     "r",  FUN_OFFSET(Ev, pha),
           "phas",    "@9E",   "r",  FUN_OFFSET(Ev, phas),
           NULL);

       Each time a row is read into the Ev struct, the "status" column is converted to an int
       data type (regardless of its data type in the file) and stored in the status value of the
       struct.  Similarly, "pi" and "pha", and the phas vector are all stored as floats. Note
       that the "@" sign indicates that the "phas" vector is a pointer to a 9 element array,
       rather than an array allocated in the struct itself. The row record can then be processed
       as required:

         /* get rows -- let routine allocate the row array */
         while( (ebuf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){
           /* process all rows */
           for(i=0; i<got; i++){
             /* point to the i'th row */
             ev = ebuf+i;
             ev->pi = (ev->pi+.5);
             ev->pha = (ev->pi-.5);
           }

       FunColumnSelect() can also be called to define "writable" columns in order to generate a
       FITS Binary Table, without reference to any input columns.  For example, the following
       will generate a 4-column FITS binary table when FunTableRowPut() is used to write Ev
       records:

         typedef struct evstruct{
           int status;
           float pi, pha
           double energy;
         } *Ev, EvRec;

         FunColumnSelect(fun, sizeof(EvRec), NULL,
           "status",  "J",     "w",   FUN_OFFSET(Ev, status),
           "pi",      "E",     "w",   FUN_OFFSET(Ev, pi),
           "pha",     "E",     "w",   FUN_OFFSET(Ev, pha),
           "energy",  "D",       "w",   FUN_OFFSET(Ev, energy),
           NULL);

       All columns are declared to be write-only, so presumably the column data is being
       generated or read from some other source.

       In addition, FunColumnSelect() can be called to define both "readable" and "writable"
       columns.  In this case, the "read" columns are associated with an input file, while the
       "write" columns are associated with the output file. Of course, columns can be specified
       as both "readable" and "writable", in which case they are read from input and (possibly
       modified data values are) written to the output.  The FunColumnSelect() call itself is
       made by passing the input Funtools handle, and it is assumed that the output file has been
       opened using this input handle as its Funtools reference handle.

       Consider the following example:

         typedef struct evstruct{
           int status;
           float pi, pha, *phas;
           double energy;
         } *Ev, EvRec;

         FunColumnSelect(fun, sizeof(EvRec), NULL,
           "status",  "J",     "r",   FUN_OFFSET(Ev, status),
           "pi",      "E",     "rw",  FUN_OFFSET(Ev, pi),
           "pha",     "E",     "rw",  FUN_OFFSET(Ev, pha),
           "phas",    "@9E",   "rw",  FUN_OFFSET(Ev, phas),
           "energy",  "D",     "w",   FUN_OFFSET(Ev, energy),
           NULL);

       As in the "read" example above, each time an row is read into the Ev struct, the "status"
       column is converted to an int data type (regardless of its data type in the file) and
       stored in the status value of the struct.  Similarly, "pi" and "pha", and the phas vector
       are all stored as floats.  Since the "pi", "pha", and "phas" variables are declared as
       "writable" as well as "readable", they also will be written to the output file.  Note,
       however, that the "status" variable is declared as "readable" only, and hence it will not
       be written to an output file.  Finally, the "energy" column is declared as "writable"
       only, meaning it will not be read from the input file. In this case, it can be assumed
       that "energy" will be calculated in the program before being output along with the other
       values.

       In these simple cases, only the columns specified as "writable" will be output using
       FunTableRowPut().  However, it often is the case that you want to merge the user columns
       back in with the input columns, even in cases where not all of the input column names are
       explicitly read or even known. For this important case, the merge=[type] keyword is
       provided in the plist string.

       The merge=[type] keyword tells Funtools to merge the columns from the input file with user
       columns on output.  It is normally used when an input and output file are opened and the
       input file provides the Funtools reference handle for the output file. In this case, each
       time FunTableRowGet() is called, the raw input rows are saved in a special buffer. If
       FunTableRowPut() then is called (before another call to FunTableRowGet()), the contents of
       the raw input rows are merged with the user rows according to the value of type as
       follows:

       •   update: add new user columns, and update value of existing ones (maintaining the input
           data type)

       •   replace: add new user columns, and replace the data type and value of existing ones.
           (Note that if tlmin/tlmax values are not specified in the replacing column, but are
           specified in the original column being replaced, then the original tlmin/tlmax values
           are used in the replacing column.)

       •   append: only add new columns, do not "replace" or "update" existing ones

       Consider the example above. If merge=update is specified in the plist string, then
       "energy" will be added to the input columns, and the values of "pi", "pha", and "phas"
       will be taken from the user space (i.e., the values will be updated from the original
       values, if they were changed by the program).  The data type for "pi", "pha", and "phas"
       will be the same as in the original file.  If merge=replace is specified, both the data
       type and value of these three input columns will be changed to the data type and value in
       the user structure.  If merge=append is specified, none of these three columns will be
       updated, and only the "energy" column will be added. Note that in all cases, "status" will
       be written from the input data, not from the user record, since it was specified as
       read-only.

       Standard applications will call FunColumnSelect() to define user columns. However, if this
       routine is not called, the default behavior is to transfer all input columns into user
       space. For this purpose a default record structure is defined such that each data element
       is properly aligned on a valid data type boundary.  This mechanism is used by programs
       such as fundisp and funtable to process columns without needing to know the specific names
       of those columns.  It is not anticipated that users will need such capabilities (contact
       us if you do!)

       By default, FunColumnSelect() reads/writes rows to/from an "array of structs", where each
       struct contains the column values for a single row of the table. This means that the
       returned values for a given column are not contiguous. You can set up the IO to return a
       "struct of arrays" so that each of the returned columns are contiguous by specifying
       org=structofarrays (abbreviation: org=soa) in the plist.  (The default case is
       org=arrayofstructs or org=aos.)

       For example, the default setup to retrieve rows from a table would be to define a record
       structure for a single event and then call
        FunColumnSelect() as follows:

         typedef struct evstruct{
           short region;
           double x, y;
           int pi, pha;
           double time;
         } *Ev, EvRec;

         got = FunColumnSelect(fun, sizeof(EvRec), NULL,
                               "x",       "D:10:10", mode, FUN_OFFSET(Ev, x),
                               "y",       "D:10:10", mode, FUN_OFFSET(Ev, y),
                               "pi",      "J",       mode, FUN_OFFSET(Ev, pi),
                               "pha",     "J",       mode, FUN_OFFSET(Ev, pha),
                               "time",    "1D",      mode, FUN_OFFSET(Ev, time),
                               NULL);

       Subsequently, each call to FunTableRowGet() will return an array of structs, one for each
       returned row. If instead you wanted to read columns into contiguous arrays, you specify
       org=soa:

         typedef struct aevstruct{
           short region[MAXROW];
           double x[MAXROW], y[MAXROW];
           int pi[MAXROW], pha[MAXROW];
           double time[MAXROW];
         } *AEv, AEvRec;

         got = FunColumnSelect(fun, sizeof(AEvRec), "org=soa",
                             "x",       "D:10:10", mode, FUN_OFFSET(AEv, x),
                             "y",       "D:10:10", mode, FUN_OFFSET(AEv, y),
                             "pi",      "J",       mode, FUN_OFFSET(AEv, pi),
                             "pha",     "J",       mode, FUN_OFFSET(AEv, pha),
                             "time",    "1D",      mode, FUN_OFFSET(AEv, time),
                             NULL);

       Note that the only modification to the call is in the plist string.

       Of course, instead of using statically allocated arrays, you also can specify dynamically
       allocated pointers:

         /* pointers to arrays of columns (used in struct of arrays) */
         typedef struct pevstruct{
           short *region;
           double *x, *y;
           int *pi, *pha;
           double *time;
         } *PEv, PEvRec;

         got = FunColumnSelect(fun, sizeof(PEvRec), "org=structofarrays",
                             "$region", "@I",       mode, FUN_OFFSET(PEv, region),
                             "x",       "@D:10:10", mode, FUN_OFFSET(PEv, x),
                             "y",       "@D:10:10", mode, FUN_OFFSET(PEv, y),
                             "pi",      "@J",       mode, FUN_OFFSET(PEv, pi),
                             "pha",     "@J",       mode, FUN_OFFSET(PEv, pha),
                             "time",    "@1D",      mode, FUN_OFFSET(PEv, time),
                             NULL);

       Here, the actual storage space is either allocated by the user or by the FunColumnSelect()
       call).

       In all of the above cases, the same call is made to retrieve rows, e.g.:

           buf = (void *)FunTableRowGet(fun, NULL, MAXROW, NULL, &got);

       However, the individual data elements are accessed differently.  For the default case of
       an "array of structs", the individual row records are accessed using:

         for(i=0; i<got; i++){
           ev = (Ev)buf+i;
           fprintf(stdout, "%.2f\t%.2f\t%d\t%d\t%.4f\t%.4f\t%21.8f\n",
                   ev->x, ev->y, ev->pi, ev->pha, ev->dx, ev->dy, ev->time);
         }

       For a struct of arrays or a struct of array pointers, we have a single struct through
       which we access individual columns and rows using:

         aev = (AEv)buf;
         for(i=0; i<got; i++){
           fprintf(stdout, "%.2f\t%.2f\t%d\t%d\t%.4f\t%.4f\t%21.8f\n",
                   aev->x[i], aev->y[i], aev->pi[i], aev->pha[i],
                   aev->dx[i], aev->dy[i], aev->time[i]);
         }

       Support for struct of arrays in the FunTableRowPut() call is handled analogously.

       See the evread example code and evmerge example code for working examples of how
       FunColumnSelect() is used.

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO