plucky (3) gd_getdata.3.gz

Provided by: libgetdata-doc_0.11.0-14ubuntu1_all bug

NAME

       gd_getdata — retrieve data from a Dirfile database

SYNOPSIS

       #include <getdata.h>

       size_t gd_getdata(DIRFILE *dirfile, const char *field_code, off_t first_frame, off_t first_sample, size_t
              num_frames, size_t num_samples, gd_type_t return_type, void *data_out);

DESCRIPTION

       The gd_getdata() function queries a dirfile(5) database specified by dirfile for  the  field  field_code.
       It fetches num_frames frames plus num_samples samples from this field, starting first_sample samples past
       frame first_frame.  The data is converted to the data type specified by return_type, and  stored  in  the
       user-supplied buffer data_out.

       The  field_code  may contain one of the representation suffixes listed in dirfile-format(5).  If it does,
       gd_getdata() will compute the appropriate complex norm before returning the data.

       The dirfile argument must point to a valid DIRFILE object previously created by  a  call  to  gd_open(3).
       The  argument  data_out  must  point  to  a  valid  memory  location  of sufficient size to hold all data
       requested.

       Unless using GD_HERE (see below), the first sample returned will be

              first_frame * samples_per_frame + first_sample

       as measured from the start of the dirfile, where samples_per_frame is the number of samples per frame  as
       returned by gd_spf(3).  The number of samples fetched is, similarly,

              num_frames * samples_per_frame + num_samples.

       Although calling gd_getdata() using both samples and frames is possible, the function is typically called
       with either num_samples and first_sample, or num_frames and first_frames, equal to zero.

       Instead of explicitly specifying the origin of the read, the caller may pass the special  symbol  GD_HERE
       as  first_frame.   This  will result in the read occurring at the current position of the I/O pointer for
       the field (see GetData I/O Pointers below for a discussion of field I/O pointers).   In  this  case,  the
       value of first_sample is ignored.

       When  reading  a SINDIR field, return_type must be GD_STRING.  For all other field types, the return_type
       argument should be one of the following symbols, which indicates the desired return type of the data:

              GD_UINT8
                      unsigned 8-bit integer

              GD_INT8 signed (two's complement) 8-bit integer

              GD_UINT16
                      unsigned 16-bit integer

              GD_INT16
                      signed (two's complement) 16-bit integer

              GD_UINT32
                      unsigned 32-bit integer

              GD_INT32
                      signed (two's complement) 32-bit integer

              GD_UINT64
                      unsigned 64-bit integer

              GD_INT64
                      signed (two's complement) 64-bit integer

              GD_FLOAT32
                      IEEE-754 standard 32-bit single precision floating point number

              GD_FLOAT64
                      IEEE-754 standard 64-bit double precision floating point number

              GD_COMPLEX64
                      C99-conformant 64-bit single precision complex number

              GD_COMPLEX128
                      C99-conformant 128-bit double precision complex number

              GD_NULL the null type: the database is queried as usual, but no data is returned.  In  this  case,
                      data_out is ignored and may be NULL.

       The  return  type  of the data need not be the same as the type of the data stored in the database.  Type
       conversion will be performed as necessary to return the requested  type.   If  the  field_code  does  not
       indicate a representation, but conversion from a complex value to a purely real one is required, only the
       real portion of the requested vector will be returned.

       Upon successful completion, the I/O pointer of the field will be on the sample immediately following  the
       last  sample  returned, if possible.  On error, the position of the I/O pointer is not specified, and may
       not even be well defined.

   Behaviour While Reading Specific Field Types
       MPLEX: Reading an MPLEX field typically requires GetData to read data before the range returned in  order
              to determine the value of the first sample returned.  This can become expensive if the encoding of
              the underlying RAW data does not support seeking backwards (which  is  true  of  most  compression
              encodings).   How  much preceding data GetData searches for the initial value of the returned data
              can be adjusted, or the lookback disabled completely, using gd_mplex_lookback(3).  If the  initial
              value of the field is not found in the data searched, GetData will fill the returned vector, up to
              the next available sample of the mulitplexed  field,  with  zero  for  integer  return  types,  or
              IEEE-754-conforming  NaN (not-a-number) for floating point return types, as it does when providing
              data before the beginning-of-field.

              GetData caches the value of the last sample from every MPLEX it reads so that a subsequent read of
              the  field starting from the following sample (either through an explicit starting sample given by
              the caller or else implicitly using GD_HERE) will not need to  scan  the  field  backwards.   This
              cache  is invalidated if a different return type is used, or if an intervening operation moves the
              field's I/O pointer.

       SINDIR:
              The only allowed return_type when reading SINDIR data is GD_STRING.  The data argument  should  be
              of type const char **, and be large enough to hold one pointer for each sample requested.  It will
              be filled with pointers to read-only string data.  The caller should not free the returned  string
              pointers.   For  convenience  when  allocating  buffers,  the GD_STRING constant has the property:
              GD_SIZE(GD_STRING) == sizeof(const char *).  On samples where the index vector is out of range  of
              the  SARRAY,  and also on samples before the index vector's frame offset, the value stored in data
              will be the NULL pointer.

       PHASE: A forward-shifted PHASE field will always encounter the end-of-field marker before its input field
              does.    This   has  ramifications  when  reading  streaming  data  with  gd_getdata()  and  using
              gd_nframes(3) to gauge field lengths (that is: a forward-shifted PHASE field always has less  data
              in it than gd_nframes(3) implies that it does).  As with any other field, gd_getdata() will return
              a short count whenever a read from a PHASE field encounters the end-of-field marker.

              Backward-shifted PHASE fields do not suffer from this problem, since gd_getdata() pads reads  past
              the  beginning-of-field marker with NaN or zero as appropriate.  Database creators who wish to use
              the PHASE field type with streaming data are encouraged to work around  this  limitation  by  only
              using  backward-shifted PHASE fields, by writing RAW data at the maximal frame lag, and then back-
              shifting all data which should have been written earlier.   Another  possible  work-around  is  to
              write  systematically  less  data  to the reference RAW field in proportion to the maximal forward
              phase shift.  This method will work with applications which respect the database size reported  by
              gd_nframes(3)  resulting  in  these  applications  effectively  ignoring all frames past the frame
              containing the maximally forward-shifted PHASE field's end-of-field marker.

       WINDOW:
              The samples of a WINDOW for which the field conditional is false will be filled with  either  zero
              for  integer  return  types,  or  IEEE-754-conforming NaN (not-a-number) for floating point return
              types.

RETURN VALUE

       In all cases, gd_getdata() returns the number of samples (not bytes) successfully read from the database.
       If  the  end-of-field is encountered before the requested number of samples have been read, a short count
       will result.  this is not an error.

       Requests for data before the beginning-of-field marker, which may have been shifted from frame zero by  a
       PHASE  field  or  /FRAMEOFFSET directive, will result in the the data being padded at the front by NaN or
       zero, depending on whether the return type is of floating point or integral type.

       On error, this function returns zero and stores a negative-valued error code in the DIRFILE object  which
       may be retrieved by a subsequent call to gd_error(3).  Possible error codes are:

       GD_E_ALLOC
               The library was unable to allocate memory.

       GD_E_BAD_CODE
               The  field  specified by field_code, or one of the fields it uses for input, was not found in the
               database.

       GD_E_BAD_DIRFILE
               An invalid dirfile was supplied.

       GD_E_BAD_SCALAR
               A scalar field used in the definition of the field was not found, or was not of scalar type.

       GD_E_BAD_TYPE
               An invalid return_type was specified.

       GD_E_DIMENSION
               The supplied field_code referred to a CONST, CARRAY, or STRING  field.   The  caller  should  use
               gd_get_constant(3),  or  gd_get_string(3)  instead.   Or, a scalar field was found where a vector
               field was expected in the definition of field_code or one of its inputs.

       GD_E_DOMAIN
               An immediate read was attempted using GD_HERE, but the I/O pointer of  the  field  was  not  well
               defined  because  two  or  more of the field's inputs did not agree as to the location of the I/O
               pointer.

       GD_E_INTERNAL_ERROR
               An internal error occurred in the library while trying to perform the task.  This indicates a bug
               in the library.  Please report the incident to the maintainer.

       GD_E_IO An  error  occurred  while  trying  to open or read from a file on disk containing a raw field or
               LINTERP table.

       GD_E_LUT
               A LINTERP table was malformed.

       GD_E_RANGE
               An attempt was made to read data outside the addressable Dirfile range (more than  2**63  samples
               past the start of the dirfile).

       GD_E_RECURSE_LEVEL
               Too  many  levels of recursion were encountered while trying to resolve field_code.  This usually
               indicates a circular dependency in field specification in the dirfile.

       GD_E_UNKNOWN_ENCODING
               The encoding scheme of a RAW field could not be determined.  This  may  also  indicate  that  the
               binary file associated with the RAW field could not be found.

       GD_E_UNSUPPORTED
               Reading  from  dirfiles with the encoding scheme of the specified dirfile is not supported by the
               library.  See dirfile-encoding(5) for details on dirfile encoding schemes.

       A descriptive error string for the error may be obtained by calling gd_error_string(3).

NOTES

       To save memory, gd_getdata() uses the memory pointed to by data_out  as  scratch  space  while  computing
       derived  fields.   As  a  result, if an error is encountered during the computation, the contents of this
       memory buffer are unspecified, and may have been modified by this call,  even  though  gd_getdata()  will
       report zero samples returned on error.

       Reading  slim-compressed  data  (see  defile-encoding(5)),  may  cause  unexpected memory usage.  This is
       because slimlib internally caches open decompressed files as they are read,  and  GetData  doesn't  close
       data files between gd_getdata() calls for efficiency's sake.  Memory used by this internal slimlib buffer
       can be reclaimed by calling gd_raw_close(3) on fields when finished reading them.

       When operating on a platform whose size_t is N-bytes wide, a  single  call  of  gd_getdata()  will  never
       return  more than (2**(N-1) - 1) samples.  The request will be truncated at (2**(N-M) - 1) samples, where
       M is the size, in bytes, of the largest data type used to calculate the  returned  field.   If  a  larger
       request is specified, less data than requested will be returned, without raising an error.  This limit is
       imposed even when return_type is GD_NULL or when reading from the INDEX field (i.e., even when no  actual
       I/O or calculation occurs).  In all cases, the actual amount of data is returned.

GETDATA I/O POINTERS

       This  is  a general discussion of field I/O pointers in the GetData library, and contains information not
       directly applicable to gd_getdata().

       Every RAW field in an open Dirfile has an I/O pointer which indicates  the  library's  current  read  and
       write  poisition  in the field.  These I/O pointers are useful when performing sequential reads or writes
       on Dirfile fields (see GD_HERE in the description above).  The value of the I/O pointer  of  a  field  is
       reported by gd_tell(3).

       Derived  fields  have  virtual  I/O  pointers arising from the I/O pointers of their input fields.  These
       virtual I/O pointers may be valid (when all input fields agree on  their  position  in  the  dirfile)  or
       invalid  (when  the input fields are not in agreement).  The I/O pointer of some derived fields is always
       invalid.  The usual reason for this is the derived field simultaneously reading from two different places
       in the same RAW field.  For example, given the following Dirfile metadata specification:

              a RAW UINT8 1
              b PHASE a 1
              c LINCOM 2 a 1 0 b 1 0

       the  derived  field c never has a valid I/O pointer, since any particular sample of c ultimately involves
       reading from more than one place in the RAW field a.  Attempting to perform sequential  reads  or  writes
       (with  GD_HERE) on a derived field when its I/O pointer is invalid will result in an error (specifically,
       GD_E_DOMAIN).

       The implicit INDEX field has an effective I/O pointer than mostly behaves  like  a  true  RAW  field  I/O
       pointer,  except  that  it  permits  simultaneous reads from multiple locations.  So, given the following
       metadata specification:

              d PHASE INDEX 1
              e LINCOM 2 INDEX 1 0 d 1 0

       the I/O pointer of the derived field e will always be valid, unlike the similarly defined c  above.   The
       virtual  I/O  pointer  of  a  derived  field  will change in response to movement of the RAW I/O pointers
       underlying the derived fields inputs, and vice versa: moving the I/O pointer of a derived field will move
       the  I/O pointer of the RAW fields from which it ultimately derives.  As a result, the I/O pointer of any
       particular field may move in unexpected ways if multiple fields are manipulated at the same time.

       When a Dirfile is first opened, the I/O pointer of every RAW field is set to the beginning-of-frame  (the
       value returned by gd_bof(3)), as is the I/O pointer of any newly-created RAW field.

       The following library calls cause I/O pointers to move:

       gd_getdata() and gd_putdata(3)
              These  functions  move  the I/O pointer of affected fields to the sample immediately following the
              last sample read or written, both when performed at an  absolutely  specified  position  and  when
              called  for  a  sequential  read  or  write  using  GD_HERE.   When  reading a derived field which
              simultaneously reads from more than one place in a RAW field (such as c above),  the  position  of
              that  RAW  field's  I/O  pointer is unspecified (that is: it is not specified which input field is
              read first).

       gd_seek(3)
              This function is used to manipulate I/O pointers directly.

       gd_flush(3) and gd_raw_close(3)
              These functions set the I/O pointer of any RAW field which is closed  back  to  the  beginning-of-
              field.

       calls which result in modifications to raw data files:
              this   may   happen   when   calling   any   of:   gd_alter_encoding(3),   gd_alter_endianness(3),
              gd_alter_frameoffset(3), gd_alter_entry(3), gd_alter_raw(3), gd_alter_spec(3),  gd_malter_spec(3),
              gd_move(3),  or  gd_rename(3);  these functions close affected RAW fields before making changes to
              the raw data files, and so reset the corresponding I/O pointers to the beginning-of-field.

       In general, when these calls fail, the I/O pointers of affected fields  may  be  anything,  even  out-of-
       bounds  or  invalid.   After  an error, the caller should issue an explicit gd_seek(3) to repoisition I/O
       pointers before attempting further sequential operations.

HISTORY

       The function getdata() appeared in GetData-0.3.0.

       The GD_COMPLEX64 and GD_COMPLEX128 data types appeared in GetData-0.6.0.

       In GetData-0.7.0, this function was renamed to gd_getdata().

       The GD_HERE symbol used for sequential reads appeared in GetData-0.8.0.

       The GD_STRING data type appeared in GetData-0.10.0.

SEE ALSO

       GD_SIZE(3), gd_error(3), gd_error_string(3), gd_get_constant(3), gd_get_string(3),  gd_mplex_lookback(3),
       gd_nframes(3),  gd_open(3),  gd_raw_close(3),  gd_seek(3), gd_spf(3), gd_putdata(3), dirfile(5), dirfile-
       encoding(5)