Provided by: libgetdata-dev_0.7.3-6ubuntu1_amd64 bug

NAME

       gd_cbopen, gd_open — open or create a dirfile

SYNOPSIS

       #include <getdata.h>

       DIRFILE* gd_cbopen(const char *dirfilename, unsigned long flags, gd_parser_callback_t
              sehandler, void *extra);

       DIRFILE* gd_open(const char *dirfilename, unsigned long flags);

DESCRIPTION

       The gd_cbopen() function opens or creates the dirfile specified by dirfilename,  returning
       a DIRFILE object associated with it.  Opening a dirfile will cause the library to read and
       parse the dirfile's format specification (see dirfile-format(5)).

       If not NULL, sehandler should be a pointer to a function which will be called  whenever  a
       syntax  error  is  encountered  during parsing the format specification.  Specify NULL for
       this parameter if no callback function is to be used.  The caller may use this function to
       correct  the  error  or modify the error handling of the format specification parser.  See
       The Callback Function section below for details on  this  function.   The  extra  argument
       allows  the  caller  to pass data to the callback function.  The pointer will be passed to
       the callback function verbatim.

       The gd_open() function is equivalent to gd_cbopen(), with sehandler and extra set to NULL.

       The flags argument should include one  of  the  access  modes:  GD_RDONLY  (read-only)  or
       GD_RDWR  (read-write),  and may also contain zero or more of the following flags, bitwise-
       or'd together:

       GD_ARM_ENDIAN
       GD_NOT_ARM_ENDIAN
              Specifies that double precision floating point raw data on  disk  is,  or  is  not,
              stored in the middle-endian format used by older ARM processors.

              These  flag  are  only  set  the default endianness, and will be overridden when an
              /ENDIAN directive specifies the byte sex of RAW fields, unless  GD_FORCE_ENDIAN  is
              also specified.

              On  every  platform,  one of these flags (GD_NOT_ARM_ENDIAN on all but middle-ended
              ARM systems) indicates the native behaviour of  the  platform.   That  symbol  will
              equal zero, and may be omitted.

       GD_BIG_ENDIAN
       GD_LITTLE_ENDIAN
              Specifies  the  default byte sex of raw data stored on disk to be either big-endian
              (most significant byte first) or  little-endian  (least  significant  byte  first).
              Omitting  both  flags  indicates the default should be the native endianness of the
              platform.

              Unlike the ARM endianness flags above, neither  of  these  symbols  is  ever  zero.
              Specifying  both  these  flags  together  will cause the library to assume that the
              endianness of the data is opposite to that of  the  native  architecture,  whatever
              that might be.

              These  flag  are  only  set  the default endianness, and will be overridden when an
              /ENDIAN directive specifies the byte sex of RAW fields, unless  GD_FORCE_ENDIAN  is
              also specified.

       GD_CREAT
              An  empty dirfile will be created, if one does not already exist.  This will create
              both the dirfile directory and an empty format specification  file  called  format.
              The  directory  will have have mode S_IRWXU | S_IRWXG | S_IRWXO (0777), modified by
              the caller's umask value (see umask(2)).  The format file will have mode S_IRUSR  |
              S_IWUSR  |  S_IRGRP  |  S_IWGRP  |  S_IROTH  | S_IWOTH (0666), also modified by the
              caller's umask.

              The owner of the dirfile directory and format file will be the effective user ID of
              the caller.  Group ownership follows the rules outlined in mkdir(2).

       GD_EXCL
              Ensure  that  this  call creates a dirfile: when specified along with GD_CREAT, the
              call will fail if the dirfile specified by dirfilename already  exists.   Behaviour
              of this flag is undefined if GD_CREAT is not specified.  This flag suffers from all
              the limitations of the O_EXCL flag as indicated in open(2).

       GD_FORCE_ENCODING
              Specifies that /ENCODING directives (see dirfile-format(5)) found  in  the  dirfile
              format  specification  should  be  ignored.  The encoding scheme specified in flags
              will be used instead (see below).

       GD_FORCE_ENDIAN
              Specifies that /ENDIAN directives (see  dirfile-format(5))  found  in  the  dirfile
              format  specification  should be ignored.  All raw data will be assumed to have the
              byte   sex    indicated    through    the    presence    or    absense    of    the
              GD_ARM_ENDIAN, GD_BIG_ENDIAN, GD_LITTLE_ENDIAN, and GD_NOT_ARM_ENDIAN flags.

       GD_IGNORE_DUPS
              If  the  dirfile  format metadata specifies more than one field with the same name,
              all but one of them will be ignored by the  parser.   Without  this  flag,  parsing
              would  fail  with  the  GD_E_FORMAT  error, possibly resulting in invocation of the
              registered callback function.  Which  of  the  duplicate  fields  is  kept  is  not
              specified.   As  a  result,  this  flag  is typically only useful in the case where
              identical copies of a field specification line are present.

              No indication is provided to indicate whether a duplicate field has been discarded.
              If    finer    grained    control   is   required,   the   caller   should   handle
              GD_E_FORMAT_DUPLICATE suberrors itself with an appropriate callback function.

       GD_PEDANTIC
              Reject dirfiles which don't conform to the Dirfile Standards.   See  the  Standards
              Compliance section below for full details.

       GD_PERMISSIVE
              Allow  non-compliant  format  specification  syntax,  even  when given along with a
              conflicting /VERSION directive.  See the Standards  Compliance  section  below  for
              full details.

       GD_PRETTY_PRINT
              When    dirfile    metadata   is   flushed   to   disk   (either   explicitly   via
              gd_metaflush(), gd_rewrite_fragment(), or gd_flush() or implicitly by  closing  the
              dirfile),  an  attempt  will be made to create a nicer looking format specification
              (from a human-readable standpoint).  What this explicitly means is not part of  the
              API,  and  any  particular  behaviour  should  not be relied on.  If the dirfile is
              opened read-only, this flag is ignored.

       GD_TRUNC
              If dirfilename specifies an already existing dirfile, it will be  truncated  before
              opening.   Since  gd_cbopen()  decides  whether  dirfilename  specifies an existing
              dirfile before attempting to  parse  the  dirfile,  dirfilename  is  considered  to
              specify  an  existing dirfile if it refers to a directory containing a regular file
              called format, regardless of the content or form of that file.

              Truncation occurs by deleting  every  regular  file  in  the  specified  directory,
              whether  the  files  were  referred  to  by  the  dirfile before truncation or not.
              Accordingly, this flag should  be  used  with  caution.   Subdirectories  are  left
              untouched.   Notably,  this operation does not consider the presence of subdirfiles
              declared by INCLUDE directives.  If the  dirfile  does  not  exist,  this  flag  is
              ignored.

       GD_VERBOSE
              Specifies  that  whenever an error is triggered by the library when working on this
              dirfile, the  corresponding  error  string,  which  can  be  retrieved  by  calling
              gd_error_string(3),  should  be  written on standard error by the library.  Without
              this flag, GetData writes nothing to standard  error.   (GetData  never  writes  to
              standard output.)

       The  flags  argument may also be bitwise or'd with one of the following symbols indicating
       the default encoding scheme of the dirfile.  Like the  endianness  flags,  the  choice  of
       encoding  here  is  ignored  if  the  encoding  is specified in the dirfile itself, unless
       GD_FORCE_ENCODED is also specified.  If none of these symbols is present,  GD_AUTO_ENCODED
       is  assumed, unless the gd_cbopen() call results in creation or truncation of the dirfile.
       In that case, GD_UNENCODED is assumed.  See dirfile-encoding(5)  for  details  on  dirfile
       encoding schemes.

       GD_AUTO_ENCODED
              Specifies that the encoding type is not known in advance, but should be detected by
              the GetData library.  Detection is accomplished by searching  for  raw  data  files
              with  extensions appropriate to the encoding scheme.  This method will notably fail
              if the the library is called via putdata(3) to create a previously non-existent raw
              field  unless  a  read  is  first  successfully performed on the dirfile.  Once the
              library has determined the encoding scheme for the first time, it remembers it  for
              subsequent calls.

       GD_BZIP2_ENDODED
              Specifies  that  raw  data  files  are  compressed  using the Burrows-Wheeler block
              sorting text compression algorithm and Huffman coding, as implemented in the  bzip2
              format.

       GD_GZIP_ENDODED
              Specifies  that  raw  data  files  are compressed using Lempel-Ziv coding (LZ77) as
              implemented in the gzip format.

       GD_LZMA_ENDODED
              Specifies that raw data files are compressed  using  the  Lempel-Ziv  Markov  Chain
              Algorithm (LZMA) as implemented in the xz container format.

       GD_SLIM_ENCODED
              Specifies that raw data files are compressed using the slimlib library.

       GD_TEXT_ENCODED
              Specifies  that raw data files are encoded as text files containing one data sample
              per line.

       GD_UNENCODED
              Specifies that raw data files are not encoded, but written verbatim to disk.

   Standards Compliance
       The latest Dirfile Standards Version which this release of GetData understands is provided
       in  the  preprocessor macro GD_DIRFILE_STANDARDS_VERSION defined in getdata.h.  GetData is
       able to open and parse any dirfile which conforms to this Standards  Version,  or  to  any
       earlier  Version.   The  dirfile-format(5) manual page lists the changes between Standards
       Versions.

       The GetData parser can operate in two  modes:  a  permissive  mode,  in  which  much  non-
       Standards  compliant  syntax  is allowed, and a pedanitc mode, in which the parser adheres
       strictly to the Standards.  If GD_PEDANTIC is passed to gd_cbopen(), the parser will start
       parsing  the  format specification in pedantic mode, otherwise it will start in permissive
       mode.

       Permissive mode is provided primarily to allow  GetData  to  be  used  on  dirfiles  which
       conform  to  no single Standard, but which were accepted by the GetData parser in previous
       versions.  It is notably lax regarding reserved field names, and  field  name  characters,
       the  mixing  of  old  and  new data type specifiers, and generally ignores the presence of
       /VERSION directives.  In read-write mode, permissive mode should be used with caution,  as
       it  can  cause unintentional corruption of dirfile metadata on write, if the heuristics in
       the parser incorrectly guessed the intention of non-compliant syntax.  In permissive mode,
       actual syntax errors are still reported as such.

       In  pedantic  mode,  the  parser  conforms to one specified Standards Version. This target
       version may change any number  of  times  in  the  course  of  scanning  a  single  format
       specification.   If  invoked using the GD_PEDANTIC flag, the parser will start in pedantic
       mode with a target version equal to  GD_DIRFILE_STANDARDS_VERSION.   Whenever  a  /VERSION
       directive is encountered in the format specification, the target version is changed to the
       Standards Version specified.  When encountering a /VERSION directive in  permissive  mode,
       the  parser  will  switch  to  pedantic  mode, unless the GD_PERMISSIVE flag was passed to
       gd_cbopen(), in which case no mode switch will take place.

       Independent of the mode of the parser when parsing the format specification, GetData  will
       calculate  a  list  of  Standards  Versions  to which the parsed metadata conform to.  The
       gd_dirfile_standards(3) function can  provide  this  information,  and  also  specify  the
       desired Standards Version for writing format metadata back to disk.

   The Callback Function
       The  caller-supplied sehandler function is called whenever the format specification parser
       encounters a syntax error (i.e.  whenever it would return the  GD_E_FORMAT  error).   This
       callback may be used to correct the error, or to tell the parser how to recover from it.

       This function should take two pointers as arguments, and return an int:

              int sehandler(gd_parser_data_t *pdata, void *extra);

       The  extra  parameter  is  the  pointer  supplied  to gd_cbopen(), passed verbatim to this
       function.  It can be used to pass caller data to the callback.  GetData does  not  inspect
       this  pointer, not even to check its validity.  If the caller needs to pass no data to the
       callback, it may be NULL.

       The gd_parser_data_t type is a structure with at least the following members:

           typedef struct {
             const DIRFILE* dirfile;
             int suberror;
             int linenum;
             const char* filename;
             char* line;
             size_t buflen;

             ...
           } gd_parser_data_t;

       The pdata->dirfile member will be a pointer to a DIRFILE object suitable only for  passing
       to gd_error_string().  Notably, the caller should not assume this pointer will be the same
       as the pointer eventually returned by gd_cbopen(), nor that it will  be  valid  after  the
       callback function returns.

       The  pdata->suberror parameter will be one of the following symbols indicating the type of
       syntax error encountered:

       GD_E_FORMAT_BAD_LINE
              The line was indecipherable.  Typically this means that the line contained  neither
              a reserved word, nor a field type.

       GD_E_FORMAT_BAD_NAME
              The specified field name was invalid.

       GD_E_FORMAT_BAD_SPF
              The samples-per-frame of a RAW field was out-of-range.

       GD_E_FORMAT_BAD_TYPE
              The data type of a RAW field was unrecognised.

       GD_E_FORMAT_BITNUM
              The first bit of a BIT field was out-of-range.

       GD_E_FORMAT_BITSIZE
              The last bit of a BIT field was out-of-range.

       GD_E_FORMAT_CHARACTER
              An  invalid  character  was  found  in the line, or a character escape sequence was
              malformed.

       GD_E_FORMAT_DUPLICATE
              The specified field name already exists.

       GD_E_FORMAT_ENDIAN
              The byte sex specified by an /ENDIAN directive was unrecognised.

       GD_E_FORMAT_LITERAL
              An unexpected character was encountered in a complex literal.

       GD_E_FORMAT_LOCATION
              The parent of a metafield was defined in another fragment.

       GD_E_FORMAT_METARAW
              An attempt was made to add a RAW metafield.

       GD_E_FORMAT_N_FIELDS
              The number of fields of a LINCOM field was out-of-range.

       GD_E_FORMAT_N_TOK
              An insufficient number of tokens was found on the line.

       GD_E_FORMAT_NO_PARENT
              The parent of a metafield was not found.

       GD_E_FORMAT_NUMBITS
              The number of bits of a BIT field was out-of-range.

       GD_E_FORMAT_PROTECT
              The protection level specified by a PROTECT directive was unrecognised.

       GD_E_FORMAT_RES_NAME
              A field was specified with the reserved name  INDEX  (or  with  the  reserved  name
              FILEFRAM in a dirfile conforming to Standards Version 5 or earlier).

       GD_E_FORMAT_UNTERM
              The last token of the line was unterminated.

       pdata->filename  and pdata->linenum members contains the pathname of the fragment and line
       number where the syntax error was encountered.  The first line in a fragment is line one.

       The pdata->line member contains a copy of the line containing the syntax error.  This line
       may be freely modified by the callback function.  It will then be reparsed if the callback
       function returns the symbol GD_SYNTAX_RESCAN (see below).  The size of the  memory  buffer
       (which  may be greater than the length of the actual string) is provided in pdata->buflen,
       and space is available for at least GD_MAX_LINE_LENGTH bytes.  A larger buffer may be used
       if desired, by assigning a pointer to the new buffer of the desired length to pdata->line.
       The new buffer should be allocated with malloc(3).  It will be freed by  the  parser.   Do
       not  call  free(3)  or  realloc(3)  on  the  original  pointer  passed  to the callback as
       pdata->line: it, too, will be freed by the parser.

       The callback function should return one of the following symbols, which tells  the  parser
       how to subsequently handle the error:

       GD_SYNTAX_ABORT
              The  parser should immediately abort parsing the format specification and fail with
              the error GD_E_FORMAT.  This is the default behaviour, if no callback  function  is
              provided (or if the parser is invoked by calling gd_open()).

       GD_SYNTAX_CONTINUE
              The parser should continue parsing the format specification.  However, once parsing
              has finished, the parser will fail with the error GD_E_FORMAT, even if  no  further
              syntax  errors  are  encountered.   This  behaviour  may  be  used by the caller to
              identify all lines containing syntax errors in the format specification, instead of
              just the first one.

       GD_SYNTAX_IGNORE
              The parser should ignore the line containing the syntax error completely, and carry
              on parsing the format specification.  If no further  errors  are  encountered,  the
              dirfile will be successfully opened.

       GD_SYNTAX_RESCAN
              The  parser  should  rescan  the  line  argument,  which  replaces  the  line which
              originally contained the syntax error.  The line is assumed to have been  corrected
              by  the callback function.  If the line still contains a syntax error, the callback
              function will be called again.

              Note: the line is not corrected on  disk;  however,  the  caller  may  subsequently
              correct the fragment on disk by calling gd_rewrite_fragment(3).

       The  callback function handles only syntax errors.  The parser may still abort early, if a
       different kind of library error is encountered.  Furthermore, although a line may  contain
       more  than  one  syntax error, the parser will only ever report one syntax error per line,
       even if the callback function returns GD_SYNTAX_CONTINUE.

RETURN VALUE

       A call to gd_cbopen() or gd_open() always returns a pointer to a newly  allocated  DIRFILE
       object.  The DIRFILE object is an opaque structure containing the parsed dirfile metadata.
       If an error occurred, the dirfile error will be  set  to  a  non-zero  error  value.   The
       DIRFILE object will also be internally flagged as invalid.  Possible error values are:

       GD_E_ACCMODE
               The  library  was  asked  to  create  or truncate a dirfile opened read-only (i.e.
               GD_CREAT or GD_TRUNC was specified in flags along with GD_RDONLY).

       GD_E_ALLOC
               The library was unable to allocate memory.

       GD_E_BAD_REFERENCE
               The  reference  field  specified  by  a  /REFERENCE  directive   in   the   format
               specification (see dirfile-format(5)) was not found, or was not a RAW field.

       GD_E_CALLBACK
               The registered callback function, sehandler, returned an unrecognised response.

       GD_E_CREAT
               The  library  was  unable  to  create  the dirfile, or the dirfile exists and both
               GD_CREAT and GD_EXCL were specified.

       GD_E_FORMAT
               A syntax error occurred in  the  format  specification.   See  also  The  Callback
               Function section above.

       GD_E_INTERNAL_ERROR
               An  internal error occurred in the library while trying to perform the task.  This
               indicates a bug in the  library.   Please  report  the  incident  to  the  GetData
               developers.

       GD_E_LINE_TOO_LONG
               The  parser encountered a line in the format specification longer than it was able
               to deal with.  Lines are limited by  the  storage  size  of  ssize_t.   On  32-bit
               systems,  this  limits  format  specification  lines to 2**31 bytes.  The limit is
               larger on 64-bit systems.

       GD_E_OPEN
               The dirfile format specification could not be  opened,  or  dirfilename  does  not
               specify a valid dirfile.

       GD_E_OPEN_FRAGMENT
               A file specified in an /INCLUDE directive could not be opened.

       GD_E_TRUNC
               The library was unable to truncate the dirfile.

       The dirfile error may be retrieved by calling gd_error(3).  A descriptive error string for
       the last error encountered can be  obtained  from  a  call  to  gd_error_string(3).   When
       finished  with  it, a caller should de-allocate the DIRFILE object by calling gd_close(3),
       or gd_discard(3), even if the open failed.

BUGS

       When working with dirfiles conforming to Standards Versions  4  and  earlier  (before  the
       introduction  of  the  ENDIAN directive), GetData assumes the dirfile has native byte sex,
       even though, officially, these early Standards stipulated data to be little-endian.   This
       is  necessary  since,  in  the  absense  of  an  explicit  /VERSION directive, it is often
       impossible to determine the intended Standards Version  of  a  dirfile,  and  the  current
       behaviour  is  to  assume native byte sex for modern dirfiles lacking /ENDIAN.  To read an
       old, little-ended dirfile on a big-ended platform, an /ENDIAN directive should be added to
       the format specification, or else GD_LITTLE_ENDIAN should be specified by the caller.

       GetData artifically limits the size of a CARRAY field to GD_MAX_CARRAY_LENGTH elements, to
       be certain it is always able to write the CARRAY back  to  disk  without  overrunning  its
       maximum  line  length.  On 32-bit systems, GD_MAX_CARRAY_LENGTH is 2**24.  It is larger on
       64-bit systems.  Excess elements are silently truncatd on dirfile open.

       GetData's  parser  assumes  it  is  running  on  an  ASCII-compatible  platform.    Format
       specification parsing will fail gloriously on an EBCDIC platform.

SEE ALSO

       dirfile(5),  dirfile-encoding(5), dirfile-format(5), gd_close(3), gd_dirfile_standards(3),
       gd_discard(3),    gd_error(3),    gd_error_string(3),    gd_getdata(3),     gd_include(3),
       gd_parser_callback(3)