Provided by: libgetdata-doc_0.10.0-3build2_all bug

NAME

       gd_open, gd_cbopen — open or create a Dirfile

SYNOPSIS

       #include <getdata.h>

       DIRFILE* gd_open(const char *dirfilename, unsigned long flags);

       DIRFILE* gd_cbopen(const char *dirfilename, unsigned long flags, gd_parser_callback_t sehandler, void
              *extra);

DESCRIPTION

       The gd_cbopen() function opens or creates the dirfile  specified  by  dirfilename,  returning  a  DIRFILE
       object  associated  with  it.   Opening  a dirfile will cause the library to read and parse the dirfile's
       format specification (see dirfile-format(5)).

       If not NULL, sehandler should be a pointer to a function which will be called whenever a syntax error  is
       encountered  during  parsing  the  format  specification.  Specify NULL for this parameter if no callback
       function is to be used.  The caller may use this function to  correct  the  error  or  modify  the  error
       handling of the format specification parser.  See The Callback Function section below for details on this
       function.  The extra argument allows the caller to pass data to the callback function.  The pointer  will
       be passed to the callback function verbatim.

       The gd_open() function is equivalent to gd_cbopen(), with sehandler and extra set to NULL.

       The flags argument should include one of the access modes: GD_RDONLY (read-only) or GD_RDWR (read-write),
       and may also contain zero or more of the following flags, bitwise-or'd together:

       GD_ARM_ENDIAN
       GD_NOT_ARM_ENDIAN
               Specifies that double precision floating point raw data on disk are, or are not,  stored  in  the
               middle-endian format used by older ARM processors.

               These  flag  only  set  the  default endianness, and will be overridden when an /ENDIAN directive
               specifies the byte sex of RAW fields, unless GD_FORCE_ENDIAN is also specified.

               On every platform, one of these flags (GD_NOT_ARM_ENDIAN on all  but  middle-ended  ARM  systems)
               indicates the native behaviour of the platform.  That symbol will equal zero, and may be omitted.

       GD_BIG_ENDIAN
       GD_LITTLE_ENDIAN
               Specifies  the  default  byte  sex  of  raw  data  stored  on  disk to be either big-endian (most
               significant byte first) or little-endian (least significant byte  first).   Omitting  both  flags
               indicates the default should be the native endianness of the platform.

               Unlike  the  ARM  endianness flags above, neither of these symbols is ever zero.  Specifying both
               these flags together will cause the library to assume that the endianness of the data is opposite
               to that of the native architecture, whatever that might be.

               These  flag  only  set  the  default endianness, and will be overridden when an /ENDIAN directive
               specifies the byte sex of RAW fields, unless GD_FORCE_ENDIAN is also specified.

       GD_CREAT
               An empty dirfile will be created, if one does not already  exist.   This  will  create  both  the
               dirfile  directory  and  an empty format specification file called format.  If the call creates a
               dirfile, then the specified access mode is ignored: a newly-created DIRFILE is always opened with
               access mode GD_RDWR, even if GD_RDONLY had been specified.

               The  directory  will  have have mode S_IRWXU | S_IRWXG | S_IRWXO (0777), modified by the caller's
               umask value (see umask(2)).  The format file will have mode S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP
               |  S_IROTH  |  S_IWOTH  (0666),  also  modified  by the caller's umask.  The owner of the dirfile
               directory and format file will be the effective user ID of the caller.  Group  ownership  follows
               the rules outlined in mkdir(2).

       GD_EXCL Ensure  that  this call creates a dirfile: when specified along with GD_CREAT, the call will fail
               if the dirfile specified by dirfilename already exists.  If GD_CREAT is not specified, this  flag
               is  ignored.   This  flag  suffers  from  all  the limitations of the O_EXCL flag as indicated in
               open(2).

       GD_FORCE_ENCODING
               Specifies  that  /ENCODING  directives  (see  dirfile-format(5))  found  in  the  dirfile  format
               specification  should  be  ignored.   The encoding scheme specified in flags will be used instead
               (see below).

       GD_FORCE_ENDIAN
               Specifies  that  /ENDIAN  directives  (see  dirfile-format(5))  found  in  the   dirfile   format
               specification  should  be  ignored.   All raw data will be assumed to have the byte sex indicated
               through the presence or  absence  of  the  GD_ARM_ENDIAN,  GD_BIG_ENDIAN,  GD_LITTLE_ENDIAN,  and
               GD_NOT_ARM_ENDIAN flags.

       GD_IGNORE_DUPS
               If  the  dirfile format metadata specifies more than one field with the same name, all but one of
               them will be ignored by the parser.  Without this flag, parsing would fail with  the  GD_E_FORMAT
               error,  possibly  resulting  in  invocation  of  the  registered callback function.  Which of the
               duplicate fields is kept is not specified.  As a result, this flag is typically  only  useful  in
               the case where identical copies of a field specification line are present.

               No  indication  is  provided  to indicate whether a duplicate field has been discarded.  If finer
               grained control is required, the caller should handle GD_E_FORMAT_DUPLICATE suberrors itself with
               an appropriate callback function.

       GD_PEDANTIC
               Reject  dirfiles  which  don't  conform  to  the Dirfile Standards.  See the Standards Compliance
               section below for full details.

       GD_PERMISSIVE
               Allow non-compliant format specification  syntax,  even  when  given  along  with  a  conflicting
               /VERSION directive.  See the Standards Compliance section below for full details.

       GD_PRETTY_PRINT
               When   dirfile   metadata   are   flushed   to   disk  (either  explicitly  via  gd_metaflush(3),
               gd_rewrite_fragment(3), or gd_flush(3) or implicitly by closing the dirfile), an attempt will  be
               made  to  create  a  nicer looking format specification (from a human-readable standpoint).  What
               this explicitly means is not part of the API, and any particular behaviour should not  be  relied
               on.  If the dirfile is opened read-only, this flag is ignored.

       GD_TRUNC
               If dirfilename specifies an already existing dirfile, it will be truncated before opening.  Since
               gd_cbopen() decides whether dirfilename specifies an existing dirfile before attempting to  parse
               the dirfile, dirfilename is considered to specify an existing dirfile if it refers to a directory
               containing a regular file called format, regardless of the content or form of that file.

               Truncation occurs by deleting every regular file and symlink in the specified directory,  whether
               the  files  were  referred  to  by  the dirfile before truncation or not.  Accordingly, this flag
               should be used with caution.  Unless GD_TRUNCSUB  is  also  specified,  subdirectories  are  left
               untouched.   Notably,  this  operation does not consider directories used in /INCLUDE directives.
               If the dirfile does not exist, this flag is ignored.

       GD_TRUNCSUB
               If specified along with GD_TRUNC, truncation  will  descend  into  subdirectories,  deleting  all
               regular  files  and  symlinks  recursively.   It  does not descend into directories pointed to by
               symbolic links: in these cases, just the symlink itself is  deleted.   If  specified  without  an
               accompanying GD_TRUNC, this flag is ignored.

       GD_VERBOSE
               Specifies  that  whenever  an error is triggered by the library when working on this dirfile, the
               corresponding error string, which can be  retrieved  by  calling  gd_error_string(3),  should  be
               written  on  the  caller's standard error stream (stderr(3)) by GetData.  The error string may be
               prefixed by a string specified by the  caller;  see  gd_verbose_prefix(3).   Without  this  flag,
               GetData writes nothing to standard error.  (GetData never writes to standard output.)

       Those flags which affect the operation of the library beyond this call itself may be modified later using
       the gd_flags(3) function.

       The flags argument may also be bitwise or'd with one of the  following  symbols  indicating  the  default
       encoding scheme of the dirfile.  Like the endianness flags, the choice of encoding here is ignored if the
       encoding is specified in the dirfile itself, unless GD_FORCE_ENCODED is also specified.  If none of these
       symbols  is  present,  GD_AUTO_ENCODED  is  assumed,  unless  the gd_cbopen() call results in creation or
       truncation of the dirfile.  In that case, GD_UNENCODED is assumed.  See dirfile-encoding(5)  for  details
       on dirfile encoding schemes.

       GD_AUTO_ENCODED
               Specifies  that  the encoding type is not known in advance, but should be detected by the GetData
               library.  Detection is accomplished by searching for raw data files with  extensions  appropriate
               to  the  encoding  scheme.   This  method  will  notably  fail  if  the the library is called via
               putdata(3) to create a previously non-existent raw field unless  a  read  is  first  successfully
               performed  on  the  dirfile.   Once  the library has determined the encoding scheme for the first
               time, it remembers it for subsequent calls.

       GD_BZIP2_ENCODED
               Specifies that raw data files  are  compressed  using  the  Burrows-Wheeler  block  sorting  text
               compression algorithm and Huffman coding, as implemented in the bzip2 format.

       GD_FLAC_ENCODED
               Specifies that raw data files are compressed using the Free Lossless Audio Coded (FLAC).

       GD_GZIP_ENCODED
               Specifies that raw data files are compressed using Lempel-Ziv coding (LZ77) as implemented in the
               gzip format.

       GD_LZMA_ENCODED
               Specifies that raw data files are compressed using the Lempel-Ziv Markov Chain  Algorithm  (LZMA)
               as implemented in the xz container format.

       GD_SLIM_ENCODED
               Specifies that raw data files are compressed using the slimlib library.

       GD_SIE_ENCODED
               Specified  that raw data files are sample-index encoded, similar to run-length encoding, suitable
               for data that change rarely.

       GD_TEXT_ENCODED
               Specifies that raw data files are encoded as text files containing one data sample per line.

       GD_UNENCODED
               Specifies that raw data files are not encoded, but written as simply binary data to disk.

       GD_ZZIP_ENCODED
               Specifies that raw data files are compressed using the DEFLATE algorithm.  All raw data files for
               a given fragment are collected together and stored in a PKZIP archive called raw.zip.

       GD_ZZSLIM_ENCODED
               Specifies  that  raw data files are compressed using a combinations of compression schemes: first
               files are slim-compressed, as with the  GD_SLIM_ENCODED  scheme,  and  then  they  are  collected
               together  and  compressed  (again) into a PKZIP archive called raw.zip, as in the GD_ZZIP_ENCODED
               scheme.

   Standards Compliance
       The latest Dirfile Standards Version which this  release  of  GetData  understands  is  provided  in  the
       preprocessor  macro GD_DIRFILE_STANDARDS_VERSION defined in getdata.h.  GetData is able to open and parse
       any dirfile which conforms to this Standards Version, or to any earlier Version.   The  dirfile-format(5)
       manual page lists the changes between Standards Versions.

       The  GetData  parser  can  operate in two modes: a permissive mode, in which much non-Standards-compliant
       syntax is allowed, and a pedantic mode, in which the parser adheres strictly to the Standards.  The  mode
       made  change  during  the parsing of a dirfile.  If GD_PEDANTIC is passed to gd_cbopen(), the parser will
       start parsing the format specification in pedantic mode, otherwise it will start in permissive mode.

       Permissive mode is provided primarily to allow GetData to be used on dirfiles which conform to no  single
       Standard,  but  which  were  accepted  by  the  GetData  parser  in previous versions.  It is notably lax
       regarding reserved field names, and  field  name  characters,  the  mixing  of  old  and  new  data  type
       specifiers,  and  generally  ignores the presence of /VERSION directives.  In read-write mode, permissive
       mode should be used with caution, as it can cause unintentional corruption of dirfile metadata on  write,
       if the heuristics in the parser incorrectly guessed the intention of non-compliant syntax.  In permissive
       mode, actual syntax errors are still reported as such.

       In pedantic mode, the parser conforms to one specific Standards Version. This target version  may  change
       any  number  of  times  in  the  course  of scanning a single format specification.  If invoked using the
       GD_PEDANTIC  flag,  the  parser  will  start  in  pedantic  mode  with  a   target   version   equal   to
       GD_DIRFILE_STANDARDS_VERSION.   Whenever a /VERSION directive is encountered in the format specification,
       the target version is changed to the Standards Version specified.  When encountering a /VERSION directive
       in  permissive mode, the parser will switch to pedantic mode, unless the GD_PERMISSIVE flag was passed to
       gd_cbopen(), in which case no mode switch will take place.

       Independent of the mode of the parser when parsing the format specification,  GetData  will  calculate  a
       list of Standards Versions to which the parsed metadata conform to.  The gd_dirfile_standards(3) function
       can provide this information, and also specify the desired Standards Version for writing format  metadata
       back to disk.

   The Callback Function
       The  caller-supplied  sehandler  function is called whenever the format specification parser encounters a
       syntax error (i.e.  whenever it would return the GD_E_FORMAT  error).   This  callback  may  be  used  to
       correct the error, or to tell the parser how to recover from it.

       This function should take two pointers as arguments, and return an int:

              int sehandler(gd_parser_data_t *pdata, void *extra);

       The  extra parameter is the pointer supplied to gd_cbopen(), passed verbatim to this function.  It can be
       used to pass caller data to the callback.  GetData does not inspect this pointer, not even to  check  its
       validity.  If the caller needs to pass no data to the callback, it may be NULL.

       The gd_parser_data_t type is a structure with at least the following members:

           typedef struct {
             const DIRFILE* dirfile;
             int suberror;
             int linenum;
             const char* filename;
             char* line;
             size_t buflen;

             ...
           } gd_parser_data_t;

       The  pdata->dirfile  member  will  be  a  pointer  to  a  DIRFILE  object  suitable  only  for passing to
       gd_error_string().  Notably, the caller should not assume this pointer will be the same  as  the  pointer
       eventually returned by gd_cbopen(), nor that it will be valid after the callback function returns.

       The  pdata->suberror  parameter  will be one of the following symbols indicating the type of syntax error
       encountered:

       GD_E_FORMAT_ALIAS
               The parent specified for a meta field was an alias.

       GD_E_FORMAT_BAD_LINE
               The line was indecipherable.  Typically this means that the line  contained  neither  a  reserved
               word, nor a field type.

       GD_E_FORMAT_BAD_NAME
               The specified field name was invalid.

       GD_E_FORMAT_BAD_SPF
               The samples-per-frame of a RAW field was out-of-range.

       GD_E_FORMAT_BAD_TYPE
               The data type of a RAW field was unrecognised.

       GD_E_FORMAT_BITNUM
               The first bit of a BIT field was out-of-range.

       GD_E_FORMAT_BITSIZE
               The last bit of a BIT field was out-of-range.

       GD_E_FORMAT_CHARACTER
               An invalid character was found in the line, or a character escape sequence was malformed.

       GD_E_FORMAT_DUPLICATE
               The specified field name already exists.

       GD_E_FORMAT_ENDIAN
               The byte sex specified by an /ENDIAN directive was unrecognised.

       GD_E_FORMAT_LITERAL
               An unexpected character was encountered in a complex literal.

       GD_E_FORMAT_LOCATION
               The parent of a metafield was defined in another fragment.

       GD_E_FORMAT_META_META
               An attempt was made to use a metafield as the parent to a new metafield.

       GD_E_FORMAT_METARAW
               An attempt was made to add a RAW metafield.

       GD_E_FORMAT_MPLEXVAL
               A MPLEX specification has a negative period.

       GD_E_FORMAT_N_FIELDS
               The number of fields of a LINCOM field was out-of-range.

       GD_E_FORMAT_N_TOK
               An insufficient number of tokens was found on the line.

       GD_E_FORMAT_NO_FIELD
               The parent of a metafield was not found.

       GD_E_FORMAT_NUMBITS
               The number of bits of a BIT field was out-of-range.

       GD_E_FORMAT_PROTECT
               The protection level specified by a /PROTECT directive was unrecognised.

       GD_E_FORMAT_RES_NAME
               A  field  was  specified  with  the  reserved name INDEX (or with the reserved name FILEFRAM in a
               dirfile conforming to Standards Version 5 or earlier).

       GD_E_FORMAT_UNTERM
               The last token of the line was unterminated.

       GD_E_FORMAT_WINDOP
               The operation in a WINDOW field was not recognised.

       pdata->filename and pdata->linenum members contains the pathname of the fragment and  line  number  where
       the syntax error was encountered.  The first line in a fragment is line one.

       The  pdata->line member contains a copy of the line containing the syntax error.  This line may be freely
       modified by the callback function.  It will then be reparsed if the callback function returns the  symbol
       GD_SYNTAX_RESCAN (see below).  The size of the memory buffer, which may be greater than the length of the
       actual string, is provided in pdata->buflen, and space  is  available  for  at  least  GD_MAX_LINE_LENGTH
       bytes.

       If  the  callback function returns GD_SYNTAX_RESCAN, then a different buffer, which may be larger, may be
       used to hold the new string, by assigning a pointer to the new buffer to pdata->line.  This  buffer  will
       be  deallocated  by  the  library  using  the  free function specified through gd_alloc_funcs(3), or else
       free(3) by default.  Do not deallocate the original buffer passed to the  callback  through  pdata->line:
       it, too, will be deallocated by the library.

       The  callback  function  should  return  one  of  the  following  symbols,  which tells the parser how to
       subsequently handle the error:

       GD_SYNTAX_ABORT
               The parser should immediately abort parsing the format specification  and  fail  with  the  error
               GD_E_FORMAT.   This  is  the  default  behaviour,  if no callback function is provided (or if the
               parser is invoked by calling gd_open()).

       GD_SYNTAX_CONTINUE
               The parser should continue parsing the format specification.  However, once parsing has finished,
               the  parser  will  fail  with  the  error  GD_E_FORMAT,  even  if  no  further  syntax errors are
               encountered.  This behaviour may be used by the caller to identify all  lines  containing  syntax
               errors in the format specification, instead of just the first one.

       GD_SYNTAX_IGNORE
               The  parser  should  ignore the line containing the syntax error completely, and carry on parsing
               the format specification.  If no further errors are encountered, the dirfile will be successfully
               opened.

       GD_SYNTAX_RESCAN
               The  parser  should  rescan the line argument, which replaces the line which originally contained
               the syntax error.  The line is assumed to have been corrected by the callback function.   If  the
               line still contains a syntax error, the callback function will be called again.

               Note:  the  line  is  not  corrected  on  disk;  however, the caller may subsequently correct the
               fragment on disk by calling gd_rewrite_fragment(3).

       The callback function handles only syntax errors.  The parser may still abort early, if a different  kind
       of  library  error  is encountered.  Furthermore, although a line may contain more than one syntax error,
       the parser will only ever report one syntax error  per  line,  even  if  the  callback  function  returns
       GD_SYNTAX_CONTINUE.

RETURN VALUE

       A  call  to gd_cbopen() or gd_open() always returns a pointer to a newly allocated DIRFILE object, except
       in instances when it is unable to allocate memory for the DIRFILE object itself, in which  case  it  will
       return NULL.  The DIRFILE object is an opaque structure containing the parsed dirfile metadata.

       If  an  error  occurred, these functions will store a negative-valued error code in the returned DIRFILE,
       which may be retrieved by a subsequent call to gd_error(3).  Possible error codes are:

       GD_E_ACCMODE
               The library was asked to truncate a dirfile opened read-only (i.e.   GD_TRUNC  was  specified  in
               flags along with GD_RDONLY).

       GD_E_ALLOC
               The library was unable to allocate memory.

       GD_E_BAD_REFERENCE
               The reference field specified by a /REFERENCE directive in the format specification (see dirfile-
               format(5)) was not found, or was not a RAW field.

       GD_E_CALLBACK
               The registered callback function, sehandler, returned an unrecognised response.

       GD_E_CREAT
               The library was unable to create the dirfile.

       GD_E_EXISTS
               The dirfile already exists and both GD_CREAT and GD_EXCL were specified.

       GD_E_FORMAT
               A syntax error occurred in the format specification.  See  also  The  Callback  Function  section
               above.

       GD_E_IO The dirfile format file, or another file that it includes, could not be read, or dirfilename does
               not specify a valid dirfile.

       GD_E_LINE_TOO_LONG
               The parser encountered a line in the format specification longer than it was able to  deal  with.
               Lines  are  limited  by  the  storage  size  of  ssize_t.   On 32-bit systems, this limits format
               specification lines to 2**31 bytes.  The limit is larger on 64-bit systems.

       A DIRFILE which is returned from a failed open is flagged as invalid, meaning most functions it is passed
       to  will faill with the error GD_E_BAD_DIRFILE.  A descriptive error string for the error may be obtained
       by calling gd_error_string(3).

       When no longer needed, the caller should de-allocate any returned DIRFILE object by calling  gd_close(3),
       or gd_discard(3), even if the open failed.

BUGS

       When working with dirfiles conforming to Standards Versions 4 and earlier (before the introduction of the
       /ENDIAN directive), GetData assumes the dirfile has native byte sex, even though, officially, these early
       Standards  stipulated  data  to be little-endian.  This is necessary since, in the absence of an explicit
       /VERSION directive, it is often impossible to determine the intended Standards Version of a dirfile,  and
       the  current behaviour is to assume native byte sex for modern dirfiles lacking /ENDIAN.  To read an old,
       little-ended dirfile on a big-ended platform,  an  /ENDIAN  directive  should  be  added  to  the  format
       specification, or else GD_LITTLE_ENDIAN should be specified by the caller.

       GetData's  parser  assumes  it  is running on an ASCII-compatible platform.  Format specification parsing
       will fail gloriously on an EBCDIC platform.

HISTORY

       The dirfile_open() function appeared in GetData-0.3.0.  The  only  supported  flags  were  GD_BIG_ENDIAN,
       GD_CREAT, GD_EXCL, GD_FORCE_ENDIAN, GD_LITTLE_ENDIAN, GD_PEDANTIC, GD_RDONLY, GD_RDWR, and GD_TRUNC.

       The  GD_AUTO_ENCODED,  GD_FORCE_ENCODING,  GD_SLIM_ENCODED,  GD_TEXT_ENCODED, GD_UNECODED, and GD_VERBOSE
       flags appeared in GetData-0.4.0.

       The dirfile_cbopen()  function  and  the  GD_BZIP2_ENCODED,  GD_GZIP_ENCODED,  and  GD_IGNORE_DUPS  flags
       appeared in GetData-0.5.0.

       The GD_PRETTY_PRINT and GD_LZMA_ENCODED flags appeared in GetData-0.6.0.

       In  GetData-0.7.0  these  functions  were  renamed  to  gd_open()  and  gd_cbopen().   The GD_ARM_ENDIAN,
       GD_NOT_ARM_ENDIAN, and GD_PERMISSIVE flags also appeared in this release.

       The GD_SIE_ENCODED, GD_TRUNCSUB, GD_ZZIP_ENCODED, and GD_ZZSLIM_ENCODED flags appeared in GetData-0.8.0.

       The GD_FLAC_ENCODED flag appeared in GetData-0.9.0.

SEE ALSO

       gd_alloc_funcs(3), gd_close(3), gd_dirfile_standards(3), gd_discard(3), gd_error(3),  gd_error_string(3),
       gd_flags(3),   gd_getdata(3),  gd_include(3),  gd_parser_callback(3),  gd_verbose_prefix(3),  dirfile(5),
       dirfile-encoding(5), dirfile-format(5)