Provided by: libgetdata-doc_0.11.0-14_all bug

NAME

       gd_open, gd_cbopen — open or create a Dirfile

SYNOPSIS

       #include <getdata.h>

       DIRFILE* gd_open(const char *dirfilename, unsigned long flags);

       DIRFILE* gd_cbopen(const char *dirfilename, unsigned long flags, gd_parser_callback_t
              sehandler, void *extra);

DESCRIPTION

       The gd_cbopen() function opens or creates the dirfile specified by dirfilename,  returning
       a DIRFILE object associated with it.  Opening a dirfile will cause the library to read and
       parse the dirfile's format specification (see dirfile-format(5)).

       If not NULL, sehandler should be a pointer to a function which will be called  whenever  a
       syntax  error  is  encountered  during parsing the format specification.  Specify NULL for
       this parameter if no callback function is to be used.  The caller may use this function to
       correct  the  error  or modify the error handling of the format specification parser.  See
       The Callback Function section below for details on  this  function.   The  extra  argument
       allows  the  caller  to pass data to the callback function.  The pointer will be passed to
       the callback function verbatim.

       The gd_open() function is equivalent to gd_cbopen(), with sehandler and extra set to NULL.

       The flags argument should include one  of  the  access  modes:  GD_RDONLY  (read-only)  or
       GD_RDWR  (read-write),  and may also contain zero or more of the following flags, bitwise-
       or'd together:

       GD_ARM_ENDIAN
       GD_NOT_ARM_ENDIAN
               Specifies that double precision floating point raw data on disk are, or  are  not,
               stored in the middle-endian format used by older ARM processors.

               These flag only set the default endianness, and will be overridden when an /ENDIAN
               directive specifies the byte sex of RAW fields,  unless  GD_FORCE_ENDIAN  is  also
               specified.

               On  every  platform, one of these flags (GD_NOT_ARM_ENDIAN on all but middle-ended
               ARM systems) indicates the native behaviour of the  platform.   That  symbol  will
               equal zero, and may be omitted.

       GD_BIG_ENDIAN
       GD_LITTLE_ENDIAN
               Specifies  the default byte sex of raw data stored on disk to be either big-endian
               (most significant byte first) or little-endian  (least  significant  byte  first).
               Omitting  both  flags indicates the default should be the native endianness of the
               platform.

               Unlike the ARM endianness flags above, neither of  these  symbols  is  ever  zero.
               Specifying  both  these  flags  together will cause the library to assume that the
               endianness of the data is opposite to that of the  native  architecture,  whatever
               that might be.

               These flag only set the default endianness, and will be overridden when an /ENDIAN
               directive specifies the byte sex of RAW fields,  unless  GD_FORCE_ENDIAN  is  also
               specified.

       GD_CREAT
               An empty dirfile will be created, if one does not already exist.  This will create
               both the dirfile directory and an empty format specification file  called  format.
               If the call creates a dirfile, then the specified access mode is ignored: a newly-
               created DIRFILE is always opened with access mode GD_RDWR, even if  GD_RDONLY  had
               been specified.

               The  directory will have have mode S_IRWXU | S_IRWXG | S_IRWXO (0777), modified by
               the caller's umask value (see umask(2)).  The format file will have mode S_IRUSR |
               S_IWUSR  |  S_IRGRP  |  S_IWGRP  |  S_IROTH | S_IWOTH (0666), also modified by the
               caller's umask.  The owner of the dirfile directory and format file  will  be  the
               effective  user  ID  of the caller.  Group ownership follows the rules outlined in
               mkdir(2).

       GD_EXCL Ensure that this call creates a dirfile: when specified along with  GD_CREAT,  the
               call  will  fail  if  the  dirfile  specified  by  dirfilename already exists.  If
               GD_CREAT is not specified, this flag is ignored.  This flag suffers from  all  the
               limitations of the O_EXCL flag as indicated in open(2).

       GD_FORCE_ENCODING
               Specifies  that  /ENCODING directives (see dirfile-format(5)) found in the dirfile
               format specification should be ignored.  The encoding scheme  specified  in  flags
               will be used instead (see below).

       GD_FORCE_ENDIAN
               Specifies  that  /ENDIAN  directives  (see dirfile-format(5)) found in the dirfile
               format specification should be ignored.  All raw data will be assumed to have  the
               byte  sex  indicated  through  the  presence  or  absence  of  the  GD_ARM_ENDIAN,
               GD_BIG_ENDIAN, GD_LITTLE_ENDIAN, and GD_NOT_ARM_ENDIAN flags.

       GD_IGNORE_DUPS
               If the dirfile format metadata specifies more than one field with the  same  name,
               all  but  one  of  them will be ignored by the parser.  Without this flag, parsing
               would fail with the GD_E_FORMAT error, possibly resulting  in  invocation  of  the
               registered  callback  function.   Which  of  the  duplicate  fields is kept is not
               specified.  As a result, this flag is typically only  useful  in  the  case  where
               identical copies of a field specification line are present.

               No  indication  is  provided  to  indicate  whether  a  duplicate  field  has been
               discarded.  If finer  grained  control  is  required,  the  caller  should  handle
               GD_E_FORMAT_DUPLICATE suberrors itself with an appropriate callback function.

       GD_PEDANTIC
               Reject  dirfiles  which don't conform to the Dirfile Standards.  See the Standards
               Compliance section below for full details.

       GD_PERMISSIVE
               Allow non-compliant format specification syntax, even  when  given  along  with  a
               conflicting  /VERSION  directive.   See the Standards Compliance section below for
               full details.

       GD_PRETTY_PRINT
               When dirfile metadata are flushed to disk (either explicitly via  gd_metaflush(3),
               gd_rewrite_fragment(3),  or  gd_flush(3) or implicitly by closing the dirfile), an
               attempt will be made to create a nicer looking format specification (from a human-
               readable  standpoint).  What this explicitly means is not part of the API, and any
               particular behaviour should not be relied on.  If the dirfile is opened read-only,
               this flag is ignored.

       GD_TRUNC
               If  dirfilename specifies an already existing dirfile, it will be truncated before
               opening.  Since gd_cbopen() decides  whether  dirfilename  specifies  an  existing
               dirfile  before  attempting  to  parse  the  dirfile, dirfilename is considered to
               specify an existing dirfile if it refers to a directory containing a regular  file
               called format, regardless of the content or form of that file.

               Truncation  occurs  by  deleting  every  regular file and symlink in the specified
               directory, whether the files were referred to by the dirfile before truncation  or
               not.   Accordingly,  this flag should be used with caution.  Unless GD_TRUNCSUB is
               also specified, subdirectories are left untouched.  Notably, this  operation  does
               not  consider  directories  used  in /INCLUDE directives.  If the dirfile does not
               exist, this flag is ignored.

       GD_TRUNCSUB
               If specified along with GD_TRUNC, truncation  will  descend  into  subdirectories,
               deleting  all  regular  files  and symlinks recursively.  It does not descend into
               directories pointed to by symbolic links: in these cases, just the symlink  itself
               is deleted.  If specified without an accompanying GD_TRUNC, this flag is ignored.

       GD_VERBOSE
               Specifies  that whenever an error is triggered by the library when working on this
               dirfile, the corresponding  error  string,  which  can  be  retrieved  by  calling
               gd_error_string(3),  should  be  written  on  the  caller's  standard error stream
               (stderr(3)) by GetData.  The error string may be prefixed by a string specified by
               the  caller;  see gd_verbose_prefix(3).  Without this flag, GetData writes nothing
               to standard error.  (GetData never writes to standard output.)

       Those flags which affect the operation of the library  beyond  this  call  itself  may  be
       modified later using the gd_flags(3) function.

       The  flags  argument may also be bitwise or'd with one of the following symbols indicating
       the default encoding scheme of the dirfile.  Like the  endianness  flags,  the  choice  of
       encoding  here  is  ignored  if  the  encoding  is specified in the dirfile itself, unless
       GD_FORCE_ENCODED is also specified.  If none of these symbols is present,  GD_AUTO_ENCODED
       is  assumed, unless the gd_cbopen() call results in creation or truncation of the dirfile.
       In that case, GD_UNENCODED is assumed.  See dirfile-encoding(5)  for  details  on  dirfile
       encoding schemes.

       GD_AUTO_ENCODED
               Specifies  that  the encoding type is not known in advance, but should be detected
               by the GetData library.  Detection is accomplished by searching for raw data files
               with extensions appropriate to the encoding scheme.  This method will notably fail
               if the the library is called via putdata(3) to create  a  previously  non-existent
               raw  field unless a read is first successfully performed on the dirfile.  Once the
               library has determined the encoding scheme for the first time, it remembers it for
               subsequent calls.

       GD_BZIP2_ENCODED
               Specifies  that  raw  data  files  are  compressed using the Burrows-Wheeler block
               sorting text compression algorithm and Huffman coding, as implemented in the bzip2
               format.

       GD_FLAC_ENCODED
               Specifies  that  raw data files are compressed using the Free Lossless Audio Coded
               (FLAC).

       GD_GZIP_ENCODED
               Specifies that raw data files are compressed using  Lempel-Ziv  coding  (LZ77)  as
               implemented in the gzip format.

       GD_LZMA_ENCODED
               Specifies  that  raw  data  files are compressed using the Lempel-Ziv Markov Chain
               Algorithm (LZMA) as implemented in the xz container format.

       GD_SLIM_ENCODED
               Specifies that raw data files are compressed using the slimlib library.

       GD_SIE_ENCODED
               Specified that raw data files are  sample-index  encoded,  similar  to  run-length
               encoding, suitable for data that change rarely.

       GD_TEXT_ENCODED
               Specifies that raw data files are encoded as text files containing one data sample
               per line.

       GD_UNENCODED
               Specifies that raw data files are not encoded, but written as simply  binary  data
               to disk.

       GD_ZZIP_ENCODED
               Specifies that raw data files are compressed using the DEFLATE algorithm.  All raw
               data files for a given fragment are collected  together  and  stored  in  a  PKZIP
               archive called raw.zip.

       GD_ZZSLIM_ENCODED
               Specifies  that  raw data files are compressed using a combinations of compression
               schemes: first files are slim-compressed, as with the GD_SLIM_ENCODED scheme,  and
               then  they  are  collected  together  and  compressed (again) into a PKZIP archive
               called raw.zip, as in the GD_ZZIP_ENCODED scheme.

   Standards Compliance
       The latest Dirfile Standards Version which this release of GetData understands is provided
       in  the  preprocessor macro GD_DIRFILE_STANDARDS_VERSION defined in getdata.h.  GetData is
       able to open and parse any dirfile which conforms to this Standards  Version,  or  to  any
       earlier  Version.   The  dirfile-format(5) manual page lists the changes between Standards
       Versions.

       The GetData parser can operate in two  modes:  a  permissive  mode,  in  which  much  non-
       Standards-compliant  syntax  is  allowed, and a pedantic mode, in which the parser adheres
       strictly to the Standards.  The mode made change during the  parsing  of  a  dirfile.   If
       GD_PEDANTIC   is  passed  to  gd_cbopen(),  the  parser  will  start  parsing  the  format
       specification in pedantic mode, otherwise it will start in permissive mode.

       Permissive mode is provided primarily to allow  GetData  to  be  used  on  dirfiles  which
       conform  to  no single Standard, but which were accepted by the GetData parser in previous
       versions.  It is notably lax regarding reserved field names, and  field  name  characters,
       the  mixing  of  old  and  new data type specifiers, and generally ignores the presence of
       /VERSION directives.  In read-write mode, permissive mode should be used with caution,  as
       it  can  cause unintentional corruption of dirfile metadata on write, if the heuristics in
       the parser incorrectly guessed the intention of non-compliant syntax.  In permissive mode,
       actual syntax errors are still reported as such.

       In  pedantic  mode,  the  parser  conforms  to one specific Standards Version. This target
       version may change any number  of  times  in  the  course  of  scanning  a  single  format
       specification.   If  invoked using the GD_PEDANTIC flag, the parser will start in pedantic
       mode with a target version equal to  GD_DIRFILE_STANDARDS_VERSION.   Whenever  a  /VERSION
       directive is encountered in the format specification, the target version is changed to the
       Standards Version specified.  When encountering a /VERSION directive in  permissive  mode,
       the  parser  will  switch  to  pedantic  mode, unless the GD_PERMISSIVE flag was passed to
       gd_cbopen(), in which case no mode switch will take place.

       Independent of the mode of the parser when parsing the format specification, GetData  will
       calculate  a  list  of  Standards  Versions  to which the parsed metadata conform to.  The
       gd_dirfile_standards(3) function can  provide  this  information,  and  also  specify  the
       desired Standards Version for writing format metadata back to disk.

   The Callback Function
       The  caller-supplied sehandler function is called whenever the format specification parser
       encounters a syntax error (i.e.  whenever it would return the  GD_E_FORMAT  error).   This
       callback may be used to correct the error, or to tell the parser how to recover from it.

       This function should take two pointers as arguments, and return an int:

              int sehandler(gd_parser_data_t *pdata, void *extra);

       The  extra  parameter  is  the  pointer  supplied  to gd_cbopen(), passed verbatim to this
       function.  It can be used to pass caller data to the callback.  GetData does  not  inspect
       this  pointer, not even to check its validity.  If the caller needs to pass no data to the
       callback, it may be NULL.

       The gd_parser_data_t type is a structure with at least the following members:

           typedef struct {
             const DIRFILE* dirfile;
             int suberror;
             int linenum;
             const char* filename;
             char* line;
             size_t buflen;

             ...
           } gd_parser_data_t;

       The pdata->dirfile member will be a pointer to a DIRFILE object suitable only for  passing
       to gd_error_string().  Notably, the caller should not assume this pointer will be the same
       as the pointer eventually returned by gd_cbopen(), nor that it will  be  valid  after  the
       callback function returns.

       The  pdata->suberror parameter will be one of the following symbols indicating the type of
       syntax error encountered:

       GD_E_FORMAT_ALIAS
               The parent specified for a meta field was an alias.

       GD_E_FORMAT_BAD_LINE
               The line was indecipherable.  Typically this means that the line contained neither
               a reserved word, nor a field type.

       GD_E_FORMAT_BAD_NAME
               The specified field name was invalid.

       GD_E_FORMAT_BAD_SPF
               The samples-per-frame of a RAW field was out-of-range.

       GD_E_FORMAT_BAD_TYPE
               The data type of a RAW field was unrecognised.

       GD_E_FORMAT_BITNUM
               The first bit of a BIT field was out-of-range.

       GD_E_FORMAT_BITSIZE
               The last bit of a BIT field was out-of-range.

       GD_E_FORMAT_CHARACTER
               An  invalid  character  was  found in the line, or a character escape sequence was
               malformed.

       GD_E_FORMAT_DUPLICATE
               The specified field name already exists.

       GD_E_FORMAT_ENDIAN
               The byte sex specified by an /ENDIAN directive was unrecognised.

       GD_E_FORMAT_LITERAL
               An unexpected character was encountered in a complex literal.

       GD_E_FORMAT_LOCATION
               The parent of a metafield was defined in another fragment.

       GD_E_FORMAT_META_META
               An attempt was made to use a metafield as the parent to a new metafield.

       GD_E_FORMAT_METARAW
               An attempt was made to add a RAW metafield.

       GD_E_FORMAT_MPLEXVAL
               A MPLEX specification has a negative period.

       GD_E_FORMAT_N_FIELDS
               The number of fields of a LINCOM field was out-of-range.

       GD_E_FORMAT_N_TOK
               An insufficient number of tokens was found on the line.

       GD_E_FORMAT_NO_FIELD
               The parent of a metafield was not found.

       GD_E_FORMAT_NUMBITS
               The number of bits of a BIT field was out-of-range.

       GD_E_FORMAT_PROTECT
               The protection level specified by a /PROTECT directive was unrecognised.

       GD_E_FORMAT_RES_NAME
               A field was specified with the reserved name INDEX  (or  with  the  reserved  name
               FILEFRAM in a dirfile conforming to Standards Version 5 or earlier).

       GD_E_FORMAT_UNTERM
               The last token of the line was unterminated.

       GD_E_FORMAT_WINDOP
               The operation in a WINDOW field was not recognised.

       pdata->filename  and pdata->linenum members contains the pathname of the fragment and line
       number where the syntax error was encountered.  The first line in a fragment is line one.

       The pdata->line member contains a copy of the line containing the syntax error.  This line
       may be freely modified by the callback function.  It will then be reparsed if the callback
       function returns the symbol GD_SYNTAX_RESCAN (see below).  The size of the memory  buffer,
       which  may  be greater than the length of the actual string, is provided in pdata->buflen,
       and space is available for at least GD_MAX_LINE_LENGTH bytes.

       If the callback function returns GD_SYNTAX_RESCAN, then a different buffer, which  may  be
       larger,  may  be  used to hold the new string, by assigning a pointer to the new buffer to
       pdata->line.  This buffer will be deallocated by  the  library  using  the  free  function
       specified  through  gd_alloc_funcs(3),  or else free(3) by default.  Do not deallocate the
       original buffer passed to the callback through pdata->line: it, too, will  be  deallocated
       by the library.

       The  callback  function should return one of the following symbols, which tells the parser
       how to subsequently handle the error:

       GD_SYNTAX_ABORT
               The parser should immediately abort parsing the format specification and fail with
               the  error GD_E_FORMAT.  This is the default behaviour, if no callback function is
               provided (or if the parser is invoked by calling gd_open()).

       GD_SYNTAX_CONTINUE
               The parser should  continue  parsing  the  format  specification.   However,  once
               parsing  has finished, the parser will fail with the error GD_E_FORMAT, even if no
               further syntax errors are encountered.  This behaviour may be used by  the  caller
               to  identify  all  lines  containing  syntax  errors  in the format specification,
               instead of just the first one.

       GD_SYNTAX_IGNORE
               The parser should ignore the line containing  the  syntax  error  completely,  and
               carry  on parsing the format specification.  If no further errors are encountered,
               the dirfile will be successfully opened.

       GD_SYNTAX_RESCAN
               The parser should  rescan  the  line  argument,  which  replaces  the  line  which
               originally contained the syntax error.  The line is assumed to have been corrected
               by the callback function.  If the line still contains a syntax error, the callback
               function will be called again.

               Note:  the  line  is  not  corrected on disk; however, the caller may subsequently
               correct the fragment on disk by calling gd_rewrite_fragment(3).

       The callback function handles only syntax errors.  The parser may still abort early, if  a
       different  kind of library error is encountered.  Furthermore, although a line may contain
       more than one syntax error, the parser will only ever report one syntax  error  per  line,
       even if the callback function returns GD_SYNTAX_CONTINUE.

RETURN VALUE

       A  call  to gd_cbopen() or gd_open() always returns a pointer to a newly allocated DIRFILE
       object, except in instances when it is unable to allocate memory for  the  DIRFILE  object
       itself,  in  which  case  it  will return NULL.  The DIRFILE object is an opaque structure
       containing the parsed dirfile metadata.

       If an error occurred, these functions will store  a  negative-valued  error  code  in  the
       returned  DIRFILE,  which  may be retrieved by a subsequent call to gd_error(3).  Possible
       error codes are:

       GD_E_ACCMODE
               The library was asked to truncate a dirfile opened read-only (i.e.   GD_TRUNC  was
               specified in flags along with GD_RDONLY).

       GD_E_ALLOC
               The library was unable to allocate memory.

       GD_E_BAD_REFERENCE
               The   reference   field   specified  by  a  /REFERENCE  directive  in  the  format
               specification (see dirfile-format(5)) was not found, or was not a RAW field.

       GD_E_CALLBACK
               The registered callback function, sehandler, returned an unrecognised response.

       GD_E_CREAT
               The library was unable to create the dirfile.

       GD_E_EXISTS
               The dirfile already exists and both GD_CREAT and GD_EXCL were specified.

       GD_E_FORMAT
               A syntax error occurred in  the  format  specification.   See  also  The  Callback
               Function section above.

       GD_E_IO The  dirfile  format file, or another file that it includes, could not be read, or
               dirfilename does not specify a valid dirfile.

       GD_E_LINE_TOO_LONG
               The parser encountered a line in the format specification longer than it was  able
               to  deal  with.   Lines  are  limited  by  the storage size of ssize_t.  On 32-bit
               systems, this limits format specification lines to  2**31  bytes.   The  limit  is
               larger on 64-bit systems.

       A  DIRFILE  which  is  returned  from  a  failed  open is flagged as invalid, meaning most
       functions it is passed to will faill with the error GD_E_BAD_DIRFILE.  A descriptive error
       string for the error may be obtained by calling gd_error_string(3).

       When  no  longer  needed,  the  caller  should  de-allocate any returned DIRFILE object by
       calling gd_close(3), or gd_discard(3), even if the open failed.

BUGS

       When working with dirfiles conforming to Standards Versions  4  and  earlier  (before  the
       introduction  of  the /ENDIAN directive), GetData assumes the dirfile has native byte sex,
       even though, officially, these early Standards stipulated data to be little-endian.   This
       is  necessary  since,  in  the  absence  of  an  explicit  /VERSION directive, it is often
       impossible to determine the intended Standards Version  of  a  dirfile,  and  the  current
       behaviour  is  to  assume native byte sex for modern dirfiles lacking /ENDIAN.  To read an
       old, little-ended dirfile on a big-ended platform, an /ENDIAN directive should be added to
       the format specification, or else GD_LITTLE_ENDIAN should be specified by the caller.

       GetData's   parser  assumes  it  is  running  on  an  ASCII-compatible  platform.   Format
       specification parsing will fail gloriously on an EBCDIC platform.

HISTORY

       The dirfile_open() function appeared in GetData-0.3.0.   The  only  supported  flags  were
       GD_BIG_ENDIAN,   GD_CREAT,   GD_EXCL,   GD_FORCE_ENDIAN,   GD_LITTLE_ENDIAN,  GD_PEDANTIC,
       GD_RDONLY, GD_RDWR, and GD_TRUNC.

       The GD_AUTO_ENCODED, GD_FORCE_ENCODING, GD_SLIM_ENCODED, GD_TEXT_ENCODED, GD_UNECODED, and
       GD_VERBOSE flags appeared in GetData-0.4.0.

       The   dirfile_cbopen()   function   and   the   GD_BZIP2_ENCODED,   GD_GZIP_ENCODED,   and
       GD_IGNORE_DUPS flags appeared in GetData-0.5.0.

       The GD_PRETTY_PRINT and GD_LZMA_ENCODED flags appeared in GetData-0.6.0.

       In  GetData-0.7.0  these  functions  were  renamed  to  gd_open()  and  gd_cbopen().   The
       GD_ARM_ENDIAN, GD_NOT_ARM_ENDIAN, and GD_PERMISSIVE flags also appeared in this release.

       The  GD_SIE_ENCODED, GD_TRUNCSUB, GD_ZZIP_ENCODED, and GD_ZZSLIM_ENCODED flags appeared in
       GetData-0.8.0.

       The GD_FLAC_ENCODED flag appeared in GetData-0.9.0.

SEE ALSO

       gd_alloc_funcs(3),  gd_close(3),  gd_dirfile_standards(3),   gd_discard(3),   gd_error(3),
       gd_error_string(3),   gd_flags(3),  gd_getdata(3),  gd_include(3),  gd_parser_callback(3),
       gd_verbose_prefix(3), dirfile(5), dirfile-encoding(5), dirfile-format(5)