xenial (3) gd_cbopen.3.gz

Provided by: libgetdata-doc_0.9.0-2.2_all bug

NAME

       gd_cbopen, gd_open — open or create a dirfile

SYNOPSIS

       #include <getdata.h>

       DIRFILE* gd_cbopen(const char *dirfilename, unsigned long flags, gd_parser_callback_t sehandler, void
              *extra);

       DIRFILE* gd_open(const char *dirfilename, unsigned long flags);

DESCRIPTION

       The gd_cbopen() function opens or creates the dirfile  specified  by  dirfilename,  returning  a  DIRFILE
       object  associated  with  it.   Opening  a dirfile will cause the library to read and parse the dirfile's
       format specification (see dirfile-format(5)).

       If not NULL, sehandler should be a pointer to a function which will be called whenever a syntax error  is
       encountered  during  parsing  the  format  specification.  Specify NULL for this parameter if no callback
       function is to be used.  The caller may use this function to  correct  the  error  or  modify  the  error
       handling of the format specification parser.  See The Callback Function section below for details on this
       function.  The extra argument allows the caller to pass data to the callback function.  The pointer  will
       be passed to the callback function verbatim.

       The gd_open() function is equivalent to gd_cbopen(), with sehandler and extra set to NULL.

       The flags argument should include one of the access modes: GD_RDONLY (read-only) or GD_RDWR (read-write),
       and may also contain zero or more of the following flags, bitwise-or'd together:

       GD_ARM_ENDIAN
       GD_NOT_ARM_ENDIAN
              Specifies that double precision floating point raw data on disk are, or are  not,  stored  in  the
              middle-endian format used by older ARM processors.

              These  flag  only  set  the  default  endianness, and will be overridden when an /ENDIAN directive
              specifies the byte sex of RAW fields, unless GD_FORCE_ENDIAN is also specified.

              On every platform, one of these flags (GD_NOT_ARM_ENDIAN on  all  but  middle-ended  ARM  systems)
              indicates the native behaviour of the platform.  That symbol will equal zero, and may be omitted.

       GD_BIG_ENDIAN
       GD_LITTLE_ENDIAN
              Specifies  the  default  byte  sex  of  raw  data  stored  on  disk  to be either big-endian (most
              significant byte first) or little-endian (least significant  byte  first).   Omitting  both  flags
              indicates the default should be the native endianness of the platform.

              Unlike  the  ARM  endianness  flags above, neither of these symbols is ever zero.  Specifying both
              these flags together will cause the library to assume that the endianness of the data is  opposite
              to that of the native architecture, whatever that might be.

              These  flag  only  set  the  default  endianness, and will be overridden when an /ENDIAN directive
              specifies the byte sex of RAW fields, unless GD_FORCE_ENDIAN is also specified.

       GD_CREAT
              An empty dirfile will be created, if one does not  already  exist.   This  will  create  both  the
              dirfile  directory  and  an  empty format specification file called format.  If the call creates a
              dirfile, then the specified access mode is ignored: a newly-created DIRFILE is always opened  with
              access mode GD_RDWR, even if GD_RDONLY had been specified.

              The  directory  will  have  have mode S_IRWXU | S_IRWXG | S_IRWXO (0777), modified by the caller's
              umask value (see umask(2)).  The format file will have mode S_IRUSR | S_IWUSR | S_IRGRP |  S_IWGRP
              |  S_IROTH  |  S_IWOTH  (0666),  also  modified  by  the caller's umask.  The owner of the dirfile
              directory and format file will be the effective user ID of the caller.   Group  ownership  follows
              the rules outlined in mkdir(2).

       GD_EXCL
              Ensure that this call creates a dirfile: when specified along with GD_CREAT, the call will fail if
              the dirfile specified by dirfilename already exists.  If GD_CREAT is not specified, this  flag  is
              ignored.  This flag suffers from all the limitations of the O_EXCL flag as indicated in open(2).

       GD_FORCE_ENCODING
              Specifies   that  /ENCODING  directives  (see  dirfile-format(5))  found  in  the  dirfile  format
              specification should be ignored.  The encoding scheme specified in flags will be used instead (see
              below).

       GD_FORCE_ENDIAN
              Specifies   that   /ENDIAN   directives  (see  dirfile-format(5))  found  in  the  dirfile  format
              specification should be ignored.  All raw data will be assumed to  have  the  byte  sex  indicated
              through  the  presence  or  absence  of  the  GD_ARM_ENDIAN,  GD_BIG_ENDIAN, GD_LITTLE_ENDIAN, and
              GD_NOT_ARM_ENDIAN flags.

       GD_IGNORE_DUPS
              If the dirfile format metadata specifies more than one field with the same name, all  but  one  of
              them  will  be  ignored by the parser.  Without this flag, parsing would fail with the GD_E_FORMAT
              error, possibly resulting in invocation  of  the  registered  callback  function.   Which  of  the
              duplicate fields is kept is not specified.  As a result, this flag is typically only useful in the
              case where identical copies of a field specification line are present.

              No indication is provided to indicate whether a duplicate field  has  been  discarded.   If  finer
              grained  control is required, the caller should handle GD_E_FORMAT_DUPLICATE suberrors itself with
              an appropriate callback function.

       GD_PEDANTIC
              Reject dirfiles which don't conform to  the  Dirfile  Standards.   See  the  Standards  Compliance
              section below for full details.

       GD_PERMISSIVE
              Allow non-compliant format specification syntax, even when given along with a conflicting /VERSION
              directive.  See the Standards Compliance section below for full details.

       GD_PRETTY_PRINT
              When  dirfile  metadata  are   flushed   to   disk   (either   explicitly   via   gd_metaflush(3),
              gd_rewrite_fragment(3),  or  gd_flush(3) or implicitly by closing the dirfile), an attempt will be
              made to create a nicer looking format specification (from a human-readable standpoint).  What this
              explicitly means is not part of the API, and any particular behaviour should not be relied on.  If
              the dirfile is opened read-only, this flag is ignored.

       GD_TRUNC
              If dirfilename specifies an already existing dirfile, it will be truncated before opening.   Since
              gd_cbopen()  decides  whether dirfilename specifies an existing dirfile before attempting to parse
              the dirfile, dirfilename is considered to specify an existing dirfile if it refers to a  directory
              containing a regular file called format, regardless of the content or form of that file.

              Truncation  occurs  by deleting every regular file and symlink in the specified directory, whether
              the files were referred to by the dirfile before truncation or not.  Accordingly, this flag should
              be  used  with  caution.  Unless GD_TRUNCSUB is also specified, subdirectories are left untouched.
              Notably, this operation does not consider directories used in /INCLUDE directives.  If the dirfile
              does not exist, this flag is ignored.

       GD_TRUNCSUB
              If  specified  along  with  GD_TRUNC,  truncation  will  descend into subdirectories, deleting all
              regular files and symlinks recursively.  It does  not  descend  into  directories  pointed  to  by
              symbolic  links:  in  these  cases,  just  the symlink itself is deleted.  If specified without an
              accompanying GD_TRUNC, this flag is ignored.

       GD_VERBOSE
              Specifies that whenever an error is triggered by the library when working  on  this  dirfile,  the
              corresponding  error  string,  which  can  be  retrieved  by calling gd_error_string(3), should be
              written on the caller's standard error stream (stderr(3)) by GetData.  The  error  string  may  be
              prefixed  by  a  string  specified  by  the  caller; see gd_verbose_prefix(3).  Without this flag,
              GetData writes nothing to standard error.  (GetData never writes to standard output.)

       Those flags which affect the operation of the library beyond this call itself may be modified later using
       the gd_flags(3) function.

       The  flags  argument  may  also  be bitwise or'd with one of the following symbols indicating the default
       encoding scheme of the dirfile.  Like the endianness flags, the choice of encoding here is ignored if the
       encoding is specified in the dirfile itself, unless GD_FORCE_ENCODED is also specified.  If none of these
       symbols is present, GD_AUTO_ENCODED is assumed, unless  the  gd_cbopen()  call  results  in  creation  or
       truncation  of  the dirfile.  In that case, GD_UNENCODED is assumed.  See dirfile-encoding(5) for details
       on dirfile encoding schemes.

       GD_AUTO_ENCODED
              Specifies that the encoding type is not known in advance, but should be detected  by  the  GetData
              library.  Detection is accomplished by searching for raw data files with extensions appropriate to
              the encoding scheme.  This method will notably fail if the the library is called via putdata(3) to
              create  a  previously  non-existent raw field unless a read is first successfully performed on the
              dirfile.  Once the library has determined the encoding scheme for the first time, it remembers  it
              for subsequent calls.

       GD_BZIP2_ENCODED
              Specifies  that  raw  data  files  are  compressed  using  the  Burrows-Wheeler block sorting text
              compression algorithm and Huffman coding, as implemented in the bzip2 format.

       GD_GZIP_ENCODED
              Specifies that raw data files are compressed using Lempel-Ziv coding (LZ77) as implemented in  the
              gzip format.

       GD_LZMA_ENCODED
              Specifies that raw data files are compressed using the Lempel-Ziv Markov Chain Algorithm (LZMA) as
              implemented in the xz container format.

       GD_SLIM_ENCODED
              Specifies that raw data files are compressed using the slimlib library.

       GD_SIE_ENCODED
              Specified that raw data files are sample-index encoded, similar to run-length  encoding,  suitable
              for data that change rarely.

       GD_TEXT_ENCODED
              Specifies that raw data files are encoded as text files containing one data sample per line.

       GD_UNENCODED
              Specifies that raw data files are not encoded, but written as simply binary data to disk.

       GD_ZZIP_ENCODED
              Specifies  that raw data files are compressed using the DEFLATE algorithm.  All raw data files for
              a given fragment are collected together and stored in a PKZIP archive called raw.zip.

       GD_ZZSLIM_ENCODED
              Specifies that raw data files are compressed using a combinations of  compression  schemes:  first
              files  are  slim-compressed,  as  with  the  GD_SLIM_ENCODED  scheme,  and then they are collected
              together and compressed (again) into a PKZIP archive called raw.zip,  as  in  the  GD_ZZIP_ENCODED
              scheme.

   Standards Compliance
       The  latest  Dirfile  Standards  Version  which  this  release  of GetData understands is provided in the
       preprocessor macro GD_DIRFILE_STANDARDS_VERSION defined in getdata.h.  GetData is able to open and  parse
       any  dirfile  which conforms to this Standards Version, or to any earlier Version.  The dirfile-format(5)
       manual page lists the changes between Standards Versions.

       The GetData parser can operate in two modes: a permissive mode,  in  which  much  non-Standards-compliant
       syntax  is allowed, and a pedantic mode, in which the parser adheres strictly to the Standards.  The mode
       made change during the parsing of a dirfile.  If GD_PEDANTIC is passed to gd_cbopen(),  the  parser  will
       start parsing the format specification in pedantic mode, otherwise it will start in permissive mode.

       Permissive  mode is provided primarily to allow GetData to be used on dirfiles which conform to no single
       Standard, but which were accepted by the  GetData  parser  in  previous  versions.   It  is  notably  lax
       regarding  reserved  field  names,  and  field  name  characters,  the  mixing  of  old and new data type
       specifiers, and generally ignores the presence of /VERSION directives.  In  read-write  mode,  permissive
       mode  should be used with caution, as it can cause unintentional corruption of dirfile metadata on write,
       if the heuristics in the parser incorrectly guessed the intention of non-compliant syntax.  In permissive
       mode, actual syntax errors are still reported as such.

       In  pedantic  mode, the parser conforms to one specific Standards Version. This target version may change
       any number of times in the course of scanning a  single  format  specification.   If  invoked  using  the
       GD_PEDANTIC   flag,   the   parser   will  start  in  pedantic  mode  with  a  target  version  equal  to
       GD_DIRFILE_STANDARDS_VERSION.  Whenever a /VERSION directive is encountered in the format  specification,
       the target version is changed to the Standards Version specified.  When encountering a /VERSION directive
       in permissive mode, the parser will switch to pedantic mode, unless the GD_PERMISSIVE flag was passed  to
       gd_cbopen(), in which case no mode switch will take place.

       Independent  of  the  mode  of the parser when parsing the format specification, GetData will calculate a
       list of Standards Versions to which the parsed metadata conform to.  The gd_dirfile_standards(3) function
       can  provide this information, and also specify the desired Standards Version for writing format metadata
       back to disk.

   The Callback Function
       The caller-supplied sehandler function is called whenever the format specification  parser  encounters  a
       syntax  error  (i.e.   whenever  it  would  return  the GD_E_FORMAT error).  This callback may be used to
       correct the error, or to tell the parser how to recover from it.

       This function should take two pointers as arguments, and return an int:

              int sehandler(gd_parser_data_t *pdata, void *extra);

       The extra parameter is the pointer supplied to gd_cbopen(), passed verbatim to this function.  It can  be
       used  to  pass caller data to the callback.  GetData does not inspect this pointer, not even to check its
       validity.  If the caller needs to pass no data to the callback, it may be NULL.

       The gd_parser_data_t type is a structure with at least the following members:

           typedef struct {
             const DIRFILE* dirfile;
             int suberror;
             int linenum;
             const char* filename;
             char* line;
             size_t buflen;

             ...
           } gd_parser_data_t;

       The pdata->dirfile member  will  be  a  pointer  to  a  DIRFILE  object  suitable  only  for  passing  to
       gd_error_string().   Notably,  the  caller should not assume this pointer will be the same as the pointer
       eventually returned by gd_cbopen(), nor that it will be valid after the callback function returns.

       The pdata->suberror parameter will be one of the following symbols indicating the type  of  syntax  error
       encountered:

       GD_E_FORMAT_ALIAS
              The parent specified for a meta field was an alias.

       GD_E_FORMAT_BAD_LINE
              The  line  was  indecipherable.   Typically  this means that the line contained neither a reserved
              word, nor a field type.

       GD_E_FORMAT_BAD_NAME
              The specified field name was invalid.

       GD_E_FORMAT_BAD_SPF
              The samples-per-frame of a RAW field was out-of-range.

       GD_E_FORMAT_BAD_TYPE
              The data type of a RAW field was unrecognised.

       GD_E_FORMAT_BITNUM
              The first bit of a BIT field was out-of-range.

       GD_E_FORMAT_BITSIZE
              The last bit of a BIT field was out-of-range.

       GD_E_FORMAT_CHARACTER
              An invalid character was found in the line, or a character escape sequence was malformed.

       GD_E_FORMAT_DUPLICATE
              The specified field name already exists.

       GD_E_FORMAT_ENDIAN
              The byte sex specified by an /ENDIAN directive was unrecognised.

       GD_E_FORMAT_LITERAL
              An unexpected character was encountered in a complex literal.

       GD_E_FORMAT_LOCATION
              The parent of a metafield was defined in another fragment.

       GD_E_FORMAT_META_META
              An attempt was made to use a metafield as the parent to a new metafield.

       GD_E_FORMAT_METARAW
              An attempt was made to add a RAW metafield.

       GD_E_FORMAT_MPLEXVAL
              A MPLEX specification has a negative period.

       GD_E_FORMAT_N_FIELDS
              The number of fields of a LINCOM field was out-of-range.

       GD_E_FORMAT_N_TOK
              An insufficient number of tokens was found on the line.

       GD_E_FORMAT_NO_FIELD
              The parent of a metafield was not found.

       GD_E_FORMAT_NUMBITS
              The number of bits of a BIT field was out-of-range.

       GD_E_FORMAT_PROTECT
              The protection level specified by a /PROTECT directive was unrecognised.

       GD_E_FORMAT_RES_NAME
              A field was specified with the reserved name INDEX (or  with  the  reserved  name  FILEFRAM  in  a
              dirfile conforming to Standards Version 5 or earlier).

       GD_E_FORMAT_UNTERM
              The last token of the line was unterminated.

       GD_E_FORMAT_WINDOP
              The operation in a WINDOW field was not recognised.

       pdata->filename  and  pdata->linenum  members contains the pathname of the fragment and line number where
       the syntax error was encountered.  The first line in a fragment is line one.

       The pdata->line member contains a copy of the line containing the syntax error.  This line may be  freely
       modified  by the callback function.  It will then be reparsed if the callback function returns the symbol
       GD_SYNTAX_RESCAN (see below).  The size of the memory buffer (which may be greater than the length of the
       actual  string)  is  provided  in  pdata->buflen,  and space is available for at least GD_MAX_LINE_LENGTH
       bytes.  A larger buffer may be used if desired, by assigning a pointer to the new buffer of  the  desired
       length  to  pdata->line.   The  new  buffer  should be allocated with malloc(3).  It will be freed by the
       parser.  Do not call free(3) or realloc(3) on the original pointer passed to the callback as pdata->line:
       it, too, will be freed by the parser.

       The  callback  function  should  return  one  of  the  following  symbols,  which tells the parser how to
       subsequently handle the error:

       GD_SYNTAX_ABORT
              The parser should immediately abort parsing the format  specification  and  fail  with  the  error
              GD_E_FORMAT.  This is the default behaviour, if no callback function is provided (or if the parser
              is invoked by calling gd_open()).

       GD_SYNTAX_CONTINUE
              The parser should continue parsing the format specification.  However, once parsing has  finished,
              the parser will fail with the error GD_E_FORMAT, even if no further syntax errors are encountered.
              This behaviour may be used by the caller to identify all lines containing  syntax  errors  in  the
              format specification, instead of just the first one.

       GD_SYNTAX_IGNORE
              The parser should ignore the line containing the syntax error completely, and carry on parsing the
              format specification.  If no further errors are encountered,  the  dirfile  will  be  successfully
              opened.

       GD_SYNTAX_RESCAN
              The parser should rescan the line argument, which replaces the line which originally contained the
              syntax error.  The line is assumed to have been corrected by the callback function.  If  the  line
              still contains a syntax error, the callback function will be called again.

              Note: the line is not corrected on disk; however, the caller may subsequently correct the fragment
              on disk by calling gd_rewrite_fragment(3).

       The callback function handles only syntax errors.  The parser may still abort early, if a different  kind
       of  library  error  is encountered.  Furthermore, although a line may contain more than one syntax error,
       the parser will only ever report one syntax error  per  line,  even  if  the  callback  function  returns
       GD_SYNTAX_CONTINUE.

RETURN VALUE

       A  call  to gd_cbopen() or gd_open() always returns a pointer to a newly allocated DIRFILE object, except
       in instances when it is unable to allocate memory for the DIRFILE object itself, in which  case  it  will
       return  NULL.   The  DIRFILE object is an opaque structure containing the parsed dirfile metadata.  If an
       error occurred, the dirfile error will be set to a non-zero error value.  The DIRFILE object will also be
       internally flagged as invalid.  Possible error values are:

       GD_E_ACCMODE
               The  library  was  asked  to truncate a dirfile opened read-only (i.e.  GD_TRUNC was specified in
               flags along with GD_RDONLY).

       GD_E_ALLOC
               The library was unable to allocate memory.

       GD_E_BAD_REFERENCE
               The reference field specified by a /REFERENCE directive in the format specification (see dirfile-
               format(5)) was not found, or was not a RAW field.

       GD_E_CALLBACK
               The registered callback function, sehandler, returned an unrecognised response.

       GD_E_CREAT
               The library was unable to create the dirfile.

       GD_E_EXISTS
               The dirfile already exists and both GD_CREAT and GD_EXCL were specified.

       GD_E_FORMAT
               A  syntax  error  occurred  in  the format specification.  See also The Callback Function section
               above.

       GD_E_IO The dirfile format file, or another file that it includes, could not be  opened,  or  dirfilename
               does not specify a valid dirfile.

       GD_E_LINE_TOO_LONG
               The  parser  encountered a line in the format specification longer than it was able to deal with.
               Lines are limited by the storage  size  of  ssize_t.   On  32-bit  systems,  this  limits  format
               specification lines to 2**31 bytes.  The limit is larger on 64-bit systems.

       The dirfile error may be retrieved by calling gd_error(3).  A descriptive error string for the last error
       encountered can be obtained from a call to gd_error_string(3).  When finished with it,  a  caller  should
       de-allocate the DIRFILE object by calling gd_close(3), or gd_discard(3), even if the open failed.

BUGS

       When working with dirfiles conforming to Standards Versions 4 and earlier (before the introduction of the
       /ENDIAN directive), GetData assumes the dirfile has native byte sex, even though, officially, these early
       Standards  stipulated  data  to be little-endian.  This is necessary since, in the absence of an explicit
       /VERSION directive, it is often impossible to determine the intended Standards Version of a dirfile,  and
       the  current behaviour is to assume native byte sex for modern dirfiles lacking /ENDIAN.  To read an old,
       little-ended dirfile on a big-ended platform,  an  /ENDIAN  directive  should  be  added  to  the  format
       specification, or else GD_LITTLE_ENDIAN should be specified by the caller.

       GetData's  parser  assumes  it  is running on an ASCII-compatible platform.  Format specification parsing
       will fail gloriously on an EBCDIC platform.

SEE ALSO

       dirfile(5), dirfile-encoding(5), dirfile-format(5), gd_close(3), gd_dirfile_standards(3),  gd_discard(3),
       gd_error(3),   gd_error_string(3),   gd_flags(3),  gd_getdata(3),  gd_include(3),  gd_parser_callback(3),
       gd_verbose_prefix(3)