Ubuntu Manpage: dirfile-format — the dirfile database format specification file

Provided by: libgetdata-doc_0.11.0-14_all

NAME

       dirfile-format — the dirfile database format specification file

DESCRIPTION

       The  dirfile  format  specification  fully  specifies the raw and derived time streams and
       auxiliary information for a dirfile(5) database.

       The format specification is contained in one or more case-sensitive text files located  in
       the  dirfile  tree.   Each  file is known as a fragment.  The primary fragment is the file
       called format located in the base dirfile directory.  This file may contain only  part  of
       the format specification, and may reference other fragments (using the /INCLUDE directive)
       containing  further  format  specification.   This  inclusion  mechanism  may  be   nested
       arbitrarily deep.

       The explicit text encoding of these files is not specified by these Standards, but it must
       be 7-bit ASCII compatible. Examples of acceptable  character  encodings  include  all  the
       ISO  8859  character  sets  (i.e.  Latin-1 through Latin-10, among others), as well as the
       UTF-8 encoding of Unicode and UCS.

       This document primarily describes the  latest  version  of  the  Standards  (Version  10);
       differences  with  previous versions are noted where relevant.  A complete list of changes
       between versions is given in the HISTORY section below.

SYNTAX

       The format specification is composed of field specification  lines  and  directive  lines,
       optionally  separated  by  blank  lines  or  lines  containing only whitespace.  Lines are
       separated by the line-feed character (0x0A).  Unless escaped (see below),  the  hash  mark
       (#)  is the comment delimiter; the comment delimiter, and any text following it to the end
       of the line, is ignored.

   Tokens
       Both field specification lines and directive lines consist of several tokens separated  by
       whitespace.   Whitespace  consists of one or more whitespace characters.  These are: space
       (0x20), horizontal tab (0x09), vertical tab (0x0B), form-feed (0x0C), and carriage  return
       (0x0D).   The  first token of a directive line is always a reserved word.  The first token
       of a field specification line is never a reserved word.   Any  amount  of  whitespace  may
       precede the first token on a line.

       Since tokens are separated by whitespace, to include a whitespace character in a token, it
       must either escaped by preceding it by a backslash character (\),  or  be  replaced  by  a
       character  escape  sequence  (see  below), or else the token must be enclosed in quotation
       marks (").  The quotation marks themselves are stripped from  the  token.  The  null-token
       (that is, the token consisting of zero characters) may be specified by a pair of quotation
       marks with nothing between them ("").  To include a literal quotation mark in a token,  it
       must  be  escaped (\").  Similarly, a hash mark may be included in a token by including it
       in a quoted token or else by escaping it (\#), otherwise the hash mark  is  understood  as
       the comment delimiter.

       It  is a syntax error to have a line which contains unmatched quotation marks, or in which
       the last character is the backslash character.

       Several characters when escaped by a preceding  backslash  character  are  interpreted  as
       special characters in tokens.  The character escape sequences are:

              \a     an alert (bell) character (ASCII 0x07 / U+0007)

              \b     a backspace character (ASCII 0x08 / U+0008)

              \e     an escape character (ASCII 0x1B / U+001B)

              \f     a form-feed character (ASCII 0x0C / U+000C)

              \n     a line-feed character (ASCII 0x0A / U+000A)

              \r     a carriage return character (ASCII 0x0D / U+000D)

              \t     a horizontal tab character (ASCII 0x09 / U+0009)

              \v     a vertical tab character (ASCII 0x0B / U+000B)

              \\     a backslash character (ASCII 0x5C / U+005C)

              \ooo   the single byte given by the octal number ooo (1 to 3 octal digits).

              \xhh   the  single  byte  given  by  the  hexadecimal number hh (1 or 2 hexadecimal
                     digits).

              \uhhhhhhh
                     the UTF-8 byte sequence  encoding  the  Unicode  code  point  given  by  the
                     hexadecimal number hhhhhhh (1 to 7 hexadecimal digits).

       Any other character which is escaped is interpreted as the character itself.  (i.e.  \c is
       interpreted as c; also, as pointed out above, \" and \# are interpreted as simply " and #,
       without their special meanings).

       No  token  may  contain  the  NULL character (ASCII 0x00 / U+0000).  Furthermore, although
       support is present to create UTF-8 byte sequences, tokens are not  required  to  be  valid
       UTF-8 sequences.  Any byte sequence not containing the NULL character forms a valid token.
       However, there may be further  restrictions  on  allowed  characters  for  a  token  in  a
       particular situation, (for example, when used as a field name).

       Standards Version 5 and earlier do not recognise the character escape sequences, nor allow
       quoting of tokens. As a result, they prohibit both whitespace and  the  comment  delimiter
       from being used in tokens.

DIRECTIVES

       There  are  ten  directives,  each specified by a different reserved word, which cannot be
       used as field names in the dirfile.  As of Standards Version 8, all reserved  words  start
       with  an  initial  forward  slash  (/),  to  distinguish them from field names.  Standards
       Versions 5, 6, and 7 permitted the  omission  of  the  initial  forward  slash,  while  in
       Standards  Version  4  and  earlier, reserved words may not have an initial forward slash.
       Like the rest of the format specification, directives are case sensitive.

       A number of the directives have fragment scope.  A  directive  with  fragment  scope  only
       applies  to  the  fragment in which it is present, plus any sub-fragments indicated by the
       /INCLUDE directive, but only if those sub-fragments don't  have  their  own  corresponding
       directive.   Directives  which  have fragment scope are: /ENCODING, /ENDIAN, /FRAMEOFFSET,
       and /PROTECT.  Because of these scoping rules, different portions of the dirfile may  have
       different encodings, endiannesses, frame offsets, or protection levels.

       If  a  directive  with  fragment scope appears more than once in a fragment, only the last
       such directive is honoured, with the exception that the  effect  of  a  directive  is  not
       propagated  to  sub-fragments  if  the  directive  line  appears after the sub-fragment is
       included.  The scoping rules of the remaining directives are discussed below.

       /ALIAS The /ALIAS directive defines an alternate name for a field defined elsewhere in the
              format  specification (called the "target").  Aliases may not be used as the parent
              field in a /META directive, but are in most other ways indistinguishable  from  the
              target's  original,  canonical  name.   Aliases may be chained (that is, the target
              name appearing in an /ALIAS directive may itself be an alias).  In this  case,  the
              new  alias  is  another  name  for  the  target's  own target.  Just as there is no
              requirement that the input fields of a derived field exist, it is not an error  for
              the target of an alias to not exist.  Syntax is:

                     /ALIAS <name> <target>

              A metafield alias may defined using the <parent-field>/<alias-name> syntax for name
              in the /ALIAS directive.  No restriction  is  placed  on  target;  specifically,  a
              metafield  alias  may  target a top-level field, or a metafield of with a different
              parent; conversely, a top-level alias may target a metafield.

              A metafield alias may never appear as the parent part of a  metafield  field  code,
              even if it refers to a top-level field.  That is, given the valid format:

                     aaaa RAW UINT8 1
                     aaaa/bbbb CONST FLOAT64 0.0
                     cccc RAW UINT8 1
                     /ALIAS cccc/dddd aaaa

              the  metafield  aaaa/bbbb  may  not  be  referred to as cccc/dddd/bbbb, even though
              cccc/dddd is a valid field code referring to aaaa.

              This is not true of top-level aliases: if eeee is an alias of ffff, then ffff/gggg,
              a metafield of ffff, may be referred to as eeee/gggg as well.

              The  /ALIAS  directive  has  no scope: it is processed immediately.  It appeared in
              Standards Version 9.

       /ENCODING
              The /ENCODING directive specifies the encoding scheme used to encode  binary  files
              in  the  dirfile.   The  encoding  scheme may be one of the predefined names listed
              below, which are described in more detail  in  dirfile-encoding(5),  or  any  other
              site-specific encoding scheme.  The predefined scheme names are:

              none   The dirfile is unencoded.

              bzip2  The dirfile is compressed using the bzip2 compression scheme.

              flac   The dirfile is compressed using the flac compression scheme.

              gzip   The dirfile is compressed using the gzip compression scheme.

              lzma   The dirfile is compressed using the LZMA compression scheme.

              slim   The dirfile is compressed using the slim compression scheme.

              sie    The dirfile is sample-index encoded (a variant of run-length encoding).

              text   The dirfile is text encoded.

              zzip   The  dirfile  is  compressed  and  encapsulated  using  the zzip compression
                     scheme.

              zzslim The dirfile is compressed and encapsulated using a combination of  the  zzip
                     and slim compression schemes.

              Implementations  should  fail  gracefully  when  encountering  an  unknown encoding
              scheme.  If no encoding scheme is specified, behaviour is implementation dependent.
              Syntax is:

                     /ENCODING <scheme> [<enc-datum>]

              The  enc-datum  token  provides  additional  data for certain encoding schemes; see
              dirfile-encoding(5) for details.  The form of enc-datum is not specified.

              The /ENCODING directive has fragment scope.  It appeared in  Standards  Version  6.
              The  predefined  schemes  sie,  zzip, and zzslim, and the optional enc-datum token,
              appeared in Standards Version 9; the predefined scheme lzma appeared  in  Standards
              Version 7; all other predefined schemes appeared in Standards Version 6.

       /ENDIAN
              The  /ENDIAN  directive  specifies  the endianness of the raw data in the database.
              The assumed endianness of raw  data  in  dirfiles  which  omit  this  directive  is
              implementation dependent.  Syntax is:

                     /ENDIAN ( big | little ) [ arm ]

              where  the  "arm"  token should be included if double precision floating point data
              are stored in the ARM middle-endian format.  The  /ENDIAN  directive  has  fragment
              scope.   It  appeared  in  Standards Version 5.  The optional arm token appeared in
              Standards Version 8.

       /FRAMEOFFSET
              The /FRAMEOFFSET directive specifies the frame number of the first frame for  which
              data exists in binary files associated with RAW fields.  Syntax is:

                     /FRAMEOFFSET <integer>

              The /FRAMEOFFSET directive has fragment scope.  It appeared in Standards Version 1.

       /HIDDEN
              The  /HIDDEN  directive  indicates  that  the  specified field name is hidden.  The
              difference (if any) between a field name which is hidden and one  that  is  not  is
              implementation  dependent.   Hiddenness  is  not  inherited  by  metafields  of the
              specified field.  Hiddenness applies to the name, not the field itself; it does not
              hide  all  aliases  of  the  field-name,  and  if field-name an alias, the alias is
              hidden, not its target.  Syntax is:

                     /HIDDEN <field-name>

              A /HIDDEN directive must appear  after  the  specification  of  field-name,  (which
              occurs  either  in  a  field specification line, or an /ALIAS directive, or a /META
              directive) in the same fragment.

              The /HIDDEN directive has no scope: it is processed immediately.   It  appeared  in
              Standards Version 9.

       /INCLUDE
              The  /INCLUDE  directive  specifies  another  file (called a fragment) to parse for
              additional format specification  for  the  dirfile.   The  inclusion  is  processed
              immediately,  before  the  fragment  containing  the /INCLUDE directive (the parent
              fragment) is parsed further.  RAW fields specified in  the  included  fragment  are
              located  in  the  directory  containing the fragment file, and not in the directory
              containing the parent fragment, and the binary file encoding may be  different  for
              each fragment.  The fragment may be specified either with an absolute path, or else
              a path relative to the directory containing the parent fragment.

              The /INCLUDE directive may optionally specify a prefix and/or suffix  to  apply  to
              field  names  defined in the included fragment.  If present, affixes are applied to
              all field-names (including aliases)  defined  in  the  included  fragment  and  any
              fragments  it  further  includes.   Affixes  nest,  with the affixes of the deepest
              inclusion innermost.  Affixes  are  not  applied  to  the  names  of  binary  files
              associated with RAW fields.  Syntax is:

                     /INCLUDE <file> [<namespace>.][<prefix>] [<suffix>]

              To specify only suffix, the null-token ("") may be used as prefix.

              A  namespace  may  also  be  specified in an /INCLUDE directive by prepending it to
              prefix.  The namespace and prefix are separated by a dot (.).  The dot is  required
              whenever  a  namespace is specified: if the prefix is empty, the third token should
              be just the namespace followed by a trailing dot.  If  a  namespace  is  specified,
              that  namespace,  relative  to the including fragment's root namespace, becomes the
              root namespace of the included fragment.  If  no  namespace  is  specified  in  the
              /INCLUDE  directive, then the current namespace (specified by a previous /NAMESPACE
              directive) is used as the root namespace of the included fragment.  That is, if the
              current namespace is current_space, then the statement:

                     /INCLUDE file newspace.

              is equivalent to

                     /NAMESPACE newspace
                     /INCLUDE file
                     /NAMESPACE current_space

              As a result, if no namespace is provided, and there has been no previous /NAMESPACE
              directive, the included fragment will have the same root namespace as the including
              fragment.

              The  /INCLUDE  directive has no scope: it is processed immediately.  It appeared in
              Standards Version 3.  The optional prefix and suffix appeared in Standards  Version
              9.  The optional namespace appeared in Standards Version 10.

       /META  The  /META  directive  specifies a metafield attached to a particular parent field.
              The field metadata may be of any allowed type except RAW.  Metafields are retrieved
              in  exactly  the  same  way  as  regular  field  data, but the field code specified
              consists of the parent and metafield names joined with a forward slash:

                     <parent-field>/<meta-field>

              META fields may not be specified before their parent field has been.  Syntax is:

                     /META <parent-field> {field specification line}

              The <parent-field> code may not be an alias.  As an illustration of this concept,

                     /META pfield meta CONST FLOAT64 3.291882

              provides a scalar metadatum called meta with value 3.291882 attached to  the  field
              pfield.    This  particular  metafield  may  be  referred  to  by  the  field  code
              "pfield/meta".  Note that different parent fields may have metafields with the same
              name,  since  all  references  to  metafields  must  include the parent field name.
              Metafields may not themselves have further sub-metafields.

              As an alternative to the /META directive, starting  with  Standards  Version  7,  a
              metafield may be specified by a standard field specification line, using

                     <parent-field>/<meta-field>

              as  the  field  name.   That  is,  the above example metafield could have also been
              specified as:

                     pfield/meta CONST FLOAT64 3.291882

              The /META directive has no scope: it is  processed  immediately.   It  appeared  in
              Standards Version 6.

       /NAMESPACE
              The         /NAMESPACE         directive         changes         the        current
              namespaceforsubsequentfieldspecificationlines.  Syntax is:

                     /NAMESPACE <subspace>

              The subspace specified is relative to the current fragment's  root  namespace.   If
              subspace  is the null-token ("") the current namespace will be set back to the root
              namespace.  Otherwise, the current namespace will be changed to  the  concatenation
              of the root namespace with subspace, with the two parts separated by a dot:

                     rootspace.subspace

              If rootspace is empty, the intervening dot is omitted, and the current namespace is
              simply subspace.

              By default, all field codes, both field names for newly specified fields, and field
              codes  used  as  inputs to fields or targets for aliases, are placed in the current
              namespace, unless they start with  an  initial  dot,  in  which  case  the  current
              namespace  is ignored, and they're placed instead in the fragment's root namespace.
              See the Namespaces section for further details.

              The /NAMESPACE directive has no  scope:  it  is  processed  immediately.   For  the
              effects  of  changing the current namespace on included fragments, see the /INCLUDE
              directive above.  The effects of a /NAMESPACE directive never propagate upwards  to
              parent fragments.  It appeared in Standards Version 10.

       /PROTECT
              The  /PROTECT  directive  specifies  the  advisory  protection level of the current
              fragment and of the RAW fields defined therein.   The  protection  level  indicates
              whether  writing  to the fragment, or the binary data on disk is permitted.  Syntax
              is:

                     /PROTECT <level>

              Four advisory protection levels are defined:

              none   No protection at all: data and metadata may be freely changed.  This is  the
                     default, if no /PROTECT directive is present.

              format The  dirfile  metadata is protected from change, but RAW data on disk may be
                     modified.

              data   The RAW data on disk is protected from change, but metadata may be modified.

              all    Both metadata and data on disk are protected from change.

              The /PROTECT directive has fragment scope.  It appeared in Standards Version 6.

       /REFERENCE
              The /REFERENCE directive specifies the name of the field to use  as  the  dirfile's
              reference  field  (see  dirfile(5)).   If no /REFERENCE directive is specified, the
              first RAW field encountered  is  used  as  the  reference  field.   The  /REFERENCE
              directive must specify a RAW field.  Syntax is:

                     /REFERENCE <field-code>

              The /REFERENCE directive has global scope: if multiple /REFERENCE directives appear
              in the dirfile metadata, only the last such is honoured.  It appeared in  Standards
              Version 6.

       /VERSION
              The /VERSION directive specifies the particular version of the Dirfile Standards to
              which the dirfile format  specification  conforms.   This  directive  should  occur
              before  any version dependent syntax is encountered.  As of Standards Version 6, no
              such syntax exists, and this  directive  is  provided  primarily  to  ease  forward
              compatibility.  Syntax is:

                     /VERSION <integer>

              The /VERSION directive has immediate scope: its effect is immediate, and it applies
              only to metadata below it, including and  propagating  downwards  to  sub-fragments
              after the directive.

              In  Standards Version 8 and earlier, its effect also propagates upwards back to the
              parent fragment, and affects subsequent metadata.  Starting with Standards  Version
              9,  this  no  longer  happens.  As a result, a /VERSION directive which indicates a
              version of 9 or later never propagates upwards; additionally,  /VERSION  directives
              found  in  subfragments included in a Version 9 or later fragment aren't propagated
              upwards into that fragment, regardless of the Version  of  the  subfragments.   The
              /VERSION directive appeared in Standards Version 5.

FIELD SPECIFICATION LINES

Any line which does not start with a reserved word is assumed to be a field specification
line. A field specification line consists of at least two tokens. The first token is the
field name. The second token is the field type. Subsequent tokens are field parameters.
The meaning and number these parameters depends on the field type specified.

Field Names
The first token in a field specification line is the field name. The field name consists
of one or more characters, excluding both ASCII control characters (the bytes 0x01 through
0x1F), and the characters

& / ; < > | .

which are reserved (but see below for the use of / to specify metafields). The dot (.)
is allowed in Standards Version 5 and earlier. The ampersand, semicolon, less-than sign,
greater-than sign, and vertical line (& ; < > |) are allowed in Standards Version 4 and
earlier. Furthermore, due to the lack of an escape or quoting mechanism (see Tokens
above), Standards Version 5 and earlier also prohibit whitespace and the comment delimiter
(#) in field names.

The field name may not be INDEX, which is a special, implicit field which contains the
integer frame index. Standards Version 5 and earlier also prohibit FILEFRAM, which was an
alias for INDEX. Field names are case sensitive. Standards Version 3 and 4 restrict
field names to 50 characters. Standards Version 2 and earlier restrict field names to 16
characters. Additionally, the filesystem may put restrictions on the length and acceptable
characters of a RAW field name, regardless of Standards Version.

Starting in Standards Version 7, if the field name beginning a field specification line
contains exactly one forward slash character (/), the line is assumed to specify a
metafield. See the /META directive above for further details. A field name may not
contain more than one forward slash.

Starting in Standards Version 10, any field name may be preceded by a namespace tag. The
namespace tag and the field name are separated by a dot (.). See the Namespaces section,
following, for details.

Namespaces
Beginning with Standards Version 10, every field in a Dirfile is contained in a namespace.
Every namespace is identified by a namespace tag which consist of the same restricted set
of characters used for field names. Namespaces nest arbitrarily deep. Subnamespaces are
identified by concatenating all namespace tags, separating tags by dots (.), with the
outermost namespace leftmost:

topspace.subspace.subsubspace

Each fragment has an immutable root namespace. The root namespace of the primary format
file is the null namespace, identified by the null-token (""). The root namespace of
other fragments is specified when they are introduced (see the /INCLUDE directive). Each
fragment also has a current namespace which may be changed as often as needed using the
/NAMESPACE directive, and defaults to the root namespace. The current namespace is always
either the root namespace or else a subspace under the root namespace.

If a field name or field code starts with a leading dot, then that name or code is taken
to be relative to the fragment's root space. If it does not start with a dot, it is taken
to be relative to the current namespace.

For example, if the both the root namespace and current namespace of a fragment start off
as rootspace, then:

aaaa RAW UINT8 1
.bbbb RAW UINT8 1
cccc.dddd RAW UINT8 1
.eeee.ffff RAW UINT8 1
/NAMESPACE newspace
gggg RAW UINT8 1
.hhhh RAW UINT8 1
iiii.jjjj RAW UINT8 1
.kkkk.llll RAW UINT8 1

specifies, respectively, the fields:

rootspace.aaaa,
rootspace.bbbb,
rootspace.cccc.dddd,
rootspace.eeee.ffff,
rootspace.newspace.gggg,
rootspace.hhhh,
rootspace.newspace.iiii.jjjj, and
rootspace.kkkk.llll.

Note that a field may specify deeper subspaces under either the root namespace or the
current namespace (meaning it is never necessary to use the /NAMESPACE directive). Note
also that there is no way for metadata in a given fragment to refer to fields outside the
fragment's root space.

There is one exception to this namespace scoping rule: the implicit INDEX vector is always
in the null (top-level) namespace, and namespace tags specified with it, either explicitly
or implicitly, even a fragment root namespace, are ignored. So, in a fragment with root
namespace rootspace, and current namespace rootspace.subspace,

INDEX,
.INDEX,
namespace.INDEX, and
.namespace.INDEX,

all refer to the same INDEX field.

Field Types
There are eighteen field types. Of these, fourteen are of vector type (BIT, DIVIDE,
INDIR, LINCOM, LINTERP, MPLEX, MULTIPLY, PHASE, POLYNOM, RAW, RECIP, SBIT, SINDIR, and
WINDOW) and four are of scalar type (CARRAY, CONST, SARRAY, and STRING). The thirteen
vector field types other than RAW fields are also called derived fields, since they derive
their value from one or more input vector fields. Any other vector field may be used as
an input vector, including the implicit INDEX field, but excluding SINDIR string vectors.

Five of these derived fields (DIVIDE, LINCOM, MPLEX, MULTIPLY, and WINDOW) have more than
one vector input field. In situations where these input fields have differing sample
rates, the sample rate of the derived field is the same as the sample rate of the first
(left-most) input field specified. Furthermore, the input fields are synchronised by
aligning them on frame boundaries, assuming equally-spaced sampling throughout a frame,
and using the last sample of each input field which did not occur after the sample of the
derived field being computed. That is, if the first and second input fields have sample
rates s1 and s2, the derived field also has sample rate s1 and, for every sample of the
derived field, n, the n'th sample of the first field is used (since they have the same
sample rate by definition), and the sample number used of the second field, m, is computed
as:

m = floor((n * s2) / s1).

Starting in Standards Version 6, certain scalar field parameters in the field
specifications may be specified using CONST or CARRAY fields, instead of literal values.
A list of parameters for which this is allowed is given below in the Field Parameters
section.

The possible fields types are:

BIT The BIT vector field type extracts one or more bits out of an input vector field as
an unsigned number. Syntax is:

<fieldname> BIT <input> <first-bit> [<num-bits>]

which specifies fieldname to be num-bits bits extracted from the input vector field
input starting with bit number first-bit (counting from the least-significant bit,
which is numbered zero), after input has been converted from its native type to an
(endianness corrected) unsigned 64-bit integer. If num-bits is omitted, it is
assumed to be one.

The extracted bits are interpreted as an unsigned integer; the SBIT field type is a
signed version of this field type. The optional num-bits parameter appeared in
Standards Version 1.

CARRAY The CARRAY scalar field type is a list of constants fully specified in the format
specification metadata. Syntax is:

<fieldname> CARRAY <type> <value0> <value1> <value2> ...

where type may be any supported native data type (see the description of the RAW
field type below), and value0, value1, &c. are the values of successive elements in
the scalar list interpreted as indicated by type. No limit is placed on the number
of elements in a CARRAY. (Note: despite being multivalued, this is not considered
a vector field since the elements of the CARRAY are not indexed by frames.) CARRAY
appeared in Standards Version 8.

CONST The CONST scalar field type is a constant fully specified in the format
specification metadata. Syntax is:

<fieldname> CONST <type> <value>

where type may be any supported native data type (see the description of the RAW
field type below), and value is the numerical value of the constant interpreted as
indicated by type. CONST appeared in Standards Version 6.

DIVIDE The DIVIDE vector field type is the quotient of two vector fields. Syntax is:

<fieldname> DIVIDE <field1> <field1>

The derived field is computed as:

fieldname = field1 / field2.

It was introduced in Standards Version 8.

INDIR The INDIR vector field type performs an indirect translation of a CARRAY scalar
field to a derived vector field based on a vector index field. Syntax is:

<fieldname> INDIR <index> <array>

where index is the vector field, which is converted to an integer type, if
necessary, and array is the CARRAY field. The nth sample of the INDIR field is the
value of the mth element of array (counting from zero), where m is the value of the
nth sample of index. When index is not a valid element number of array, the
corresponding value of the INDIR is implementation dependent. INDIR appeared in
Standards Version 10.

LINCOM The LINCOM vector field type is the linear combination of one, two or three input
vector fields. Syntax is:

<fieldname> LINCOM [<n>] <field1> <a1> <b1> [<field2> <a2> <b2> [<field3>
<a3> <b3>]]

where n, if present, indicates the number of input vector fields (1, 2, or 3). The
derived field is computed as:

fieldname = (a1 * field1 + b1) + (a2 * field2 + b2) + (a3 * field3 + b3)

with the field2 and field3 terms included only if specified.

If n is not specified, the number of fields is determined by looking at the
supplied parameters. Since it is possible to create a field code which is
identical to a literal number, the third token on the line is assumed to be n if it
the entire token can be parsed as a literal number using the rules outlined in
strtod(3). That is, if the field code specifying field1 could be mistaken for a
literal number, n must be specified to prevent ambiguity. In standards Version 6
and earlier, n is mandatory.

LINTERP
The LINTERP vector field type specifies a table look up based on another vector
field. Syntax is:

<fieldname> LINTERP <input> <table>

where input is the input vector field for the table lookup, and table is the path
to the lookup table file for the field. If this path is relative, it is assumed to
be relative to the directory containing the fragment defining this field. The
lookup table file is an ASCII text file with two whitespace separated columns of x
and y values. Values are linearly interpolated between the points specified in the
lookup table.

MPLEX The MPLEX vector field type permits the multiplexing of several low sample rate
fields into a single data field of higher sample rate. Syntax is:

<fieldname> MPLEX <input> <index> <count> [<period>]

where input is the input vector containing the multiplexed fields, index is the
vector containing the mutliplex index, count is the value of the multiplex index
when the computed field is stored in input, and period, if present and non-zero, is
the number of samples between successive occurrances of the value count in the
index vector. A period of zero (or, equivalently, it's omission) indicates that
either the value count is not equally spaced in the index vector, or else that the
spacing is unknown. Both count and period are integers, and period may not be
negative.

At every sample n, the derived field is computed as:

fieldname[n] = (index == count) ? input[n] : fieldname[n - 1]

The index vector is converted to an integer type for comparison. The value of the
derived field before the first sample where index equals count is implementation
dependent.

The values of count and period place no restrictions on values contained in index.
Specifically, particular values of index (including count) need not be equally
spaced (neither by period nor any other spacing); index need not ever take on the
value count (in which case the value of the entirety of the derived field is
implementation dependent). Different MPLEX field definitions which use the same
index vector may specify different periods. MPLEX appeared in Standards Version 9.

MULTIPLY
The MULTIPLY vector field type is the product of two vector fields. Syntax is:

<fieldname> MULTIPLY <field1> <field2>

The derived field is computed as:

fieldname = field1 * field2.

MULTIPLY appeared in Standards Version 2.

PHASE The PHASE vector field type shifts an input vector field by the specified number of
samples. Syntax is:

<fieldname> PHASE <input> <shift>

which specifies fieldname to be the input vector field, input, shifted by shift
samples. A positive shift indicates a forward shift, towards the end-of-field.
Results of shifting past the beginning- or end-of-field is implementation
dependent. PHASE appeared in Standards Version 4.

POLYNOM
The POLYNOM vector field type specifies a polynomial function of a single input
vector field. Syntax is:

<field_name> POLYNOM <input> <a0> <a1> [<a2> [<a3> [<a4> [<a5>]]]]

where <input> is the input field code, and the order of the computed polynomial is
determined by how many co-efficients are present in the specification. The derived
field is computed as:

fieldname = a0 + a1 * input + a2 * input**2 + a3 * input**3 + a4 * input**4
+ a5 * input**5

where ** is the element-wise exponentiation operator, and the higher order terms
are computed only if the corresponding co-efficients ai are specified. POLYNOM
appeared in Standards Version 7.

RAW The RAW vector field type specifies raw time streams on disk. In this case, the
field name should correspond to the name of the file containing the time stream.
Syntax is:

where sample-rate is the number of samples per dirfile frame for the time stream
and type is a token specifying the native data type:

UINT8 unsigned 8-bit integer

INT8 two's complement signed 8-bit integer

UINT16 unsigned 16-bit integer

INT16 two's complement signed 16-bit integer

UINT32 unsigned 32-bit integer

INT32 two's complement signed 32-bit integer

UINT64 unsigned 64-bit integer

INT64 two's complement signed 64-bit integer

FLOAT32
IEEE-754 standard 32-bit single precision floating point number

FLOAT64
IEEE-754 standard 64-bit double precision floating point number

COMPLEX64
a 64-bit complex number consisting of two IEEE-754 standard 32-bit
single precision floating point numbers representing the real and
imaginary parts of the complex number (Standards Version 7 and later)

COMPLEX128
a 128-bit complex number consisting of two IEEE-754 standard 64-bit
double precision floating point numbers representing the real and
imaginary parts of the complex number (Standards Version 7 and
later).

For more information on the storage of complex valued data, see dirfile(5). Two
additional type names exist: FLOAT is equivalent to FLOAT32, and DOUBLE is
equivalent to FLOAT64. Standards Version 9 deprecates these two aliases, but still
allows them.

All these type names (except those for complex data, which came later) were
introduced in Standards Version 5. Earlier Standards Versions specified data types
with single-character type aliases:

c UINT8

u UINT16

s INT16

U UINT32

i, S INT32

f FLOAT32

d FLOAT64

Types INT8, UINT64, INT64, COMPLEX64, and COMPLEX128 are not supported before
Standards Version 5, so no single-character type aliases exist for these types.
These single-character type aliases were deprecated in Standards Version 5 and
removed in Standards Version 8.

RECIP The RECIP vector field type computes the reciprocal of a single input vector field.
Syntax is:

<field_name> RECIP <input> <dividend>

where <input> is the input field code and <dividend> is a scalar quantity. The
derived field is computed as:

fieldname = dividend / input.

RECIP appeared in Standards Version 8.

SARRAY The SARRAY scalar field type is a list of strings fully specified in the format
file metadata. Syntax is:

<fieldname> SARRAY <string0> <string1> <string2> ...

Each string is a single token. To include whitespace in a string, enclose it in
quotation marks ("), or else escape the whitespace with the backslash character
(\). No limit is placed on the number of elements in a SARRAY. SARRAY appeared in
Standards Version 10.

SBIT The SBIT vector field type extracts one or more bits out of an input vector field
as a (two's-complement) signed number. Syntax is:

<fieldname> SBIT <input> <first-bit> [<num-bits>]

which specifies fieldname to be num-bits bits extracted from the input vector field
input starting with bit number first-bit (counting from the least-significant bit,
which is numbered zero), after input has been converted from its native type to an
(endianness corrected) two's complement signed 64-bit integer. If num-bits is
omitted, it is assumed to be one.

The extracted bits are interpreted as a two's complement signed integer of the
specified width. (So, if num-bits is, for example, one, then the field can take on
the value zero or negative one.) The BIT field type is an unsigned version of this
field type. SBIT appeared in Standards Version 7.

SINDIR The SINDIR vector field type performs an indirect translation of a SARRAY scalar
field to a derived vector field of strings based on a vector index field. Syntax
is:

<fieldname> SINDIR <index> <array>

where index is the vector field, which is converted to an integer type, if
necessary, and array is the SARRAY field. The nth sample of the SINDIR field is
the string value of the mth element of array (counting from zero), where m is the
value of the nth sample of index. When index is not a valid element number of
array, the corresponding value of the SINDIR is implementation dependent. SINDIR
appeared in Standards Version 10.

STRING The STRING scalar field type is a character string fully specified in the format
file metadata. Syntax is:

<fieldname> STRING <string>

where string is the string value of the field. Note that string is a single token.
To include whitespace in the string, enclose string in quotation marks ("), or else
escape the whitespace with the backslash character (\). STRING appeared in
Standards Version 6.

WINDOW The WINDOW vector field type isolates a portion of an input vector based on a
comparison. Syntax is:

<fieldname> WINDOW <input> <check> <op> <threshold>

where input is the vector containing the data to extract, check is the vector on
which to test the comparison, threshold is the value against which check is
compared, and op is one of the following tokens indicating the particular
comparison performed:

EQ data are extracted where check, converted to a 64-bit signed integer,
equals threshold,

GE data are extracted where check, converted to a 64-bit floating-point
number, is greater than or equal to threshold,

GT data are extracted where check, converted to a 64-bit floating-point
number, is strictly greater than threshold,

LE data are extracted where check, converted to a 64-bit floating-point
number, is less than or equal to threshold,

LT data are extracted where check, converted to a 64-bit floating-point
number, is strictly less than threshold,

NE data are extracted where check, converted to a 64-bit signed integer,
is not equal to threshold,

SET data are extracted where at least one bit set in threshold is also
set in check, when converted to a 64-bit unsigned integer,

CLR data are extracted where at least one bit set in threshold is not set
in check, when converted to a 64-bit unsigned integer,

The storage type of threshold depends on the operator, and follows the
interpretation of check. It may never be complex valued.

Outside the region extracted, the value of the derived field is implementation
dependent.

Note: with the EQ operator, this derived field type is very similar to the MPLEX
field type above. The primary difference is that MPLEX mandates the value of the
derived field outside the extracted region, while WINDOW does not. WINDOW appeared
in Standards Version 9.

Field Parameters
All input vector field parameters should be field codes (see below). Additionally, the
scalar field parameters listed may be either literal numbers or else the field code of a
CONST field containing the value, or the field code of a CARRAY followed by a left angle
bracket (<), then an non-negative integer used as the CARRAY element index, then a right
angle bracket (>), that is:

fieldcode<n>

If the angle brackets and element index are omitted from a CARRAY field code used as a
parameter, the first element in the field (index zero) is assumed.

Field parameters which may be specified using a scalar field code are:

BIT, SBIT
bitnum, numbits

LINCOM any of the mi, or bi

MPLEX count, max

PHASE shift

POLYNOM
any of the ai

RAW spf

RECIP dividend

WINDOW threshold

Since it is possible to create a field code which is identical to a literal number, a
parameter is assumed to be the field code of a scalar field only if the entire token
cannot be parsed as a literal number using the rules outlined in strtod(3). For example,
a CONST field whose field code consists solely of digits can never be used as a parameter
in a field specification line.

Starting in Standards Version 7, literal complex number is specified as two real (floating
point) numbers separated by a semicolon (;) with no intervening whitespace. So, for
example, the tokens

1;0 0;1 4;0 0;5 9.313e2;74.1

represent, respectively, the real unit, the imaginary unit, the real number four, the
imaginary number 5i, and the complex number 931.3 + 74.1i. Because the semicolon
character cannot be used in field names, a complex valued literal can never be mistaken
for a field code. This allows, among other things, the composition of complex valued
fields from purely real input fields. For example, a complex valued field, z, may be
created from a real valued field re, representing the real part of the complex number, and
the real valued field im, representing the imaginary part of the complex number, with the
following LINCOM specification:

z LINCOM re 1 0 im 0;1 0

Starting in Standards Version 9, in additional to decimal notation, literal integer
parameters may be specified as hexadecimal numbers, by prefixing the number (after an
optional '+' or '-' sign) with 0x or 0X, or as octal numbers, by prefixing the number with
0, as described in strtol(3). Similarly, floating point literal numbers (both purely real
ones and components of complex literals) may be specified in hexadecimal by prefixing them
with 0x or 0X, and using p or P as the binary exponent prefix, as described in the C99
standard. Both uppercase and lowercase hexadecimal digits may be used. In cases where a
literal floating point number may apear, the tokens INF or INFINITY, optionally preceded
by a '+' or '-' sign, and NAN, optionally immediately followed by '(', then a sequence of
characters, then ')', and all disregarding case, will be interpreted as the special
floating point values explained in strtod(3).

Field Codes
When specifying the input to a field, either as a scalar parameter, or as an input vector
field to a non-RAW vector field, field codes are used. A field code consists of, in
order:

• (since Standards Version 10:) optonally, a leading dot (.), indicating this field code
is relative to the fragment's root namespace. Without the leading dot, the field code
is taken to be relative to the current namespace. (See the discussion in the
Namespaces section above for details.)

• (since Standards Version 10:) optionally, a non-null subnamespace followed by a dot
(.) indicating a subspace under the current or root namespace. The subnamespace may
be made up of any number of namespace tags separated by dots, to nest deeper in the
namespace tree.

• (since Standards Version 6:) if the field in question is a metafield (see the /META
directive above), the field name of the metafield's parent (which may be an alias)
followed by a forward slash (/).

• a simple field name, possibly an alias, indicating a vector or scalar field

• (since Standards Version 7:) optionally, a dot (.) followed by a representation
suffix.

A representation suffix may be used used to extract a real number from a complex value.
The available suffixes (listed here with their preceding dot) and their meanings are:

.a the argument of the input, that is, the angle (in radians) between the positive
real axis and the input. The argument is in the range [-pi, pi], and a branch cut
exists along the negative real axis. At the branch cut, -pi is returned if the
imaginary part is -0, and pi is returned if the imaginary part is +0. If the input
is zero, zero is returned.

.i the imaginary part of the input (i.e. the projection of the input onto the
imaginary axis)

.m the modulus of the input (i.e. its absolue value).

.r the real part of the input (i.e. the projection of the input onto the real axis)

.z (since Standards Version 10:) the identity representation: it returns the full
complex value, equivalent to simply omitting the suffix completely. It is only
needed in certain cases to force the correct interpretation of a field code in the
presence of a namespace tag. To wit, the field code

name.r

may be interpreted as the real-part (via the .r representation suffix) of the field
called name. (if such a field exists). To refer to a field called r in the name
namespace, the field code must be written:

name.r.z

NB: The first interpretation only occurs with valid representation suffixes; the
field code:

name.q

is interpreted as the field q in the name namespace because .q is not a valid
representation suffix. Furthermore, ambiguity arises only if both fields "name"
and "name.r" are defined. if the field "name" does not exist, but the field
"name.r" does, then the original field code is not ambiguous. This is the only
representation suffix allowed on SARRAY, SINDIR, and STRING field codes.

If the specified field is purely real, representations are calculated as if the imaginary
part were equal to +0.

HISTORY

This document describes Versions 10 and earlier of the Dirfile Standards.

Version 10 of the Standards (January 2017) added the INDIR, SARRAY, and SINDIR field
types, namespaces, the /NAMESPACE directive, the flac encoding scheme, and the .z
representation suffix.

Version 9 of the Standards (April 2012) added the MPLEX and WINDOW field types, the /ALIAS
and /HIDDEN directives, the affixes to /INCLUDE, the sie, zzip, and zzslim encoding
schemes, along with the optional enc_datum token to /ENCODING. It permitted specification
of integer literals in octal and hexadecimal. Finally, it deprecated the type aliases
FLOAT and DOUBLE.

Version 8 of the Standards (November 2010) added the DIVIDE, RECIP, and CARRAY field
types, made the forward slash on reserved words mandatory, and prohibited using the
single-character type aliases in the specification of RAW fields. It also introduced the
optional second (arm) token to the /ENDIAN directive.

Version 7 of the Standards (October 2009) added the SBIT and POLYNOM field types, and the
directive-less method of specifying metafields. It also introduced the data types
COMPLEX128 and COMPLEX64, along with the notion of representations, and the lzma encoding
scheme. Finally, it made the number of fields parameter for LINCOM optional.

Version 6 of the Standards (October 2008) added the /ENCODING, /META, /PROTECT, and
/REFERENCE directives, and the CONST and STRING field types. It permitted whitespace in
tokens and introduced the character escape sequences. It allowed CONST fields to be used
as parameters in field specification lines. It also removed FILEFRAM as an alias for
INDEX, and prohibited . but allowed # and \ in field names.

Version 5 of the Standards (August 2008) added VERSION and ENDIAN, slash demarcation of
reserved words, and removed the restriction on field name length. It introduced the data
types INT8, INT64, and UINT64, the new-style type specifiers, and increased the range of
the BIT field type from 32 to 64 bits. It also prohibited the characters &;<>\| in field
names.

Version 4 of the Standards (October 2006) added the PHASE field type.

Version 3 of the Standards (January 2006) added INCLUDE and increased the allowed length
of a field name from 16 to 50 characters.

Version 2 of the Standards (September 2005) added the MULTIPLY field type.

Version 1 of the Standards (November 2004) added FRAMEOFFSET and the optional fourth
argument to the BIT field type.

Version 0 of the Standards (before March 2003) refers to the dirfile standards supported
by the getdata(3) library originally introduced into the kst(1) sources, which contained
support for all other features covered by this document.

AUTHORS

       The     dirfile     specification     was     developed     by     C.    B.    Netterfield
       <netterfield@astro.utoronto.ca>.

       Since Standards Version 3, the dirfile specification has been maintained by  D.  V.  Wiebe
       <getdata@ketiltrout.net>.