Provided by: libgetdata-tools_0.7.3-6ubuntu1_amd64 

NAME
dirfile-format — the dirfile database format specification file
DESCRIPTION
The dirfile format specification fully specifies the raw and derived time streams and auxiliary
information for a dirfile(5) database.
The format specification is contained in one or more case-sensitive text files located in the dirfile
tree. Each file is known as a fragment. The primary fragment is the file called format located in the
base dirfile directory. This file may contain only part of the format specification, and may reference
other fragments (using the /INCLUDE directive) containing further format specification. This inclusion
mechanism may be nested arbitrarily deep.
The explicit text encoding of these files is not specified by these standards, but must be 7-bit ASCII
compatible. Examples of acceptable character encodings include all the ISO 8859 character sets (i.e.
Latin-1 through Latin-10, among others), as well as the UTF-8 encoding of Unicode and UCS.
SYNTAX
The format specification is composed of field specification lines and directive lines, optionally
separated by blank lines or lines containing only whitespace. Lines are separated by the line-feed
character (0x0A). Unless escaped (see below), the hash mark (#) is the comment delimiter; the comment
delimiter, and any text following it to the end of the line, is ignored.
Tokens
Both field specification lines and directive lines consist of several tokens separated by whitespace.
Whitespace consists of one or more whitespace characters. These are: space (0x20), horizontal tab
(0x09), vertical tab (0x0B), form-feed (0x0C), and carriage return (0x0D). The first token of a
directive line is always a reserved word. The first token of a field specification line is never a
reserved word. Any amount of whitespace may precede the first token on a line.
Since tokens are separated by whitespace, to include a whitespace character in a token, it must either
escaped by preceding it by a backslash character (\), or be replaced by a character escape sequence (see
below), or else the token must be enclosed in quotation marks ("). The quotation marks themselves will
be stripped from the token. The null-token (that is, the token consisting of zero characters) may be
specified by a pair of quotation marks with nothing between them (""). To include a literal quotation
mark in a token, it must be escaped (\"). Similarly, a hash mark may be included in a token by including
it in a quoted token or else by escaping it (\#), otherwise the hash mark will be understood as the
comment delimiter.
It is a syntax error to have a line which contains unmatched quotation marks, or in which the last
character is the backslash character.
Several characters when escaped by a preceding backslash character are interpreted as special characters
in tokens. The character escape sequences are:
\a an alert (bell) character (ASCII 0x07 / U+0007)
\b a backspace character (ASCII 0x08 / U+0008)
\e an escape character (ASCII 0x1B / U+001B)
\f a form-feed character (ASCII 0x0C / U+000C)
\n a line-feed character (ASCII 0x0A / U+000A)
\r a carriage return character (ASCII 0x0D / U+000D)
\t a horizontal tab character (ASCII 0x09 / U+0009)
\v a vertical tab character (ASCII 0x0B / U+000B)
\\ a backslash character (ASCII 0x5C / U+005C)
\ooo the single byte given by the octal number ooo.
\xhh the single byte given by the hexadecimal number hh.
\uhhhhhhh
the UTF-8 byte sequence encoding the Unicode code point given by the hexadecimal number
hhhhhhh.
Any other character which is escaped is interpreted as the character itself. (i.e. \c is interpreted as
c; also, as pointed out above, \" and \# are interpreted as simply " and #, without their special
meanings).
No token may contain the NULL character (ASCII 0x00 / U+0000). Furthermore, although support is present
to create UTF-8 byte sequences, tokens are not required to be valid UTF-8 sequences. Any byte sequence
not containing the NULL character forms a valid token. However, there may be further restrictions on
allowed characters for a token in a particular situation, (for example, when used as a field name).
DIRECTIVES
There are eight reserved words, which cannot be used as field names in the dirfile. Instead, these
specify directives. All reserved words start with an initial forward slash (/), to distinguish them from
field names. Previous versions of the Standards permitted the omission of the slash. Like the rest of
the format specification, directives are case sensitive.
A number of the directives have fragment scope. A directive with fragment scope only applies to the
fragment in which it is present, plus any sub-fragments indicated by the /INCLUDE directive, but only if
those sub-fragments don't have their own corresponding directive. Directives which have fragment scope
are: /ENCODING, /ENDIAN, /FRAMEOFFSET, and /PROTECT. Because of these scoping rules, different portions
of the dirfile may have different encodings, endiannesses, frame offsets, or protection levels.
If a directive with fragment scope appears more than once in a fragment, only the last such directive
will be honoured, with the exception that the effect of a directive will not be propagated to sub-
fragments if the directive line appears after the sub-fragment is included. The scoping rules of the
remaining directives are discussed below.
/ENCODING
The /ENCODING directive specifies the encoding scheme used to encode binary files in the dirfile.
The encoding scheme may be one of the predefined names listed below, which are described in more
detail in dirfile-encoding(5), or any other site-specific encoding scheme. The predefined scheme
names are:
none The dirfile is unencoded.
bzip2 The dirfile is compressed using the bzip2 compression scheme.
gzip The dirfile is compressed using the gzip compression scheme.
lzma The dirfile is compressed using the LZMA compression scheme.
slim The dirfile is compressed using the slim compression scheme.
text The dirfile is text encoded.
Implementations should fail gracefully when encountering an unknown encoding scheme. If no
encoding scheme is specified, behaviour is implementation dependent. Syntax is:
/ENCODING <scheme>
The /ENCODING directive has fragment scope.
/ENDIAN
The /ENDIAN directive specifies the endianness of the raw data in the database. The assumed
endianness of raw data in dirfiles which omit this directive is implementation dependent. Syntax
is:
/ENDIAN ( big | little ) [ arm ]
where the "arm" token should be included if double precision floating point data are stored in the
ARM middle-endian format. The /ENDIAN directive has fragment scope.
/FRAMEOFFSET
The /FRAMEOFFSET directive specifies the frame number of the first frame for which data exists in
binary files associated with RAW fields. Syntax is:
/FRAMEOFFSET <integer>
The /FRAMEOFFSET directive has fragment scope.
/INCLUDE
The /INCLUDE directive specifies another file (called a fragment) to parse for additional format
specification for the dirfile. The inclusion is treated as if the lines of the fragment were
pasted verbatim in place of the INCLUDE directive line. The exception to this is that RAW fields
specified in the fragment are located in the directory containing the fragment and not in the
directory containing the parent fragment, and the binary file encoding may be different for each
fragment. The fragment may be specified either with an absolute path, or else a relative path
from the current file. Syntax is:
/INCLUDE <file>
The /INCLUDE directive has no scope: it is processed immediately and has no long-term effect.
/META The /META directive specifies a metafield attached to a particular parent field. The field
metadata may be of any allowed type except RAW. Metafields are retrieved in exactly the same way
as regular field data, but the field code specified consists of the parent and metafield names
joined with a forward slash:
<parent-field>/<meta-field>
META fields may not be specified before their parent field has been. Syntax is:
/META <parent-field> {field specification line}
As an illustration of this concept,
/META pfield meta CONST FLOAT64 3.291882
provides a scalar metadatum called meta with value 3.291882 attached to the field pfield. This
particular metafield may be referred to by the field code "pfield/meta". Note that different
parent fields may have metafields with the same name, since all references to metafields must
include the parent field name. Metafields may not themselves have further sub-metafields.
As an alternative to the /META directive, a metafield may be specified by a standard field
specification line, using
<parent-field>/<meta-field>
as the field name. That is, the above example metafield could have also been specified as:
pfield/meta CONST FLOAT64 3.291882
The /META directive has no scope: it is processed immediately and has no long-term effect.
/PROTECT
The /PROTECT directive specifies the advisory protection level of the current fragment and of the
RAW fields defined therein. The protection level indicates whether writing to the fragment, or
the binary data on disk is permitted. Syntax is:
/PROTECT <level>
Four advisory protection levels are defined:
none No protection at all: data and metadata may be freely changed. This is the default, if no
/PROTECT directive is present.
format The dirfile metadata is protected from change, but RAW data on disk may be modified.
data The RAW data on disk is protected from change, but metadata may be modified.
all Both metadata and data on disk are protected from change.
The /PROTECT directive has fragment scope.
/REFERENCE
The /REFERENCE directive specifies the name of the field to use as the dirfile's reference field
(see dirfile(5)). If no /REFERENCE directive is specified, the first RAW field encountered is
used as the reference field. The /REFERENCE directive must specify a RAW field. Syntax is:
/REFERENCE <field-code>
The /REFERENCE directive has global scope: if multiple /REFERENCE directives appear in the dirfile
metadata, only the last such will be honoured.
/VERSION
The /VERSION directive specifies the particular version of the Dirfile Standards to which the
dirfile format specification conforms. This directive should occur before any version dependent
syntax is encountered. As of Standards Version 6, no such syntax exists, and this directive is
provided primarily to ease forward compatibility. Syntax is:
/VERSION <integer>
The /VERSION directive has immediate scope: its effect is immediate, and it applies only to
metadata below it, including and propagating downwards to sub-fragments after the directive. Its
effect will also propagate upwards back to the parent fragment, and affect subsequent metadata.
FIELD SPECIFICATION LINES
Any line which does not start with a reserved word is assumed to be a field specification line. A field
specification line consists of at least two tokens. The first token is the field name. The second token
is the field type. Subsequent tokens are field parameters. The meaning and number these parameters
depends on the field type specified.
Field Names
The first token in a field specification line is the field name. The field name consists of one or more
characters, excluding both ASCII control characters (the bytes 0x01 through 0x1F), and the characters
& / ; < > | .
which are reserved (but see below for the use of / to specify metafields). The field name may not be
INDEX, which is a special, implicit field which contains the integer frame index. Field names are case
sensitive.
If the field name beginning a field specification line does contain a / character, the line is assumed to
specify a metafield. See the /META directive above for further details.
Field Types
There are thirteen field types. Of these, ten are of vector type (BIT, DIVIDE, LINCOM, LINTERP,
MULTIPLY, PHASE, POLYNOM, RAW, RECIP, and SBIT) and three are of scalar type (CONST, CARRAY, and STRING).
The possible fields types are:
BIT The BIT vector field type extracts one or more bits out of an input vector field as an unsigned
number. Syntax is:
<field-name> BIT <input> <first-bit> [<bits>]
which specifies field-name to be the value of bits first-bit through first-bit+bits-1 of the input
vector field input, when input is converted from its native type to an (endianness corrected)
unsigned 64-bit integer. If bits is omitted, it is assumed to be 1. Both first-bit and bits may
be either literal numbers, or else the field code of a CONST or CARRAY field type containing their
values. The SBIT field type is a signed version of this field type.
CARRAY The CARRAY scalar field type is a list of constants fully specified in the format specification
metadata. Syntax is:
<field-name> CARRAY <type> <value0> <value1> <value2> ...
where type may be any supported native data type (see the description of the RAW field type
below), and value0, value1, &c. are the values of successive elements in the scalar list
interpreted as indicated by type. No limit is placed on the number of elements in a CARRAY.
(Note: despite being multivalued, this is not considered a vector field since the elements of the
CARRAY are not indexed by frames.)
CONST The CONST scalar field type is a constant fully specified in the format specification metadata.
Syntax is:
<field-name> CONST <type> <value>
where type may be any supported native data type (see the description of the RAW field type
below), and value is the numerical value of the constant interpreted as indicated by type.
DIVIDE The DIVIDE vector field type is the quotient of two vector fields. Syntax is:
<field-name> DIVIDE <field1> <field1>
The derived field will be computed as:
field-name[n] = field1[n] / field2[n2]
with the index n2 computed appropriately for the (potentially differing) sample rates of the input
fields. The resultant field will have the same sample rate as field1.
LINCOM The LINCOM vector field type is the linear combination of one, two or three input vector fields.
Syntax is:
<field-name> LINCOM [<n>] <field1> <a1> <b1> [<field2> <a2> <b2> [<field3> <a3> <b3>]]
where n, if present, indicates the number of input vector fields (1, 2, or 3). The derived field
will be computed as:
field-name[n] = (a1 * field1[n] + b1) + (a2 * field2[n2] + b2) + (a3 * field3[n3] + b3)
with the field2 and field3 terms included only if specified and the indices n2 and n3 computed
appropriately for the (potentially differing) sample rates of the input fields. The resultant
field will have the same sample rate as field1. Each supplied co-efficient (a1, b1, a2, &c.) may
be either a literal number, or else the field code of a CONST or CARRAY field type containing its
value.
If n is not specified, the number of fields is determined by looking at the supplied parameters.
Since it is possible to create a field code which is identical to a literal number, the third
token on the line is assumed to be n if it the entire token can be parsed as a literal number
using the rules outlined in strtod(3). That is, if the field code specifying field1 could be
mistaken for a literal number, n must be specified to prevent ambiguity.
LINTERP
The LINTERP vector field type specifies a table look up based on another vector field. Syntax is:
<field-name> LINTERP <input> <table>
where input is the input vector field for the table lookup, and table is the path to the lookup
table file for the field. If this path is relative, it is assumed to be relative to the directory
containing the fragment defining this field. The lookup table file is an ASCII text file with two
whitespace separated columns of x and y values. Values are linearly interpolated between the
points specified in the lookup table.
MULTIPLY
The MULTIPLY vector field type is the product of two vector fields. Syntax is:
<field-name> MULTIPLY <field1> <field2>
The derived field will be computed as:
field-name[n] = field1[n] * field2[n2]
with the index n2 computed appropriately for the (potentially differing) sample rates of the input
fields. The resultant field will have the same sample rate as field1.
PHASE The PHASE vector field type shifts an input vector field by the specified number of samples.
Syntax is:
<field-name> PHASE <input> <shift>
which specifies field-name to be the input vector field, input, shifted by shift samples. A
positive shift indicates a forward shift, towards the end-of-field. Results of shifting past the
beginning- or end-of-field is implementation dependent. The shift parameter may be either a
literal number, or else the field code of a CONST or CARRAY field type containing its values.
POLYNOM
The POLYNOM vector field type specifies a polynomial function of a single input vector field.
Syntax is:
<field_name> POLYNOM <input> <a0> <a1> [<a2> [<a3> [<a4> [<a5>]]]]
where <input> is the input field code, and the order of the computed polynomial is determined by
how many co-efficients are present in the specification. The derived field is computed as:
field-name[n] = a0 + a1 * input[n] + a2 * input[n]**2 + a3 * input[n]**3 + a4 * input[n]**4
+ a5 * input[n]**5
where ** is the exponentiation operator, and the higher order terms are computed only if the
corresponding co-efficients ai are specified. The coefficients, if specified, may be either
literal numbers, or else the field code of a CONST or CARRAY field type containing the value.
RECIP The RECIP vector field type computes the reciprocal of a single input vector field. Syntax is:
<field_name> RECIP <input> <dividend>
where <input> is the input field code and <dividend> is a scalar quantity. The derived field is
computed as:
field-name[n] = dividend / input[n].
The dividend, if specified, may be either literal numbers, or else the field code of a CONST or
CARRAY field type containing the value.
RAW The RAW vector field type specifies raw time streams on disk. In this case, the field name should
correspond to the name of the file containing the time stream. Syntax is:
<field-name> RAW <type> <sample-rate>
where sample-rate is the number of samples per dirfile frame for the time stream and type is a
token specifying the native data format type:
UINT8 unsigned 8-bit integer
INT8 signed (two's complement) 8-bit integer
UINT16 unsigned 16-bit integer
INT16 signed (two's complement) 16-bit integer
UINT32 unsigned 32-bit integer
INT32 signed (two's complement) 32-bit integer
UINT64 unsigned 64-bit integer
INT64 signed (two's complement) 64-bit integer
FLOAT32 or FLOAT
IEEE-754 standard 32-bit single precision floating point number
FLOAT64 or DOUBLE
IEEE-754 standard 64-bit double precision floating point number
COMPLEX64
a 64-bit complex number consisting of two IEEE-754 standard 32-bit single precision
floating point numbers representing the real and imaginary parts of the complex
number.
COMPLEX128
a 128-bit complex number consisting of two IEEE-754 standard 64-bit double precision
floating point numbers representing the real and imaginary parts of the complex
number.
For more information on the storage of complex valued data, see dirfile(5).
For backwards compatibility, implementations should also recognise the following single character
type aliases in use prior to Standards Version 5:
c UINT8
u UINT16
s INT16
U UINT32
i, S INT32
f FLOAT32
d FLOAT64
Types INT8, UINT64, INT64, COMPLEX64, and COMPLEX128 are not supported before Standards Version 5,
so no single character type aliases exist for these types. Standards Version 8 removed support
for these single character type codes.
The sample-rate parameter may be either a literal number, or else the name of a CONST or CARRAY
field type containing its values.
SBIT The SBIT vector field type extracts one or more bits out of an input vector field as a signed
number. Syntax is:
<field-name> SBIT <input> <first-bit> [<bits>]
which specifies field-name to be the value of bits first-bit through first-bit+bits-1 of the input
vector field input, when input is converted from its native type to a (endianness corrected)
signed 64-bit integer. If bits is omitted, it is assumed to be 1. Both first-bit and bits may be
either literal numbers, or else the field code of a CONST or CARRAY field type containing their
values. The BIT field type is an unsigned version of this field type.
STRING The STRING scalar field type is a character string fully specified in the format file metadata.
Syntax is:
<field-name> STRING <value>
where value is the string value of the field. Note that value is a single token. To include
whitespace in the string, enclose value in quotation marks ("), or else escape the whitespace with
the backslash character (\).
Field Parameters
All input vector field parameters should be field codes (see below). Additionally, some of the numerical
field parameters may be either literal numbers or else the field code of a CONST field containing the
value, or the field code of a CARRAY followed by a left angle bracket (<), then an non-negative integer
used as the CARRAY element index, then a right angle bracket (>), that is:
field_code<n>
Parameters which allow non-literal values are indicated above. If the angle brackets and element index
are omitted from a CARRAY field code used as a parameter, the first element in the field (index zero) is
assumed.
Since it is possible to create a field code which is identical to a literal number, a parameter is
assumed to be the field code of a scalar field only if the entire token cannot be parsed as a literal
number using the rules outlined in strtod(3). For example, a CONST field whose field code consists
solely of digits can never be used as a parameter in a field specification line.
A literal complex number is specified as two real (floating point) numbers separated by a semicolon (;)
with no intervening whitespace. So, for example, the tokens
1;0 0;1 4;0 0;5 9.313e2;74.1
represent, respectively, the real unit, the imaginary unit, the real number four, the imaginary number
5i, and the complex number 931.3 + 74.1i. Because the semicolon character cannot be used in field names,
a complex valued literal can never be mistaken for a field code. This allows, among other things, the
composition of complex valued fields from purely real input fields. For example, a complex valued field,
z, may be created from a real valued field re, representing the real part of the complex number, and the
real valued field im, representing the imaginary part of the complex number, with the following LINCOM
specification:
z LINCOM re 1 0 im 0;1 0
Field Codes
When specifying the input to a field, either as a scalar parameter, or as an input vector field to a non-
RAW vector field, field codes are used. A field code is one of:
• a simple field name, indicating a vector or scalar field
• a parent field name, followed by a forward slash, followed by a metafield name, indicating a
metafield. See the description of the /META directive above for further details.
• either of the above, followed by a period, followed by a representation suffix, but only if the field
or metafield specified is not a STRING type field.
A representation suffix may be used used to extract a real number from a complex value. The available
suffixes and their meanings are:
.a This representation indicates the angle (in radians) between the positive real axis and the value
(ie. the complex argument). The argument is in the range [-pi, pi], and a branch cut exists along
the negative real axis. At the branch cut, -pi is returned if the imaginary part is -0, and pi is
returned if the imaginary part is +0. If z=0, zero is returned.
.i This representation indicates the projection of the value onto the imaginary axis (ie. the
imaginary part of the number).
.m This representation indicates the modulus of the value (ie. its absolute value).
.r This representation indicates the projection of the value onto the real axis (ie. the real part
of the number).
If the specified field is purely real, the representations are calculated as if the imaginary part was
equal to +0. For example, given a complex valued vector, z, a vector containing the real part of
z, re_z, could be produced with:
re_z PHASE z.r 0
and similarly for the complex field's imaginary part, argument, and absolute value. (Although it should
be pointed out this simplistic an example isn't strictly necessary, since z.r could be used wherever re_z
would be.)
STANDARDS VERSIONS
This document describes Version 8 of the Dirfile Standards.
Version 8 of the Standards (November 2010) added the DIVIDE, RECIP, and CARRAY field types, made the
forward slash on reserved words mandatory, and prohibited using the single character data type aliases in
the specification of RAW fields. It also introduced the optional second (arm) token to the /ENDIAN
directive.
Version 7 of the Standards (October 2009) added the SBIT and POLYNOM field types, and the directive-less
method of specifying metafields. It also introduced the data types COMPLEX128 and COMPLEX64, along with
the notion of representations. Finally, it made the number of fields parameter for LINCOM optional.
Version 6 of the Standards (October 2008) added the /ENCODING, /META, /PROTECT, and /REFERENCE
directives, and the CONST and STRING field types. It permitted whitespace in tokens and introduced the
character escape sequences. It allowed CONST fields to be used as parameters in field specification
lines. It also removed FILEFRAM as an alias for INDEX, and prohibited . but allowed # and \ in field
names.
Version 5 of the Standards (August 2008) added VERSION and ENDIAN, slash demarcation of reserved words,
and removed the restriction on field name length. It introduced the data types INT8, INT64, and UINT64,
the new-style type specifiers, and increased the range of the BIT field type from 32 to 64 bits. It also
prohibited the characters &;<>\| in field names.
Version 4 of the Standards (October 2006) added the PHASE field type.
Version 3 of the Standards (January 2006) added INCLUDE and increased the allowed length of a field name
from 16 to 50 characters.
Version 2 of the Standards (September 2005) added the MULTIPLY field type.
Version 1 of the Standards (November 2004) added FRAMEOFFSET and the optional fourth argument to the BIT
field type.
Version 0 of the Standards (before March 2003) refers to the dirfile standards supported by the
getdata(3) library originally introduced into the kst(1) sources, which contained support for all other
features covered by this document.
AUTHORS
The dirfile specification was developed by C. B. Netterfield <netterfield@astro.utoronto.ca>.
Since Standards Version 3, the dirfile specification has been maintained by D. V. Wiebe
<getdata@ketiltrout.net>.
SEE ALSO
dirfile(5), dirfile-encoding(5)
Standards Version 8 23 October 2010 dirfile-format(5)