Provided by:
libgetdata-tools_0.7.3-6_i386 
NAME
dirfile-format -- the dirfile database format specification file
DESCRIPTION
The dirfile format specification fully specifies the raw and derived
time streams and auxiliary information for a dirfile(5) database.
The format specification is contained in one or more case-sensitive
text files located in the dirfile tree. Each file is known as a
fragment. The primary fragment is the file called format located in
the base dirfile directory. This file may contain only part of the
format specification, and may reference other fragments (using the
/INCLUDE directive) containing further format specification. This
inclusion mechanism may be nested arbitrarily deep.
The explicit text encoding of these files is not specified by these
standards, but must be 7-bit ASCII compatible. Examples of acceptable
character encodings include all the ISO 8859 character sets (i.e.
Latin-1 through Latin-10, among others), as well as the UTF-8 encoding
of Unicode and UCS.
SYNTAX
The format specification is composed of field specification lines and
directive lines, optionally separated by blank lines or lines
containing only whitespace. Lines are separated by the line-feed
character (0x0A). Unless escaped (see below), the hash mark (#) is the
comment delimiter; the comment delimiter, and any text following it to
the end of the line, is ignored.
Tokens
Both field specification lines and directive lines consist of several
tokens separated by whitespace. Whitespace consists of one or more
whitespace characters. These are: space (0x20), horizontal tab (0x09),
vertical tab (0x0B), form-feed (0x0C), and carriage return (0x0D). The
first token of a directive line is always a reserved word. The first
token of a field specification line is never a reserved word. Any
amount of whitespace may precede the first token on a line.
Since tokens are separated by whitespace, to include a whitespace
character in a token, it must either escaped by preceding it by a
backslash character (\), or be replaced by a character escape sequence
(see below), or else the token must be enclosed in quotation marks (").
The quotation marks themselves will be stripped from the token. The
null-token (that is, the token consisting of zero characters) may be
specified by a pair of quotation marks with nothing between them ("").
To include a literal quotation mark in a token, it must be escaped
(\"). Similarly, a hash mark may be included in a token by including
it in a quoted token or else by escaping it (\#), otherwise the hash
mark will be understood as the comment delimiter.
It is a syntax error to have a line which contains unmatched quotation
marks, or in which the last character is the backslash character.
Several characters when escaped by a preceding backslash character are
interpreted as special characters in tokens. The character escape
sequences are:
\a an alert (bell) character (ASCII 0x07 / U+0007)
\b a backspace character (ASCII 0x08 / U+0008)
\e an escape character (ASCII 0x1B / U+001B)
\f a form-feed character (ASCII 0x0C / U+000C)
\n a line-feed character (ASCII 0x0A / U+000A)
\r a carriage return character (ASCII 0x0D / U+000D)
\t a horizontal tab character (ASCII 0x09 / U+0009)
\v a vertical tab character (ASCII 0x0B / U+000B)
\\ a backslash character (ASCII 0x5C / U+005C)
\ooo the single byte given by the octal number ooo.
\xhh the single byte given by the hexadecimal number hh.
\uhhhhhhh
the UTF-8 byte sequence encoding the Unicode code point
given by the hexadecimal number hhhhhhh.
Any other character which is escaped is interpreted as the character
itself. (i.e. \c is interpreted as c; also, as pointed out above, \"
and \# are interpreted as simply " and #, without their special
meanings).
No token may contain the NULL character (ASCII 0x00 / U+0000).
Furthermore, although support is present to create UTF-8 byte
sequences, tokens are not required to be valid UTF-8 sequences. Any
byte sequence not containing the NULL character forms a valid token.
However, there may be further restrictions on allowed characters for a
token in a particular situation, (for example, when used as a field
name).
DIRECTIVES
There are eight reserved words, which cannot be used as field names in
the dirfile. Instead, these specify directives. All reserved words
start with an initial forward slash (/), to distinguish them from field
names. Previous versions of the Standards permitted the omission of
the slash. Like the rest of the format specification, directives are
case sensitive.
A number of the directives have fragment scope. A directive with
fragment scope only applies to the fragment in which it is present,
plus any sub-fragments indicated by the /INCLUDE directive, but only if
those sub-fragments don't have their own corresponding directive.
Directives which have fragment scope are:
/ENCODING, /ENDIAN, /FRAMEOFFSET, and /PROTECT. Because of these
scoping rules, different portions of the dirfile may have different
encodings, endiannesses, frame offsets, or protection levels.
If a directive with fragment scope appears more than once in a
fragment, only the last such directive will be honoured, with the
exception that the effect of a directive will not be propagated to sub-
fragments if the directive line appears after the sub-fragment is
included. The scoping rules of the remaining directives are discussed
below.
/ENCODING
The /ENCODING directive specifies the encoding scheme used to
encode binary files in the dirfile. The encoding scheme may be
one of the predefined names listed below, which are described in
more detail in dirfile-encoding(5), or any other site-specific
encoding scheme. The predefined scheme names are:
none The dirfile is unencoded.
bzip2 The dirfile is compressed using the bzip2 compression
scheme.
gzip The dirfile is compressed using the gzip compression
scheme.
lzma The dirfile is compressed using the LZMA compression
scheme.
slim The dirfile is compressed using the slim compression
scheme.
text The dirfile is text encoded.
Implementations should fail gracefully when encountering an
unknown encoding scheme. If no encoding scheme is specified,
behaviour is implementation dependent. Syntax is:
/ENCODING <scheme>
The /ENCODING directive has fragment scope.
/ENDIAN
The /ENDIAN directive specifies the endianness of the raw data
in the database. The assumed endianness of raw data in dirfiles
which omit this directive is implementation dependent. Syntax
is:
/ENDIAN ( big | little ) [ arm ]
where the "arm" token should be included if double precision
floating point data are stored in the ARM middle-endian format.
The /ENDIAN directive has fragment scope.
/FRAMEOFFSET
The /FRAMEOFFSET directive specifies the frame number of the
first frame for which data exists in binary files associated
with RAW fields. Syntax is:
/FRAMEOFFSET <integer>
The /FRAMEOFFSET directive has fragment scope.
/INCLUDE
The /INCLUDE directive specifies another file (called a
fragment) to parse for additional format specification for the
dirfile. The inclusion is treated as if the lines of the
fragment were pasted verbatim in place of the INCLUDE directive
line. The exception to this is that RAW fields specified in the
fragment are located in the directory containing the fragment
and not in the directory containing the parent fragment, and the
binary file encoding may be different for each fragment. The
fragment may be specified either with an absolute path, or else
a relative path from the current file. Syntax is:
/INCLUDE <file>
The /INCLUDE directive has no scope: it is processed immediately
and has no long-term effect.
/META The /META directive specifies a metafield attached to a
particular parent field. The field metadata may be of any
allowed type except RAW. Metafields are retrieved in exactly
the same way as regular field data, but the field code specified
consists of the parent and metafield names joined with a forward
slash:
<parent-field>/<meta-field>
META fields may not be specified before their parent field has
been. Syntax is:
/META <parent-field> {field specification line}
As an illustration of this concept,
/META pfield meta CONST FLOAT64 3.291882
provides a scalar metadatum called meta with value 3.291882
attached to the field pfield. This particular metafield may be
referred to by the field code "pfield/meta". Note that
different parent fields may have metafields with the same name,
since all references to metafields must include the parent field
name. Metafields may not themselves have further sub-
metafields.
As an alternative to the /META directive, a metafield may be
specified by a standard field specification line, using
<parent-field>/<meta-field>
as the field name. That is, the above example metafield could
have also been specified as:
pfield/meta CONST FLOAT64 3.291882
The /META directive has no scope: it is processed immediately
and has no long-term effect.
/PROTECT
The /PROTECT directive specifies the advisory protection level
of the current fragment and of the RAW fields defined therein.
The protection level indicates whether writing to the fragment,
or the binary data on disk is permitted. Syntax is:
/PROTECT <level>
Four advisory protection levels are defined:
none No protection at all: data and metadata may be freely
changed. This is the default, if no /PROTECT directive
is present.
format The dirfile metadata is protected from change, but RAW
data on disk may be modified.
data The RAW data on disk is protected from change, but
metadata may be modified.
all Both metadata and data on disk are protected from change.
The /PROTECT directive has fragment scope.
/REFERENCE
The /REFERENCE directive specifies the name of the field to use
as the dirfile's reference field (see dirfile(5)). If no
/REFERENCE directive is specified, the first RAW field
encountered is used as the reference field. The /REFERENCE
directive must specify a RAW field. Syntax is:
/REFERENCE <field-code>
The /REFERENCE directive has global scope: if multiple
/REFERENCE directives appear in the dirfile metadata, only the
last such will be honoured.
/VERSION
The /VERSION directive specifies the particular version of the
Dirfile Standards to which the dirfile format specification
conforms. This directive should occur before any version
dependent syntax is encountered. As of Standards Version 6, no
such syntax exists, and this directive is provided primarily to
ease forward compatibility. Syntax is:
/VERSION <integer>
The /VERSION directive has immediate scope: its effect is
immediate, and it applies only to metadata below it, including
and propagating downwards to sub-fragments after the directive.
Its effect will also propagate upwards back to the parent
fragment, and affect subsequent metadata.
FIELD SPECIFICATION LINES
Any line which does not start with a reserved word is assumed to be a
field specification line. A field specification line consists of at
least two tokens. The first token is the field name. The second token
is the field type. Subsequent tokens are field parameters. The
meaning and number these parameters depends on the field type
specified.
Field Names
The first token in a field specification line is the field name. The
field name consists of one or more characters, excluding both ASCII
control characters (the bytes 0x01 through 0x1F), and the characters
& / ; < > | .
which are reserved (but see below for the use of / to specify
metafields). The field name may not be INDEX, which is a special,
implicit field which contains the integer frame index. Field names are
case sensitive.
If the field name beginning a field specification line does contain a /
character, the line is assumed to specify a metafield. See the /META
directive above for further details.
Field Types
There are thirteen field types. Of these, ten are of vector type (BIT,
DIVIDE, LINCOM, LINTERP, MULTIPLY, PHASE, POLYNOM, RAW, RECIP, and
SBIT) and three are of scalar type (CONST, CARRAY, and STRING). The
possible fields types are:
BIT The BIT vector field type extracts one or more bits out of an
input vector field as an unsigned number. Syntax is:
<field-name> BIT <input> <first-bit> [<bits>]
which specifies field-name to be the value of bits first-bit
through first-bit+bits-1 of the input vector field input, when
input is converted from its native type to an (endianness
corrected) unsigned 64-bit integer. If bits is omitted, it is
assumed to be 1. Both first-bit and bits may be either literal
numbers, or else the field code of a CONST or CARRAY field type
containing their values. The SBIT field type is a signed
version of this field type.
CARRAY The CARRAY scalar field type is a list of constants fully
specified in the format specification metadata. Syntax is:
<field-name> CARRAY <type> <value0> <value1> <value2> ...
where type may be any supported native data type (see the
description of the RAW field type below), and value0, value1,
&c. are the values of successive elements in the scalar list
interpreted as indicated by type. No limit is placed on the
number of elements in a CARRAY. (Note: despite being
multivalued, this is not considered a vector field since the
elements of the CARRAY are not indexed by frames.)
CONST The CONST scalar field type is a constant fully specified in the
format specification metadata. Syntax is:
<field-name> CONST <type> <value>
where type may be any supported native data type (see the
description of the RAW field type below), and value is the
numerical value of the constant interpreted as indicated by
type.
DIVIDE The DIVIDE vector field type is the quotient of two vector
fields. Syntax is:
<field-name> DIVIDE <field1> <field1>
The derived field will be computed as:
field-name[n] = field1[n] / field2[n2]
with the index n2 computed appropriately for the (potentially
differing) sample rates of the input fields. The resultant
field will have the same sample rate as field1.
LINCOM The LINCOM vector field type is the linear combination of one,
two or three input vector fields. Syntax is:
<field-name> LINCOM [<n>] <field1> <a1> <b1> [<field2>
<a2> <b2> [<field3> <a3> <b3>]]
where n, if present, indicates the number of input vector fields
(1, 2, or 3). The derived field will be computed as:
field-name[n] = (a1 * field1[n] + b1) + (a2 * field2[n2]
+ b2) + (a3 * field3[n3] + b3)
with the field2 and field3 terms included only if specified and
the indices n2 and n3 computed appropriately for the
(potentially differing) sample rates of the input fields. The
resultant field will have the same sample rate as field1. Each
supplied co-efficient (a1, b1, a2, &c.) may be either a literal
number, or else the field code of a CONST or CARRAY field type
containing its value.
If n is not specified, the number of fields is determined by
looking at the supplied parameters. Since it is possible to
create a field code which is identical to a literal number, the
third token on the line is assumed to be n if it the entire
token can be parsed as a literal number using the rules outlined
in strtod(3). That is, if the field code specifying field1
could be mistaken for a literal number, n must be specified to
prevent ambiguity.
LINTERP
The LINTERP vector field type specifies a table look up based on
another vector field. Syntax is:
<field-name> LINTERP <input> <table>
where input is the input vector field for the table lookup, and
table is the path to the lookup table file for the field. If
this path is relative, it is assumed to be relative to the
directory containing the fragment defining this field. The
lookup table file is an ASCII text file with two whitespace
separated columns of x and y values. Values are linearly
interpolated between the points specified in the lookup table.
MULTIPLY
The MULTIPLY vector field type is the product of two vector
fields. Syntax is:
<field-name> MULTIPLY <field1> <field2>
The derived field will be computed as:
field-name[n] = field1[n] * field2[n2]
with the index n2 computed appropriately for the (potentially
differing) sample rates of the input fields. The resultant
field will have the same sample rate as field1.
PHASE The PHASE vector field type shifts an input vector field by the
specified number of samples. Syntax is:
<field-name> PHASE <input> <shift>
which specifies field-name to be the input vector field, input,
shifted by shift samples. A positive shift indicates a forward
shift, towards the end-of-field. Results of shifting past the
beginning- or end-of-field is implementation dependent. The
shift parameter may be either a literal number, or else the
field code of a CONST or CARRAY field type containing its
values.
POLYNOM
The POLYNOM vector field type specifies a polynomial function of
a single input vector field. Syntax is:
<field_name> POLYNOM <input> <a0> <a1>
[<a2> [<a3> [<a4> [<a5>]]]]
where <input> is the input field code, and the order of the
computed polynomial is determined by how many co-efficients are
present in the specification. The derived field is computed as:
field-name[n] = a0 + a1 * input[n] + a2 * input[n]**2 +
a3 * input[n]**3 + a4 * input[n]**4 + a5 * input[n]**5
where ** is the exponentiation operator, and the higher order
terms are computed only if the corresponding co-efficients ai
are specified. The coefficients, if specified, may be either
literal numbers, or else the field code of a CONST or CARRAY
field type containing the value.
RECIP The RECIP vector field type computes the reciprocal of a single
input vector field. Syntax is:
<field_name> RECIP <input> <dividend>
where <input> is the input field code and <dividend> is a scalar
quantity. The derived field is computed as:
field-name[n] = dividend / input[n].
The dividend, if specified, may be either literal numbers, or
else the field code of a CONST or CARRAY field type containing
the value.
RAW The RAW vector field type specifies raw time streams on disk.
In this case, the field name should correspond to the name of
the file containing the time stream. Syntax is:
<field-name> RAW <type> <sample-rate>
where sample-rate is the number of samples per dirfile frame for
the time stream and type is a token specifying the native data
format type:
UINT8 unsigned 8-bit integer
INT8 signed (two's complement) 8-bit integer
UINT16 unsigned 16-bit integer
INT16 signed (two's complement) 16-bit integer
UINT32 unsigned 32-bit integer
INT32 signed (two's complement) 32-bit integer
UINT64 unsigned 64-bit integer
INT64 signed (two's complement) 64-bit integer
FLOAT32 or FLOAT
IEEE-754 standard 32-bit single precision floating
point number
FLOAT64 or DOUBLE
IEEE-754 standard 64-bit double precision floating
point number
COMPLEX64
a 64-bit complex number consisting of two IEEE-754
standard 32-bit single precision floating point
numbers representing the real and imaginary parts
of the complex number.
COMPLEX128
a 128-bit complex number consisting of two
IEEE-754 standard 64-bit double precision floating
point numbers representing the real and imaginary
parts of the complex number.
For more information on the storage of complex valued data, see
dirfile(5).
For backwards compatibility, implementations should also
recognise the following single character type aliases in use
prior to Standards Version 5:
c UINT8
u UINT16
s INT16
U UINT32
i, S INT32
f FLOAT32
d FLOAT64
Types INT8, UINT64, INT64, COMPLEX64, and COMPLEX128 are not
supported before Standards Version 5, so no single character
type aliases exist for these types. Standards Version 8 removed
support for these single character type codes.
The sample-rate parameter may be either a literal number, or
else the name of a CONST or CARRAY field type containing its
values.
SBIT The SBIT vector field type extracts one or more bits out of an
input vector field as a signed number. Syntax is:
<field-name> SBIT <input> <first-bit> [<bits>]
which specifies field-name to be the value of bits first-bit
through first-bit+bits-1 of the input vector field input, when
input is converted from its native type to a (endianness
corrected) signed 64-bit integer. If bits is omitted, it is
assumed to be 1. Both first-bit and bits may be either literal
numbers, or else the field code of a CONST or CARRAY field type
containing their values. The BIT field type is an unsigned
version of this field type.
STRING The STRING scalar field type is a character string fully
specified in the format file metadata. Syntax is:
<field-name> STRING <value>
where value is the string value of the field. Note that value
is a single token. To include whitespace in the string, enclose
value in quotation marks ("), or else escape the whitespace with
the backslash character (\).
Field Parameters
All input vector field parameters should be field codes (see below).
Additionally, some of the numerical field parameters may be either
literal numbers or else the field code of a CONST field containing the
value, or the field code of a CARRAY followed by a left angle bracket
(<), then an non-negative integer used as the CARRAY element index,
then a right angle bracket (>), that is:
field_code<n>
Parameters which allow non-literal values are indicated above. If the
angle brackets and element index are omitted from a CARRAY field code
used as a parameter, the first element in the field (index zero) is
assumed.
Since it is possible to create a field code which is identical to a
literal number, a parameter is assumed to be the field code of a scalar
field only if the entire token cannot be parsed as a literal number
using the rules outlined in strtod(3). For example, a CONST field
whose field code consists solely of digits can never be used as a
parameter in a field specification line.
A literal complex number is specified as two real (floating point)
numbers separated by a semicolon (;) with no intervening whitespace.
So, for example, the tokens
1;0 0;1 4;0 0;5 9.313e2;74.1
represent, respectively, the real unit, the imaginary unit, the real
number four, the imaginary number 5i, and the complex number 931.3 +
74.1i. Because the semicolon character cannot be used in field names,
a complex valued literal can never be mistaken for a field code. This
allows, among other things, the composition of complex valued fields
from purely real input fields. For example, a complex valued field, z,
may be created from a real valued field re, representing the real part
of the complex number, and the real valued field im, representing the
imaginary part of the complex number, with the following LINCOM
specification:
z LINCOM re 1 0 im 0;1 0
Field Codes
When specifying the input to a field, either as a scalar parameter, or
as an input vector field to a non-RAW vector field, field codes are
used. A field code is one of:
o a simple field name, indicating a vector or scalar field
o a parent field name, followed by a forward slash, followed by a
metafield name, indicating a metafield. See the description of the
/META directive above for further details.
o either of the above, followed by a period, followed by a
representation suffix, but only if the field or metafield specified
is not a STRING type field.
A representation suffix may be used used to extract a real number from
a complex value. The available suffixes and their meanings are:
.a This representation indicates the angle (in radians) between the
positive real axis and the value (ie. the complex argument).
The argument is in the range [-pi, pi], and a branch cut exists
along the negative real axis. At the branch cut, -pi is
returned if the imaginary part is -0, and pi is returned if the
imaginary part is +0. If z=0, zero is returned.
.i This representation indicates the projection of the value onto
the imaginary axis (ie. the imaginary part of the number).
.m This representation indicates the modulus of the value (ie. its
absolute value).
.r This representation indicates the projection of the value onto
the real axis (ie. the real part of the number).
If the specified field is purely real, the representations are
calculated as if the imaginary part was equal to +0. For example,
given a complex valued vector, z, a vector containing the real part of
z, re_z, could be produced with:
re_z PHASE z.r 0
and similarly for the complex field's imaginary part, argument, and
absolute value. (Although it should be pointed out this simplistic an
example isn't strictly necessary, since z.r could be used wherever re_z
would be.)
STANDARDS VERSIONS
This document describes Version 8 of the Dirfile Standards.
Version 8 of the Standards (November 2010) added the DIVIDE, RECIP, and
CARRAY field types, made the forward slash on reserved words mandatory,
and prohibited using the single character data type aliases in the
specification of RAW fields. It also introduced the optional second
(arm) token to the /ENDIAN directive.
Version 7 of the Standards (October 2009) added the SBIT and POLYNOM
field types, and the directive-less method of specifying metafields.
It also introduced the data types COMPLEX128 and COMPLEX64, along with
the notion of representations. Finally, it made the number of fields
parameter for LINCOM optional.
Version 6 of the Standards (October 2008) added the
/ENCODING, /META, /PROTECT, and /REFERENCE directives, and the CONST
and STRING field types. It permitted whitespace in tokens and
introduced the character escape sequences. It allowed CONST fields to
be used as parameters in field specification lines. It also removed
FILEFRAM as an alias for INDEX, and prohibited . but allowed # and \
in field names.
Version 5 of the Standards (August 2008) added VERSION and ENDIAN,
slash demarcation of reserved words, and removed the restriction on
field name length. It introduced the data types INT8, INT64, and
UINT64, the new-style type specifiers, and increased the range of the
BIT field type from 32 to 64 bits. It also prohibited the characters
&;<>\| in field names.
Version 4 of the Standards (October 2006) added the PHASE field type.
Version 3 of the Standards (January 2006) added INCLUDE and increased
the allowed length of a field name from 16 to 50 characters.
Version 2 of the Standards (September 2005) added the MULTIPLY field
type.
Version 1 of the Standards (November 2004) added FRAMEOFFSET and the
optional fourth argument to the BIT field type.
Version 0 of the Standards (before March 2003) refers to the dirfile
standards supported by the getdata(3) library originally introduced
into the kst(1) sources, which contained support for all other features
covered by this document.
AUTHORS
The dirfile specification was developed by C. B. Netterfield
<netterfield@astro.utoronto.ca>.
Since Standards Version 3, the dirfile specification has been
maintained by D. V. Wiebe <getdata@ketiltrout.net>.
SEE ALSO
dirfile(5), dirfile-encoding(5)