Ubuntu Manpage: cfa - create aggregated CF datasets

Provided by: python-cf_1.3.2+dfsg1-4_amd64

NAME

       cfa - create aggregated CF datasets

SYNOPSIS

       cfa [-d dir] [-f format] [-h] [-i] [-n] [-o file] [-u] [-v] [-x] [OPTIONS] INPUTS

DESCRIPTION

The cfa tool creates and writes to disk the CF fields contained in files contained in the
INPUTS (which may include directories if the --recursive option is set).

Accepts CF-netCDF and CFA-netCDF files (or URLs if DAP access is enabled), Met Office (UK)
PP files and Met Office (UK) fields files as input. Multiple input files in a mixture of
formats may be given and normal UNIX file globbing rules apply.

Output files are in CF-netCDF or CFA-netCDF format (see the -f option). Both output types
are available in netCDF3 and netCDF4 formats. Note that the netCDF3 formats are generally
slower to write than the netCDF4 formats, by several orders of magnitude if files with
many data variables are involved. However, not all software can read netCDF4, so it is
advisable to check before writing in this format.

By default the contents of each input file is aggregated (i.e. combined) into as few
multi-dimensional CF fields as possible. Unaggregatable fields in the input files may be
omitted from the output (see the -x option). Information on which fields are
unaggregatable, and why, may be displayed (see the --info option). All aggregation may be
turned off with the -n option, in which case all input fields are output without
modification.

See the AGGREGATION section for details on the aggregation process and unaggregatable
fields.

By default one output file is created per input file. In this case there is no inter-file
aggregation and the contents of each file is aggregated independently of the others.
Output file names are created by removing the suffix .pp, .nc or .nca, if there is one,
from each input file name and then adding a new suffix of .nc or .nca for CF-netCDF and
CFA-netCDF output formats respectively. If the -d option is set then all output files will
be written to the specified directory, otherwise each output file will be written to the
same directory as its input file.

Alternatively, all of the input files may be treated collectively as a single CF dataset
and written to a single output file (see the -o option). In this case aggregation is
attempted within and between the input files.

An error occurs if an output file has the same full name as any of the input files or any
other output file.

AGGREGATION

       Aggregation of input fields into as few multi-dimensional CF fields as possible is carried
       out   according   to   the   aggregation   rules   documented    in    CF    ticket    #78
       (http://kitt.llnl.gov/trac/ticket/78).  For  each  input  field,  the  aggregation process
       creates a structural signature which is essentially a subset of the metadata of the field,
       including  coordinate  metadata  and  other domain information, but which contains no data
       values. The structural signature accounts for the following standard CF properties:

              add_offset,  calendar,   cell_methods,   _FillValue,   flag_masks,   flag_meanings,
              flag_values, missing_value, scale_factor, standard_error_multiplier, standard_name,
              units, valid_max, valid_min, valid_range

       Aggregation is then attempted on  each  group  of  fields  with  the  same,  well  defined
       structural  signature,  and  will  succeed  where  the coordinate data values imply a safe
       combination into a single dataset.

       Not all fields are aggregatable. Unaggregatable fields are those without  a  well  defined
       structural  signature;  or  those  with the same structural signature when at least two of
       them 1) can't be unambiguously distinguished by coordinates or other domain information or
       2) contain coordinate reference fields or ancillary variable fields which themselves can't
       be unambiguously aggregated.

EXAMPLES

       Create a new netCDF3 classic file containing the aggregatable fields in all of  the  input
       files:

              cfa -o newfile.nc *.nc

       Create,  in  an existing directory and overwriting any existing files, new netCDF3 classic
       files containing the aggregatable fields in each input file:

              cfa -d directory --overwrite *.pp

       Create a new netCDF4 file containing all fields in all of the input files:

              cfa -f NETCDF4 -o newfile.nc *.nc

       Create a new CFA-netCDF4 file containing all fields in all of the input  files  and  allow
       long names or netCDF variable names to identify fields and their components:

              cfa -i -f CFA4 -o newfile.nc *.nc

OPTIONS

--axis=property
Aggregation configuration: Create a new axis for each input field which has given
property. If an input field has the property then, prior to aggregation, a new axis
is created with an auxiliary coordinate whose data array is the property's value.
This allows for the possibility of aggregation along the new axis. The property
itself is deleted from that field. No axis is created for input fields which do not
have the specified property.

Multiple axes may be created by specifying more than one --axis option.

For example, if you wish to aggregate an ensemble of model experiments that are
distinguished by the source property, you can use --axis=source to create an
ensemble axis which has an auxiliary coordinate variable containing the source
property values.

--cfa_base=[value]
For output CFA-netCDF files only. File names referenced by an output CFA-netCDF
file have relative, as opposed to absolute, paths or URL bases. This may be useful
when relocating a CFA-netCDF file together with the datasets referenced by it.

If set with no value (--cfa_base=) or the value is empty then file names are given
relative to the directory or URL base containing the output CFA-netCDF file. If set
with a non-empty value then file names are given relative to the directory or URL
base described by the value.

By default, file names within CFA-netCDF files are stored with absolute paths.
Ignored for output files of any other format.

--compress=N

Regulate the speed and efficiency of compression. Must be an integer between 0 and
9. By default N is 0, meaning no compression; 1 is the fastest, but has the lowest
compression ratio; 9 is the slowest but best compression ratio.

--contiguous
Aggregation configuration: Requires that aggregated fields have adjacent dimension
coordinate cells which partially overlap or share common boundary values. Ignored
if the dimension coordinates do not have bounds.

-d dir, --directory=dir
Specify the output directory for all output files.

--double
Write 32-bit floats as 64-bit floats and 32-bit integers as 64-bit integers. By
default, input data types are preserved.

--equal=property
Aggregation configuration: Require that an input field may only be aggregated with
other fields if they all have the given CF property (standard or non-standard) with
equal values. Ignored for any input field which does not have this property, or if
the property is already accounted for in the structural signature.

Supersedes the behaviour for the given property that may be implied by the
--exist_all option.

Multiple properties may be set by specifying more than one --equal option.

--equal_all
Aggregation configuration: Require that an input field may only be aggregated with
other fields that have the same set of CF properties (excluding those already
accounted for in the structural signature) with equal sets of values.

The behaviour for individual properties may be overridden by the --exist --ignore
options.

For example, to insist that a group of aggregated input fields must all have the
same CF properties (other than those accounted for in the structural signature)
with matching values, but allowing the long_name properties have unequal values,
you can use --equal_all --exist=long_name

--exist=property
Aggregation configuration: Require that an input field may only be aggregated with
other fields if they all have the given CF property (standard or non-standard), but
not requiring the values to be the same. Ignored for any input field which does not
have this property, or if the property is already accounted for in the structural
signature.

Supersedes the behaviour for the given property that may be implied by the
--equal_all option.

Multiple properties may be set by specifying more than one --exist option.

--exist_all
Aggregation configuration: Require that an input field may only be aggregated with
other fields that have the same set of CF properties (excluding those already
accounted for in the structural signature), but not requiring the values to be the
same.

The behaviour for individual properties may be overridden by the --equal --ignore
options.

For example, to insist that a group of aggregated input fields must all have the
same CF properties (other than those accounted for in the structural signature),
regardless of their values, but also insisting that the long_name properties have
equal values, you can use --exist_all --equal=long_name

-f format, --format=format
Set the format of the output file(s). Valid choices are NETCDF3_CLASSIC,
NETCDF3_64BIT, NETCDF4, NETCDF4_CLASSIC and NETCDF3_64BIT for outputting CF-netCDF
files in those netCDF formats and CFA3 or CFA4 for outputting CFA-netCDF files in
NETCDF3_CLASSIC or NETCDF4 formats respectively. By default, NETCDF3_CLASSIC is
assumed.

Note that the netCDF3 formats are generally slower to write than the netCDF4
formats, by several orders of magnitude if files with many data variables are
involved. However, not all software can read netCDF4, so it is advisable to check
before writing in this format.

-h, --help
Display this man page.

-i, --relaxed_identities
Aggregation configuration: In the absence of standard names, allow fields and their
components (such as coordinates) to be identified by their long_name CF properties
or else their netCDF file variable names.

--ignore=property
Aggregation configuration: An input field may be aggregated with other fields
regardless of whether or not they have the given CF property (standard or non-
standard) and regardless of its values. Ignored for any input field which does not
have this property, or if the property is already accounted for in the structural
signature.

This is the default behaviour in the absence of all the --exist --equal --exist_all
--equal_all options and supersedes the behaviour for the given property that may be
implied if any of these options are set.

Multiple properties may be set by specifying more than one --ignore option.

For example, to insist that a group of aggregated input fields must all have the
same CF properties (other than those accounted for in the structural signature)
with the same values, but with no restrictions on the existence or values of the
long_name property you can use --equal_all --ignore=long_name

--fletcher32
Activate the Fletcher-32 HDF5 checksum algorithm to detect compression errors.
Ignored if there is no compression (see the --compress option).

--follow_symlinks
In combination with --recursive also search for files in directories which resolve
to symbolic links. Files specified by the INPUTS which are symbolic links are
always followed. Note that setting --recursive --follow_symlinks can lead to
infinite recursion if a directory which resolves to a symbolic link points to a
parent directory of itself.

--ignore_read_error
Ignore, without failing, any input file which causes an error whilst being read, as
would be the case for an empty file, unknown file format, etc. By default an error
occurs in this case.

--info=N
Aggregation configuration: Print information about the aggregation process. If N is
0 then no information is displayed. If N is 1 or more then display information on
which fields are unaggregatable, and why. If N is 2 or more then display the field
structural signatures and, when there is more than one field with the same
structural signature, their canonical first and last coordinate values. If N is 3
or more then display the field complete aggregation metadata.

By default N is 0.

--least_sig_digit=N
Truncate the input field data arrays. For a positive integer N the precision that
is retained in the compressed data is '10 to the power -N'. For example, if N is 2
then a precision of 0.01 is retained. In conjunction with compression this produces
'lossy', but significantly more efficient compression (see the --compress option).

--ncvar_identities
Aggregation configuration: Force fields and their components (such as coordinates)
to be identified by their netCDF file variable names.

-n, --no_aggregation
Aggregation configuration: Do not aggregate fields. Writes the input fields as they
exist in the input files.

--no_overlap
Aggregation configuration: Requires that aggregated fields have adjacent dimension
coordinate cells which do not overlap (but they may share common boundary values).
Ignored if the dimension coordinates do not have bounds.

--no_shuffle
Turn off the HDF5 shuffle filter, which de-interlaces a block of data before
compression by reordering the bytes by storing the first byte of all of a
variable's values in the chunk contiguously, followed by all the second bytes, and
so on. By default the filter is applied because if the data array values are not
all wildly different, using the filter can make the data more easily compressible.
Ignored if there is no compression (see the --compress option).

-o file, --outfile=file
Treat all input files collectively as a single CF dataset. In this case aggregation
is attempted within and between the input files and all outputs are written to the
specified file.

--overwrite
Allow pre-existing output files to be overwritten.

--promote=component
Promote field components to independent top-level fields. If component is ancillary
then ancillary data fields are promoted. If component is auxiliary then auxiliary
coordinate variables are promoted. If component is measure then cell meausure
variables are promoted. If component is reference then fields pointed to from
formula_terms attributes are promoted. If component is field then all component
fields are promoted.

Multiple conponent types may be promoted by specifying more than one --promote
option.

For example, promote to ancillary data field and cell measure variables to
independent, top-level fields you can use --promote=ancillary --promote=measure

--recursive
Allow directories to be specified by the INPUTS and recursively search the
directories for actual files to read. Set the --ignore_read_error option to bypass
any unreadable files and the --follow_symlinks option to allow directories to be
symbolic links.

--reference_datetime=datetime
Set the reference date-time of time coordinate units to an ISO 8601-like date-time.
Changing the reference date-time does not change the absolute date-times of the
coordinates. Ignored for non-reference date-time coordinates. Some examples of
valid date-times: 1830-12-1, "1830-12-09 2:34:45Z".

--respect_valid
Aggregation configuration: Take into account the CF properties valid_max, valid_min
and valid_range during aggregation. By default they are ignored for the purposes of
aggregation and deleted from any aggregated output CF fields.

--shared_nc_domain
Aggregation configuration: Match axes between a field and its contained ancillary
variable and coordinate reference fields via their netCDF dimension names and not
via their domains.

--single
Write 64-bit floats as 32-bit floats and 64-bit integers as 32-bit integers. By
default, input data types are preserved.

--squeeze
Remove size 1 axes from the output field data arrays. If a size one axis has any
one dimensional coordinates then these are converted to CF scalar coordinates.

-u, --relaxed_units
Aggregation configuration: Assume that fields or their components (such as
coordinates) with the same standard name (or other identifiers, see the -i option)
but missing units all have equivalent (but unspecified) units, so that aggregation
may occur. This is the default for Met Office (UK) PP files and Met Office (UK)
fields files, but not for other formats.

--unsqueeze
Include size 1 axes in the output field data arrays. If a size one axis has any CF
scalar coordinates then these are converted to one dimensional coordinates.

--um_version=version
For Met Office (UK) PP files and Met Office (UK) fields files only, the Unified
Model (UM) version to be used when decoding the header. Valid versions are, for
example, 4.2, 6.6.3 and 8.2. The default version is 4.5. In general, the given
version is ignored if it can be inferred from the header (which is usually the case
for files created by the UM at versions 5.3 and later). The exception to this is
when the given version has a third element (such as the 3 in 6.6.3), in which case
any version in the header is ignored. This option is ignored for input files which
are not Met Office (UK) PP files or Met Office (UK) fields files.

--unlimited=axis
Create an unlimited dimension (a dimension that can be appended to). A dimension is
identified by either a standard name; one of T, Z, Y, X denoting time, height or
horixontal axes (as defined by the CF conventions); or the value of an arbitrary CF
property preceded by the property name and a colon. For example:

Multiple unlimited axes may be defined by specifying more than one --unlimited
option. Note, however, that only netCDF4 formats support multiple unlimited
dimensions. For example, to set the time and Z dimensions to be unlimited you could
use --unlimited=time --unlimited=Z

An example of defining an axis by an arbitrary CF property could be
--unlimited=long_name:pseudo_level

-v, --verbose
Display a one-line summary of each output CF field.

-x, --exclude
Aggregation configuration: Omit unaggregatable fields from the output. Ignored if
the -n option is set. See the AGGREGATION section for the definition of an
unaggregatable field.

LIBRARY

       cf-python library version 1.3.1

BUGS

       Reports of bugs are welcome at http://cfpython.bitbucket.org/

LICENSE

       Open Source Initiative MIT License

AUTHOR

       David Hassell