Ubuntu Manpage: nccopy - Copy a netCDF file, optionally changing format, compression, or chunking in the

NAME

       nccopy  -  Copy a netCDF file, optionally changing format, compression, or chunking in the
       output.

SYNOPSIS

       nccopy [-k  kind_name ] [-kind_code] [-d  n ] [-s] [-c   chunkspec  ]  [-u]  [-w]  [-[v|V]
              var1,...]   [-[g|G] grp1,...]  [-m  bufsize ] [-h  chunk_cache ] [-e  cache_elems ]
              [-r] [-F  filterspec ] [-L  n ] [-M  n ]  infile  outfile

DESCRIPTION

       The nccopy utility copies an input netCDF file in  any  supported  format  variant  to  an
       output  netCDF  file,  optionally  converting  the  output to any compatible netCDF format
       variant, compressing the data, or rechunking the data.  For example,  if  built  with  the
       netCDF-3  library,  a  netCDF  classic  file may be copied to a netCDF 64-bit offset file,
       permitting larger variables.  If built with the netCDF-4 library, a  netCDF  classic  file
       may  be  copied to a netCDF-4 file or to a netCDF-4 classic model file as well, permitting
       data compression, efficient schema changes,  larger  variable  sizes,  and  use  of  other
       netCDF-4 features.

       If  no output format is specified, with either -k kind_name or -kind_code, then the output
       will use the same format as the input, unless the input is classic or  64-bit  offset  and
       either  chunking  or  compression  is specified, in which case the output will be netCDF-4
       classic model format.  Attempting some kinds of format conversion will result in an error,
       if  the  conversion is not possible.  For example, an attempt to copy a netCDF-4 file that
       uses features of the enhanced model, such as groups or variable-length strings, to any  of
       the other kinds of netCDF formats that use the classic model will result in an error.

       nccopy  also  serves as an example of a generic netCDF-4 program, with its ability to read
       any valid netCDF file and handle nested groups, strings, and user-defined types, including
       arbitrarily  nested  compound types, variable-length types, and data of any valid netCDF-4
       type.

       If DAP support was enabled when nccopy was built, the file name may  specify  a  DAP  URL.
       This may be used to convert data on DAP servers to local netCDF files.

OPTIONS

-k kind_name
Use format name to specify the kind of file to be created and, by inference, the
data model (i.e. netcdf-3 (classic) or netcdf-4 (enhanced)). The possible
arguments are:

'nc3' or 'classic' => netCDF classic format

'nc6' or '64-bit offset' => netCDF 64-bit format

'nc4' or 'netCDF-4' => netCDF-4 format (enhanced data model)

'nc7' or 'netCDF-4 classic model' => netCDF-4 classic model format

Note: The old format numbers '1', '2', '3', '4', equivalent to the format names
'nc3', 'nc6', 'nc4', or 'nc7' respectively, are also still accepted but deprecated,
due to easy confusion between format numbers and format names.

[-kind_code]
Use format numeric code (instead of format name) to specify the kind of file to be
created and, by inference, the data model (i.e. netcdf-3 (classic) versus netcdf-4
(enhanced)). The numeric codes are:

3 => netcdf classic format

6 => netCDF 64-bit format

4 => netCDF-4 format (enhanced data model)

7 => netCDF-4 classic model format
The numeric code "7" is used because "7=3+4", specifying the format that uses the netCDF-3
data model for compatibility with the netCDF-4 storage format for performance. Credit is
due to NCO for use of these numeric codes instead of the old and confusing format numbers.

-d n
For netCDF-4 output, including netCDF-4 classic model, specify deflation level
(level of compression) for variable data output. 0 corresponds to no compression
and 9 to maximum compression, with higher levels of compression requiring
marginally more time to compress or uncompress than lower levels. As a side effect
specifying a compression level of 0 (via "-d 0") actually turns off deflation
altogether. Compression achieved may also depend on output chunking parameters.
If this option is specified for a classic format or 64-bit offset format input
file, it is not necessary to also specify that the output should be netCDF-4
classic model, as that will be the default. If this option is not specified and
the input file has compressed variables, the compression will still be preserved in
the output, using the same chunking as in the input by default.

Note that nccopy requires all variables to be compressed using the same compression
level, but the API has no such restriction. With a program you can customize
compression for each variable independently.

-s For netCDF-4 output, including netCDF-4 classic model, specify shuffling of
variable data bytes before compression or after decompression. Shuffling refers to
interlacing of bytes in a chunk so that the first bytes of all values are
contiguous in storage, followed by all the second bytes, and so on, which often
improves compression. This option is ignored unless a non-zero deflation level is
specified. Using -d0 to specify no deflation on input data that has been
compressed and shuffled turns off both compression and shuffling in the output.

-u Convert any unlimited size dimensions in the input to fixed size dimensions in the
output. This can speed up variable-at-a-time access, but slow down record-at-a-
time access to multiple variables along an unlimited dimension.

-w Keep output in memory (as a diskless netCDF file) until output is closed, at which
time output file is written to disk. This can greatly speedup operations such as
converting unlimited dimension to fixed size (-u option), chunking, rechunking, or
compressing the input. It requires that available memory is large enough to hold
the output file. This option may provide a larger speedup than careful tuning of
the -m, -h, or -e options, and it's certainly a lot simpler.

-c chunkspec
For netCDF-4 output, including netCDF-4 classic model, specify chunking
(multidimensional tiling) for variable data in the output. This is useful to
specify the units of disk access, compression, or other filters such as checksums.
Changing the chunking in a netCDF file can also greatly speedup access, by choosing
chunk shapes that are appropriate for the most common access patterns.

The chunkspec argument has several forms. The first form is the original,
deprecated form and is a string of comma-separated associations, each specifying a
dimension name, a '/' character, and optionally the corresponding chunk length for
that dimension. No blanks should appear in the chunkspec string, except possibly
escaped blanks that are part of a dimension name. A chunkspec names at least one
dimension, and may omit dimensions which are not to be chunked or for which the
default chunk length is desired. If a dimension name is followed by a '/'
character but no subsequent chunk length, the actual dimension length is assumed.
If copying a classic model file to a netCDF-4 output file and not naming all
dimensions in the chunkspec, unnamed dimensions will also use the actual dimension
length for the chunk length. An example of a chunkspec for variables that use 'm'
and 'n' dimensions might be 'm/100,n/200' to specify 100 by 200 chunks. To see the
chunking resulting from copying with a chunkspec, use the '-s' option of ncdump on
the output file.

The chunkspec '/' that omits all dimension names and corresponding chunk lengths
specifies that no chunking is to occur in the output, so can be used to unchunk all
the chunked variables. To see the chunking resulting from copying with a
chunkspec, use the '-s' option of ncdump on the output file.

As an I/O optimization, nccopy has a threshold for the minimum size of non-record
variables that get chunked, currently 8192 bytes. The -M flag can be used to
override this value.

Note that nccopy requires variables that share a dimension to also share the chunk
size associated with that dimension, but the programming interface has no such
restriction. If you need to customize chunking for variables independently, you
will need to use the second form of chunkspec. This second form of chunkspec has
this syntax: var:n1,n2,...,nn . This assumes that the variable named "var" has
rank n. The chunking to be applied to each dimension of the variable is specified
by the values of n1 through nn. This second form of chunking specification can be
repeated multiple times to specify the exact chunking for different variables. If
the variable is specified but no chunk sizes are specified (i.e. -c var: ) then
chunking is disabled for that variable. If the same variable is specified more
than once, the second and later specifications are ignored. Also, this second
form, per-variable chunking, takes precedence over any per-dimension chunking
except the bare "/" case.

The third form of the chunkspec has the syntax: var:compact or var:contiguous.
This explicitly attempts to set the variable storage type as compact or contiguous,
respectively. These may be overridden if other flags require the variable to be
chunked.

-v var1,...
The output will include data values for the specified variables, in addition to the
declarations of all dimensions, variables, and attributes. One or more variables
must be specified by name in the comma-delimited list following this option. The
list must be a single argument to the command, hence cannot contain unescaped
blanks or other white space characters. The named variables must be valid netCDF
variables in the input-file. A variable within a group in a netCDF-4 file may be
specified with an absolute path name, such as "/GroupA/GroupA2/var". Use of a
relative path name such as 'var' or "grp/var" specifies all matching variable names
in the file. The default, without this option, is to include data values for all
variables in the output.

-V var1,...
The output will include the specified variables only but all dimensions and global
or group attributes. One or more variables must be specified by name in the comma-
delimited list following this option. The list must be a single argument to the
command, hence cannot contain unescaped blanks or other white space characters. The
named variables must be valid netCDF variables in the input-file. A variable within
a group in a netCDF-4 file may be specified with an absolute path name, such as
'/GroupA/GroupA2/var'. Use of a relative path name such as 'var' or 'grp/var'
specifies all matching variable names in the file. The default, without this
option, is to include all variables in the output.

-g grp1,...
The output will include data values only for the specified groups. One or more
groups must be specified by name in the comma-delimited list following this option.
The list must be a single argument to the command. The named groups must be valid
netCDF groups in the input-file. The default, without this option, is to include
data values for all groups in the output.

-G grp1,...
The output will include only the specified groups. One or more groups must be
specified by name in the comma-delimited list following this option. The list must
be a single argument to the command. The named groups must be valid netCDF groups
in the input-file. The default, without this option, is to include all groups in
the output.

-m bufsize
An integer or floating-point number that specifies the size, in bytes, of the copy
buffer used to copy large variables. A suffix of K, M, G, or T multiplies the copy
buffer size by one thousand, million, billion, or trillion, respectively. The
default is 5 Mbytes, but will be increased if necessary to hold at least one chunk
of netCDF-4 chunked variables in the input file. You may want to specify a value
larger than the default for copying large files over high latency networks. Using
the '-w' option may provide better performance, if the output fits in memory.

-h chunk_cache
For netCDF-4 output, including netCDF-4 classic model, an integer or floating-point
number that specifies the size in bytes of chunk cache allocated for each chunked
variable. This is not a property of the file, but merely a performance tuning
parameter for avoiding compressing or decompressing the same data multiple times
while copying and changing chunk shapes. A suffix of K, M, G, or T multiplies the
chunk cache size by one thousand, million, billion, or trillion, respectively. The
default is 4.194304 Mbytes (or whatever was specified for the configure-time
constant CHUNK_CACHE_SIZE when the netCDF library was built). Ideally, the nccopy
utility should accept only one memory buffer size and divide it optimally between a
copy buffer and chunk cache, but no general algorithm for computing the optimum
chunk cache size has been implemented yet. Using the '-w' option may provide better
performance, if the output fits in memory.

-e cache_elems
For netCDF-4 output, including netCDF-4 classic model, specifies number of chunks
that the chunk cache can hold. A suffix of K, M, G, or T multiplies the number of
chunks that can be held in the cache by one thousand, million, billion, or
trillion, respectively. This is not a property of the file, but merely a
performance tuning parameter for avoiding compressing or decompressing the same
data multiple times while copying and changing chunk shapes. The default is 1009
(or whatever was specified for the configure-time constant CHUNK_CACHE_NELEMS when
the netCDF library was built). Ideally, the nccopy utility should determine an
optimum value for this parameter, but no general algorithm for computing the
optimum number of chunk cache elements has been implemented yet.

-r Read netCDF classic or 64-bit offset input file into a diskless netCDF file in
memory before copying. Requires that input file be small enough to fit into
memory. For nccopy, this doesn't seem to provide any significant speedup, so may
not be a useful option.

-L n Set the log level; only usable if nccopy supports netCDF-4 (enhanced).

-M n Set the minimum chunk size; only usable if nccopy supports netCDF-4 (enhanced).

-F filterspec
For netCDF-4 output, including netCDF-4 classic model, specify a filter to apply to
a specified set of variables in the output. As a rule, the filter is a
compression/decompression algorithm with a unique numeric identifier assigned by
the HDF Group (see https://support.hdfgroup.org/services/filters.html).

The filterspec argument has this general form.
fqn1|fqn2...,filterid,param1,param2...paramn or *,filterid,param1,param2...paramn
An fqn (fully qualified name) is the name of a variable prefixed by its containing groups
with the group names separated by forward slash ('/'). An example might be /g1/g2/var.
Alternatively, just the variable name can be given if it is in the root group: e.g. var.
Backslash escapes may be used as needed. A note of warning: the '|' separator is a bash
reserved character, so you will probably need to put the filter spec in some kind of
quotes or otherwise escape it.

The filterid is an unsigned positive integer representing the id assigned by the
HDFgroup to the filter. Following the id is a sequence of parameters defining the
operation of the filter. Each parameter is a 32-bit unsigned integer.

This parameter may be repeated multiple times with different variable names.

EXAMPLES

       Make  a  copy of foo1.nc, a netCDF file of any type, to foo2.nc, a netCDF file of the same
       type:

              nccopy foo1.nc foo2.nc

       Note that the above copy will not be as fast as use of cp or other  simple  copy  utility,
       because  the  file is copied using only the netCDF API.  If the input file has extra bytes
       after the end of the netCDF  data,  those  will  not  be  copied,  because  they  are  not
       accessible  through the netCDF interface.  If the original file was generated in "No fill"
       mode so that fill values are not stored for padding for data alignment,  the  output  file
       may have different padding bytes.

       Convert a netCDF-4 classic model file, compressed.nc, that uses compression, to a netCDF-3
       file classic.nc:

              nccopy -k classic compressed.nc classic.nc

       Note that 'nc3' could be used instead of 'classic'.

       Download the variable 'time_bnds' and its associated attributes from an OPeNDAP server and
       copy the result to a netCDF file named 'tb.nc':

              nccopy 'http://test.opendap.org/opendap/data/nc/sst.mnmean.nc.gz?time_bnds' tb.nc

       Note  that URLs that name specific variables as command-line arguments should generally be
       quoted, to avoid the shell interpreting special characters such as '?'.

       Compress all the variables in the input file foo.nc, a netCDF file of  any  type,  to  the
       output file bar.nc:

              nccopy -d1 foo.nc bar.nc

       If  foo.nc  was  a classic or 64-bit offset netCDF file, bar.nc will be a netCDF-4 classic
       model netCDF file, because the classic and 64-bit offset  format  variants  don't  support
       compression.   If  foo.nc was a netCDF-4 file with some variables compressed using various
       deflation levels, the output will also be a netCDF-4 file of the same type,  but  all  the
       variables, including any uncompressed variables in the input, will now use deflation level
       1.

       Assume the input data includes gridded variables that use time, lat, lon dimensions,  with
       1000  times  by 1000 latitudes by 1000 longitudes, and that the time dimension varies most
       slowly.  Also assume that users want quick access to data at all times for a small set  of
       lat-lon points.  Accessing data for 1000 times would typically require accessing 1000 disk
       blocks, which may be slow.

       Reorganizing the data into chunks on disk that have all the time in each chunk for  a  few
       lat  and  lon  coordinates  would  greatly speed up such access.  To chunk the data in the
       input file slow.nc, a netCDF file of any type, to the output file fast.nc, you could use;

              nccopy -c time/1000,lat/40,lon/40 slow.nc fast.nc

       to specify data chunks of 1000 times, 40 latitudes, and 40 longitudes.  If you had  enough
       memory  to  contain  the  output  file,  you  could  speed  up  the  rechunking  operation
       significantly by creating the output in memory before writing it to disk on  close  (using
       the -w flag):

              nccopy -w -c time/1000,lat/40,lon/40 slow.nc fast.nc
       Alternatively,  one  could  write  this  using  the  alternate, variable-specific chunking
       specification and assuming that times, lat, and lon are variables.

              nccopy -c time:1000 -c lat:40 -c lon:40 slow.nc fast.nc

Chunking Rules

The complete set of chunking rules is captured here. As a rough summary, these rules
preserve all chunking properties from the input file. These rules apply only when the
selected output format supports chunking, i.e. for the netcdf-4 variants.

The variable specific chunking specification should be obvious and translates directly to
the corresponding "nc_def_var_chunking" API call.

The original per-dimension, chunking specification requires some interpretation by nccopy.
The following rules are applied in the given order independently for each variable to be
copied from input to output. The rules are written assuming we are trying to determine the
chunking for a given output variable Vout that comes from an input variable Vin.

1. If there is no '-c' option that applies to a variable and the corresponding input
variable is contiguous or the input is some netcdf-3 variant, then let the netcdf-c
library make all chunking decisions.

2. For each dimension of Vout explicitly specified on the command line (using the '-c'
option), apply the chunking value for that dimension regardless of input format or
input properties.

3. For dimensions of Vout not named on the command line in a '-c' option, preserve
chunk sizes from the corresponding input variable, if it is chunked.

4. If Vin is contiguous, and none of its dimensions are named on the command line, and
chunking is not mandated by other options, then make Vout be contiguous.

5. If the input variable is contiguous (or is some netcdf-3 variant) and there are no
options requiring chunking, or the '/' special case for the '-c' option is
specified, then the output variable V is marked as contiguous.