Provided by: gdal-bin_3.12.0+dfsg-1_amd64 bug

NAME

       gdal-vector-partition - Partition a vector dataset into multiple files

       Added in version 3.12.

SYNOPSIS

          Usage: gdal vector partition [OPTIONS] <INPUT> <OUTPUT>

          Partition a vector dataset into multiple files.

          Positional arguments:
            -i, --input <INPUT>                                  Input vector datasets [required]
            -o, --output <OUTPUT>                                Output directory [required]

          Common Options:
            -h, --help                                           Display help message and exit
            --json-usage                                         Display usage as JSON document and exit
            --config <KEY>=<VALUE>                               Configuration option [may be repeated]
            -q, --quiet                                          Quiet mode (no progress bar)

          Options:
            --overwrite                                          Whether overwriting existing output is allowed
                                                                 Mutually exclusive with --append
            --append                                             Whether appending to existing layer is allowed
                                                                 Mutually exclusive with --overwrite
            -f, --of, --format, --output-format <OUTPUT-FORMAT>  Output format
            --co, --creation-option <KEY>=<VALUE>                Creation option [may be repeated]
            --lco, --layer-creation-option <KEY>=<VALUE>         Layer creation option [may be repeated]
            --field <FIELD>                                      Field(s) on which to partition [may be repeated] [required]
            --scheme <SCHEME>                                    Partitioning scheme. SCHEME=hive|flat (default: hive)
            --pattern <PATTERN>                                  Filename pattern ('part_%010d' for scheme=hive, '{LAYER_NAME}_{FIELD_VALUE}_%010d' for scheme=flat)
            --feature-limit <FEATURE-LIMIT>                      Maximum number of features per file
            --max-file-size <MAX-FILE-SIZE>                      Maximum file size (MB or GB suffix can be used)
            --omit-partitioned-field                             Whether to omit partitioned fields from target layer definition
            --skip-errors                                        Skip errors when writing features

          Advanced Options:
            --if, --input-format <INPUT-FORMAT>                  Input formats [may be repeated]
            --oo, --open-option <KEY>=<VALUE>                    Open options [may be repeated]

DESCRIPTION

       gdal  vector partition dispatches features into different files, depending on the values the feature take
       on a subset of fields specified by the user.

       Two partitioning schemes are available:

       • hive, corresponding to Apache Hive partitioning, is the default one.

         Each partitioning field corresponds  to  a  nested  directory.  Let's  consider  a  layer  with  fields
         "continent"  and "country", chosen as partitioning fields.  All features where "continent" evaluates to
         "Europe" and "country" evaluates to "France", will be written in the "continent=Europe/country=France/"
         subdirectory of the output directory.

         NULL values for partitioning fields are encoded as __HIVE_DEFAULT_PARTITION__ in  the  directory  name.
         Non-ASCII  characters,  space, equal sign, or characters not compatible with directory name constraints
         are percent-encoded (e.g. %20 for space).

       • flat where files are written directly under the output directory using a default  filename  pattern  of
         {LAYER_NAME}_{FIELD_VALUE}_%10d.

       By  default, the format of the input dataset will be used for the output, if it can be determined and the
       input driver supports writing. Otherwise, --format must be used.

       gdal vector partition can be used as the last step of a pipeline.

       The following options are available:

   Standard options
       --output <OUTPUT-DIRECTORY>
              Root of the output directory. [required]

       --field <FIELD-NAME>
              Fields(s) on which to partition. [required]

              Only fields of type String, Integer and Integer64 are allowed.  The order into  which  fields  are
              specified matter to determine the directory hierarchy.

       -f, --of, --format, --output-format <OUTPUT-FORMAT>
              Which  output  vector format to use. Allowed values may be given by gdal --formats | grep vector |
              grep rw | sort

       --co <NAME>=<VALUE>
              Many formats have one or more optional dataset creation  options  that  can  be  used  to  control
              particulars  about the file created. For instance, the GeoPackage driver supports creation options
              to control the version.

              May be repeated.

              The dataset creation options available vary by format driver, and  some  simple  formats  have  no
              creation options at all. A list of options supported for a format can be listed with the --formats
              command  line  option but the documentation for the format is the definitive source of information
              on driver creation options.  See Vector drivers format specific documentation for  legal  creation
              options for each format.

              Note that dataset creation options are different from layer creation options.

       --layer-creation-option <NAME>=<VALUE>
              Many  formats  have  one  or  more  optional  layer  creation  options that can be used to control
              particulars about the layer created. For instance, the GeoPackage driver supports  layer  creation
              options  to  control  the  feature  identifier  or geometry column name, setting the identifier or
              description, etc.

              May be repeated.

              The layer creation options available vary by format driver, and some simple formats have no  layer
              creation options at all. A list of options supported for a format can be listed with the --formats
              command  line  option but the documentation for the format is the definitive source of information
              on driver creation options.  See Vector drivers format specific documentation for  legal  creation
              options for each format.

              Note that layer creation options are different from dataset creation options.

       --overwrite
              Allow  program  to  overwrite existing target file or dataset.  Otherwise, by default, gdal errors
              out if the target file or dataset already exists.

       --append
              Whether the output directory must be opened in append mode. Implies that  it  already  exists  and
              that the output format supports appending.

              This mode is useful when adding new features to an already an existing partitioned dataset.

       --scheme hive|flat
              Partitioning scheme. Defaults to hive.

       --pattern <PATTERN>
              Filename pattern. User chosen string, with substitutions for:

              • {LAYER_NAME}, when found, is substituted with the layer name (percent encoded where needed).

              • {FIELD_VALUE},  when  found,  is  substituted with the partitioning field value (percent encoded
                where needed). If several partitioning fields are used, each value is  separated  by  underscore
                (_). Empty strings are substituted with __EMPTY__ and null fields with __NULL__.

              • %[0?][0-9]?[0]?d:  C-style  integer formatter for the part number.  Valid values are for example
                %d or %05d.  One and only one part number specifier must be present in the pattern.

              Default   values   for   the   pattern   are    part_%010d    for    the    hive    scheme,    and
              {LAYER_NAME}_{FIELD_VALUE}_%010d for the flat scheme.`

       --feature-limit <FEATURE-LIMIT>
              Maximum  number  of  features  per  file. By default, unlimited. If the limit is exceeded, several
              parts are created.

       --max-file-size <MAX-FILE-SIZE>
              Maximum file size (MB or GB suffix can be used). By default, unlimited.  If the limit is exceeded,
              several parts are created.

              Note that the maximum file size is used as a hint, and might not be  strictly  respected,  because
              the  evaluation  of the file size corresponding to a feature is based on a heuristics, as the file
              size itself cannot be reliably used when it is under writing. In particular, the  heuristics  does
              not  assume  any  compression,  so  for  compressed  formats,  the  actual  size  of a part can be
              significantly smaller than the specified limit.

       --omit-partitioned-field
              Whether to omit partitioned fields from  the  target  layer  definition.   Automatically  set  for
              Parquet output format and Hive partitioning.

       --skip-errors
              Whether failures to write feature(s) should be ignored. Note that this option sets the size of the
              transaction  unit  to  one  feature at a time, which may cause severe slowdown when inserting into
              databases.

   Advanced options
       --oo <NAME>=<VALUE>
              Dataset open option (format specific).

              May be repeated.

       --if <format>
              Format/driver name to be attempted to open the input file(s). It is  generally  not  necessary  to
              specify  it,  but  it  can be used to skip automatic driver detection, when it fails to select the
              appropriate driver.  This option can be  repeated  several  times  to  specify  several  candidate
              drivers.   Note  that  it  does  not  force those drivers to open the dataset. In particular, some
              drivers have requirements on file extensions.

              May be repeated.

EXAMPLES

   Example 1: Create a partition based on the "continent" and "country" fields
          $ gdal vector partition world_cities.gpkg out_directory --field continent,country --format Parquet

   Example 2: Create a partition based on the "country" field, filtering on cities with population bigger than 1
       million, with a flat partitioning scheme
          $ gdal pipeline ! read world_cities.gpkg ! filter --where "pop > 1e6" ! partition out_directory --field country --format GPKG --scheme flat

AUTHOR

       Even Rouault <even.rouault@spatialys.com>

COPYRIGHT

       1998-2025

                                                  Nov 07, 2025                          GDAL-VECTOR-PARTITION(1)