Provided by: grass-doc_8.3.0-1_all bug

NAME

       i.cluster   -  Generates  spectral  signatures  for  land  cover types in an image using a
       clustering algorithm.
       The resulting signature file is used as input for i.maxlik, to  generate  an  unsupervised
       image classification.

KEYWORDS

       imagery, classification, signatures

SYNOPSIS

       i.cluster
       i.cluster --help
       i.cluster   group=name   subgroup=name   signaturefile=name  classes=integer   [seed=name]
       [sample=rows,cols]     [iterations=integer]     [convergence=float]     [separation=float]
       [min_size=integer]    [reportfile=name]    [--overwrite]  [--help]  [--verbose]  [--quiet]
       [--ui]

   Flags:
       --overwrite
           Allow output files to overwrite existing files

       --help
           Print usage summary

       --verbose
           Verbose module output

       --quiet
           Quiet module output

       --ui
           Force launching GUI dialog

   Parameters:
       group=name [required]
           Name of input imagery group

       subgroup=name [required]
           Name of input imagery subgroup

       signaturefile=name [required]
           Name for output file containing result signatures

       classes=integer [required]
           Initial number of classes
           Options: 1-255

       seed=name
           Name of file containing initial signatures

       sample=rows,cols
           Number of rows and columns over which a sample pixel is taken

       iterations=integer
           Maximum number of iterations
           Default: 30

       convergence=float
           Percent convergence
           Options: 0-100
           Default: 98.0

       separation=float
           Cluster separation
           Default: 0.0

       min_size=integer
           Minimum number of pixels in a class
           Default: 17

       reportfile=name
           Name for output file containing final report

DESCRIPTION

       i.cluster performs the first pass in the two-pass unsupervised classification of  imagery,
       while  the  GRASS  module i.maxlik executes the second pass.  Both commands must be run to
       complete the unsupervised classification.

       i.cluster is a clustering algorithm (a modification of the k-means  clustering  algorithm)
       that  reads  through  the  (raster)  imagery  data  and builds pixel clusters based on the
       spectral reflectances of  the  pixels  (see  Figure).   The  pixel  clusters  are  imagery
       categories  that  can  be  related  to  land  cover  types  on  the  ground.  The spectral
       distributions of the clusters (e.g., land cover spectral signatures) are influenced by six
       parameters  set by the user. A relevant parameter set by the user is the initial number of
       clusters to be discriminated.

       Fig.:  Land  use/land  cover  clustering  of  LANDSAT  scene
       (simplified)

       i.cluster  starts  by  generating  spectral  signatures  for  this  number of clusters and
       "attempts" to end up with this number of clusters  during  the  clustering  process.   The
       resulting  number  of  clusters  and  their  spectral  distributions,  however,  are  also
       influenced by the range of the spectral values (category values) in the  image  files  and
       the  other  parameters  set by the user.  These parameters are:  the minimum cluster size,
       minimum cluster separation, the percent convergence, the maximum number of iterations, and
       the row and column sampling intervals.

       The  cluster  spectral signatures that result are composed of cluster means and covariance
       matrices.  These cluster means and  covariance  matrices  are  used  in  the  second  pass
       (i.maxlik)  to classify the image.  The clusters or spectral classes result can be related
       to land cover types on the ground.  The user has to specify the name of  group  file,  the
       name of subgroup file, the name of a file to contain result signatures, the initial number
       of clusters to be discriminated, and optionally other parameters  (see  below)  where  the
       group  should contain the imagery files that the user wishes to classify.  The subgroup is
       a subset of this group.  The user must create a group and subgroup by  running  the  GRASS
       program  i.group  before  running i.cluster.  The subgroup should contain only the imagery
       band files that the user wishes to classify.  Note that this subgroup  must  contain  more
       than  one  band  file.  The purpose of the group and subgroup is to collect map layers for
       classification or analysis. The signaturefile is the file  to  contain  result  signatures
       which  can  be  used  as  input  for i.maxlik.  The classes value is the initial number of
       clusters to be discriminated; any parameter values  left  unspecified  are  set  to  their
       default values.

       For  all  raster  maps  used to generate signature file it is recommended to have semantic
       label set.  Use r.support to set semantc labels of  each  member  of  the  imagery  group.
       Signatures generated for one scene are suitable for classification of other scenes as long
       as they consist of same raster bands (semantic labels match). If semantic labels  are  not
       set,  it will be possible to use obtained signature file to classify only the same imagery
       group used for generating signatures.

   Parameters:
       group=name
           The name of the group file which contains the imagery files that the  user  wishes  to
           classify.

       subgroup=name
           The name of the subset of the group specified in group option, which must contain only
           imagery band files and more than one band file. The user must create  a  group  and  a
           subgroup by running the GRASS program i.group before running i.cluster.

       signaturefile=name
           The  name  assigned  to output signature file which contains signatures of classes and
           can be used as the input file for the  GRASS  program  i.maxlik  for  an  unsupervised
           classification.

       classes=value
           The  number  of  clusters  that will initially be identified in the clustering process
           before the iterations begin.

       seed=name
           The name of a seed signature file is optional. The seed signatures are signatures that
           contain  cluster  means  and  covariance  matrices  which were calculated prior to the
           current run of i.cluster. They may be acquired from a previously run of  i.cluster  or
           from  a  supervised  classification  signature  training site section (e.g., using the
           signature file output by g.gui.iclass).  The purpose of seed signatures is to optimize
           the cluster decision boundaries (means) for the number of clusters specified.

       sample=rows,cols
           These  numbers are optional with default values based on the size of the data set such
           that the total pixels to be processed is approximately 10,000 (consider round up). The
           smaller  these numbers, the larger the sample size used to generate the signatures for
           the classes defined.

       iterations=value
           This parameter determines the maximum number of iterations which is greater  than  the
           number of iterations predicted to achieve the optimum percent convergence. The default
           value is 30. If the number of iterations reaches the maximum designated by  the  user;
           the  user  may  want  to  rerun  i.cluster  with  a  higher  number of iterations (see
           reportfile).
           Default: 30

       convergence=value
           A high percent convergence is the point at which cluster means  become  stable  during
           the  iteration  process.   The default value is 98.0 percent.  When clusters are being
           created, their means constantly change as pixels are assigned to them  and  the  means
           are  recalculated  to  include  the  new pixel.  After all clusters have been created,
           i.cluster begins iterations that change cluster  means  by  maximizing  the  distances
           between  them.   As  these means shift, a higher and higher convergence is approached.
           Because means will never become totally static, a percent convergence  and  a  maximum
           number  of  iterations  are  supplied  to  stop  the  iterative  process.  The percent
           convergence should be reached before the maximum number of iterations. If the  maximum
           number  of  iterations is reached, it is probable that the desired percent convergence
           was not reached. The number of iterations is reported in the cluster statistics in the
           report file (see reportfile).
           Default: 98.0

       separation=value
           This  is  the  minimum separation below which clusters will be merged in the iteration
           process. The default value is 0.0. This is an image-specific number (a "magic" number)
           that  depends on the image data being classified and the number of final clusters that
           are acceptable. Its determination requires experimentation. Note that as  the  minimum
           class  (or  cluster)  separation is increased, the maximum number of iterations should
           also be increased to achieve this separation with a  high  percentage  of  convergence
           (see convergence).
           Default: 0.0

       min_size=value
           This  is  the  minimum  number of pixels that will be used to define a cluster, and is
           therefore the minimum number of pixels for which means and covariance matrices will be
           calculated.
           Default: 17

       reportfile=name
           The  reportfile  is  an  optional  parameter  which  contains  the  result,  i.e., the
           statistics for each cluster. Also included are the resulting percent  convergence  for
           the  clusters,  the number of iterations that was required to achieve the convergence,
           and the separability matrix.

NOTES

   Sampling method
       i.cluster does not cluster all pixels, but only  a  sample  (see  parameter  sample).  The
       result  of  that  clustering  is  not  that  all  pixels  are assigned to a given cluster;
       essentially, only signatures which are representative of a given  cluster  are  generated.
       When  running  i.cluster  on the same data asking for the same number of classes, but with
       different sample sizes, likely slightly different signatures for each cluster are obtained
       at each run.

   Algorithm used for i.cluster
       The algorithm uses input parameters set by the user on the initial number of clusters, the
       minimum distance between clusters, and the  correspondence  between  iterations  which  is
       desired, and minimum size for each cluster. It also asks if all pixels to be clustered, or
       every "x"th row  and  "y"th  column  (sampling),  the  correspondence  between  iterations
       desired, and the maximum number of iterations to be carried out.

       In  the  1st  pass,  initial  cluster  means for each band are defined by giving the first
       cluster a value equal to the band mean minus its standard deviation, and the last  cluster
       a  value  equal to the band mean plus its standard deviation, with all other cluster means
       distributed equally spaced in between these. Each pixel is  then  assigned  to  the  class
       which  it  is closest to, distance being measured as Euclidean distance. All clusters less
       than the user-specified minimum distance are then merged. If a cluster has less  than  the
       user-specified minimum number of pixels, all those pixels are again reassigned to the next
       nearest cluster. New cluster means are calculated for each band as the average  of  raster
       pixel values in that band for all pixels present in that cluster.

       In  the 2nd pass, pixels are then again reassigned to clusters based on new cluster means.
       The cluster means are then  again  recalculated.   This  process  is  repeated  until  the
       correspondence  between  iterations  reaches  a  user-specified level, or till the maximum
       number of iterations specified is over, whichever comes first.

EXAMPLE

       Preparing the statistics for unsupervised classification of a LANDSAT scene  within  North
       Carolina location:
       # Set computational region to match the scene
       g.region raster=lsat7_2002_10 -p
       # store VIZ, NIR, MIR into group/subgroup (leaving out TIR)
       i.group group=lsat7_2002 subgroup=res_30m \
         input=lsat7_2002_10,lsat7_2002_20,lsat7_2002_30,lsat7_2002_40,lsat7_2002_50,lsat7_2002_70
       # generate signature file and report
       i.cluster group=lsat7_2002 subgroup=res_30m \
         signaturefile=cluster_lsat2002 \
         classes=10 reportfile=rep_clust_lsat2002.txt
       To  complete  the unsupervised classification, i.maxlik is subsequently used.  See example
       in its manual page.

       The signature file obtained in the example  above  will  allow  to  classify  the  current
       imagery  group only (lsat7_2002).  If the user would like to re-use the signature file for
       the classification of different imagery group(s), they can set semantic  labels  for  each
       group member beforehand, i.e., before generating the signature files.  Semantic labels are
       set by means of r.support as shown below:
       # Define semantic labels for all LANDSAT bands
       r.support map=lsat7_2002_10 semantic_label=TM7_1
       r.support map=lsat7_2002_20 semantic_label=TM7_2
       r.support map=lsat7_2002_30 semantic_label=TM7_3
       r.support map=lsat7_2002_40 semantic_label=TM7_4
       r.support map=lsat7_2002_50 semantic_label=TM7_5
       r.support map=lsat7_2002_61 semantic_label=TM7_61
       r.support map=lsat7_2002_62 semantic_label=TM7_62
       r.support map=lsat7_2002_70 semantic_label=TM7_7
       r.support map=lsat7_2002_80 semantic_label=TM7_8

SEE ALSO

           •   Image classification wiki page

           •   Historical reference also the GRASS GIS 4 Image Processing manual (PDF)

           •   Wikipedia article on k-means clustering (note that i.cluster uses  a  modification
               of the k-means clustering algorithm)

        r.support, g.gui.iclass, i.group, i.gensig, i.maxlik, i.segment, i.smap, r.kappa

AUTHORS

       Michael Shapiro, U.S. Army Construction Engineering Research Laboratory
       Tao Wen, University of Illinois at Urbana-Champaign, Illinois
       Semantic label support: Maris Nartiss, University of Latvia

SOURCE CODE

       Available at: i.cluster source code (history)

       Accessed: Tuesday Jun 27 11:14:13 2023

       Main index | Imagery index | Topics index | Keywords index | Graphical index | Full index

       © 2003-2023 GRASS Development Team, GRASS GIS 8.3.0 Reference Manual