Ubuntu Manpage: i.cluster - Generates spectral signatures for land cover types in an image using a

Provided by: grass-doc_8.2.0-1build1_all

NAME

       i.cluster   -  Generates  spectral  signatures  for  land  cover types in an image using a
       clustering algorithm.
       The resulting signature file is used as input for i.maxlik, to  generate  an  unsupervised
       image classification.

KEYWORDS

       imagery, classification, signatures

SYNOPSIS

       i.cluster
       i.cluster --help
       i.cluster   group=name   subgroup=name   signaturefile=name  classes=integer   [seed=name]
       [sample=rows,cols]     [iterations=integer]     [convergence=float]     [separation=float]
       [min_size=integer]    [reportfile=name]    [--overwrite]  [--help]  [--verbose]  [--quiet]
       [--ui]

   Flags:
       --overwrite
           Allow output files to overwrite existing files

       --help
           Print usage summary

       --verbose
           Verbose module output

       --quiet
           Quiet module output

       --ui
           Force launching GUI dialog

   Parameters:
       group=name [required]
           Name of input imagery group

       subgroup=name [required]
           Name of input imagery subgroup

       signaturefile=name [required]
           Name for output file containing result signatures

       classes=integer [required]
           Initial number of classes
           Options: 1-255

       seed=name
           Name of file containing initial signatures

       sample=rows,cols
           Number of rows and columns over which a sample pixel is taken

       iterations=integer
           Maximum number of iterations
           Default: 30

       convergence=float
           Percent convergence
           Options: 0-100
           Default: 98.0

       separation=float
           Cluster separation
           Default: 0.0

       min_size=integer
           Minimum number of pixels in a class
           Default: 17

       reportfile=name
           Name for output file containing final report

DESCRIPTION

i.cluster performs the first pass in the two-pass unsupervised classification of imagery,
while the GRASS module i.maxlik executes the second pass. Both commands must be run to
complete the unsupervised classification.

i.cluster is a clustering algorithm (a modification of the k-means clustering algorithm)
that reads through the (raster) imagery data and builds pixel clusters based on the
spectral reflectances of the pixels (see Figure). The pixel clusters are imagery
categories that can be related to land cover types on the ground. The spectral
distributions of the clusters (e.g., land cover spectral signatures) are influenced by six
parameters set by the user. A relevant parameter set by the user is the initial number of
clusters to be discriminated.

Fig.: Land use/land cover clustering of LANDSAT scene
(simplified)

i.cluster starts by generating spectral signatures for this number of clusters and
"attempts" to end up with this number of clusters during the clustering process. The
resulting number of clusters and their spectral distributions, however, are also
influenced by the range of the spectral values (category values) in the image files and
the other parameters set by the user. These parameters are: the minimum cluster size,
minimum cluster separation, the percent convergence, the maximum number of iterations, and
the row and column sampling intervals.

The cluster spectral signatures that result are composed of cluster means and covariance
matrices. These cluster means and covariance matrices are used in the second pass
(i.maxlik) to classify the image. The clusters or spectral classes result can be related
to land cover types on the ground. The user has to specify the name of group file, the
name of subgroup file, the name of a file to contain result signatures, the initial number
of clusters to be discriminated, and optionally other parameters (see below) where the
group should contain the imagery files that the user wishes to classify. The subgroup is
a subset of this group. The user must create a group and subgroup by running the GRASS
program i.group before running i.cluster. The subgroup should contain only the imagery
band files that the user wishes to classify. Note that this subgroup must contain more
than one band file. The purpose of the group and subgroup is to collect map layers for
classification or analysis. The signaturefile is the file to contain result signatures
which can be used as input for i.maxlik. The classes value is the initial number of
clusters to be discriminated; any parameter values left unspecified are set to their
default values.

For all raster maps used to generate signature file it is recommended to have semantic
label set. Use r.support to set semantc labels of each member of the imagery group.
Signatures generated for one scene are suitable for classification of other scenes as long
as they consist of same raster bands (semantic labels match). If semantic labels are not
set, it will be possible to use obtained signature file to classify only the same imagery
group used for generating signatures.

Parameters:
group=name
The name of the group file which contains the imagery files that the user wishes to
classify.

subgroup=name
The name of the subset of the group specified in group option, which must contain only
imagery band files and more than one band file. The user must create a group and a
subgroup by running the GRASS program i.group before running i.cluster.

signaturefile=name
The name assigned to output signature file which contains signatures of classes and
can be used as the input file for the GRASS program i.maxlik for an unsupervised
classification.

classes=value
The number of clusters that will initially be identified in the clustering process
before the iterations begin.

seed=name
The name of a seed signature file is optional. The seed signatures are signatures that
contain cluster means and covariance matrices which were calculated prior to the
current run of i.cluster. They may be acquired from a previously run of i.cluster or
from a supervised classification signature training site section (e.g., using the
signature file output by g.gui.iclass). The purpose of seed signatures is to optimize
the cluster decision boundaries (means) for the number of clusters specified.

sample=rows,cols
These numbers are optional with default values based on the size of the data set such
that the total pixels to be processed is approximately 10,000 (consider round up). The
smaller these numbers, the larger the sample size used to generate the signatures for
the classes defined.

iterations=value
This parameter determines the maximum number of iterations which is greater than the
number of iterations predicted to achieve the optimum percent convergence. The default
value is 30. If the number of iterations reaches the maximum designated by the user;
the user may want to rerun i.cluster with a higher number of iterations (see
reportfile).
Default: 30

convergence=value
A high percent convergence is the point at which cluster means become stable during
the iteration process. The default value is 98.0 percent. When clusters are being
created, their means constantly change as pixels are assigned to them and the means
are recalculated to include the new pixel. After all clusters have been created,
i.cluster begins iterations that change cluster means by maximizing the distances
between them. As these means shift, a higher and higher convergence is approached.
Because means will never become totally static, a percent convergence and a maximum
number of iterations are supplied to stop the iterative process. The percent
convergence should be reached before the maximum number of iterations. If the maximum
number of iterations is reached, it is probable that the desired percent convergence
was not reached. The number of iterations is reported in the cluster statistics in the
report file (see reportfile).
Default: 98.0

separation=value
This is the minimum separation below which clusters will be merged in the iteration
process. The default value is 0.0. This is an image-specific number (a "magic" number)
that depends on the image data being classified and the number of final clusters that
are acceptable. Its determination requires experimentation. Note that as the minimum
class (or cluster) separation is increased, the maximum number of iterations should
also be increased to achieve this separation with a high percentage of convergence
(see convergence).
Default: 0.0

min_size=value
This is the minimum number of pixels that will be used to define a cluster, and is
therefore the minimum number of pixels for which means and covariance matrices will be
calculated.
Default: 17

reportfile=name
The reportfile is an optional parameter which contains the result, i.e., the
statistics for each cluster. Also included are the resulting percent convergence for
the clusters, the number of iterations that was required to achieve the convergence,
and the separability matrix.

NOTES

Sampling method
i.cluster does not cluster all pixels, but only a sample (see parameter sample). The
result of that clustering is not that all pixels are assigned to a given cluster;
essentially, only signatures which are representative of a given cluster are generated.
When running i.cluster on the same data asking for the same number of classes, but with
different sample sizes, likely slightly different signatures for each cluster are obtained
at each run.

Algorithm used for i.cluster
The algorithm uses input parameters set by the user on the initial number of clusters, the
minimum distance between clusters, and the correspondence between iterations which is
desired, and minimum size for each cluster. It also asks if all pixels to be clustered, or
every "x"th row and "y"th column (sampling), the correspondence between iterations
desired, and the maximum number of iterations to be carried out.

In the 1st pass, initial cluster means for each band are defined by giving the first
cluster a value equal to the band mean minus its standard deviation, and the last cluster
a value equal to the band mean plus its standard deviation, with all other cluster means
distributed equally spaced in between these. Each pixel is then assigned to the class
which it is closest to, distance being measured as Euclidean distance. All clusters less
than the user-specified minimum distance are then merged. If a cluster has less than the
user-specified minimum number of pixels, all those pixels are again reassigned to the next
nearest cluster. New cluster means are calculated for each band as the average of raster
pixel values in that band for all pixels present in that cluster.

In the 2nd pass, pixels are then again reassigned to clusters based on new cluster means.
The cluster means are then again recalculated. This process is repeated until the
correspondence between iterations reaches a user-specified level, or till the maximum
number of iterations specified is over, whichever comes first.

EXAMPLE

       Preparing the statistics for unsupervised classification of a LANDSAT scene  within  North
       Carolina location:
       # Set computational region to match the scene
       g.region raster=lsat7_2002_10 -p
       # store VIZ, NIR, MIR into group/subgroup (leaving out TIR)
       i.group group=lsat7_2002 subgroup=res_30m \
         input=lsat7_2002_10,lsat7_2002_20,lsat7_2002_30,lsat7_2002_40,lsat7_2002_50,lsat7_2002_70
       # generate signature file and report
       i.cluster group=lsat7_2002 subgroup=res_30m \
         signaturefile=cluster_lsat2002 \
         classes=10 reportfile=rep_clust_lsat2002.txt
       To  complete  the unsupervised classification, i.maxlik is subsequently used.  See example
       in its manual page.

       The signature file obtained in the example  above  will  allow  to  classify  the  current
       imagery  group only (lsat7_2002).  If the user would like to re-use the signature file for
       the classification of different imagery group(s), they can set semantic  labels  for  each
       group member beforehand, i.e., before generating the signature files.  Semantic labels are
       set by means of r.support as shown below:
       # Define semantic labels for all LANDSAT bands
       r.support map=lsat7_2002_10 semantic_label=TM7_1
       r.support map=lsat7_2002_20 semantic_label=TM7_2
       r.support map=lsat7_2002_30 semantic_label=TM7_3
       r.support map=lsat7_2002_40 semantic_label=TM7_4
       r.support map=lsat7_2002_50 semantic_label=TM7_5
       r.support map=lsat7_2002_61 semantic_label=TM7_61
       r.support map=lsat7_2002_62 semantic_label=TM7_62
       r.support map=lsat7_2002_70 semantic_label=TM7_7
       r.support map=lsat7_2002_80 semantic_label=TM7_8

AUTHORS

       Michael Shapiro, U.S. Army Construction Engineering Research Laboratory
       Tao Wen, University of Illinois at Urbana-Champaign, Illinois
       Semantic label support: Maris Nartiss, University of Latvia

SOURCE CODE

       Available at: i.cluster source code (history)

       Accessed: Mon Jun 13 15:10:44 2022

       Main index | Imagery index | Topics index | Keywords index | Graphical index | Full index

       © 2003-2022 GRASS Development Team, GRASS GIS 8.2.0 Reference Manual