Provided by: libunicode-linebreak-perl_0.0.20110501-1build1_amd64 bug

NAME

       Unicode::GCString - String as Sequence of UAX #29 Grapheme Clusters

SYNOPSIS

           use Unicode::GCString;
           $gcstring = Unicode::GCString->new($string);

DESCRIPTION

       Unicode::GCString treats Unicode string as a sequence of extended grapheme clusters
       defined by Unicode Standard Annex #29 [UAX #29].

       Grapheme cluster is a sequence of Unicode character(s) that consists of one grapheme base
       and optional grapheme extender and/or XprependX character.  It is close in that people
       consider as character.

   Public Interface
       Constructors

       new (STRING, [LINEBREAK])
           Constructor.  Create new grapheme cluster string (Unicode::GCString object) from
           Unicode string STRING.  Optional Unicode::LineBreak object LINEBREAK controls breaking
           features.

       copy
           Copy constructor.  Create a copy of grapheme cluster string.  Next position of new
           string is set at beginning.

       Sizes

       chars
           Instance method.  Returns number of Unicode characters grapheme cluster string
           includes, i.e. length as Unicode string.

       columns
           Instance method.  Returns total number of columns of grapheme clusters defined by
           built-in character database.  For more details see "DESCRIPTION" in
           Unicode::LineBreak.

       length
           Instance method.  Returns number of grapheme clusters contained in grapheme cluster
           string.

       Operations as String

       as_string
       """OBJECT"""
           Instance method.  Convert grapheme cluster string to Unicode string explicitly.

       cmp (STRING)
       STRING "cmp" STRING
           Instance method.  Compare strings.  There are no oddities.  One of each STRING may be
           Unicode string.

       concat (STRING)
       STRING "." STRING
           Instance method.  Concatenate STRINGs.  One of each STRING may be Unicode string.
           Note that number of columns (see columns()) or grapheme clusters (see length()) of
           resulting string is not always equal to sum of both strings.  Next position of new
           string is that set on the left value.

       join ([STRING, ...])
           Instance method.  Join STRINGs inserting grapheme cluster string.  Any of STRINGs may
           be Unicode string.

       substr (OFFSET, [LENGTH, [REPLACEMENT]])
           Instance method.  Returns substring of grapheme cluster string.  OFFSET and LENGTH are
           based on grapheme clusters.  If REPLACEMENT is specified, substring is replaced by it.
           REPLACEMENT may be Unicode string.

       Operations as Sequence of Grapheme Clusters

       as_array
       "@{"OBJECT"}"
       as_arrayref
           Instance method.  Convert grapheme cluster string to an array of grapheme clusters.

       eos Instance method.  Test if current position is at end of grapheme cluster string.

       flag ([OFFSET, [VALUE]])
           Instance method.  Get or set flag value of OFFEST-th grapheme cluster.  If OFFSET was
           not specified, returns flag value of next grapheme cluster.  Flag value is an non-zero
           integer not greater than 255 and initially is 0.

           Predefined flags are:

           Unicode::LineBreak::ALLOW_BEFORE
               Allow line breaking just before this grapheme cluster.

           Unicode::LineBreak::PROHIBIT_BEFORE
               Prohibit line breaking just before this grapheme cluster.

       item ([OFFSET])
           Instance method.  Returns OFFSET-th grapheme cluster.  If OFFSET was not specified,
           returns next grapheme cluster.

       lbclass ([OFFSET])
           Returns Line Breaking Class (See Unicode::LineBreak) of the first character of OFFSET-
           th grapheme cluster.  If OFFSET was not specified, returns class of next grapheme
           cluster.

       lbclass_ext ([OFFSET])
           Returns Line Breaking Class (See Unicode::LineBreak) of the last grapheme extender of
           OFFSET-th grapheme cluster.  If there are no grapheme extenders or its class is CM,
           value of lbclass() is returned.

       next
       "<"OBJECT">"
           Instance method, iterative.  Returns next grapheme cluster and increment next
           position.

       pos ([OFFSET])
           Instance method.  If optional OFFSET is specified, set next position by it.  Returns
           next position of grapheme cluster string.

CAVEAT

       ·   On Perl around 5.10.1, implicit conversion from Unicode::GCString object to Unicode
           string sometimes let "utf8_mg_pos_cache_update" cache be confused.

           For example, instead of doing

               $sub = substr($gcstring, $i, $j);

           do

               $sub = substr("$gcstring", $i, $j);

               $sub = substr($gcstring->as_string, $i, $j);

       ·   This module implements default algorithm for determining grapheme cluster boundaries.
           Tailoring mechanism has not been supported yet.

VERSION

       Consult $VERSION variable.

       Development versions of this module may be found at
       http://hatuka.nezumi.nu/repos/Unicode-LineBreak/ <http://hatuka.nezumi.nu/repos/Unicode-
       LineBreak/>.

SEE ALSO

       [UAX #29] Mark Davis (2009-2010).  Unicode Standard Annex #29: Unicode Text Segmentation,
       Revision 15-17.  <http://www.unicode.org/reports/tr29/>.

AUTHOR

       Hatuka*nezumi - IKEDA Soji <hatuka(at)nezumi.nu>

COPYRIGHT

       Copyright (C) 2009-2011 Hatuka*nezumi - IKEDA Soji.

       This program is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.