Provided by: libtickit-dev_0.3.4-1_amd64 bug

NAME

       tickit_utf8_count, tickit_utf8_countmore - count characters in Unicode strings

SYNOPSIS

       #include <tickit.h>

       typedef struct {
           size_t bytes;
           int    codepoints;
           int    graphemes;
           int    columns;
       } TickitStringPos;

       size_t tickit_utf8_count(const char *str, TickitStringPos *pos,
           const TickitStringPos *limit);
       size_t tickit_utf8_countmore(const char *str, TickitStringPos *pos,
           const TickitStringPos *limit);

       size_t tickit_utf8_ncount(const char *str, size_t len,
           TickitStringPos *pos, const TickitStringPos *limit);
       size_t tickit_utf8_ncountmore(const char *str, size_t len,
           TickitStringPos *pos, const TickitStringPos *limit);

       Link with -ltickit.

DESCRIPTION

       tickit_utf8_count()  counts  characters  in the given Unicode string, which must be in UTF-8 encoding. It
       starts at the beginning of the string and counts forward over codepoints and graphemes, incrementing  the
       counters  in  pos  until  it  reaches a limit. It will not go further than any of the limits given by the
       limits structure (where the value -1 indicates no limit of that type). It will never split a codepoint in
       the  middle  of  a  UTF-8  sequence, nor will it split a grapheme between its codepoints; it is therefore
       possible that the function returns before any of the limits have been reached, if the next whole grapheme
       would  involve  going  past  at  least  one  of the specified limits. The function will also stop when it
       reaches the end of str. It returns the total number of bytes it has counted over.

       The bytes member counts UTF-8 bytes which encode individual codepoints. For example the Unicode character
       U+00E9  is  encoded by two bytes 0xc3, 0xa9; it would increment the bytes counter by 2 and the codepoints
       counter by 1.

       The codepoints member counts individual Unicode codepoints.

       The graphemes member counts whole composed graphical clusters  of  codepoints,  where  combining  accents
       which  count  as  individual  codepoints  do  not count as separate graphemes. For example, the codepoint
       sequence U+0065 U+0301 would increment the codepoint counter by 2 and the graphemes counter by 1.

       The columns member counts the number of screen columns consumed by the graphemes. Most graphemes  consume
       only 1 column, but some are defined in Unicode to consume 2.

       tickit_utf8_countmore()  is  similar  to  tickit_utf8_count() except it will not zero any of the counters
       before it starts. It can continue counting where a previous call finished. In particular, it will  assume
       that  it  is  starting at the beginning of a UTF-8 sequence that begins a new grapheme; it will not check
       these facts and the behavior is undefined if these assumptions do not hold. It will begin at  the  offset
       given by pos.bytes.

       The  tickit_utf8_ncount() and tickit_utf8_ncountmore() variants are similar except that they read no more
       than len bytes from the string and do not require it to be NUL terminated. They will still stop at a  NUL
       byte if one is found before len bytes have been read.

       These  functions  will  all immediately abort if any C0 or C1 control byte other than NUL is encountered,
       returning the value -1. In this circumstance, the pos structure will still be updated with  the  progress
       so far.

USAGE

       Typically, these functions would be used either of two ways.

       When  given a value in limit.bytes (or no limit and simply using string termination), tickit_utf8_count()
       will yield the width of the given string in terminal columns, in the limit.columns field.

       When given a value in limit.columns, tickit_utf8_count() will yield the number of bytes  of  that  string
       that will consume the given space on the terminal.

RETURN VALUE

       tickit_utf8_count()  and  tickit_utf8_countmore()  return the number of bytes they have skipped over this
       call, or -1 if they encounter a C0 or C1 byte other than NUL .

SEE ALSO

       tickit_stringpos_zero(3), tickit_stringpos_limit_bytes(3), tickit_utf8_mbswidth(3), tickit(7)

                                                                                            TICKIT_UTF8_COUNT(3)