Provided by: libterm-shellui-perl_0.92-4_all bug

NAME

       Text::Shellwords::Cursor - Parse a string into tokens

SYNOPSIS

        use Text::Shellwords::Cursor;
        my $parser = Text::Shellwords::Cursor->new();
        my $str = 'ab cdef "ghi"    j"k\"l "';
        my ($tok1) = $parser->parse_line($str);
          $tok1 = ['ab', 'cdef', 'ghi', 'j', 'k"l ']
        my ($tok2, $tokno, $tokoff) = $parser->parse_line($str, cursorpos => 6);
           as above, but $tokno=1, $tokoff=3  (under the 'f')

       DESCRIPTION

       This module is very similar to Text::Shellwords and Text::ParseWords.  However, it has one
       very significant difference: it keeps track of a character position in the line it's
       parsing.  For instance, if you pass it ("zq fmgb", cursorpos=>6), it would return (['zq',
       'fmgb'], 1, 3).  The cursorpos parameter tells where in the input string the cursor
       resides (just before the 'b'), and the result tells you that the cursor was on token 1
       ('fmgb'), character 3 ('b').  This is very useful when computing command-line completions
       involving quoting, escaping, and tokenizing characters (like '(' or '=').

       A few helper utilities are included as well.  You can escape a string to ensure that
       parsing it will produce the original string (parse_escape).  You can also reassemble the
       tokens with a visually pleasing amount of whitespace between them (join_line).

       This module started out as an integral part of Term::GDBUI using code loosely based on
       Text::ParseWords.  However, it is now basically a ground-up reimplementation.  It was
       split out of Term::GDBUI for version 0.8.

METHODS

       new
          Creates a new parser.  Takes named arguments on the command line.

          keep_quotes
              Normally all unescaped, unnecessary quote marks are stripped.  If you specify
              "keep_quotes=>1", however, they are preserved.  This is useful if you need to know
              whether the string was quoted or not (string constants) or what type of quotes was
              around it (affecting variable interpolation, for instance).

          token_chars
              This argument specifies the characters that should be considered tokens all by
              themselves.  For instance, if I pass token_chars=>'=', then 'ab=123' would be
              parsed to ('ab', '=', '123').  Without token_chars, 'ab=123' remains a single
              string.

              NOTE: you cannot change token_chars after the constructor has been called!  The
              regexps that use it are compiled once (m//o).  Also, until the Gnu Readline library
              can accept "=[]," without diving into an endless loop, we will not tell history
              expansion to use token_chars (it uses " \t\fBen()<>;&|" by default).

          debug
              Turns on rather copious debugging to try to show what the parser is thinking at
              every step.

          space_none
          space_before
          space_after
              These variables affect how whitespace in the line is normalized and it is
              reassembled into a string.  See the join_line routine.

          error
              This is a reference to a routine that should be called to display a parse error.
              The routine takes two arguments: a reference to the parser, and the error message
              to display as a string.

          parsebail(msg)
              If the parsel routine or any of its subroutines runs into a fatal error, they call
              parsebail to present a very descriptive diagnostic.

          parsel
              This is the heinous routine that actually does the parsing.  You should never need
              to call it directly.  Call parse_line instead.

          parse_line(line, named args)
              This is the entrypoint to this module's parsing functionality.  It converts a line
              into tokens, respecting quoted text, escaped characters, etc.  It also keeps track
              of a cursor position on the input text, returning the token number and offset
              within the token where that position can be found in the output.

              This routine originally bore some resemblance to Text::ParseWords.  It has changed
              almost completely, however, to support keeping track of the cursor position.  It
              also has nicer failure modes, modular quoting, token characters (see token_chars in
              "new"), etc.  This routine now does much more.

              Arguments:

              line
                 This is a string containing the command-line to parse.

              This routine also accepts the following named parameters:

              cursorpos
                 This is the character position in the line to keep track of.  Pass undef (by not
                 specifying it) or the empty string to have the line processed with cursorpos
                 ignored.

                 Note that passing undef is not the same as passing some random number and
                 ignoring the result!  For instance, if you pass 0 and the line begins with
                 whitespace, you'll get a 0-length token at the beginning of the line to
                 represent the cursor in the middle of the whitespace.  This allows command
                 completion to work even when the cursor is not near any tokens.  If you pass
                 undef, all whitespace at the beginning and end of the line will be trimmed as
                 you would expect.

                 If it is ambiguous whether the cursor should belong to the previous token or to
                 the following one (i.e. if it's between two quoted strings, say "a""b" or a
                 token_char), it always gravitates to the previous token.  This makes more sense
                 when completing.

              fixclosequote
                 Sometimes you want to try to recover from a missing close quote (for instance,
                 when calculating completions), but usually you want a missing close quote to be
                 a fatal error.  fixclosequote=>1 will implicitly insert the correct quote if
                 it's missing.  fixclosequote=>0 is the default.

              messages
                 parse_line is capable of printing very informative error messages.  However,
                 sometimes you don't care enough to print a message (like when calculating
                 completions).  Messages are printed by default, so pass messages=>0 to turn them
                 off.

              This function returns a reference to an array containing three items:

              tokens
                 A the tokens that the line was separated into (ref to an array of strings).

              tokno
                 The number of the token (index into the previous array) that contains cursorpos.

              tokoff
                 The character offset into tokno of cursorpos.

              If the cursor is at the end of the token, tokoff will point to 1 character past the
              last character in tokno, a non-existant character.  If the cursor is between tokens
              (surrounded by whitespace), a zero-length token will be created for it.

          parse_escape(lines)
              Escapes characters that would be otherwise interpreted by the parser.  Will accept
              either a single string or an arrayref of strings (which will be modified in-place).

          join_line(tokens)
              This routine does a somewhat intelligent job of joining tokens back into a command
              line.  If token_chars (see "new") is empty (the default), then it just escapes
              backslashes and quotes, and joins the tokens with spaces.

              However, if token_chars is nonempty, it tries to insert a visually pleasing amount
              of space between the tokens.  For instance, rather than 'a ( b , c )', it tries to
              produce 'a (b, c)'.  It won't reformat any tokens that aren't found in
              $self->{token_chars}, of course.

              To change the formatting, you can redefine the variables $self->{space_none},
              $self->{space_before}, and $self->{space_after}.  Each variable is a string
              containing all characters that should not be surrounded by whitespace, should have
              whitespace before, and should have whitespace after, respectively.  Any character
              found in token_chars, but non in any of these space_ variables, will have space
              placed both before and after.

BUGS

       None known.

LICENSE

       Copyright (c) 2003-2011 Scott Bronson, all rights reserved.  This program is covered by
       the MIT license.

AUTHOR

       Scott Bronson <bronson@rinspin.com>