Provided by: tcllib_1.17-dfsg-1_all bug

NAME

       string::token::shell - Parsing of shell command line

SYNOPSIS

       package require Tcl  8.5

       package require string::token::shell  ?1.2?

       package require string::token  ?1?

       package require fileutil

       ::string token shell ?-indices? ?-partial? ?--? string

_________________________________________________________________________________________________

DESCRIPTION

       This  package  provides a command which parses a line of text using basic sh-syntax into a
       list of words.

       The complete set of procedures is described below.

       ::string token shell ?-indices? ?-partial? ?--? string
              This command parses the input string under the assumption of it following basic sh-
              syntax.   The  result of the command is a list of words in the string.  An error is
              thrown if the input does not follow the  allowed  syntax.   The  behaviour  can  be
              modified by specifying any of the two options -indices and -partial.

              --     When  specified option parsing stops at this point. This option is needed if
                     the input string may start with dash. In other words, this  is  pretty  much
                     required if string is user input.

              -indices
                     When  specified  the  output  is not a list of words, but a list of 4-tuples
                     describing the words. Each tuple contains the type of the word,  its  start-
                     and end-indices in the input, and the actual text of the word.

                     Note that the length of the word as given by the indices can differ from the
                     length of the word found in the last  element  of  the  tuple.  The  indices
                     describe  the  words  extent  in the input, including delimiters, intra-word
                     quoting, etc. whereas for  the  actual  text  of  the  word  delimiters  are
                     stripped, intra-word quoting decoded, etc.

                     The possible token types are

                     PLAIN  Plain word, not quoted.

                     D:QUOTED
                            Word is delimited by double-quotes.

                     S:QUOTED
                            Word is delimited by single-quotes.

                     D:QUOTED:PART

                     S:QUOTED:PART
                            Like  the  previous types, but the word has no closing quote, i.e. is
                            incomplete. These token types can occur if and  only  if  the  option
                            -partial  was specified, and only for the last word of the result. If
                            the option -partial was not specified such incomplete words cause the
                            command to thrown an error instead.

              -partial
                     When  specified  the  parser  will  accept  an  incomplete quoted word (i.e.
                     without closing quote) at the end of the line as valid instead  of  throwing
                     an error.

       The  basic  shell  syntax  accepted  here  are  unquoted, single- and double-quoted words,
       separated by whitespace. Leading and trailing whitespace are possible too,  and  stripped.
       Shell variables in their various forms are not recognized, nor are sub-shells.  As for the
       recognized forms of words, see below for the detailed specification.

              single-quoted word
                     A single-quoted word begins with a single-quote character,  i.e.   '  (ASCII
                     39) followed by zero or more unicode characters not a single-quote, and then
                     closed by a single-quote.

                     The word must be followed by either the end of the string, or whitespace.  A
                     word cannot directly follow the word.

              double-quoted word
                     A  double-quoted  word  begins with a double-quote character, i.e.  " (ASCII
                     34) followed by zero or more unicode characters not a double-quote, and then
                     closed by a double-quote.

                     Contrary  to  single-quoted  words  a  double-quote can be embedded into the
                     word, by  prefacing,  i.e.  escaping,  i.e.  quoting  it  with  a  backslash
                     character  \ (ASCII 92). Similarly a backslash character must be quoted with
                     itself to be inserted literally.

              unquoted word
                     Unquoted  words  are  not  delimited  by  quotes  and  thus  cannot  contain
                     whitespace or single-quote characters. Double-quote and backslash characters
                     can be put into unquoted words, by quting them like for double-quoted words.

              whitespace
                     Whitespace is any unicode space character.  This is equivalent to string  is
                     space, or the regular expression \\s.

                     Whitespace  may  occur  before  the  first  word,  or  after  the last word.
                     Whitespace must occur between adjacent words.

BUGS, IDEAS, FEEDBACK

       This document, and the package it describes,  will  undoubtedly  contain  bugs  and  other
       problems.    Please   report  such  in  the  category  textutil  of  the  Tcllib  Trackers
       [http://core.tcl.tk/tcllib/reportlist].  Please also report any ideas for enhancements you
       may have for either package and/or documentation.

KEYWORDS

       bash, lexing, parsing, shell, string, tokenization

CATEGORY

       Text processing