Ubuntu Manpage: string::token::shell - Parsing of shell command line

NAME

       string::token::shell - Parsing of shell command line

SYNOPSIS

       package require Tcl  8.5

       package require string::token::shell  ?1.2?

       package require string::token  ?1?

       package require fileutil

       ::string token shell ?-indices? ?-partial? ?--? string

_________________________________________________________________________________________________

DESCRIPTION

This package provides a command which parses a line of text using basic sh-syntax into a
list of words.

The complete set of procedures is described below.

::string token shell ?-indices? ?-partial? ?--? string
This command parses the input string under the assumption of it following basic sh-
syntax. The result of the command is a list of words in the string. An error is
thrown if the input does not follow the allowed syntax. The behaviour can be
modified by specifying any of the two options -indices and -partial.

-- When specified option parsing stops at this point. This option is needed if
the input string may start with dash. In other words, this is pretty much
required if string is user input.

-indices
When specified the output is not a list of words, but a list of 4-tuples
describing the words. Each tuple contains the type of the word, its start-
and end-indices in the input, and the actual text of the word.

Note that the length of the word as given by the indices can differ from the
length of the word found in the last element of the tuple. The indices
describe the words extent in the input, including delimiters, intra-word
quoting, etc. whereas for the actual text of the word delimiters are
stripped, intra-word quoting decoded, etc.

The possible token types are

PLAIN Plain word, not quoted.

D:QUOTED
Word is delimited by double-quotes.

S:QUOTED
Word is delimited by single-quotes.

D:QUOTED:PART

S:QUOTED:PART
Like the previous types, but the word has no closing quote, i.e. is
incomplete. These token types can occur if and only if the option
-partial was specified, and only for the last word of the result. If
the option -partial was not specified such incomplete words cause the
command to thrown an error instead.

-partial
When specified the parser will accept an incomplete quoted word (i.e.
without closing quote) at the end of the line as valid instead of throwing
an error.

The basic shell syntax accepted here are unquoted, single- and double-quoted words,
separated by whitespace. Leading and trailing whitespace are possible too, and stripped.
Shell variables in their various forms are not recognized, nor are sub-shells. As for the
recognized forms of words, see below for the detailed specification.

single-quoted word
A single-quoted word begins with a single-quote character, i.e. ' (ASCII
39) followed by zero or more unicode characters not a single-quote, and then
closed by a single-quote.

The word must be followed by either the end of the string, or whitespace. A
word cannot directly follow the word.

double-quoted word
A double-quoted word begins with a double-quote character, i.e. " (ASCII
34) followed by zero or more unicode characters not a double-quote, and then
closed by a double-quote.

Contrary to single-quoted words a double-quote can be embedded into the
word, by prefacing, i.e. escaping, i.e. quoting it with a backslash
character \ (ASCII 92). Similarly a backslash character must be quoted with
itself to be inserted literally.

unquoted word
Unquoted words are not delimited by quotes and thus cannot contain
whitespace or single-quote characters. Double-quote and backslash characters
can be put into unquoted words, by quting them like for double-quoted words.

whitespace
Whitespace is any unicode space character. This is equivalent to string is
space, or the regular expression \\s.

Whitespace may occur before the first word, or after the last word.
Whitespace must occur between adjacent words.

BUGS, IDEAS, FEEDBACK

       This document, and the package it describes,  will  undoubtedly  contain  bugs  and  other
       problems.    Please   report  such  in  the  category  textutil  of  the  Tcllib  Trackers
       [http://core.tcl.tk/tcllib/reportlist].  Please also report any ideas for enhancements you
       may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note further that attachments are strongly preferred over inlined patches. Attachments can
       be made by going to the Edit form of the ticket immediately after its creation,  and  then
       using the left-most button in the secondary navigation bar.

KEYWORDS

       bash, lexing, parsing, shell, string, tokenization

NAME

SYNOPSIS

DESCRIPTION

BUGS, IDEAS, FEEDBACK

KEYWORDS

CATEGORY