Provided by: tcl8.6-doc_8.6.10+dfsg-1_all bug

NAME

       encoding - Manipulate encodings

SYNOPSIS

       encoding option ?arg arg ...?
_________________________________________________________________________________________________

INTRODUCTION

       Strings  in  Tcl are logically a sequence of 16-bit Unicode characters.  These strings are
       represented in memory as a sequence of bytes that may be  in  one  of  several  encodings:
       modified  UTF-8  (which  uses  1 to 3 bytes per character), 16-bit “Unicode” (which uses 2
       bytes per character, with an endianness that is dependent on the host  architecture),  and
       binary  (which  uses  a  single  byte per character but only handles a restricted range of
       characters).  Tcl does not guarantee to always use the same encoding for the same string.

       Different operating system interfaces  or  applications  may  generate  strings  in  other
       encodings such as Shift-JIS.  The encoding command helps to bridge the gap between Unicode
       and these other formats.

DESCRIPTION

       Performs one of several encoding related  operations,  depending  on  option.   The  legal
       options are:

       encoding convertfrom ?encoding? data
              Convert  data  to  Unicode from the specified encoding.  The characters in data are
              treated as binary data where the lower 8-bits of  each  character  is  taken  as  a
              single  byte.   The  resulting  sequence  of  bytes  is  treated as a string in the
              specified encoding.  If encoding is not specified, the current system  encoding  is
              used.

       encoding convertto ?encoding? string
              Convert string from Unicode to the specified encoding.  The result is a sequence of
              bytes that represents the converted string.  Each  byte  is  stored  in  the  lower
              8-bits  of  a Unicode character (indeed, the resulting string is a binary string as
              far as Tcl is concerned, at least initially).  If encoding is  not  specified,  the
              current system encoding is used.

       encoding dirs ?directoryList?
              Tcl  can  load  encoding  data  files from the file system that describe additional
              encodings for it to work with. This command sets the search path for *.enc encoding
              data  files  to  the list of directories directoryList. If directoryList is omitted
              then the command returns the current list of directories that make  up  the  search
              path.  It  is  an error for directoryList to not be a valid list. If, when a search
              for an encoding data file is happening, an element in directoryList does not  refer
              to a readable, searchable directory, that element is ignored.

       encoding names
              Returns  a  list  containing  the  names of all of the encodings that are currently
              available.  The encodings “utf-8” and “iso8859-1” are guaranteed to be  present  in
              the list.

       encoding system ?encoding?
              Set  the  system  encoding  to  encoding.  If  encoding is omitted then the command
              returns the current system encoding.  The system  encoding  is  used  whenever  Tcl
              passes strings to system calls.

EXAMPLE

       It  is  common  practice to write script files using a text editor that produces output in
       the euc-jp encoding, which represents the ASCII characters as  singe  bytes  and  Japanese
       characters  as  two bytes.  This makes it easy to embed literal strings that correspond to
       non-ASCII characters by simply typing the  strings  in  place  in  the  script.   However,
       because  the source command always reads files using the current system encoding, Tcl will
       only source such files correctly when the encoding used to write the  file  is  the  same.
       This  tends  not  to be true in an internationalized setting.  For example, if such a file
       was sourced in North America (where the ISO8859-1 is normally used), each byte in the file
       would  be  treated  as  a  separate  character  that  maps to the 00 page in Unicode.  The
       resulting Tcl strings will not contain the expected Japanese  characters.   Instead,  they
       will contain a sequence of Latin-1 characters that correspond to the bytes of the original
       string.  The encoding command can be used to convert this string to the expected  Japanese
       Unicode characters.  For example,

              set s [encoding convertfrom euc-jp "\xA4\xCF"]

       would return the Unicode string “\u306F”, which is the Hiragana letter HA.

SEE ALSO

       Tcl_GetEncoding(3tcl)

KEYWORDS

       encoding, unicode