Provided by: python-kitchen-doc_1.2.6-8_all bug

NAME

       kitchen - kitchen 1.2.6

       Author Toshio Kuratomi

       Date   19 March 2011

       Version
              1.0.x

       We've  all  done  it.   In the process of writing a brand new application we've discovered that we need a
       little bit of code that we've invented before.  Perhaps it's something to handle unicode  text.   Perhaps
       it's  something  to  make a bit of python-2.5 code run on python-2.4.  Whatever it is, it ends up being a
       tiny bit of code that seems too small to worry about pushing into its own module so it sits there, a part
       of your current project, waiting to be cut and pasted into your next project.  And  the  next.   And  the
       next.   And since that little bittybit of code proved so useful to you, it's highly likely that it proved
       useful to someone else as well.  Useful enough that they've written it and copy and pasted  it  over  and
       over into each of their new projects.

       Well,  no  longer!  Kitchen aims to pull these small snippets of code into a few python modules which you
       can import and use within your project.  No more copy and paste!  Now you can let someone  else  maintain
       and release these small snippets so that you can get on with your life.

       This  package forms the core of Kitchen.  It contains some useful modules for using newer python standard
       library <http://docs.python.org/library> modules on older python versions,  text  manipulation,  PEP  386
       <https://peps.python.org/pep-0386/> versioning, and initializing gettext.  With this package we're trying
       to  provide  a  few  useful features that don't have too many dependencies outside of the python standard
       library <http://docs.python.org/library>.  We'll be releasing other modules that drop  into  the  kitchen
       namespace to add other features (possibly with larger deps) as time goes on.

REQUIREMENTS

       We've  tried  to  keep  the  core  kitchen module's requirements lightweight.  At the moment kitchen only
       requires

       python 2.4 or later

       Warning:
          Kitchen-1.1.0 was the last release to support python-2.3.x.

   Soft Requirements
       If found, these libraries will be used to make the implementation of some part of kitchen better in  some
       way.   If they are not present, the API that they enable will still exist but may function in a different
       manner.

       chardet <http://pypi.python.org/pypi/chardet>
              Used in guess_encoding() <#kitchen.text.misc.guess_encoding> and guess_encoding_to_xml() <#kitchen
              .text.converters.guess_encoding_to_xml> to help guess encoding of byte  strings  being  converted.
              If not present, unknown encodings will be converted as if they were latin1

OTHER RECOMMENDED LIBRARIES

       These  libraries  implement  commonly  used  functionality  that  everyone  seems to invent.  Rather than
       reinvent their wheel, I simply list the things that they do well for now.  Perhaps if people  can't  find
       them normally, I'll add them as requirements in setup.py or link them into kitchen's namespace.  For now,
       I just mention them here:

       bunch <http://pypi.python.org/pypi/bunch/>
              Bunch  is  a  dictionary  that you can use attribute lookup as well as bracket notation to access.
              Setting it apart from most homebrewed  implementations  is  the  bunchify()  function  which  will
              descend nested structures of lists and dicts, transforming the dicts to Bunch's.

       hashlib <http://code.krypto.org/python/hashlib/>
              Python  2.5  and forward have a hashlib library that provides secure hash functions to python.  If
              you're developing for python2.4 though, you can install the standalone hashlib  library  and  have
              access to the same functions.

       iterutils <http://pypi.python.org/pypi/iterutils/>
              The python documentation for itertools has some examples of other nice iterable functions that can
              be built from the itertools functions.  This third-party module creates those recipes as a module.

       ordereddict <http://pypi.python.org/pypi/ordereddict/>
              Python  2.7  and  forward  have  a  OrderedDict  that provides a dict whose items are ordered (and
              indexable) as well as named.

       unittest2 <http://pypi.python.org/pypi/unittest2>
              Python 2.7 has an updated unittest library with new functions not present in the  python  standard
              library  <http://docs.python.org/library>  for  Python  2.6 or less.  If you want to use those new
              functions but need your testing framework to be compatible with older Python the unittest2 library
              provides the update as an external module.

       nose <http://somethingaboutorange.com/mrl/projects/nose/>
              If you want to use a test discovery tool instead of the unittest framework, nosetests  provides  a
              simple to use way to do that.

LICENSE

       This  python  module is distributed under the terms of the GNU Lesser General Public License Version 2 or
       later <http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html>.

       Note:
          Some parts of this module are licensed under terms less restrictive than the LGPLv2+.  If you separate
          these files from the work as a whole you are allowed to use them under the less restrictive  licenses.
          The following is a list of the files that are known:

          Python 2 license <http://www.python.org/download/releases/2.4/license/>
                 _subprocess.py,   test_subprocess.py,   defaultdict.py,  test_defaultdict.py,  _base64.py,  and
                 test_base64.py

CONTENTS

   Using kitchen to write good code
       Kitchen's functions won't automatically make you a better programmer.  You have to learn when and how  to
       use  them  as  well.  This section of the documentation is intended to show you some of the ways that you
       can apply kitchen's functions to problems that may have arisen in your life.  The goal of this section is
       to give you enough information to understand what the kitchen API can do for you and where in the Kitchen
       API <#kitchenapi> docs to look for something that can help you with your next issue.  Along the way,  you
       might  pick up the knack for identifying issues with your code before you publish it.  And that will make
       you a better coder.

   Overcoming frustration: Correctly using unicode in python2
       In python-2.x, there's two types that deal with text.

       1. str is for strings of bytes.  These are very similar in nature to how strings are handled in C.

       2. unicode is for strings of unicode code points <#term-code-points>.

       Note:
          Just what the dickens is "Unicode"?

          One mistake that people encountering this issue for the first time make is confusing the unicode  type
          and  the  encodings of unicode stored in the str type.  In python, the unicode type stores an abstract
          sequence of code points <#term-code-points>.  Each code point <#term-code-point> represents a grapheme
          <#term-grapheme>.  By contrast, byte str stores a sequence of bytes which can  then  be  mapped  to  a
          sequence  of  code  points  <#term-code-points>.   Each  unicode encoding (UTF-8 <#term-UTF-8>, UTF-7,
          UTF-16, UTF-32, etc) maps different sequences of bytes to the unicode code points <#term-code-points>.

          What does that mean to you as a programmer?  When you're dealing with text manipulations (finding  the
          number  of  characters  in a string or cutting a string on word boundaries) you should be dealing with
          unicode strings as they abstract characters in a manner that's appropriate for thinking of them  as  a
          sequence of letters that you will see on a page.  When dealing with I/O, reading to and from the disk,
          printing  to  a  terminal, sending something over a network link, etc, you should be dealing with byte
          str as those devices are going to need to deal with concrete implementations of what  bytes  represent
          your abstract characters.

       In  the python2 world many APIs use these two classes interchangably but there are several important APIs
       where only one or the other will do the right thing.  When you give the wrong type of string  to  an  API
       that  wants  the  other  type,  you  may  end  up  with  an exception being raised (UnicodeDecodeError or
       UnicodeEncodeError).  However, these exceptions aren't always raised because python  implicitly  converts
       between types... sometimes.

   Frustration #1: Inconsistent Errors
       Although  converting  when  possible  seems like the right thing to do, it's actually the first source of
       frustration.  A programmer can test out their program with a string like: The quick brown fox jumped over
       the lazy dog and not encounter any issues.  But when they release their software into the  wild,  someone
       enters  the  string:  I sat down for coffee at the café and suddenly an exception is thrown.  The reason?
       The mechanism that converts between the  two  types  is  only  able  to  deal  with  ASCII  <#term-ASCII>
       characters.   Once  you  throw  non-ASCII  <#term-ASCII>  characters into your strings, you have to start
       dealing with the conversion manually.

       So, if I manually convert everything to either byte str or unicode strings, will I be okay?   The  answer
       is.... sometimes.

   Frustration #2: Inconsistent APIs
       The  problem  you  run  into  when converting everything to byte str or unicode strings is that you'll be
       using someone else's API quite often (this includes the APIs in the python standard library  <http://docs
       .python.org/library>) and find that the API will only accept byte str or only accept unicode strings.  Or
       worse,  that the code will accept either when you're dealing with strings that consist solely of ASCII <#
       term-ASCII> but throw an error when you give it a string that's got non-ASCII  <#term-ASCII>  characters.
       When you encounter these APIs you first need to identify which type will work better and then you have to
       convert your values to the correct type for that code.  Thus the programmer that wants to proactively fix
       all unicode errors in their code needs to do two things:

       1. You must keep track of what type your sequences of text are.  Does my_sentence contain unicode or str?
          If you don't know that then you're going to be in for a world of hurt.

       2. Anytime  you  call  a function you need to evaluate whether that function will do the right thing with
          str or unicode values.  Sending the wrong value here will lead to a UnicodeError being thrown when the
          string contains non-ASCII <#term-ASCII> characters.

       Note:
          There is one mitigating factor here.  The python community has been standardizing on using unicode  in
          all  its  APIs.   Although  there are some APIs that you need to send byte str to in order to be safe,
          (including things as ubiquitous as print() as we'll see in the next section), it's getting easier  and
          easier to use unicode strings with most APIs.

   Frustration #3: Inconsistent treatment of output
       Alright,  since  the  python  community  is  moving to using unicode strings everywhere, we might as well
       convert everything to unicode strings and use that by default, right?  Sounds good most of the  time  but
       there's  at  least one huge caveat to be aware of.  Anytime you output text to the terminal or to a file,
       the text has to be converted into a byte str.  Python will try to implicitly convert from unicode to byte
       str... but it will throw an exception if the bytes are non-ASCII <#term-ASCII>:

          >>> string = unicode(raw_input(), 'utf8')
          café
          >>> log = open('/var/tmp/debug.log', 'w')
          >>> log.write(string)
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

       Okay, this is simple enough to solve:  Just convert to a byte str and we're all set:

          >>> string = unicode(raw_input(), 'utf8')
          café
          >>> string_for_output = string.encode('utf8', 'replace')
          >>> log = open('/var/tmp/debug.log', 'w')
          >>> log.write(string_for_output)
          >>>

       So that was simple, right?  Well... there's one gotcha that makes things a bit harder to debug sometimes.
       When you attempt to write non-ASCII <#term-ASCII> unicode  strings  to  a  file-like  object  you  get  a
       traceback  everytime.   But  what happens when you use print()?  The terminal is a file-like object so it
       should raise an exception right?  The answer to that is....  sometimes:

          $ python
          >>> print u'café'
          café

       No exception.  Okay, we're fine then?

       We are until someone does one of the following:

       • Runs the script in a different locale:

            $ LC_ALL=C python
            >>> # Note: if you're using a good terminal program when running in the C locale
            >>> # The terminal program will prevent you from entering non-ASCII characters
            >>> # python will still recognize them if you use the codepoint instead:
            >>> print u'caf\xe9'
            Traceback (most recent call last):
              File "<stdin>", line 1, in <module>
            UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

       • Redirects output to a file:

            $ cat test.py
            #!/usr/bin/python -tt
            # -*- coding: utf-8 -*-
            print u'café'
            $ ./test.py  >t
            Traceback (most recent call last):
              File "./test.py", line 4, in <module>
                print u'café'
            UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

       Okay, the locale thing is a pain but understandable: the  C  locale  doesn't  understand  any  characters
       outside  of  ASCII  <#term-ASCII>  so  naturally  attempting  to  display those won't work.  Now why does
       redirecting to a file cause problems?  It's because print() in python2 is treated specially.  Whereas the
       other file-like objects in  python  always  convert  to  ASCII  <#term-ASCII>  unless  you  set  them  up
       differently, using print() to output to the terminal will use the user's locale to convert before sending
       the  output to the terminal.  When print() is not outputting to the terminal (being redirected to a file,
       for instance), print() decides that it doesn't know what locale to use for that file and so it  tries  to
       convert to ASCII <#term-ASCII> instead.

       So  what  does  this  mean  for you, as a programmer?  Unless you have the luxury of controlling how your
       users use your code, you should always, always, always convert to a byte str before outputting strings to
       the terminal or to a file.  Python even provides you with a facility to do just this.  If you  know  that
       every unicode string you send to a particular file-like object (for instance, stdout) should be converted
       to a particular encoding you can use a codecs.StreamWriter object to convert from a unicode string into a
       byte  str.  In particular, codecs.getwriter() will return a StreamWriter class that will help you to wrap
       a file-like object for output.  Using our print() example:

          $ cat test.py
          #!/usr/bin/python -tt
          # -*- coding: utf-8 -*-
          import codecs
          import sys

          UTF8Writer = codecs.getwriter('utf8')
          sys.stdout = UTF8Writer(sys.stdout)
          print u'café'
          $ ./test.py  >t
          $ cat t
          café

   Frustrations #4 and #5 -- The other shoes
       In English, there's a saying "waiting for the other shoe to drop".  It means that when one event (usually
       bad) happens, you come to expect another event (usually worse) to come after.  In this case we  have  two
       other shoes.

   Frustration #4: Now it doesn't take byte strings?!
       If  you wrap sys.stdout using codecs.getwriter() and think you are now safe to print any variable without
       checking its type I am afraid I must inform you that you're not paying enough attention to  Murphy's  Law
       <#term-Murphy-s-Law>.   The  StreamWriter  that codecs.getwriter() provides will take unicode strings and
       transform them into byte str before they get to sys.stdout.  The problem is  if  you  give  it  something
       that's  already  a byte str it tries to transform that as well.  To do that it tries to turn the byte str
       you give it into unicode and then transform that back into a byte str...  and since it uses the ASCII  <#
       term-ASCII> codec to perform those conversions, chances are that it'll blow up when making them:

          >>> import codecs
          >>> import sys
          >>> UTF8Writer = codecs.getwriter('utf8')
          >>> sys.stdout = UTF8Writer(sys.stdout)
          >>> print 'café'
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
            File "/usr/lib64/python2.6/codecs.py", line 351, in write
              data, consumed = self.encode(object, self.errors)
          UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

       To  work around this, kitchen provides an alternate version of codecs.getwriter() that can deal with both
       byte  str  and  unicode  strings.   Use   kitchen.text.converters.getwriter()   <#kitchen.text.converters
       .getwriter> in place of the codecs version like this:

          >>> import sys
          >>> from kitchen.text.converters import getwriter
          >>> UTF8Writer = getwriter('utf8')
          >>> sys.stdout = UTF8Writer(sys.stdout)
          >>> print u'café'
          café
          >>> print 'café'
          café

   Frustration #5: Inconsistent APIs Part deux
       Sometimes  you  do  everything right in your code but other people's code fails you.  With unicode issues
       this happens more often than we want.  A glaring example of this is when  you  get  values  back  from  a
       function that aren't consistently unicode string or byte str.

       An  example  from  the  python standard library <http://docs.python.org/library> is gettext.  The gettext
       functions are used to help translate messages that you display to users in the users'  native  languages.
       Since  most  languages  contain  letters  outside  of  the ASCII <#term-ASCII> range, the values that are
       returned contain unicode characters.  gettext provides you with  ugettext()  and  ungettext()  to  return
       these  translations  as  unicode strings and gettext(), ngettext(), lgettext(), and lngettext() to return
       them as encoded byte str.  Unfortunately, even though they're documented  to  return  only  one  type  of
       string or the other, the implementation has corner cases where the wrong type can be returned.

       This  means  that  even  if  you separate your unicode string and byte str correctly before you pass your
       strings to a gettext function, afterwards, you might have to check that you have the right sort of string
       type again.

       Note:
          kitchen.i18n <#module-kitchen.i18n> provides alternate gettext translation objects  that  return  only
          byte str or only unicode string.

   A few solutions
       Now that we've identified the issues, can we define a comprehensive strategy for dealing with them?

   Convert text at the border
       If  you  get  some  piece  of  text  from a library, read from a file, etc, turn it into a unicode string
       immediately.  Since python is moving in the direction of unicode strings  everywhere  it's  going  to  be
       easier to work with unicode strings within your code.

       If  your  code  is heavily involved with using things that are bytes, you can do the opposite and convert
       all text into byte str at the border and only convert to unicode when you need it for passing to  another
       library or performing string operations on it.

       In  either  case,  the important thing is to pick a default type for strings and stick with it throughout
       your code.  When you mix the types it becomes much easier to operate on a string with a function that can
       only use the other type by mistake.

       Note:
          In python3, the abstract unicode type becomes  much  more  prominent.   The  type  named  str  is  the
          equivalent  of  python2's  unicode and python3's bytes type replaces python2's str.  Most APIs deal in
          the unicode type of string with just some pieces that are low level dealing with bytes.  The  implicit
          conversions between bytes and unicode is removed and whenever you want to make the conversion you need
          to do so explicitly.

   When the data needs to be treated as bytes (or unicode) use a naming convention
       Sometimes  you're  converting  nearly  all of your data to unicode strings but you have one or two values
       where you have to keep byte str around.  This is often the case when you need to use the  value  verbatim
       with some external resource.  For instance, filenames or key values in a database.  When you do this, use
       a  naming  convention  for the data you're working with so you (and others reading your code later) don't
       get confused about what's being stored in the value.

       If you need both a textual string to present to the user and a byte value for an  exact  match,  consider
       keeping  both versions around.  You can either use two variables for this or a dict whose key is the byte
       value.

       Note:
          You can use the naming convention used in  kitchen  as  a  guide  for  implementing  your  own  naming
          convention.  It prefixes byte str variables of unknown encoding with b_ and byte str of known encoding
          with  the  encoding  name  like:  utf8_.  If the default was to handle str and only keep a few unicode
          values, those variables would be prefixed with u_.

   When outputting data, convert back into bytes
       When you go to send your data back outside  of  your  program  (to  the  filesystem,  over  the  network,
       displaying  to  the  user,  etc)  turn the data back into a byte str.  How you do this will depend on the
       expected output format of the data.  For displaying to the user, you can use the user's default  encoding
       using  locale.getpreferredencoding().   For  entering  into  a  file, you're best bet is to pick a single
       encoding and stick with it.

       Warning:
          When using the encoding that the user has  set  (for  instance,  using  locale.getpreferredencoding(),
          remember  that  they  may have their encoding set to something that can't display every single unicode
          character.  That means when you convert from unicode to a byte str you  need  to  decide  what  should
          happen  if the byte value is not valid in the user's encoding.  For purposes of displaying messages to
          the user, it's usually okay to  use  the  replace  encoding  error  handler  to  replace  the  invalid
          characters with a question mark or other symbol meaning the character couldn't be displayed.

       You   can   use   kitchen.text.converters.getwriter()  <#kitchen.text.converters.getwriter>  to  do  this
       automatically for sys.stdout.  When creating exception messages be sure to convert to bytes manually.

   When writing unittests, include non-ASCII values and both unicode and str type
       Unless you know that a specific portion of your code will only deal with ASCII <#term-ASCII>, be sure  to
       include  non-ASCII  <#term-ASCII>  values  in  your  unittests.   Including a few characters from several
       different scripts is highly advised as well because some code  may  have  special  cased  accented  roman
       characters but not know how to handle characters used in Asian alphabets.

       Similarly, unless you know that that portion of your code will only be given unicode strings or only byte
       str  be  sure  to  try  variables  of  both types in your unittests.  When doing this, make sure that the
       variables are also non-ASCII <#term-ASCII> as python's implicit conversion will mask problems  with  pure
       ASCII  <#term-ASCII>  data.   In many cases, it makes sense to check what happens if byte str and unicode
       strings that won't decode in the present locale are given.

   Be vigilant about spotting poor APIs
       Make sure that the libraries you use return only unicode strings or byte str.   Unittests  can  help  you
       spot issues here by running many variations of data through your functions and checking that you're still
       getting the types of string that you expect.

   Example: Putting this all together with kitchen
       The kitchen library provides a wide array of functions to help you deal with byte str and unicode strings
       in your program.  Here's a short example that uses many kitchen functions to do its work:

          #!/usr/bin/python -tt
          # -*- coding: utf-8 -*-
          import locale
          import os
          import sys
          import unicodedata

          from kitchen.text.converters import getwriter, to_bytes, to_unicode
          from kitchen.i18n import get_translation_object

          if __name__ == '__main__':
              # Setup gettext driven translations but use the kitchen functions so
              # we don't have the mismatched bytes-unicode issues.
              translations = get_translation_object('example')
              # We use _() for marking strings that we operate on as unicode
              # This is pretty much everything
              _ = translations.ugettext
              # And b_() for marking strings that we operate on as bytes.
              # This is limited to exceptions
              b_ = translations.lgettext

              # Setup stdout
              encoding = locale.getpreferredencoding()
              Writer = getwriter(encoding)
              sys.stdout = Writer(sys.stdout)

              # Load data.  Format is filename\0description
              # description should be utf-8 but filename can be any legal filename
              # on the filesystem
              # Sample datafile.txt:
              #   /etc/shells\x00Shells available on caf\xc3\xa9.lan
              #   /var/tmp/file\xff\x00File with non-utf8 data in the filename
              #
              # And to create /var/tmp/file\xff (under bash or zsh) do:
              #   echo 'Some data' > /var/tmp/file$'\377'
              datafile = open('datafile.txt', 'r')
              data = {}
              for line in datafile:
                  # We're going to keep filename as bytes because we will need the
                  # exact bytes to access files on a POSIX operating system.
                  # description, we'll immediately transform into unicode type.
                  b_filename, description = line.split('\0', 1)

                  # to_unicode defaults to decoding output from utf-8 and replacing
                  # any problematic bytes with the unicode replacement character
                  # We accept mangling of the description here knowing that our file
                  # format is supposed to use utf-8 in that field and that the
                  # description will only be displayed to the user, not used as
                  # a key value.
                  description = to_unicode(description, 'utf-8').strip()
                  data[b_filename] = description
              datafile.close()

              # We're going to add a pair of extra fields onto our data to show the
              # length of the description and the filesize.  We put those between
              # the filename and description because we haven't checked that the
              # description is free of NULLs.
              datafile = open('newdatafile.txt', 'w')

              # Name filename with a b_ prefix to denote byte string of unknown encoding
              for b_filename in data:
                  # Since we have the byte representation of filename, we can read any
                  # filename
                  if os.access(b_filename, os.F_OK):
                      size = os.path.getsize(b_filename)
                  else:
                      size = 0
                  # Because the description is unicode type,  we know the number of
                  # characters corresponds to the length of the normalized unicode
                  # string.
                  length = len(unicodedata.normalize('NFC', description))

                  # Print a summary to the screen
                  # Note that we do not let implici type conversion from str to
                  # unicode transform b_filename into a unicode string.  That might
                  # fail as python would use the ASCII filename.  Instead we use
                  # to_unicode() to explictly transform in a way that we know will
                  # not traceback.
                  print _(u'filename: %s') % to_unicode(b_filename)
                  print _(u'file size: %s') % size
                  print _(u'desc length: %s') % length
                  print _(u'description: %s') % data[b_filename]

                  # First combine the unicode portion
                  line = u'%s\0%s\0%s' % (size, length, data[b_filename])
                  # Since the filenames are bytes, turn everything else to bytes before combining
                  # Turning into unicode first would be wrong as the bytes in b_filename
                  # might not convert
                  b_line = '%s\0%s\n' % (b_filename, to_bytes(line))

                  # Just to demonstrate that getwriter will pass bytes through fine
                  print b_('Wrote: %s') % b_line
                  datafile.write(b_line)
              datafile.close()

              # And just to show how to properly deal with an exception.
              # Note two things about this:
              # 1) We use the b_() function to translate the string.  This returns a
              #    byte string instead of a unicode string
              # 2) We're using the b_() function returned by kitchen.  If we had
              #    used the one from gettext we would need to convert the message to
              #    a byte str first
              message = u'Demonstrate the proper way to raise exceptions.  Sincerely,  \u3068\u3057\u304a'
              raise Exception(b_(message))

       See also:
          kitchen.text.converters <#module-kitchen.text.converters>

   Designing Unicode Aware APIs
       APIs  that  deal with byte str and unicode strings are difficult to get right.  Here are a few strategies
       with pros and cons of each.

   Contents
       • Designing Unicode Aware APIs

         • Take either bytes or unicode, output only unicode

         • Take either bytes or unicode, output the same type

         • Separate functions

         • Deciding whether to take str or unicode when no value is returned

           • Writing to external data

           • Updating data structures

         • APIs to Avoid

           • Returning unicode unless a conversion fails

           • Ignoring values with no chance of recovery

           • Raising a UnicodeException with no chance of recovery

         • Knowing your data

           • Do you need to operate on both bytes and unicode?

           • Can you restrict the encodings?

             • Single byte encodings

             • Multibyte encodings

               • Fixed width

               • Variable Width

                 • ASCII compatible

                 • Escaped

                 • Other

   Take either bytes or unicode, output only unicode
       In this strategy, you allow the user to enter either unicode strings or byte str but what you  give  back
       is  always unicode.  This strategy is easy for novice endusers to start using immediately as they will be
       able to feed either type of string into the function and get back a string that they  can  use  in  other
       places.

       However,  it  does lead to the novice writing code that functions correctly when testing it with ASCII <#
       term-ASCII>-only data but fails when given data that contains non-ASCII <#term-ASCII> characters.  Worse,
       if your API is not designed to be flexible, the consumer of your code won't be  able  to  easily  correct
       those problems once they find them.

       Here's a good API that uses this strategy:

          from kitchen.text.converters import to_unicode

          def truncate(msg, max_length, encoding='utf8', errors='replace'):
              msg = to_unicode(msg, encoding, errors)
              return msg[:max_length]

       The  call  to  truncate() starts with the essential parameters for performing the task.  It ends with two
       optional keyword arguments that define the encoding to use to transform from a byte str  to  unicode  and
       the  strategy  to  use  if undecodable bytes are encountered.  The defaults may vary depending on the use
       cases you have in mind.  When the output  is  generally  going  to  be  printed  for  the  user  to  see,
       errors='replace'  is  a  good  default.   If you are constructing keys to a database, raisng an exception
       (with errors='strict') may be a better default.  In either case, having both parameters allows the person
       using your API to choose how they want to handle any problems.  Having the values is also a clue to  them
       that a conversion from byte str to unicode string is going to occur.

       Note:
          If  you're  targeting  python-3.1  and  above,  errors='surrogateescape'  may be a better default than
          errors='strict'.  You need to be mindful of a few things when using surrogateescape though:

          • surrogateescape will cause issues if a non-ASCII <#term-ASCII>  compatible  encoding  is  used  (for
            instance,  UTF-16  and  UTF-32.)  That makes it unhelpful in situations where a true general purpose
            method of encoding  must  be  found.   PEP  383  <https://peps.python.org/pep-0383/>  mentions  that
            surrogateescape  was  specifically designed with the limitations of translating using system locales
            (where ASCII <#term-ASCII> compatibility is generally seen as inescapable) so you should  keep  that
            in mind.

          • If  you  use  surrogateescape  to decode from bytes to unicode you will need to use an error handler
            other than strict to encode as the lone surrogate that this error handler creates makes for  invalid
            unicode  that  must  be  handled when encoding.  In Python-3.1.2 or less, a bug in the encoder error
            handlers mean that you can only use surrogateescape to encode; anything else will throw an error.

          Evaluate your usages of the variables in question to see what makes sense.

       Here's a bad example of using this strategy:

          from kitchen.text.converters import to_unicode

          def truncate(msg, max_length):
              msg = to_unicode(msg)
              return msg[:max_length]

       In this example, we don't have the optional keyword arguments for encoding and errors.  A user  who  uses
       this  function  is  more  likely  to miss the fact that a conversion from byte str to unicode is going to
       occur.  And once an error is reported, they will have to look through their backtrace  and  think  harder
       about  where  they want to transform their data into unicode strings instead of having the opportunity to
       control how the conversion takes place in the function itself.  Note that the user does have the  ability
       to make this work by making the transformation to unicode themselves:

          from kitchen.text.converters import to_unicode

          msg = to_unicode(msg, encoding='euc_jp', errors='ignore')
          new_msg = truncate(msg, 5)

   Take either bytes or unicode, output the same type
       This  strategy  is sometimes called polymorphic because the type of data that is returned is dependent on
       the type of data that is received.  The concept is that when you are given a byte  str  to  process,  you
       return  a  byte  str  in  your output.  When you are given unicode strings to process, you return unicode
       strings in your output.

       This can work well for end users as the ones that know about the difference between the two string  types
       will  already  have transformed the strings to their desired type before giving it to this function.  The
       ones that don't can remain blissfully ignorant (at least, as far as your function is  concerned)  as  the
       function does not change the type.

       In  cases  where  the encoding of the byte str is known or can be discovered based on the input data this
       works well.  If you can't figure out the input encoding, however, this strategy can fail in  any  of  the
       following cases:

       1. It needs to do an internal conversion between byte str and unicode string.

       2. It cannot return the same data as either a unicode string or byte str.

       3. You may need to deal with byte strings that are not byte-compatible with ASCII <#term-ASCII>

       First, a couple examples of using this strategy in a good way:

          def translate(msg, table):
              replacements = table.keys()
              new_msg = []
              for index, char in enumerate(msg):
                  if char in replacements:
                      new_msg.append(table[char])
                  else:
                      new_msg.append(char)

              return ''.join(new_msg)

       In this example, all of the strings that we use (except the empty string which is okay because it doesn't
       have  any  characters to encode) come from outside of the function.  Due to that, the user is responsible
       for making sure that the msg, and the keys and values in table all match in terms  of  type  (unicode  vs
       str)  and  encoding  (You can do some error checking to make sure the user gave all the same type but you
       can't do the same for the user giving different encodings).  You do not  need  to  make  changes  to  the
       string that require you to know the encoding or type of the string; everything is a simple replacement of
       one element in the array of characters in message with the character in table.

          import json
          from kitchen.text.converters import to_unicode, to_bytes

          def first_field_from_json_data(json_string):
              '''Return the first field in a json data structure.

              The format of the json data is a simple list of strings.
              '["one", "two", "three"]'
              '''
              if isinstance(json_string, unicode):
                  # On all python versions, json.loads() returns unicode if given
                  # a unicode string
                  return json.loads(json_string)[0]

              # Byte str: figure out which encoding we're dealing with
              if '\x00' not in json_data[:2]
                  encoding = 'utf8'
              elif '\x00\x00\x00' == json_data[:3]:
                  encoding = 'utf-32-be'
              elif '\x00\x00\x00' == json_data[1:4]:
                  encoding = 'utf-32-le'
              elif '\x00' == json_data[0] and '\x00' == json_data[2]:
                  encoding = 'utf-16-be'
              else:
                  encoding = 'utf-16-le'

              data = json.loads(unicode(json_string, encoding))
              return data[0].encode(encoding)

       In  this  example  the  function takes either a byte str type or a unicode string that has a list in json
       format and returns the first field from it as the type of the input string.  The first section of code is
       very straightforward; we receive a unicode string, parse it with a function, and then  return  the  first
       field from our parsed data (which our function returned to us as json data).

       The second portion that deals with byte str is not so straightforward.  Before we can parse the string we
       have  to  determine what characters the bytes in the string map to.  If we didn't do that, we wouldn't be
       able to properly find which characters are present in the string.  In order to do that we have to  figure
       out  the  encoding  of the byte str.  Luckily, the json specification states that all strings are unicode
       and encoded with one of UTF32be, UTF32le, UTF16be, UTF16le, or UTF-8 <#term-UTF-8>.  It  further  defines
       the  format  such  that  the  first  two  characters are always ASCII <#term-ASCII>.  Each of these has a
       different sequence of NULLs when they encode an ASCII <#term-ASCII> character.  We can use that to detect
       which encoding was used to create the byte str.

       Finally, we return the byte str by encoding the unicode back to a byte str.

       As you can see, in this example we have to convert from byte str to unicode and back.  But we  know  from
       the  json  specification that byte str has to be one of a limited number of encodings that we are able to
       detect.  That ability makes this strategy work.

       Now for some examples of using this strategy in ways that fail:

          import unicodedata
          def first_char(msg):
              '''Return the first character in a string'''
              if not isinstance(msg, unicode):
                  try:
                      msg = unicode(msg, 'utf8')
                  except UnicodeError:
                      msg = unicode(msg, 'latin1')
              msg = unicodedata.normalize('NFC', msg)
              return msg[0]

       If you look at that code and think that there's something fragile and  prone  to  breaking  in  the  try:
       except: block you are correct in being suspicious.  This code will fail on multi-byte character sets that
       aren't  UTF-8  <#term-UTF-8>.   It  can  also  fail on data where the sequence of bytes is valid UTF-8 <#
       term-UTF-8> but the bytes are actually of a different encoding.  The reasons this code fails is  that  we
       don't  know  what encoding the bytes are in and the code must convert from a byte str to a unicode string
       in order to function.

       In order to make this code robust we must know the encoding of msg.  The only way to know that is to  ask
       the user so the API must do that:

          import unicodedata
          def number_of_chars(msg, encoding='utf8', errors='strict'):
              if not isinstance(msg, unicode):
                  msg = unicode(msg, encoding, errors)
              msg = unicodedata.normalize('NFC', msg)
              return len(msg)

       Another example of failure:

          import os
          def listdir(directory):
              files = os.listdir(directory)
              if isinstance(directory, str):
                  return files
              # files could contain both bytes and unicode
              new_files = []
              for filename in files:
                  if not isinstance(filename, unicode):
                      # What to do here?
                      continue
                  new_files.appen(filename)
              return new_files

       This  function  illustrates  the  second  failure  mode.   Here,  not  all  of the possible values can be
       represented as unicode without knowing more about the encoding of each of the filenames involved.   Since
       each  filename  could have a different encoding there's a few different options to pursue.  We could make
       this function always return byte str since that can accurately represent anything that could be returned.
       If we want to return unicode we need to at least allow the user to specify what to do in case of an error
       decoding the bytes to unicode.  We can also let the user specify  the  encoding  to  use  for  doing  the
       decoding  but  that  won't  help  in  all cases since not all files will be in the same encoding (or even
       necessarily in any encoding):

          import locale
          import os
          def listdir(directory, encoding=locale.getpreferredencoding(), errors='strict'):
              # Note: In python-3.1+, surrogateescape may be a better default
              files = os.listdir(directory)
              if isinstance(directory, str):
                  return files
              new_files = []
              for filename in files:
                  if not isinstance(filename, unicode):
                      filename = unicode(filename, encoding=encoding, errors=errors)
                  new_files.append(filename)
              return new_files

       Note that although we use errors in this example as what to pass to the codec that decodes to unicode  we
       could  also have an errors argument that decides other things to do like skip a filename entirely, return
       a placeholder (Nondisplayable filename), or raise an exception.

       This leaves us with one last failure to describe:

          def first_field(csv_string):
              '''Return the first field in a comma separated values string.'''
              try:
                  return csv_string[:csv_string.index(',')]
              except ValueError:
                  return csv_string

       This code looks simple enough.  The hidden error here is that we are searching for a comma character in a
       byte str but not all encodings will use the same sequence of bytes to represent the comma.  If you use an
       encoding that's not ASCII <#term-ASCII> compatible on the byte level, then the literal comma ','  in  the
       above code will match inappropriate bytes.  Some examples of how it can fail:

       • Will find the byte representing an ASCII <#term-ASCII> comma in another character

       • Will find the comma but leave trailing garbage bytes on the end of the string

       • Will not match the character that represents the comma in this encoding

       There  are  two ways to solve this.  You can either take the encoding value from the user or you can take
       the separator value from the user.  Of the two, taking the encoding is the better option for two reasons:

       1. Taking a separator argument doesn't clearly document for the API user that the reason they  must  give
          it  is  to  properly  match the encoding of the csv_string.  They're just as likely to think that it's
          simply a way to specify an alternate character (like ":" or "|") for the separator.

       2. It's possible for a variable width encoding to reuse the same byte sequence for  different  characters
          in multiple sequences.

          Note:
             UTF-8  <#term-UTF-8>  is  resistant  to  this  as any character's sequence of bytes will never be a
             subset of another character's sequence of bytes.

       With that in mind, here's how to improve the API:

          def first_field(csv_string, encoding='utf-8', errors='replace'):
              if not isinstance(csv_string, unicode):
                  u_string = unicode(csv_string, encoding, errors)
                  is_unicode = False
              else:
                  u_string = csv_string

              try:
                  field = u_string[:U_string.index(u',')]
              except ValueError:
                  return csv_string

              if not is_unicode:
                  field = field.encode(encoding, errors)
              return field

       Note:
          If you decide you'll never encounter a variable width encoding that reuses byte sequences you can  use
          this code instead:

              def first_field(csv_string, encoding='utf-8'):
                  try:
                      return csv_string[:csv_string.index(','.encode(encoding))]
                  except ValueError:
                      return csv_string

   Separate functions
       Sometimes  you  want to be able to take either byte str or unicode strings, perform similar operations on
       either one and then return data in the same format as was given.  Probably the easiest way to do that  is
       to  have  separate  functions for each and adopt a naming convention to show that one is for working with
       byte str and the other is for working with unicode strings:

          def translate_b(msg, table):
              '''Replace values in str with other byte values like unicode.translate'''
              if not isinstance(msg, str):
                  raise TypeError('msg must be of type str')
              str_table = [chr(s) for s in xrange(0,256)]
              delete_chars = []
              for chr_val in (k for k in table.keys() if isinstance(k, int)):
                  if chr_val > 255:
                      raise ValueError('Keys in table must not exceed 255)')
                  if table[chr_val] == None:
                      delete_chars.append(chr(chr_val))
                  elif isinstance(table[chr_val], int):
                      if table[chr_val] > 255:
                          raise TypeError('table values cannot be more than 255 or less than 0')
                      str_table[chr_val] = chr(table[chr_val])
                  else:
                      if not isinstance(table[chr_val], str):
                          raise TypeError('character mapping must return integer, None or str')
                      str_table[chr_val] = table[chr_val]
              str_table = ''.join(str_table)
              delete_chars = ''.join(delete_chars)
              return msg.translate(str_table, delete_chars)

          def translate(msg, table):
              '''Replace values in a unicode string with other values'''
              if not isinstance(msg, unicode):
                  raise TypeError('msg must be of type unicode')
              return msg.translate(table)

       There's several things that we have to do in this API:

       • Because the function names might not be enough of a clue to the user of  the  functions  of  the  value
         types that are expected, we have to check that the types are correct.

       • We  keep  the  behaviour  of the two functions as close to the same as possible, just with byte str and
         unicode strings substituted for each other.

   Deciding whether to take str or unicode when no value is returned
       Not all functions have a return value.  Sometimes a function is there to interact with something external
       to python, for instance, writing a file out to disk or a method exists to update the internal state of  a
       data  structure.   One of the main questions with these APIs is whether to take byte str, unicode string,
       or both.  The answer depends on your use case but I'll give some examples here.

   Writing to external data
       When your information is going to an external data source like writing to  a  file  you  need  to  decide
       whether  to  take in unicode strings or byte str.  Remember that most external data sources are not going
       to be dealing with unicode directly.  Instead, they're going to be dealing with a sequence of bytes  that
       may  be  interpreted as unicode.  With that in mind, you either need to have the user give you a byte str
       or convert to a byte str inside the function.

       Next you need to think about the type of  data  that  you're  receiving.   If  it's  textual  data,  (for
       instance,  this  is  a chat client and the user is typing messages that they expect to be read by another
       person) it probably makes sense to take in unicode strings and do the conversion  inside  your  function.
       On  the  other  hand,  if  this  is  a lower level function that's passing data into a network socket, it
       probably should be taking byte str instead.

       Just as noted in the API notes above, you should specify an encoding and errors argument if you  need  to
       transform from unicode string to byte str and you are unable to guess the encoding from the data itself.

   Updating data structures
       Sometimes  your  API  is  just  going  to  update  a  data structure and not immediately output that data
       anywhere.  Just as when writing external data, you should think about both what your function is going to
       do with the data eventually and what the caller of your function is thinking  that  they're  giving  you.
       Most  of  the time, you'll want to take unicode strings and enter them into the data structure as unicode
       when the data is textual in nature.  You'll want to take byte str and enter them into the data  structure
       as byte str when the data is not text.  Use a naming convention so the user knows what's expected.

   APIs to Avoid
       There  are  a  few  APIs that are just wrong.  If you catch yourself making an API that does one of these
       things, change it before anyone sees your code.

   Returning unicode unless a conversion fails
       This type of API usually deals with byte str at some point  and  converts  it  to  unicode  because  it's
       usually thought to be text.  However, there are times when the bytes fail to convert to a unicode string.
       When that happens, this API returns the raw byte str instead of a unicode string.  One example of this is
       present in the python standard library <http://docs.python.org/library>: python2's os.listdir():

          >>> import os
          >>> import locale
          >>> locale.getpreferredencoding()
          'UTF-8'
          >>> os.mkdir('/tmp/mine')
          >>> os.chdir('/tmp/mine')
          >>> open('nonsense_char_\xff', 'w').close()
          >>> open('all_ascii', 'w').close()
          >>> os.listdir(u'.')
          [u'all_ascii', 'nonsense_char_\xff']

       The  problem  with  APIs  like this is that they cause failures that are hard to debug because they don't
       happen where the variables are set.  For instance, let's say you take the filenames from os.listdir() and
       give it to this function:

          def normalize_filename(filename):
              '''Change spaces and dashes into underscores'''
              return filename.translate({ord(u' '):u'_', ord(u' '):u'_'})

       When you test this, you use filenames that all are decodable in your preferred  encoding  and  everything
       seems  to  work.   But  when  this  code is run on a machine that has filenames in multiple encodings the
       filenames  returned  by  os.listdir()  suddenly  include  byte  str.   And  byte  str  has  a   different
       string.translate()  function that takes different values.  So the code raises an exception where it's not
       immediately obvious that os.listdir() is at fault.

   Ignoring values with no chance of recovery
       An early version of python3 attempted to fix the os.listdir() problem pointed out in the last section  by
       returning  all values that were decodable to unicode and omitting the filenames that were not.  This lead
       to the following output:

          >>> import os
          >>> import locale
          >>> locale.getpreferredencoding()
          'UTF-8'
          >>> os.mkdir('/tmp/mine')
          >>> os.chdir('/tmp/mine')
          >>> open(b'nonsense_char_\xff', 'w').close()
          >>> open('all_ascii', 'w').close()
          >>> os.listdir('.')
          ['all_ascii']

       The issue with this type of code is that it is silently doing something surprising.  The  caller  expects
       to  get  a  full  list  of files back from os.listdir().  Instead, it silently ignores some of the files,
       returning only a subset.  This leads to code that doesn't do what is expected that may go unnoticed until
       the code is in production and someone notices that something important is being missed.

   Raising a UnicodeException with no chance of recovery
       Believe it or not, a few libraries exist that make it  impossible  to  deal  with  unicode  text  without
       raising  a  UnicodeError.   What seems to occur in these libraries is that the library has functions that
       expect to receive a unicode string.  However, internally,  those  functions  call  other  functions  that
       expect  to  receive  a  byte  str.   The programmer of the API was smart enough to convert from a unicode
       string to a byte str but they did not give the user the chance to specify the encodings to use or how  to
       deal  with  errors.   This  results  in exceptions when the user passes in a byte str because the initial
       function wants a unicode string and exceptions when the user passes  in  a  unicode  string  because  the
       function can't convert the string to bytes in the encoding that it's selected.

       Do not put the user in the position of not being able to use your API without raising a UnicodeError with
       certain  values.   If you can only safely take unicode strings, document that byte str is not allowed and
       vice versa.  If you have to convert internally, make sure to give the caller of your function  parameters
       to  control the encoding and how to treat errors that may occur during the encoding/decoding process.  If
       your code will raise a UnicodeError with non-ASCII  <#term-ASCII>  values  no  matter  what,  you  should
       probably rethink your API.

   Knowing your data
       If  you've  read  all the way down to this section without skipping you've seen several admonitions about
       the type of data you are processing affecting the viability of the various API choices.

       Here's a few things to consider in your data:

   Do you need to operate on both bytes and unicode?
       Much of the data in libraries, programs, and the general environment outside of python is  written  where
       strings  are sequences of bytes.  So when we interact with data that comes from outside of python or data
       that is about to leave python it may make sense to only operate on the data as a byte str.   There's  two
       times when this may make sense:

       1. The  user is intended to hand the data to the function and then the function takes care of sending the
          data outside of python (to the filesystem, over the network, etc).

       2. The data is not representable as text.  For instance, writing a binary file format.

       Even when your code is operating in this area you still need to think a little more about your data.  For
       instance, it might make sense for the person using your API to  pass  in  unicode  strings  and  let  the
       function convert that into the byte str that it then sends over the wire.

       There  are  also  times  when it might make sense to operate only on unicode strings.  unicode represents
       text so anytime that you are working on textual data  that  isn't  going  to  leave  python  it  has  the
       potential  to be a unicode-only API.  However, there's two things that you should consider when designing
       a unicode-only API:

       1. As your API gains popularity, people are going to use your API in places that you may not have thought
          of.  Corner cases in these other places may mean that processing bytes is desirable.

       2. In python2, byte str and unicode are often used interchangably  with  each  other.   That  means  that
          people  programming  against  your  API may have received str from some other API and it would be most
          convenient for their code if your API accepted it.

       Note:
          In python3, the separation between the text type and the byte type are more  clear.   So  in  python3,
          there's less need to have all APIs take both unicode and bytes.

   Can you restrict the encodings?
       If  you  determine  that  you  have  to  deal with byte str you should realize that not all encodings are
       created equal.  Each has different properties that may make it possible to provide a simpler API provided
       that you can reasonably tell the users of your API that they cannot use certain classes of encodings.

       As one example, if you are required to find a comma (,) in a byte str you have different choices based on
       what encodings are allowed.  If you  can  reasonably  restrict  your  API  users  to  only  giving  ASCII
       compatible  <#term-ASCII-compatible>  encodings you can do this simply by searching for the literal comma
       character because that character will be represented by the same byte sequence in all ASCII compatible <#
       term-ASCII-compatible> encodings.

       The following are some classes of encodings to be aware of as you decide how generic your code  needs  to
       be.

   Single byte encodings
       Single  byte  encodings  can  only  represent  256  total  characters.   They  encode  the code points <#
       term-code-points> for a character to the equivalent number in a single byte.

       Most  single  byte  encodings  are  ASCII  compatible  <#term-ASCII-compatible>.   ASCII  compatible   <#
       term-ASCII-compatible> encodings are the most likely to be usable without changes to code so this is good
       news.     A    notable    exception    to    this    is    the    EBDIC    <http://en.wikipedia.org/wiki/
       Extended_Binary_Coded_Decimal_Interchange_Code> family of encodings.

   Multibyte encodings
       Multibyte encodings use more than one byte to encode some characters.

   Fixed width
       Fixed width encodings have a set number of bytes to represent all of the characters in the character set.
       UTF-32 is an example of a fixed width encoding that uses four bytes per character and can  express  every
       unicode  characters.   There  are  a  number  of problems with writing APIs that need to operate on fixed
       width, multibyte characters.  To go back to our earlier example of finding a comma in a string,  we  have
       to realize that even in UTF-32 where the code point <#term-code-point> for ASCII <#term-ASCII> characters
       is the same as in ASCII <#term-ASCII>, the byte sequence for them is different.  So you cannot search for
       the  literal  byte  character  as  it may pick up false positives and may break a byte sequence in an odd
       place.

   Variable Width
   ASCII compatible
       UTF-8 <#term-UTF-8> and the EUC <http://en.wikipedia.org/wiki/Extended_Unix_Code> family of encodings are
       examples of ASCII  compatible  <#term-ASCII-compatible>  multi-byte  encodings.   They  achieve  this  by
       adhering to two principles:

       • All  of  the  ASCII  <#term-ASCII> characters are represented by the byte that they are in the ASCII <#
         term-ASCII> encoding.

       • None of the ASCII <#term-ASCII> byte sequences are reused in any other byte sequence  for  a  different
         character.

   Escaped
       Some  multibyte  encodings  work  by  using  only  bytes from the ASCII <#term-ASCII> encoding but when a
       particular sequence of those byes is found, they are interpreted as meaning something  other  than  their
       ASCII <#term-ASCII> values.  UTF-7 is one such encoding that can encode all of the unicode code points <#
       term-code-points>.  For instance, here's a some Japanese characters encoded as UTF-7:

          >>> a = u'\u304f\u3089\u3068\u307f'
          >>> print a
          くらとみ
          >>> print a.encode('utf-7')

          +ME8wiTBoMH8-
       These encodings can be used when you need to encode unicode data that may contain non-ASCII <#term-ASCII>
       characters for inclusion in an ASCII <#term-ASCII> only transport medium or file.

       However,  they are not ASCII compatible <#term-ASCII-compatible> in the sense that we used earlier as the
       bytes that represent a ASCII <#term-ASCII> character are being reused as part of  other  characters.   If
       you  were  to  search  for  a  literal  plus sign in this encoded string, you would run across many false
       positives, for instance.

   Other
       There are many other popular variable width encodings, for instance UTF-16 and shift-JIS.  Many of  these
       are  not ASCII compatible <#term-ASCII-compatible> so you cannot search for a literal ASCII <#term-ASCII>
       character without danger of false positives or false negatives.

   Kitchen API
       Kitchen is structured as a collection of modules.  In its current configuration, Kitchen ships  with  the
       following  modules.   Other  addon modules that may drag in more dependencies can be found on the project
       webpage <https://fedorahosted.org/kitchen>

   Kitchen.i18n Module
       I18N <#term-I18N> is an important piece of  any  modern  program.   Unfortunately,  setting  up  i18n  <#
       term-I18N>  in  your  program  is often a confusing process.  The functions provided here aim to make the
       programming side of that a little easier.

       Most projects will be able to do something like this when they startup:

          # myprogram/__init__.py:

          import os
          import sys

          from kitchen.i18n import easy_gettext_setup

          _, N_  = easy_gettext_setup('myprogram', localedirs=(
                  os.path.join(os.path.realpath(os.path.dirname(__file__)), 'locale'),
                  os.path.join(sys.prefix, 'lib', 'locale')
                  ))

       Then, in other files that have strings that need translating:

          # myprogram/commands.py:

          from myprogram import _, N_

          def print_usage():
              print _(u"""available commands are:
              --help              Display help
              --version           Display version of this program
              --bake-me-a-cake    as fast as you can
                  """)

          def print_invitations(age):
              print _('Please come to my party.')
              print N_('I will be turning %(age)s year old',
                  'I will be turning %(age)s years old', age) % {'age': age}

       See the documentation of easy_gettext_setup() and get_translation_object() for more details.

          See also:

              gettext
                     for details of how the python gettext facilities work

              babel <http://babel.edgewall.org>
                     The  babel  module  for  in   depth   information   on   gettext,   message   catalogs   <#
                     term-message-catalogs>,  and  translating  your app.  babel provides some nice features for
                     i18n <#term-I18N> on top of gettext

   Functions
       easy_gettext_setup() should satisfy the needs of most users.   get_translation_object()  is  designed  to
       ease the way for anyone that needs more control.

       kitchen.i18n.easy_gettext_setup(domain, localedirs=(), use_unicode=True)
              Setup translation functions for an application

              Parametersdomain  --  Name of the message domain.  This should be a unique name that can be used to
                       lookup the message catalog <#term-message-catalog> for this app.

                     • localedirs   --   Iterator   of   directories   to   look   for   message   catalogs   <#
                       term-message-catalogs> under.  The first directory to exist is used regardless of whether
                       messages  for  this  domain  are  present.  If none of the directories exist, fallback on
                       sys.prefix + /share/locale Default: No directories to search so we just use the fallback.

                     • use_unicode -- If True return the gettext functions  for  str  strings  else  return  the
                       functions for byte bytes for the translations.  Default is True.

              Returns
                     tuple of the gettext function and gettext function for plurals

              Setting  up  gettext  can be a little tricky because of lack of documentation.  This function will
              setup   gettext    using   the   Class-based   API   <http://docs.python.org/library/gettext.html#
              class-based-api> for you.  For the simple case, you can use the default arguments and call it like
              this:

                 _, N_ = easy_gettext_setup()

              This  will  get  you two functions, _() and N_() that you can use to mark strings in your code for
              translation.  _() is used to mark strings that don't need to worry about plural  forms  no  matter
              what  the value of the variable is.  N_() is used to mark strings that do need to have a different
              form if a variable in the string is plural.

              See also:

                 Kitchen.i18n Module <>
                        This module's documentation has examples of using _() and N_()

                 get_translation_object()
                        for information on how  to  use  localedirs  to  get  the  proper  message  catalogs  <#
                        term-message-catalogs>  both  when  in  development  and when installed to FHS compliant
                        directories on Linux.

              Note:
                 The gettext functions returned from this function should be superior to the ones returned  from
                 gettext.   The  traits  that  make  them  better  are  described  in  the DummyTranslations and
                 NewGNUTranslations documentation.

              Changed in version kitchen-0.2.4: ; API kitchen.i18n 2.0.0 Changed easy_gettext_setup() to  return
              the lgettext functions instead of gettext functions when use_unicode=False.

       kitchen.i18n.get_translation_object(domain, localedirs=(), languages=None, class_=None, fallback=True,
       codeset=None, python2_api=True)
              Get a translation object bound to the message catalogs <#term-message-catalogs>

              Parametersdomain  --  Name of the message domain.  This should be a unique name that can be used to
                       lookup the message catalog <#term-message-catalog> for this app or library.

                     • localedirs   --   Iterator   of   directories   to   look   for   message   catalogs   <#
                       term-message-catalogs> under.  The directories are searched in order for message catalogs
                       <#term-message-catalogs>.   For  each  of  the directories searched, we check for message
                       catalogs  in  any  language  specified  in:attr:languages.   The  message   catalogs   <#
                       term-message-catalogs>  are  used  to  create the Translation object that we return.  The
                       Translation object will attempt to lookup the msgid in the first catalog that  we  found.
                       If  it's  not  in  there, it will go through each subsequent catalog looking for a match.
                       For this reason, the order in which you specify the localedirs may be important.   If  no
                       message  catalogs  <#term-message-catalogs>  are found, either return a DummyTranslations
                       object or raise an IOError depending on the value of  fallback.   Rhe  default  localedir
                       from   gettext which is os.path.join(sys.prefix, 'share', 'locale') on Unix is implicitly
                       appended to the localedirs, making it the last directory searched.

                     • languages --

                       Iterator of language codes to check for message  catalogs  <#term-message-catalogs>.   If
                       unspecified, the user's locale settings will be used.

                       See also:
                          gettext.find() for information on what environment variables are used.

                     • class  --  The  class  to  use  to  extract  translations  from  the  message catalogs <#
                       term-message-catalogs>.  Defaults to NewGNUTranslations.

                     • fallback  --  If  set  to  data:False,  raise  an  IOError  if  no  message  catalogs  <#
                       term-message-catalogs>  are  found.   If  True,  the  default, return a DummyTranslations
                       object.

                     • codeset -- Set the character encoding to use when returning byte bytes objects.  This  is
                       equivalent  to  calling output_charset() on the Translations object that is returned from
                       this function.

                     • python2_api -- When data:True (default), return Translation objects that use the  python2
                       gettext  api  (gettext() and lgettext() return byte bytes.  ugettext() exists and returns
                       str strings).  When False, return Translation objects that use the  python3  gettext  api
                       (gettext returns str strings and lgettext returns byte bytes.  ugettext does not exist.)

              Returns
                     Translation object to get gettext methods from

              If  you  need more flexibility than easy_gettext_setup(), use this function.  It sets up a gettext
              Translation object and returns it to you.  Then you can access any of the methods  of  the  object
              that you need directly.  For instance, if you specifically need to access lgettext():

                 translations = get_translation_object('foo')
                 translations.lgettext('My Message')

              This   function  is  similar  to  the  python  standard  library  <http://docs.python.org/library>
              gettext.translation() but makes it better in two ways

              1.

                 It returns NewGNUTranslations or DummyTranslations
                        objects  by  default.   These  are   superior   to   the   gettext.GNUTranslations   and
                        gettext.NullTranslations  objects  because  they  are consistent in the string type they
                        return and they fix several issues that can causethe python  standard  library  <http://
                        docs.python.org/library> objects to throw UnicodeError.

              2.

                 This function takes multiple directories to search for
                        message catalogs <#term-message-catalogs>.

              The  latter  is  important  when  setting  up gettext in a portable manner.  There is not a common
              directory for translations across operating systems so one needs to look in  multiple  directories
              for  the  translations.   get_translation_object() is able to handle that if you give it a list of
              directories to search for catalogs:

                 translations = get_translation_object('foo', localedirs=(
                      os.path.join(os.path.realpath(os.path.dirname(__file__)), 'locale'),
                      os.path.join(sys.prefix, 'lib', 'locale')))

              This will search for several different directories:

              1. A  directory   named   locale   in   the   same   directory   as   the   module   that   called
                 get_translation_object(),

              2. In /usr/lib/locale

              3. In /usr/share/locale (the fallback directory)

              This  allows  gettext  to  work  on  Windows  and  in  development  (where the message catalogs <#
              term-message-catalogs> are typically in the toplevel module directory)  and  also  when  installed
              under   Linux   (where   the   message   catalogs   <#term-message-catalogs>   are   installed  in
              /usr/share/locale).  You (or the system packager) just need to install  the  message  catalogs  <#
              term-message-catalogs>  in  /usr/share/locale  and  remove the locale directory from the module to
              make this work.  ie:

                 In development:
                     ~/foo   # Toplevel module directory
                     ~/foo/__init__.py
                     ~/foo/locale    # With message catalogs below here:
                     ~/foo/locale/es/LC_MESSAGES/foo.mo

                 Installed on Linux:
                     /usr/lib/python2.7/site-packages/foo
                     /usr/lib/python2.7/site-packages/foo/__init__.py
                     /usr/share/locale/  # With message catalogs below here:
                     /usr/share/locale/es/LC_MESSAGES/foo.mo

              Note:
                 This function will setup Translation objects that attempt to lookup msgids in all of the  found
                 message  catalogs  <#term-message-catalogs>.   This  means  if you have several versions of the
                 message catalogs <#term-message-catalogs> installed in different directories that the  function
                 searches, you need to make sure that localedirs specifies the directories so that newer message
                 catalogs  <#term-message-catalogs>  are  searched first.  It also means that if a newer catalog
                 does not contain a translation for a msgid but an older one  that's  in  localedirs  does,  the
                 translation from that older catalog will be returned.

              Changed   in   version   kitchen-1.1.0:   ;   API   kitchen.i18n  2.1.0  Add  more  parameters  to
              get_translation_object() so it can more easily be used as a replacement for gettext.translation().
              Also change the way we use localedirs.  We cycle through them until we find a suitable locale file
              rather than simply cycling through until we find a directory that exists.  The new code  is  based
              heavily  on  the  python  standard  library <http://docs.python.org/library> gettext.translation()
              function.

              Changed in version kitchen-1.2.0: ; API kitchen.i18n 2.2.0 Add python2_api parameter

   Translation Objects
       The standard translation objects from the gettext module suffer from several problems:

       • They can throw UnicodeError

       • They can't find translations for non-ASCII <#term-ASCII> byte str messages

       • They may return either unicode string or byte str from the same function even though the functions  say
         they will only return unicode or only return byte str.

       DummyTranslations and NewGNUTranslations were written to fix these issues.

       class kitchen.i18n.DummyTranslations(fp=None, python2_api=True)
              Safer version of gettext.NullTranslations

              This  Translations  class  doesn't  translate the strings and is intended to be used as a fallback
              when  there  were  errors  setting   up   a   real   Translations   object.    It's   safer   than
              gettext.NullTranslations in its handling of byte bytes vs str strings.

              Unlike  NullTranslations,  this  Translation class will never throw a UnicodeError.  The code that
              you have around a call to DummyTranslations might throw a UnicodeError but at least that  will  be
              in  code  you control and can fix.  Also, unlike NullTranslations all of this Translation object's
              methods guarantee to return byte bytes except for ugettext() and ungettext()  which  guarantee  to
              return str strings.

              When byte bytes are returned, the strings will be encoded according to this algorithm:

              1. If  a  fallback  has been added, the fallback will be called first.  You'll need to consult the
                 fallback to see whether it performs any encoding changes.

              2. If a byte bytes was given, the same byte bytes will be returned.

              3. If a str string was given and set_output_charset() has been called then we  encode  the  string
                 using the output_charset

              4. If  a  str  string was given and this is gettext() or ngettext() and _charset was set output in
                 that charset.

              5. If a str string was given and this is gettext() or ngettext() we encode it using 'utf-8'.

              6. If a str string was given and this is lgettext() or lngettext() we encode using  the  value  of
                 locale.getpreferredencoding()

              For  ugettext()  and  ungettext(),  we  go  through  the  same  set  of  steps  with the following
              differences:

              • We transform byte bytes into str strings for these methods.

              • The encoding used to decode the byte bytes is taken from input_charset if it's set, otherwise we
                decode using UTF-8 <#term-UTF-8>.

              input_charset
                     is an extension to the python  standard  library  <http://docs.python.org/library>  gettext
                     that  specifies  what charset a message is encoded in when decoding a message to str.  This
                     is used for two purposes:

              1. If the message string is a byte bytes, this is used to decode the string to a str string before
                 looking it up in the message catalog <#term-message-catalog>.

              2. In ugettext() and ungettext() methods, if  a  byte  bytes  is  given  as  the  message  and  is
                 untranslated  this  is  used  as  the  encoding  when  decoding to str.  This is different from
                 _charset which may be set when a message  catalog  <#term-message-catalog>  is  loaded  because
                 input_charset  is  used  to  describe  an  encoding used in a python source file while _charset
                 describes the encoding used in the message catalog <#term-message-catalog> file.

              Any characters that aren't able to be transformed from a byte bytes to str string  or  vice  versa
              will  be  replaced with a replacement character (ie: u'�' in unicode based encodings, '?' in other
              ASCII <#term-ASCII> compatible encodings).

              See also:

                 gettext.NullTranslations
                        For information about what methods are available and what they do.

              Changed in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0 * Although we  had  adapted  gettext(),
              ngettext(),
                lgettext(), and lngettext() to always return byte
                bytes, we hadn't forced those byte bytes to always be
                in a specified charset.  We now make sure that gettext() and
                ngettext() return byte bytes encoded using
                output_charset if set, otherwise charset and if
                neither of those, UTF-8 <#term-UTF-8>.  With lgettext() and
                lngettext() output_charset if set, otherwise
                locale.getpreferredencoding().  * Make setting input_charset and output_charset also
                set those attributes on any fallback translation objects.

              Changed in version kitchen-1.2.0: ; API kitchen.i18n 2.2.0 Add python2_api parameter to __init__()

              output_charset()
                     Compatibility for python2.3 which doesn't have output_charset

              set_output_charset(charset)
                     Set the output charset

                     This  serves  two  purposes.  The normal gettext.NullTranslations.set_output_charset() does
                     not set the output on fallback objects.  On  python-2.3,  gettext.NullTranslations  objects
                     don't contain this method.

       class kitchen.i18n.NewGNUTranslations(fp=None, python2_api=True)
              Safer version of gettext.GNUTranslations

              gettext.GNUTranslations suffers from two problems that this class fixes.

              1. gettext.GNUTranslations  can  throw a UnicodeError in gettext.GNUTranslations.ugettext() if the
                 message being translated has non-ASCII <#term-ASCII> characters and there is no translation for
                 it.

              2. gettext.GNUTranslations can return byte bytes from gettext.GNUTranslations.ugettext()  and  str
                 strings from the other gettext() methods if the message being translated is the wrong type

              When byte bytes are returned, the strings will be encoded according to this algorithm:

              1. If  a  fallback  has been added, the fallback will be called first.  You'll need to consult the
                 fallback to see whether it performs any encoding changes.

              2. If a byte bytes was given, the same byte bytes will be returned.

              3. If a str string was given and set_output_charset() has been called then we  encode  the  string
                 using the output_charset

              4. If  a  str string was given and this is gettext() or ngettext() and a charset was detected when
                 parsing the message catalog <#term-message-catalog>, output in that charset.

              5. If a str string was given and this is gettext() or ngettext()  we  encode  it  using  UTF-8  <#
                 term-UTF-8>.

              6. If  a  str  string was given and this is lgettext() or lngettext() we encode using the value of
                 locale.getpreferredencoding()

              For ugettext() and  ungettext(),  we  go  through  the  same  set  of  steps  with  the  following
              differences:

              • We transform byte bytes into str strings for these methods.

              • The encoding used to decode the byte bytes is taken from input_charset if it's set, otherwise we
                decode using UTF-8 <#term-UTF-8>

              input_charset
                     an  extension  to the python standard library <http://docs.python.org/library> gettext that
                     specifies what charset a message is encoded in when decoding a message  to  str.   This  is
                     used for two purposes:

              1. If the message string is a byte bytes, this is used to decode the string to a str string before
                 looking it up in the message catalog <#term-message-catalog>.

              2. In  ugettext()  and  ungettext()  methods,  if  a  byte  bytes  is  given as the message and is
                 untranslated his is used as the encoding when decoding to str.   This  is  different  from  the
                 _charset  parameter  that  may  be set when a message catalog <#term-message-catalog> is loaded
                 because input_charset is used to describe an encoding  used  in  a  python  source  file  while
                 _charset describes the encoding used in the message catalog <#term-message-catalog> file.

              Any  characters  that  aren't able to be transformed from a byte bytes to str string or vice versa
              will be replaced with a replacement character (ie: u'�' in unicode based encodings, '?'  in  other
              ASCII <#term-ASCII> compatible encodings).

              See also:

                 gettext.GNUTranslations.gettext
                        For information about what methods this class has and what they do

              Changed  in  version  kitchen-1.1.0:  ;  API kitchen.i18n 2.1.0 Although we had adapted gettext(),
              ngettext(), lgettext(), and lngettext() to always return byte bytes, we hadn't forced  those  byte
              bytes  to always be in a specified charset.  We now make sure that gettext() and ngettext() return
              byte bytes encoded using output_charset if set, otherwise charset and if neither of  those,  UTF-8
              <#term-UTF-8>.     With    lgettext()   and   lngettext()   output_charset   if   set,   otherwise
              locale.getpreferredencoding().

   Kitchen.text: unicode and utf8 and xml oh my!
       The kitchen.text module contains functions that deal with text manipulation.

   Kitchen.text.converters
       Functions to handle conversion of byte bytes and str strings.

       Changed in version kitchen: 0.2a2 ; API kitchen.text 2.0.0 Added getwriter()

       Changed  in  version  kitchen:  0.2.2    ;   API   kitchen.text   2.1.0   Added   exception_to_unicode(),
       exception_to_bytes(), EXCEPTION_CONVERTERS, and BYTE_EXCEPTION_CONVERTERS

       Changed  in version kitchen: 1.0.1 ; API kitchen.text 2.1.1 Deprecated BYTE_EXCEPTION_CONVERTERS as we've
       simplified exception_to_unicode() and exception_to_bytes() to make it unnecessary

   Byte Strings and Unicode in Python2
       Python2 has two string types,  str  and  unicode.   unicode  represents  an  abstract  sequence  of  text
       characters.  It can hold any character that is present in the unicode standard.  str can hold any byte of
       data.   The  operating system and python work together to display these bytes as characters in many cases
       but you should always keep in mind that the information is really a sequence of bytes, not a sequence  of
       characters.   In python2 these types are interchangeable a large amount of the time.  They are one of the
       few pairs of types that automatically convert when used in equality:

          >>> # string is converted to unicode and then compared
          >>> "I am a string" == u"I am a string"
          True
          >>> # Other types, like int, don't have this special treatment
          >>> 5 == "5"
          False

       However, this automatic conversion tends to lull people into a false  sense  of  security.   As  long  as
       you're dealing with ASCII <#term-ASCII> characters the automatic conversion will save you from seeing any
       differences.  Once you start using characters that are not in ASCII <#term-ASCII>, you will start getting
       UnicodeError and UnicodeWarning as the automatic conversions between the types fail:

          >>> "I am an ñ" == u"I am an ñ"
          __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
          False

       Why  do  these  conversions  fail?   The  reason  is that the python2 unicode type represents an abstract
       sequence of unicode text known as code points  <#term-code-points>.   str,  on  the  other  hand,  really
       represents  a  sequence  of  bytes.   Those  bytes  are  converted  by your operating system to appear as
       characters on your screen using a particular encoding (usually with a default defined  by  the  operating
       system  and  customizable  by  the  individual  user.) Although ASCII <#term-ASCII> characters are fairly
       standard in what bytes represent each character, the bytes outside of the ASCII <#term-ASCII>  range  are
       not.  In general, each encoding will map a different character to a particular byte.  Newer encodings map
       individual  characters  to  multiple  bytes  (which  the  older  encodings will instead treat as multiple
       characters).  In the face of these differences, python refuses to guess at an encoding and instead issues
       a warning or exception and refuses to convert.

       See also:

          Overcoming frustration: Correctly using unicode in python2 <#overcoming-frustration>
                 For a longer introduction on this subject.

   Strategy for Explicit Conversion
       So what is the best method of dealing with this weltering babble  of  incoherent  encodings?   The  basic
       strategy is to explicitly turn everything into unicode when it first enters your program.  Then, when you
       send  it  to output, you can transform the unicode back into bytes.  Doing this allows you to control the
       encodings that are used and avoid getting tracebacks due to UnicodeError. Using the functions defined  in
       this module, that looks something like this:

          >>> from kitchen.text.converters import to_unicode, to_bytes
          >>> name = raw_input('Enter your name: ')
          Enter your name: Toshio くらとみ
          >>> name
          'Toshio \xe3\x81\x8f\xe3\x82\x89\xe3\x81\xa8\xe3\x81\xbf'
          >>> type(name)
          <type 'str'>
          >>> unicode_name = to_unicode(name)
          >>> type(unicode_name)
          <type 'unicode'>
          >>> unicode_name
          u'Toshio \u304f\u3089\u3068\u307f'
          >>> # Do a lot of other things before needing to save/output again:
          >>> output = open('datafile', 'w')
          >>> output.write(to_bytes(u'Name: %s\\n' % unicode_name))

       A few notes:

       Looking  at  line  6,  you'll  notice  that  the input we took from the user was a byte str.  In general,
       anytime we're getting a value from outside of python (The filesystem,  reading  data  from  the  network,
       interacting  with  an  external  command,  reading  values  from the environment) we are interacting with
       something that will want to give us a byte str.  Some python  standard  library  <http://docs.python.org/
       library>  modules  and  third party libraries will automatically attempt to convert a byte str to unicode
       strings for you.  This is both a boon and a curse.  If the library can guess correctly about the encoding
       that the data is in, it will return unicode objects to you without you having to convert.  However, if it
       can't guess correctly, you may end up with one of several problems:

       UnicodeError
              The library attempted to decode a byte str into a unicode, string failed, and raises an exception.

       Garbled data
              If the library returns the data after decoding it with the wrong encoding, the characters you  see
              in the unicode string won't be the ones that you expect.

       A byte str instead of unicode string
              Some  libraries  will  return a unicode string when they're able to decode the data and a byte str
              when they can't.  This is generally the hardest problem to debug when it occurs.  Avoid it in your
              own code and try to avoid or open bugs against upstreams that do this. See Designing Unicode Aware
              APIs <#designingunicodeawareapis> for strategies to do this properly.

       On line 8, we convert from a byte str to a unicode string.  to_unicode() does this for us.  It  has  some
       error  handling  and  sane  defaults  that  make  this  a nicer function to use than calling str.decode()
       directly:

       • Instead of defaulting to the ASCII <#term-ASCII> encoding which fails with all but the simple  American
         English characters, it defaults to UTF-8 <#term-UTF-8>.

       • Instead  of  raising  an  error if it cannot decode a value, it will replace the value with the unicode
         "Replacement character" symbol ().

       • If you happen to call this method with something that is not a str or unicode, it will return an  empty
         unicode string.

       All  three  of  these  can  be  overridden  using  different  keyword arguments to the function.  See the
       to_unicode() documentation for more information.

       On line 15 we push the data back out to a file.  Two things you should note here:

       1. We deal with the strings as unicode until the last instant.  The string format  that  we're  using  is
          unicode  and  the variable also holds unicode.  People sometimes get into trouble when they mix a byte
          str format with a variable that holds a unicode string (or vice versa) at this stage.

       2. to_bytes(), does the reverse of to_unicode().  In this case, we're using the default values which turn
          unicode into a byte str using UTF-8 <#term-UTF-8>.  Any errors are  replaced  with  a    and  sending
          nonstring  objects  yield  empty  unicode  strings.   Just  like  to_unicode(),  you  can  look at the
          documentation for to_bytes() to find out how to override any of these defaults.

   When to use an alternate strategy
       The default strategy of decoding to unicode strings when you take data in and encoding to a byte str when
       you send the data back out works great for most problems but there are a few times when you shouldn't:

       • The values aren't meant to be read as text

       • The values need to be byte-for-byte when you send them back out -- for instance if  they  are  database
         keys or filenames.

       • You are transferring the data between several libraries that all expect byte str.

       In  each  of these instances, there is a reason to keep around the byte str version of a value.  Here's a
       few hints to keep your sanity in these situations:

       1. Keep your unicode and str values separate.  Just like the pain caused when you  have  to  use  someone
          else's  library  that  returns  both unicode and str you can cause yourself pain if you have functions
          that can return both types or variables that could hold either type of value.

       2. Name your variables so that you can tell whether you're storing byte str or unicode  string.   One  of
          the first things you end up having to do when debugging is determine what type of string you have in a
          variable  and  what  type of string you are expecting.  Naming your variables consistently so that you
          can tell which type they are supposed to hold will save you from at least one of those steps.

       3. When you get values initially, make sure that you're dealing with the type of value that you expect as
          you save it.  You can use isinstance() or to_bytes() since to_bytes() doesn't do any modifications  of
          the string if it's already a str.  When using to_bytes() for this purpose you might want to use:

             try:
                 b_input = to_bytes(input_should_be_bytes_already, errors='strict', nonstring='strict')
             except:
                 handle_errors_somehow()

          The  reason  is  that  the  default  of to_bytes() will take characters that are illegal in the chosen
          encoding and transform them to replacement characters.  Since the point of keeping this data as a byte
          str is to keep the exact same bytes when you  send  it  outside  of  your  code,  changing  things  to
          replacement  characters  should be rasing red flags that something is wrong.  Setting errors to strict
          will raise an exception which gives you an opportunity to fail gracefully.

       4. Sometimes you will want to print out the values that you have in your byte str.  When you do this  you
          will need to make sure that you transform unicode to str before combining them.  Also be sure that any
          other  function  calls  (including gettext) are going to give you strings that are the same type.  For
          instance:

             print to_bytes(_('Username: %(user)s'), 'utf-8') % {'user': b_username}

   Gotchas and how to avoid them
       Even when you have a good conceptual understanding of how python2 treats unicode and str there are  still
       some things that can surprise you.  In most cases this is because, as noted earlier, python or one of the
       python  libraries  you  depend  on  is  trying  to  convert  a value automatically and failing.  Explicit
       conversion at the appropriate place usually solves that.

   str(obj)
       One common idiom for getting a simple, string representation of an object is to use:

          str(obj)

       Unfortunately, this is not safe.  Sometimes str(obj) will return unicode.  Sometimes  it  will  return  a
       byte  str.   Sometimes, it will attempt to convert from a unicode string to a byte str, fail, and throw a
       UnicodeError.  To be safe from all of these, first decide whether you need unicode or str to be returned.
       Then use to_unicode() or to_bytes() to get the simple representation like this:

          u_representation = to_unicode(obj, nonstring='simplerepr')
          b_representation = to_bytes(obj, nonstring='simplerepr')

   print
       python has a builtin print() statement that outputs strings to the terminal.  This originated in  a  time
       when  python  only  dealt with byte str.  When unicode strings came about, some enhancements were made to
       the print() statement so that it could print those as well.  The enhancements make print() work  most  of
       the time.  However, the times when it doesn't work tend to make for cryptic debugging.

       The basic issue is that print() has to figure out what encoding to use when it prints a unicode string to
       the  terminal.  When python is attached to your terminal (ie, you're running the interpreter or running a
       script that prints to the screen) python is able to take the encoding value  from  your  locale  settings
       LC_ALL  or  LC_CTYPE and print the characters allowed by that encoding.  On most modern Unix systems, the
       encoding is utf-8 <#term-UTF-8> which means that you can print any unicode character without problem.

       There are two common cases of things going wrong:

       1. Someone has a locale set that does not accept all valid unicode characters.  For instance:

             $ LC_ALL=C python
             >>> print u'\ufffd'
             Traceback (most recent call last):
               File "<stdin>", line 1, in <module>
             UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

          This often happens when a script that you've written and debugged from the terminal  is  run  from  an
          automated  environment  like  cron.   It  also  occurs when you have written a script using a utf-8 <#
          term-UTF-8> aware locale and released it for consumption by people all over the internet.  Inevitably,
          someone is running with a locale that can't handle all unicode characters  and  you  get  a  traceback
          reported.

       2. You redirect output to a file.  Python isn't using the values in LC_ALL unconditionally to decide what
          encoding  to  use.  Instead it is using the encoding set for the terminal you are printing to which is
          set to accept different encodings by LC_ALL.  If you redirect to a file, you are no longer printing to
          the terminal so LC_ALL won't have any effect.  At this point, python will  decide  it  can't  find  an
          encoding and fallback to ASCII <#term-ASCII> which will likely lead to UnicodeError being raised.  You
          can see this in a short script:

             #! /usr/bin/python -tt
             print u'\ufffd'

          And then look at the difference between running it normally and redirecting to a file:

             $ ./test.py
             �
             $ ./test.py > t
             Traceback (most recent call last):
               File "test.py", line 3, in <module>
                   print u'\ufffd'
             UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

       The  short  answer  to  dealing with this is to always use bytes when writing output.  You can do this by
       explicitly converting to bytes like this:

          from kitchen.text.converters import to_bytes
          u_string = u'\ufffd'
          print to_bytes(u_string)

       or you can wrap stdout and stderr with a StreamWriter.  A StreamWriter is  convenient  in  that  you  can
       assign  it to encode for sys.stdout or sys.stderr and then have output automatically converted but it has
       the drawback of still being able to throw UnicodeError if the writer can't encode  all  possible  unicode
       codepoints.     Kitchen    provides    an    alternate    version    which    can   be   retrieved   with
       kitchen.text.converters.getwriter() which will not traceback in its standard configuration.

   Unicode, str, and dict keys
       The hash() of the ASCII <#term-ASCII> characters is the same for unicode and byte str.  When you use them
       in dict keys, they evaluate to the same dictionary slot:

          >>> u_string = u'a'
          >>> b_string = 'a'
          >>> hash(u_string), hash(b_string)
          (12416037344, 12416037344)
          >>> d = {}
          >>> d[u_string] = 'unicode'
          >>> d[b_string] = 'bytes'
          >>> d
          {u'a': 'bytes'}

       When you deal with key values outside of ASCII <#term-ASCII>, unicode and byte str evaluate unequally  no
       matter what their character content or hash value:

          >>> u_string = u'ñ'
          >>> b_string = u_string.encode('utf-8')
          >>> print u_string
          ñ
          >>> print b_string
          ñ
          >>> d = {}
          >>> d[u_string] = 'unicode'
          >>> d[b_string] = 'bytes'
          >>> d
          {u'\\xf1': 'unicode', '\\xc3\\xb1': 'bytes'}
          >>> b_string2 = '\\xf1'
          >>> hash(u_string), hash(b_string2)
          (30848092528, 30848092528)
          >>> d = {}
          >>> d[u_string] = 'unicode'
          >>> d[b_string2] = 'bytes'
          {u'\\xf1': 'unicode', '\\xf1': 'bytes'}

       How  do you work with this one?  Remember rule #1:  Keep your unicode and byte str values separate.  That
       goes for keys in a dictionary just like anything else.

       • For any given dictionary, make sure that all your keys are either unicode or str.  Do not mix the  two.
         If  you're  being  given  both unicode and str but you don't need to preserve separate keys for each, I
         recommend using to_unicode() or to_bytes() to convert all keys to one type or the other like this:

            >>> from kitchen.text.converters import to_unicode
            >>> u_string = u'one'
            >>> b_string = 'two'
            >>> d = {}
            >>> d[to_unicode(u_string)] = 1
            >>> d[to_unicode(b_string)] = 2
            >>> d
            {u'two': 2, u'one': 1}

       • These issues also apply to using dicts with tuple keys that contain a mixture of unicode and str.  Once
         again the best fix is to standardise on either str or unicode.

       • If you absolutely need to store values in a dictionary where the keys could be either  unicode  or  str
         you  can use StrictDict <#kitchen.collections.strictdict.StrictDict> which has separate entries for all
         unicode and byte str and deals correctly with any tuple containing mixed unicode and byte str.

   Functions
   Unicode and byte str conversion
       kitchen.text.converters.to_unicode(obj, encoding='utf-8', errors='replace', nonstring=None,
       non_string=None)
              Convert an object into a str string

              Parametersobj -- Object to convert to a str string.  This should normally be a byte bytesencoding -- What encoding to try converting the byte bytes  as.   Defaults  to  utf-8  <#
                       term-UTF-8>

                     • errors  --  If errors are found while decoding, perform this action.  Defaults to replace
                       which replaces the invalid bytes with a character that means the bytes were unable to  be
                       decoded.   Other  values  are  the  same  as the error handling schemes in the codec base
                       classes  <http://docs.python.org/library/codecs.html#codec-base-classes>.   For  instance
                       strict  which  raises  an  exception  and  ignore  which  simply  omits the non-decodable
                       characters.

                     • nonstring --

                       How to treat nonstring values.  Possible values are:

                       simplerepr
                              Attempt to call the object's "simple representation" method and return that value.
                              Python-2.3+  has  two  methods  that  try  to  return  a  simple   representation:
                              object.__unicode__()  and  object.__str__().   We  first try to get a usable value
                              from object.__unicode__().  If that fails we try the same with object.__str__().

                       empty  Return an empty str string

                       strict Raise a TypeError

                       passthru
                              Return the object unchanged

                       repr   Attempt to return a str string of the repr of the object

                       Default is simplereprnon_string -- Deprecated Use nonstring instead

              RaisesTypeError -- if nonstring is strict and a  non-basestring  object  is  passed  in  or  if
                       nonstring is set to an unknown value

                     • UnicodeDecodeError  --  if  errors  is  strict  and  obj is not decodable using the given
                       encoding

              Returns
                     str string or the original object depending on the value of nonstring.

              Usually this should be used on a byte bytes but it can  take  both  byte  bytes  and  str  strings
              intelligently.   Nonstring  objects  are handled in different ways depending on the setting of the
              nonstring parameter.

              The default values of this function are set so as to always return a str string and never raise an
              error when converting from a byte bytes to a str string.  However, when you do  not  pass  validly
              encoded  text  (or a nonstring object), you may end up with output that you don't expect.  Be sure
              you understand the requirements of your data, not just ignore errors by passing  it  through  this
              function.

              Changed  in  version  0.2.1a2:  Deprecated  non_string in favor of nonstring parameter and changed
              default value to simplerepr

       kitchen.text.converters.to_bytes(obj, encoding='utf-8', errors='replace', nonstring=None,
       non_string=None)
              Convert an object into a byte bytes

              Parametersobj -- Object to convert to a byte bytes.  This should normally be a str string.

                     • encoding -- Encoding to use to convert the str string into a  byte  bytes.   Defaults  to
                       utf-8 <#term-UTF-8>.

                     • errors --

                       If  errors  are  found  while  encoding,  perform this action.  Defaults to replace which
                       replaces the invalid bytes with a character that  means  the  bytes  were  unable  to  be
                       encoded.   Other  values  are  the  same  as the error handling schemes in the codec base
                       classes  <http://docs.python.org/library/codecs.html#codec-base-classes>.   For  instance
                       strict  which  raises  an  exception  and  ignore  which  simply  omits the non-encodable
                       characters.

                     • nonstring --

                       How to treat nonstring values.  Possible values are:

                       simplerepr
                              Attempt to call the object's "simple representation" method and return that value.
                              Python-2.3+  has  two  methods  that  try  to  return  a  simple   representation:
                              object.__unicode__()  and  object.__str__().   We  first try to get a usable value
                              from object.__str__().  If that fails we try the same with object.__unicode__().

                       empty  Return an empty byte bytes

                       strict Raise a TypeError

                       passthru
                              Return the object unchanged

                       repr   Attempt to return a byte bytes of the repr() of the object

                       Default is simplerepr.

                     • non_string -- Deprecated Use nonstring instead.

              RaisesTypeError -- if nonstring is strict and a  non-basestring  object  is  passed  in  or  if
                       nonstring is set to an unknown value.

                     • UnicodeEncodeError  --  if  errors is strict and all of the bytes of obj are unable to be
                       encoded using encoding.

              Returns
                     byte bytes or the original object depending on the value of nonstring.

              Warning:
                 If you pass a byte bytes into this function the byte bytes is returned unmodified.  It  is  not
                 re-encoded with the specified encoding.  The easiest way to achieve that is:

                     to_bytes(to_unicode(text), encoding='utf-8')

                 The  initial  to_unicode()  call  will ensure text is a str string.  Then, to_bytes() will turn
                 that into a byte bytes with the specified encoding.

              Usually, this should be used on a str string but it can take either a byte bytes or a  str  string
              intelligently.   Nonstring  objects  are handled in different ways depending on the setting of the
              nonstring parameter.

              The default values of this function are set so as to always return a byte bytes and never raise an
              error when converting from unicode to bytes.  However, when you do not pass an encoding  that  can
              validly  encode  the  object  (or  a non-string object), you may end up with output that you don't
              expect.  Be sure you understand the requirements of your data, not just ignore errors  by  passing
              it through this function.

              Changed  in  version  0.2.1a2:  Deprecated  non_string in favor of nonstring parameter and changed
              default value to simplerepr

       kitchen.text.converters.getwriter(encoding)
              Return a codecs.StreamWriter that resists tracing back.

              Parameters
                     encoding -- Encoding to use for transforming str strings into byte bytes.

              Return type
                     codecs.StreamWriter

              Returns
                     StreamWriter that you can instantiate to wrap output streams to automatically translate str
                     strings into encoding.

              This is a reimplemetation of codecs.getwriter() that returns a StreamWriter that  resists  issuing
              tracebacks.   The StreamWriter that is returned uses kitchen.text.converters.to_bytes() to convert
              str strings into byte bytes.  The departures from codecs.getwriter() are:

              1. The StreamWriter that is returned will take byte bytes as well as str strings.  Any byte  bytes
                 will be passed through unmodified.

              2. The  default error handler for unknown bytes is to replace the bytes with the unknown character
                 (? in most ascii-based encodings,  in the utf encodings) whereas  codecs.getwriter()  defaults
                 to  strict.   Like  codecs.StreamWriter,  the  returned StreamWriter can have its error handler
                 changed in code by setting stream.errors = 'new_handler_name'

              Example usage:

                 $ LC_ALL=C python
                 >>> import sys
                 >>> from kitchen.text.converters import getwriter
                 >>> UTF8Writer = getwriter('utf-8')
                 >>> unwrapped_stdout = sys.stdout
                 >>> sys.stdout = UTF8Writer(unwrapped_stdout)
                 >>> print 'caf\xc3\xa9'
                 café
                 >>> print u'caf\xe9'
                 café
                 >>> ASCIIWriter = getwriter('ascii')
                 >>> sys.stdout = ASCIIWriter(unwrapped_stdout)
                 >>> print 'caf\xc3\xa9'
                 café
                 >>> print u'caf\xe9'
                 caf?

              See also:
                 API docs for codecs.StreamWriter and codecs.getwriter()  and  Print  Fails  <http://wiki.python
                 .org/moin/PrintFails> on the python wiki.

              Added in version kitchen: 0.2a2, API: kitchen.text 1.1.0

       kitchen.text.converters.to_str(obj)
              Deprecated

              This  function  converts  something  to  a byte bytes if it isn't one.  It's used to call str() or
              unicode() on the object to get its simple representation without danger of getting a UnicodeError.
              You should be using to_unicode() or to_bytes() explicitly instead.

              If you need str strings:

                 to_unicode(obj, nonstring='simplerepr')

              If you need byte bytes:

                 to_bytes(obj, nonstring='simplerepr')

       kitchen.text.converters.to_utf8(obj, errors='replace', non_string='passthru')
              Deprecated

              Convert str to an encoded utf-8 <#term-UTF-8> byte bytes.  You should be using to_bytes() instead:

                 to_bytes(obj, encoding='utf-8', non_string='passthru')

   Transformation to XML
       kitchen.text.converters.unicode_to_xml(string, encoding='utf-8', attrib=False, control_chars='replace')
              Take a str string and turn it into a byte bytes suitable for xml

              Parametersstring -- str string to encode into an XML compatible byte bytesencoding -- encoding to use for the returned byte bytes.  Default is to encode  to  UTF-8
                       <#term-UTF-8>.   If  some of the characters in string are not encodable in this encoding,
                       the unknown characters will be  entered  into  the  output  string  using  xml  character
                       references.

                     • attrib  --  If  True,  quote the string for use in an xml attribute.  If False (default),
                       quote for use in an xml text field.

                     • control_chars --

                       control characters <#term-control-characters> are not allowed in XML documents.  When  we
                       encounter those we need to know what to do.  Valid options are:

                       replace
                              (default) Replace the control characters with ?

                       ignore Remove the characters altogether from the output

                       strict Raise   an   XmlEncodeError   <#kitchen.text.exceptions.XmlEncodeError>   when  we
                              encounter a control character <#term-control-character>

              Raiseskitchen.text.exceptions.XmlEncodeError  <#kitchen.text.exceptions.XmlEncodeError>  --  If
                       control_chars  is  set  to  strict  and  the string to be made suitable for output to xml
                       contains control characters <#term-control-characters> or if string is not a  str  string
                       then we raise this exception.

                     • ValueError -- If control_chars is set to something other than replace, ignore, or strict.

              Return type
                     byte bytes

              Returns
                     representation of the str string as a valid XML byte bytes

              XML  files  consist mainly of text encoded using a particular charset.  XML also denies the use of
              certain bytes in the encoded text (example: ASCII Null).  There are also special  characters  that
              must be escaped if they are present in the input (example: <).  This function takes care of all of
              those issues for you.

              There  are  a  few  different  ways  to  use  this function depending on your needs.  The simplest
              invocation is like this:

                 unicode_to_xml(u'String with non-ASCII characters: <"á と">')

              This will return the following to you, encoded in utf-8 <#term-UTF-8>:

                 'String with non-ASCII characters: &lt;"á と"&gt;'

              Pretty straightforward.  Now, what if you need to encode your document  in  something  other  than
              utf-8 <#term-UTF-8>?  For instance, latin-1?  Let's see:

                 unicode_to_xml(u'String with non-ASCII characters: <"á と">', encoding='latin-1')
                 'String with non-ASCII characters: &lt;"á &#12392;"&gt;'

              Because  the  character is not available in the latin-1 charset, it is replaced with &#12392; in
              our output.  This is an  xml  character  reference  which  represents  the  character  at  unicode
              codepoint 12392, the  character.

              When you want to reverse this, use xml_to_unicode() which will turn a byte bytes into a str string
              and replace the xml character references with the unicode characters.

              XML  also  has  the  quirk  of  not  allowing control characters <#term-control-characters> in its
              output.  The control_chars parameter allows us to specify what to do with those.   For  use  cases
              that  don't need absolute character by character fidelity (example: holding strings that will just
              be used for display in a GUI app later), the default value of replace works well:

                 unicode_to_xml(u'String with disallowed control chars: \u0000\u0007')
                 'String with disallowed control chars: ??'

              If you do need to be able to reproduce all of the characters at a later  date  (examples:  if  the
              string  is a key value in a database or a path on a filesystem) you have many choices.  Here are a
              few  that  rely  on   utf-7,   a   verbose   encoding   that   encodes   control   characters   <#
              term-control-characters>  (as  well  as non-ASCII <#term-ASCII> unicode values) to characters from
              within the ASCII <#term-ASCII> printable characters.  The good thing about doing this is that  the
              code  is  pretty simple.  You just need to use utf-7 both when encoding the field for xml and when
              decoding it for use in your python program:

                 unicode_to_xml(u'String with unicode: と and control char: ', encoding='utf7')
                 'String with unicode: +MGg and control char: +AAc-'
                 # [...]
                 xml_to_unicode('String with unicode: +MGg and control char: +AAc-', encoding='utf7')
                 u'String with unicode: と and control char: '

              As you can see, the utf-7 encoding will transform even characters that would be  representable  in
              utf-8  <#term-UTF-8>.   This  can  be  a drawback if you want unicode characters in the file to be
              readable without being decoded first.  You can work around this with increased complexity in  your
              application code:

                 encoding = 'utf-8'
                 u_string = u'String with unicode: と and control char: '
                 try:
                     # First attempt to encode to utf8
                     data = unicode_to_xml(u_string, encoding=encoding, errors='strict')
                 except XmlEncodeError:
                     # Fallback to utf-7
                     encoding = 'utf-7'
                     data = unicode_to_xml(u_string, encoding=encoding, errors='strict')
                 write_tag('<mytag encoding=%s>%s</mytag>' % (encoding, data))
                 # [...]
                 encoding = tag.attributes.encoding
                 u_string = xml_to_unicode(u_string, encoding=encoding)

              Using  code  similar  to  that,  you  can have some fields encoded using your default encoding and
              fallback to utf-7 if there are control characters <#term-control-characters> present.

              Note:
                 If your goal is to preserve the control characters <#term-control-characters> you  cannot  save
                 the  entire  file  as  utf-7  and  set  the  xml encoding parameter to utf-7 if your goal is to
                 preserve the control characters <#term-control-characters>.  Because XML doesn't allow  control
                 characters <#term-control-characters>, you have to encode those separate from any encoding work
                 that the XML parser itself knows about.

              See also:

                 bytes_to_xml()
                        if  you're  dealing with bytes that are non-text or of an unknown encoding that you must
                        preserve on a byte for byte level.

                 guess_encoding_to_xml()
                        if you're dealing with strings in unknown encodings that you don't  need  to  save  with
                        char-for-char fidelity.

       kitchen.text.converters.xml_to_unicode(byte_string, encoding='utf-8', errors='replace')
              Transform a byte bytes from an xml file into a str string

              Parametersbyte_string -- byte bytes to decode

                     • encoding -- encoding that the byte bytes is in

                     • errors  -- What to do if not every character is  valid in encoding.  See the to_unicode()
                       documentation for legal values.

              Return type
                     str string

              Returns
                     string decoded from byte_string

              This function attempts to reverse what unicode_to_xml() does.  It takes a byte  bytes  (presumably
              read  in  from  an xml file) and expands all the html entities into unicode characters and decodes
              the byte bytes into a str string.  One thing it cannot do is restore  any  control  characters  <#
              term-control-characters>  that were removed prior to inserting into the file.  If you need to keep
              such characters you need to use xml_to_bytes() and bytes_to_xml() or  use  on  of  the  strategies
              documented in unicode_to_xml() instead.

       kitchen.text.converters.byte_string_to_xml(byte_string, input_encoding='utf-8', errors='replace',
       output_encoding='utf-8', attrib=False, control_chars='replace')
              Make sure a byte bytes is validly encoded for xml output

              Parametersbyte_string -- Byte bytes to turn into valid xml output

                     • input_encoding -- Encoding of byte_string.  Default utf-8errors --

                       How to handle errors encountered while decoding the byte_string into str at the beginning
                       of the process.  Values are:

                       replace
                              (default) Replace the invalid bytes with a ?

                       ignore Remove the characters altogether from the output

                       strict Raise an UnicodeDecodeError when we encounter a non-decodable character

                     • output_encoding  --  Encoding for the xml file that this string will go into.  Default is
                       utf-8.  If all the characters in byte_string are not  encodable  in  this  encoding,  the
                       unknown characters will be entered into the output string using xml character references.

                     • attrib  --  If  True,  quote the string for use in an xml attribute.  If False (default),
                       quote for use in an xml text field.

                     • control_chars --

                       XML does not allow control  characters  <#term-control-characters>.   When  we  encounter
                       those we need to know what to do.  Valid options are:

                       replace
                              (default) Replace the control characters <#term-control-characters> with ?

                       ignore Remove the characters altogether from the output

                       strict Raise an error when we encounter a control character <#term-control-character>

              RaisesXmlEncodeError  <#kitchen.text.exceptions.XmlEncodeError>  --  If control_chars is set to
                       strict and the string to be made suitable for output to xml contains  control  characters
                       <#term-control-characters> then we raise this exception.

                     • UnicodeDecodeError  -- If errors is set to strict and the byte_string contains bytes that
                       are not decodable using input_encoding, this error is raised

              Return type
                     byte bytes

              Returns
                     representation of the byte bytes  in  the  output  encoding  with  any  bytes  that  aren't
                     available in xml taken care of.

              Use this when you have a byte bytes representing text that you need to make suitable for output to
              xml.  There are several cases where this is the case.  For instance, if you need to transform some
              strings encoded in latin-1 to utf-8 <#term-UTF-8> for output:

                 utf8_string = byte_string_to_xml(latin1_string, input_encoding='latin-1')

              If  you  already  have  strings  in the proper encoding you may still want to use this function to
              remove control characters <#term-control-characters>:

                 cleaned_string = byte_string_to_xml(string, input_encoding='utf-8', output_encoding='utf-8')

              See also:

                 unicode_to_xml()
                        for other ideas on using this function

       kitchen.text.converters.xml_to_byte_string(byte_string, input_encoding='utf-8', errors='replace',
       output_encoding='utf-8')
              Transform a byte bytes from an xml file into str string

              Parametersbyte_string -- byte bytes to decode

                     • input_encoding -- encoding that the byte bytes is in

                     • errors -- What to do if not every character is valid in encoding.  See  the  to_unicode()
                       docstring for legal values.

                     • output_encoding -- Encoding for the output byte bytes

              Returns
                     str string decoded from byte_string

              This  function  attempts to reverse what unicode_to_xml() does.  It takes a byte bytes (presumably
              read in from an xml file) and expands all the html entities into unicode  characters  and  decodes
              the  byte  bytes  into  a str string.  One thing it cannot do is restore any control characters <#
              term-control-characters> that were removed prior to inserting into the file.  If you need to  keep
              such  characters  you  need  to use xml_to_bytes() and bytes_to_xml() or use one of the strategies
              documented in unicode_to_xml() instead.

       kitchen.text.converters.bytes_to_xml(byte_string, *args, **kwargs)
              Return a byte bytes encoded so it is valid inside of any xml file

              Parametersbyte_string -- byte bytes to transform

                     • **kwargs (*args,) --

                       extra arguments to this function are passed on to the function actually implementing  the
                       encoding.  You can use this to tweak the output in some cases but, as a general rule, you
                       shouldn't because the underlying encoding function is not guaranteed to remain the same.

              Return type
                     byte bytes consisting of all ASCII <#term-ASCII> characters

              Returns
                     byte bytes representation of the input.  This will be encoded using base64.

              This function is made especially to put binary information into xml documents.

              This  function  is intended for encoding things that must be preserved byte-for-byte.  If you want
              to encode a byte string that's text and don't mind losing the actual bytes you  probably  want  to
              try byte_string_to_xml() or guess_encoding_to_xml() instead.

              Note:
                 Although  the current implementation uses base64.b64encode() and there's no plans to change it,
                 that isn't guaranteed.  If you want to make sure that you can encode and decode these  messages
                 it's best to use xml_to_bytes() if you use this function to encode.

       kitchen.text.converters.xml_to_bytes(byte_string, *args, **kwargs)
              Decode a string encoded using bytes_to_xml()

              Parametersbyte_string  --  byte  bytes  to  transform.  This should be a base64 encoded sequence of
                       bytes originally generated by bytes_to_xml().

                     • **kwargs (*args,) --

                       extra arguments to this function are passed on to the function actually implementing  the
                       encoding.  You can use this to tweak the output in some cases but, as a general rule, you
                       shouldn't because the underlying encoding function is not guaranteed to remain the same.

              Return type
                     byte bytes

              Returns
                     byte bytes that's the decoded input

              If you've got fields in an xml document that were encoded with bytes_to_xml() then you want to use
              this function to undecode them.  It converts a base64 encoded string into a byte bytes.

              Note:
                 Although  the current implementation uses base64.b64decode() and there's no plans to change it,
                 that isn't guaranteed.  If you want to make sure that you can encode and decode these  messages
                 it's best to use bytes_to_xml() if you use this function to decode.

       kitchen.text.converters.guess_encoding_to_xml(string, output_encoding='utf-8', attrib=False,
       control_chars='replace')
              Return a byte bytes suitable for inclusion in xml

              Parametersstring -- str or byte bytes to be transformed into a byte bytes suitable for inclusion in
                       xml.  If string is a byte bytes we attempt to guess the encoding.  If we cannot guess, we
                       fallback to latin-1.

                     • output_encoding -- Output encoding for the byte bytes.  This should match the encoding of
                       your xml file.

                     • attrib  --  If  True,  escape  the  item for use in an xml attribute.  If False (default)
                       escape the item for use in a text node.

              Returns
                     utf-8 <#term-UTF-8> encoded byte bytes

       kitchen.text.converters.to_xml(string, encoding='utf-8', attrib=False, control_chars='ignore')
              Deprecated: Use guess_encoding_to_xml() instead

   Working with exception messages
       kitchen.text.converters.EXCEPTION_CONVERTERS = (<function <lambda>>, <function <lambda>>)

              Tuple of functions to try to use to convert an exception into a string
                     representation.  Its main use is to extract a string  (str  or  bytes)  from  an  exception
                     object in exception_to_unicode() and exception_to_bytes().  The functions here will try the
                     exception's  args[0]  and  the  exception  itself (roughly equivalent to str(exception)) to
                     extract the message. This is only a default and can be easily overridden when calling those
                     functions.  There are several reasons you might wish to do that.  If  you  have  exceptions
                     where  the best string representing the exception is not returned by the default functions,
                     you can add another function to extract from a different field:

                        from kitchen.text.converters import (EXCEPTION_CONVERTERS,
                                exception_to_unicode)

                        class MyError(Exception):
                            def __init__(self, message):
                                self.value = message

                        c = [lambda e: e.value]
                        c.extend(EXCEPTION_CONVERTERS)
                        try:
                            raise MyError('An Exception message')
                        except MyError, e:
                            print exception_to_unicode(e, converters=c)

                     Another reason would be if you're converting to a byte bytes and you know the  bytes  needs
                     to  be  a  non-utf-8  <#term-UTF-8>  encoding.   exception_to_bytes()  defaults to utf-8 <#
                     term-UTF-8> but if you convert into a byte bytes explicitly using a converter then you  can
                     choose a different encoding:

                        from kitchen.text.converters import (EXCEPTION_CONVERTERS,
                                exception_to_bytes, to_bytes)
                        c = [lambda e: to_bytes(e.args[0], encoding='euc_jp'),
                                lambda e: to_bytes(e, encoding='euc_jp')]
                        c.extend(EXCEPTION_CONVERTERS)
                        try:
                            do_something()
                        except Exception, e:
                            log = open('logfile.euc_jp', 'a')
                            log.write('%s

              ' % exception_to_bytes(e, converters=c)
                        log.close()

                     Each  function  in  this  list  should take the exception as its sole argument and return a
                     string containing the message representing the exception.  The  functions  may  return  the
                     message  as a :byte class:bytes, a str string, or even an object if you trust the object to
                     return a decent string representation.  The exception_to_unicode() and exception_to_bytes()
                     functions will make sure to convert the string to the proper type before returning.

                     Added in version 0.2.2.

       kitchen.text.converters.BYTE_EXCEPTION_CONVERTERS = (<function <lambda>>, <function to_bytes>)
              Deprecated: Use EXCEPTION_CONVERTERS instead.

              Tuple of functions to try to use to convert an exception into a string representation.  This tuple
              is similar to the one in EXCEPTION_CONVERTERS but it's  used  with  exception_to_bytes()  instead.
              Ideally,  these  functions should do their best to return the data as a byte bytes but the results
              will be run through to_bytes() before being returned.

              Added in version 0.2.2.

              Changed in version 1.0.1: Deprecated as simplifications allow EXCEPTION_CONVERTERS to perform  the
              same function.

       kitchen.text.converters.exception_to_unicode(exc, converters=(<function <lambda>>, <function <lambda>>))
              Convert an exception object into a unicode representation

              Parametersexc -- Exception object to convert

                     • converters  --  List  of  functions  to  use to convert the exception into a string.  See
                       EXCEPTION_CONVERTERS for the default value and an example of adding other  converters  to
                       the  defaults.   The  functions  in  the  list are tried one at a time to see if they can
                       extract a string from the exception.  The first one to do so without raising an exception
                       is used.

              Returns
                     str string representation of the exception.  The value extracted by the converters will  be
                     converted  into  str  before being returned using the utf-8 <#term-UTF-8> encoding.  If you
                     know you need to use an alternate encoding add a function that does that  to  the  list  of
                     functions in converters)

              Added in version 0.2.2.

       kitchen.text.converters.exception_to_bytes(exc, converters=(<function <lambda>>, <function <lambda>>))
              Convert an exception object into a str representation

              Parametersexc -- Exception object to convert

                     • converters  --  List  of  functions  to  use to convert the exception into a string.  See
                       EXCEPTION_CONVERTERS for the default value and an example of adding other  converters  to
                       the  defaults.   The  functions  in  the  list are tried one at a time to see if they can
                       extract a string from the exception.  The first one to do so without raising an exception
                       is used.

              Returns
                     byte bytes representation of the exception.  The value extracted by the converters will  be
                     converted  into bytes before being returned using the utf-8 <#term-UTF-8> encoding.  If you
                     know you need to use an alternate encoding add a function that does that  to  the  list  of
                     functions in converters)

              Added in version 0.2.2.

              Changed  in  version 1.0.1: Code simplification allowed us to switch to using EXCEPTION_CONVERTERS
              as the default value of converters.

   Format Text for Display
       Functions related to displaying unicode text.  Unicode characters don't all have the  same  width  so  we
       need helper functions for displaying them.

       Added in version 0.2: kitchen.display API 1.0.0

       kitchen.text.display.textual_width(msg, control_chars='guess', encoding='utf-8', errors='replace')
              Get the textual width <#term-textual-width> of a string

              Parametersmsg -- str string or byte bytes to get the width of

                     • control_chars --

                       specify  how to deal with control characters <#term-control-characters>.  Possible values
                       are:

                       guess  (default) will  take  a  guess  for  control  character  <#term-control-character>
                              widths.   Most  codes will return zero width.  backspace, delete, and clear delete
                              return -1.  escape currently returns -1 as well but this is not guaranteed as it's
                              not always correct

                       strict will  raise   kitchen.text.exceptions.ControlCharError   <#kitchen.text.exceptions
                              .ControlCharError> if a control character <#term-control-character> is encountered

                     • encoding  -- If we are given a byte bytes this is used to decode it into str string.  Any
                       characters that are not decodable in this encoding will get  a  value  dependent  on  the
                       errors parameter.

                     • errors  --  How  to treat errors encoding the byte bytes to str string.  Legal values are
                       the   same   as   for   kitchen.text.converters.to_unicode()    <#kitchen.text.converters
                       .to_unicode>.  The default value of replace will cause undecodable byte sequences to have
                       a width of one. ignore will have a width of zero.

              Raises ControlCharError  <#kitchen.text.exceptions.ControlCharError>  -- if msg contains a control
                     character <#term-control-character> and control_chars is strict.

              Returns
                     Textual width <#term-textual-width> of the msg.  This is  the  amount  of  space  that  the
                     string  will consume on a monospace display.  It's measured in the number of cell positions
                     or columns it will take up on a monospace display.  This is not the number of  glyphs  that
                     are in the string.

              Note:
                 This  function can be wrong sometimes because Unicode does not specify a strict width value for
                 all of the code points  <#term-code-points>.   In  particular,  we've  found  that  some  Tamil
                 characters take up to four character cells but we return a lesser amount.

       kitchen.text.display.textual_width_chop(msg, chop, encoding='utf-8', errors='replace')
              Given a string, return it chopped to a given textual width <#term-textual-width>

              Parametersmsg -- str string or byte bytes to chop

                     • chop -- Chop msg if it exceeds this textual width <#term-textual-width>

                     • encoding  --  If  we are given a byte bytes, this is used to decode it into a str string.
                       Any characters that are not decodable in this encoding will be assigned a width of one.

                     • errors -- How to treat errors encoding the byte bytes to str.  Legal values are the  same
                       as for kitchen.text.converters.to_unicode() <#kitchen.text.converters.to_unicode>

              Return type
                     str string

              Returns
                     str string of the msg chopped at the given textual width <#term-textual-width>

              This is what you want to use instead of %.*s, as it does the "right" thing with regard to UTF-8 <#
              term-UTF-8>  sequences,  control  characters  <#term-control-characters>, and characters that take
              more than one cell position. Eg:

                 >>> # Wrong: only displays 8 characters because it is operating on bytes
                 >>> print "%.*s" % (10, 'café ñunru!')
                 café ñun
                 >>> # Properly operates on graphemes
                 >>> '%s' % (textual_width_chop('café ñunru!', 10))
                 café ñunru
                 >>> # takes too many columns because the kanji need two cell positions
                 >>> print '1234567890\n%.*s' % (10, u'一二三四五六七八九十')
                 1234567890
                 一二三四五六七八九十
                 >>> # Properly chops at 10 columns
                 >>> print '1234567890\n%s' % (textual_width_chop(u'一二三四五六七八九十', 10))
                 1234567890
                 一二三四五

       kitchen.text.display.textual_width_fill(msg, fill, chop=None, left=True, prefix='', suffix='')
              Expand a str string to a specified textual width <#term-textual-width> or chop to same

              Parametersmsg -- str string to format

                     • fill -- pad string until the textual width <#term-textual-width> of the  string  is  this
                       length

                     • chop  -- before doing anything else, chop the string to this length.  Default: Don't chop
                       the string at all

                     • left -- If True (default) left justify the string and put the padding on the  right.   If
                       False, pad on the left side.

                     • prefix -- Attach this string before the field we're filling

                     • suffix -- Append this string to the end of the field we're filling

              Return type
                     str string

              Returns
                     msg  formatted  to  fill  the  specified  width.  If no chop is specified, the string could
                     exceed the fill length when completed.  If prefix or suffix are printable  characters,  the
                     string could be longer than the fill width.

              Note:
                 prefix  and  suffix should be used for "invisible" characters like highlighting, color changing
                 escape codes, etc.  The fill characters are appended outside of any prefix or suffix  elements.
                 This allows you to only highlight msg inside of the field you're filling.

              Warning:
                 msg,  prefix, and suffix should all be representable as unicode characters.  In particular, any
                 escape sequences in prefix and suffix need to be convertible to str.  If you need to  use  byte
                 sequences here rather than unicode characters, use byte_string_textual_width_fill() instead.

              This   function   expands   a   string   to  fill  a  field  of  a  particular  textual  width  <#
              term-textual-width>.  Use it instead of %*.*s, as it does the "right" thing with regard  to  UTF-8
              <#term-UTF-8>  sequences,  control characters <#term-control-characters>, and characters that take
              more than one cell position in a display.  Example usage:

                 >>> msg = u'一二三四五六七八九十'
                 >>> # Wrong: This uses 10 characters instead of 10 cells:
                 >>> u":%-*.*s:" % (10, 10, msg[:9])
                 :一二三四五六七八九 :
                 >>> # This uses 10 cells like we really want:
                 >>> u":%s:" % (textual_width_fill(msg[:9], 10, 10))
                 :一二三四五:

                 >>> # Wrong: Right aligned in the field, but too many cells
                 >>> u"%20.10s" % (msg)
                           一二三四五六七八九十
                 >>> # Correct: Right aligned with proper number of cells
                 >>> u"%s" % (textual_width_fill(msg, 20, 10, left=False))
                           一二三四五

                 >>> # Wrong: Adding some escape characters to highlight the line but too many cells
                 >>> u"%s%20.10s%s" % (prefix, msg, suffix)
                 u'[7m          一二三四五六七八九十[0m'
                 >>> # Correct highlight of the line
                 >>> u"%s%s%s" % (prefix, display.textual_width_fill(msg, 20, 10, left=False), suffix)
                 u'[7m          一二三四五[0m'

                 >>> # Correct way to not highlight the fill
                 >>> u"%s" % (display.textual_width_fill(msg, 20, 10, left=False, prefix=prefix, suffix=suffix))
                 u'          [7m一二三四五[0m'

       kitchen.text.display.wrap(text, width=70, initial_indent='', subsequent_indent='', encoding='utf-8',
       errors='replace')
              Works like we want textwrap.wrap() to work,

              Parameterstext -- str string or byte bytes to wrap

                     • width -- textual width <#term-textual-width> at which to wrap.  Default: 70

                     • initial_indent -- string to use to indent the first line.  Default: do not indent.

                     • subsequent_indent -- string to use to wrap subsequent lines.  Default: do not indent

                     • encoding -- Encoding to use if text is a byte byteserrors -- error handler to use if text is a byte  bytes  and  contains  some  undecodable
                       characters.

              Return type
                     list of str strings

              Returns
                     list of lines that have been text wrapped and indented.

              textwrap.wrap()   from  the  python  standard  library  <http://docs.python.org/library>  has  two
              drawbacks that this attempts to fix:

              1. It does not  handle  textual  width  <#term-textual-width>.   It  only  operates  on  bytes  or
                 characters which are both inadequate (due to multi-byte and double width characters).

              2. It malforms lists and blocks.

       kitchen.text.display.fill(text, *args, **kwargs)
              Works like we want textwrap.fill() to work

              Parameters
                     text -- str string or byte bytes to process

              Returns
                     str string with each line separated by a newline

              See also:

                 kitchen.text.display.wrap()
                        for other parameters that you can give this command.

              This  function is a light wrapper around kitchen.text.display.wrap().  Where that function returns
              a list of lines, this function returns one string with each line separated by a newline.

       kitchen.text.display.byte_string_textual_width_fill(msg, fill, chop=None, left=True, prefix='',
       suffix='', encoding='utf-8', errors='replace')
              Expand a byte bytes to a specified textual width <#term-textual-width> or chop to same

              Parametersmsg -- byte bytes encoded in UTF-8 <#term-UTF-8> that we want formatted

                     • fill -- pad msg until the textual width <#term-textual-width> is this long

                     • chop -- before doing anything else, chop the string to this length.  Default: Don't  chop
                       the string at all

                     • left  --  If True (default) left justify the string and put the padding on the right.  If
                       False, pad on the left side.

                     • prefix -- Attach this byte bytes before the field we're filling

                     • suffix -- Append this byte bytes to the end of the field we're filling

              Return type
                     byte bytes

              Returns
                     msg formatted to fill the specified textual width <#term-textual-width>.   If  no  chop  is
                     specified, the string could exceed the fill length when completed.  If prefix or suffix are
                     printable characters, the string could be longer than fill width.

              Note:
                 prefix  and  suffix should be used for "invisible" characters like highlighting, color changing
                 escape codes, etc.  The fill characters are appended outside of any prefix or suffix  elements.
                 This allows you to only highlight msg inside of the field you're filling.

              See also:

                 textual_width_fill()
                        For example usage.  This function has only two differences.

                        1. it  takes  byte bytes for prefix and suffix so you can pass in arbitrary sequences of
                           bytes, not just unicode characters.

                        2. it returns a byte bytes instead of a str string.

   Internal Data
       There are a few internal functions and variables in this module.  Code outside of kitchen  shouldn't  use
       them but people coding on kitchen itself may find them useful.

       kitchen.text.display._COMBINING = ((768, 879), (1155, 1161), (1425, 1469), (1471, 1471), (1473, 1474),
       (1476, 1477), (1479, 1479), (1536, 1539), (1552, 1562), (1611, 1631), (1648, 1648), (1750, 1764), (1767,
       1768), (1770, 1773), (1807, 1807), (1809, 1809), (1840, 1866), (1958, 1968), (2027, 2035), (2045, 2045),
       (2070, 2073), (2075, 2083), (2085, 2087), (2089, 2093), (2137, 2139), (2259, 2273), (2275, 2303), (2305,
       2306), (2364, 2364), (2369, 2376), (2381, 2381), (2385, 2388), (2402, 2403), (2433, 2433), (2492, 2492),
       (2497, 2500), (2509, 2509), (2530, 2531), (2558, 2558), (2561, 2562), (2620, 2620), (2625, 2626), (2631,
       2632), (2635, 2637), (2672, 2673), (2689, 2690), (2748, 2748), (2753, 2757), (2759, 2760), (2765, 2765),
       (2786, 2787), (2817, 2817), (2876, 2876), (2879, 2879), (2881, 2883), (2893, 2893), (2902, 2902), (2946,
       2946), (3008, 3008), (3021, 3021), (3134, 3136), (3142, 3144), (3146, 3149), (3157, 3158), (3260, 3260),
       (3263, 3263), (3270, 3270), (3276, 3277), (3298, 3299), (3387, 3388), (3393, 3395), (3405, 3405), (3530,
       3530), (3538, 3540), (3542, 3542), (3633, 3633), (3636, 3642), (3655, 3662), (3761, 3761), (3764, 3772),
       (3784, 3789), (3864, 3865), (3893, 3893), (3895, 3895), (3897, 3897), (3953, 3966), (3968, 3972), (3974,
       3975), (3984, 3991), (3993, 4028), (4038, 4038), (4141, 4144), (4146, 4146), (4150, 4151), (4153, 4154),
       (4184, 4185), (4237, 4237), (4448, 4607), (4957, 4959), (5906, 5908), (5938, 5940), (5970, 5971), (6002,
       6003), (6068, 6069), (6071, 6077), (6086, 6086), (6089, 6099), (6109, 6109), (6155, 6157), (6313, 6313),
       (6432, 6434), (6439, 6440), (6450, 6450), (6457, 6459), (6679, 6680), (6752, 6752), (6773, 6780), (6783,
       6783), (6832, 6845), (6912, 6915), (6964, 6964), (6966, 6970), (6972, 6972), (6978, 6978), (6980, 6980),
       (7019, 7027), (7082, 7083), (7142, 7142), (7154, 7155), (7223, 7223), (7376, 7378), (7380, 7392), (7394,
       7400), (7405, 7405), (7412, 7412), (7416, 7417), (7616, 7673), (7675, 7679), (8203, 8207), (8234, 8238),
       (8288, 8291), (8298, 8303), (8400, 8432), (11503, 11505), (11647, 11647), (11744, 11775), (12330, 12335),
       (12441, 12442), (42607, 42607), (42612, 42621), (42654, 42655), (42736, 42737), (43014, 43014), (43019,
       43019), (43045, 43046), (43204, 43204), (43232, 43249), (43307, 43309), (43347, 43347), (43443, 43443),
       (43456, 43456), (43696, 43696), (43698, 43700), (43703, 43704), (43710, 43711), (43713, 43713), (43766,
       43766), (44013, 44013), (64286, 64286), (65024, 65039), (65056, 65071), (65279, 65279), (65529, 65531),
       (66045, 66045), (66272, 66272), (66422, 66426), (68097, 68099), (68101, 68102), (68108, 68111), (68152,
       68154), (68159, 68159), (68325, 68326), (68900, 68903), (69446, 69456), (69702, 69702), (69759, 69759),
       (69817, 69818), (69888, 69890), (69939, 69940), (70003, 70003), (70080, 70080), (70090, 70090), (70197,
       70198), (70377, 70378), (70459, 70460), (70477, 70477), (70502, 70508), (70512, 70516), (70722, 70722),
       (70726, 70726), (70750, 70750), (70850, 70851), (71103, 71104), (71231, 71231), (71350, 71351), (71467,
       71467), (71737, 71738), (72160, 72160), (72244, 72244), (72263, 72263), (72345, 72345), (72767, 72767),
       (73026, 73026), (73028, 73029), (73111, 73111), (92912, 92916), (92976, 92982), (113822, 113822),
       (119141, 119145), (119149, 119170), (119173, 119179), (119210, 119213), (119362, 119364), (122880,
       122886), (122888, 122904), (122907, 122913), (122915, 122916), (122918, 122922), (123184, 123190),
       (123628, 123631), (125136, 125142), (125252, 125258), (917505, 917505), (917536, 917631), (917760,
       917999))
              Internal table, provided by this module to list code points <#term-code-points> which combine with
              other  characters  and  therefore  should  have no textual width <#term-textual-width>.  This is a
              sorted tuple of non-overlapping intervals.  Each interval is a tuple listing a starting code point
              <#term-code-point> and ending code point <#term-code-point>.  Every code point  <#term-code-point>
              between the two end points is a combining character.

              See also:

                 _generate_combining_table()
                        for how this table is generated

              This table was last regenerated on python-3.8.0a3 with unicodedata.unidata_version 12.0.0

       kitchen.text.display._generate_combining_table()
              Combine Markus Kuhn's data with unicodedata to make combining char list

              Return type
                     tuple of tuples

              Returns
                     tuple  of  intervals of code points <#term-code-points> that are combining character.  Each
                     interval is a 2-tuple of the starting code point <#term-code-point>  and  the  ending  code
                     point <#term-code-point> for the combining characters.

              In  normal  use,  this  function serves to tell how we're generating the combining char list.  For
              speed reasons, we use this to generate a static list and just use that later.

              Markus Kuhn's list of combining characters is more complete than what's in the python  unicodedata
              library but the python unicodedata is synced against later versions of the unicode database

              This is used to generate the _COMBINING table.

       kitchen.text.display._print_combining_table()
              Print out a new _COMBINING table

              This will print a new _COMBINING table in the format used in kitchen/text/display.py.  It's useful
              for  updating  the _COMBINING table with updated data from a new python as the format won't change
              from what's already in the file.

       kitchen.text.display._interval_bisearch(value, table)
              Binary search in an interval table.

              Parametersvalue -- numeric value to search for

                     • table -- Ordered list of intervals.  This is a list of two-tuples.  The elements  of  the
                       two-tuple define an interval's start and end points.

              Returns
                     If value is found within an interval in the table return True.  Otherwise, False

              This  function  checks  whether a numeric value is present within a table of intervals.  It checks
              using a binary search algorithm, dividing the list of values in  half  and  checking  against  the
              values until it determines whether the value is in the table.

       kitchen.text.display._ucp_width(ucs, control_chars='guess')
              Get the textual width <#term-textual-width> of a ucs character

              Parametersucs -- integer representing a single unicode code point <#term-code-point>

                     • control_chars --

                       specify  how to deal with control characters <#term-control-characters>.  Possible values
                       are:

                       guess  (default) will  take  a  guess  for  control  character  <#term-control-character>
                              widths.   Most  codes will return zero width.  backspace, delete, and clear delete
                              return -1.  escape currently returns -1 as well but this is not guaranteed as it's
                              not always correct

                       strict will  raise  ControlCharError  <#kitchen.text.exceptions.ControlCharError>  if   a
                              control character <#term-control-character> is encountered

              Raises ControlCharError  <#kitchen.text.exceptions.ControlCharError>  --  if  the  code  point  <#
                     term-code-point> is a unicode control character <#term-control-character> and control_chars
                     is set to 'strict'

              Returns
                     textual width <#term-textual-width> of the character.

              Note:
                 It's important to remember this is textual width <#term-textual-width> and not  the  number  of
                 characters or bytes.

       kitchen.text.display._textual_width_le(width, *args)
              Optimize the common case when deciding which textual width <#term-textual-width> is larger

              Parameterswidth -- textual width <#term-textual-width> to compare against.

                     • *args -- str strings to check the total textual width <#term-textual-width> of

              Returns
                     True if the total length of args are less than or equal to width.  Otherwise False.

              We  often  want  to  know "does X fit in Y".  It takes a while to use textual_width() to calculate
              this.  However, we know that the number of canonically composed str characters is always going  to
              have  1 or 2 for the textual width <#term-textual-width> per character.  With this we can take the
              following shortcuts:

              1. If the number of canonically composed characters is more than width, the true textual width  <#
                 term-textual-width> cannot be less than width.

              2. If  the  number  of canonically composed characters * 2 is less than the width then the textual
                 width <#term-textual-width> must be ok.

              textual width <#term-textual-width> of a canonically composed str string will  always  be  greater
              than  or  equal  to  the  the  number  of  str characters.  So we can first check if the number of
              composed str characters is less  than  the  asked  for  width.   If  it  is  we  can  return  True
              immediately.  If not, then we must do a full textual width <#term-textual-width> lookup.

   Miscellaneous functions for manipulating text
       Collection of text functions that don't fit in another category.

       Changed  in  version  kitchen:  1.2.0,  API: kitchen.text 2.2.0 Added isbasestring(), isbytestring(), and
       isunicodestring() to help tell which string type is which on python2 and python3

       kitchen.text.misc.byte_string_valid_encoding(byte_string, encoding='utf-8')
              Detect if a byte bytes is valid in a specific encoding

              Parametersbyte_string -- Byte bytes to test for bytes not valid in this encoding

                     • encoding -- encoding to test against.  Defaults to UTF-8 <#term-UTF-8>.

              Returns
                     True if there are no invalid UTF-8 <#term-UTF-8> characters.  False if an invalid character
                     is detected.

              Note:
                 This function checks whether the byte bytes is valid in the specified encoding.   It  does  not
                 detect  whether the byte bytes actually was encoded in that encoding.  If you want that sort of
                 functionality, you probably want to use guess_encoding() instead.

       kitchen.text.misc.byte_string_valid_xml(byte_string, encoding='utf-8')
              Check that a byte bytes would be valid in xml

              Parametersbyte_string -- Byte bytes to check

                     • encoding -- Encoding of the xml file.  Default: UTF-8 <#term-UTF-8>

              Returns
                     True if the string is valid.  False if it would be invalid in the xml file

              In some cases you'll have a whole bunch of byte strings and rather than transforming them  to  str
              and  back  to byte bytes for output to xml, you will just want to make sure they work with the xml
              file you're constructing.  This function will help you do that.  Example:

                 ARRAY_OF_MOSTLY_UTF8_STRINGS = [...]
                 processed_array = []
                 for string in ARRAY_OF_MOSTLY_UTF8_STRINGS:
                     if byte_string_valid_xml(string, 'utf-8'):
                         processed_array.append(string)
                     else:
                         processed_array.append(guess_bytes_to_xml(string, encoding='utf-8'))
                 output_xml(processed_array)

       kitchen.text.misc.guess_encoding(byte_string, disable_chardet=False)
              Try to guess the encoding of a byte bytes

              Parametersbyte_string -- byte bytes to guess the encoding of

                     • disable_chardet -- If this is True,  we  never  attempt  to  use  chardet  to  guess  the
                       encoding.   This  is  useful  if  you  need  to  have  reproducibility whether chardet is
                       installed or not.  Default: False.

              Raises TypeError -- if byte_string is not a byte bytes type

              Returns
                     string containing a guess at the encoding of byte_string.  This is appropriate to  pass  as
                     the encoding argument when encoding and decoding unicode strings.

              We  start by attempting to decode the byte bytes as UTF-8 <#term-UTF-8>.  If this succeeds we tell
              the world it's UTF-8 <#term-UTF-8> text.  If it doesn't and chardet is installed on the system and
              disable_chardet is False this function will use it to try detecting the encoding  of  byte_string.
              If it is not installed or chardet cannot determine the encoding with a high enough confidence then
              we rather arbitrarily claim that it is latin-1.  Since latin-1 will encode to every byte, decoding
              from latin-1 to str will not cause UnicodeErrors although the output might be mangled.

       kitchen.text.misc.html_entities_unescape(string)
              Substitute unicode characters for HTML entities

              Parameters
                     string -- str string to substitute out html entities

              Raises TypeError -- if something other than a str string is given

              Return type
                     str string

              Returns
                     The plain text without html entities

       kitchen.text.misc.isbasestring(obj)
              Determine if obj is a byte bytes or str string

              In  python2  this  is eqiuvalent to isinstance(obj, basestring).  In python3 it checks whether the
              object is an instance of str, bytes, or bytearray.  This is an aid to porting code that needed  to
              test  whether  an  object  was  derived from basestring in python2 (commonly used in unicode-bytes
              conversion functions)

              Parameters
                     obj -- Object to test

              Returns
                     True if the object is a basestring.  Otherwise False.

              Added in version Kitchen:: 1.2.0, API kitchen.text 2.2.0

       kitchen.text.misc.isbytestring(obj)
              Determine if obj is a byte bytes

              In python2 this is equivalent to isinstance(obj, str).  In python3 it checks whether the object is
              an instance of bytes or bytearray.

              Parameters
                     obj -- Object to test

              Returns
                     True if the object is a byte bytes.  Otherwise, False.

              Added in version Kitchen:: 1.2.0, API kitchen.text 2.2.0

       kitchen.text.misc.isunicodestring(obj)
              Determine if obj is a str string

              In python2 this is equivalent to isinstance(obj, unicode).   In  python3  it  checks  whether  the
              object is an instance of bytes.

              Parameters
                     obj -- Object to test

              Returns
                     True if the object is a str string.  Otherwise, False.

              Added in version Kitchen:: 1.2.0, API kitchen.text 2.2.0

       kitchen.text.misc.process_control_chars(string, strategy='replace')
              Look for and transform control characters <#term-control-characters> in a string

              Parametersstring    --    string    to   search   for   and   transform   control   characters   <#
                       term-control-characters> within

                     • strategy --

                       XML does not allow ASCII  <#term-ASCII>  control  characters  <#term-control-characters>.
                       When we encounter those we need to know what to do.  Valid options are:

                       replace
                              (default) Replace the control characters <#term-control-characters> with "?"

                       ignore Remove the characters altogether from the output

                       strict Raise   a  ControlCharError  <#kitchen.text.exceptions.ControlCharError>  when  we
                              encounter a control character

              RaisesTypeError -- if string is not a unicode string.

                     • ValueError -- if the strategy is not one of replace, ignore, or strict.

                     • kitchen.text.exceptions.ControlCharError  <#kitchen.text.exceptions.ControlCharError>  --
                       if the strategy is strict and a control character <#term-control-character> is present in
                       the string

              Returns
                     str string with no control characters <#term-control-characters> in it.

              Changed  in version kitchen: 1.2.0, API: kitchen.text 2.2.0 Strip out the C1 control characters in
              addition to the C0 control characters.

       kitchen.text.misc.str_eq(str1, str2, encoding='utf-8', errors='replace')
              Compare two strings, converting to byte bytes if one is str

              Parametersstr1 -- First string to compare

                     • str2 -- Second string to compare

                     • encoding -- If we need to convert one string into a byte bytes to compare,  the  encoding
                       to use.  Default is utf-8 <#term-UTF-8>.

                     • errors  --  What  to  do  if  we  encounter  errors  when  encoding  the string.  See the
                       kitchen.text.converters.to_bytes() <#kitchen.text.converters.to_bytes> documentation  for
                       possible values.  The default is replace.

              This  function  prevents  UnicodeError  (python-2.4  or  less)  and UnicodeWarning (python 2.5 and
              higher) when we compare a str string to a byte bytes.   The  errors  normally  arise  because  the
              conversion  is done to ASCII <#term-ASCII>.  This function lets you convert to utf-8 <#term-UTF-8>
              or another encoding instead.

              Note:
                 When we need to convert one of the strings from str in order to compare them we convert the str
                 string into a byte bytes.  That means that strings can compare differently if you use different
                 encodings for each.

              Note that str1 == str2 is faster than this function if you can accept the following limitations:

              • Limited to python-2.5+ (otherwise a UnicodeDecodeError may be thrown)

              • Will generate a UnicodeWarning if non-ASCII <#term-ASCII> byte bytes is compared to str string.

   UTF-8
       Functions for operating on byte bytes encoded as UTF-8 <#term-UTF-8>

       Note:
          In many cases, it is better to convert to str, operate on the strings, then convert back to  UTF-8  <#
          term-UTF-8>.  str type can handle many of these functions itself.  For those that it doesn't (removing
          control  characters from length calculations, for instance) the code to do so with a str type is often
          simpler.

       Warning:
          All of the functions in this module are deprecated.  Most of them have been  replaced  with  functions
          that    operate    on   unicode   values   in   kitchen.text.display   <#module-kitchen.text.display>.
          kitchen.text.utf8.utf8_valid() has been replaced with a function in kitchen.text.misc <#module-kitchen
          .text.misc>.

       kitchen.text.utf8.utf8_text_fill(text, *args, **kwargs)
              Deprecated Similar to textwrap.fill() but understands  utf-8  <#term-UTF-8>  strings  and  doesn't
              screw up lists/blocks/etc.

              Use kitchen.text.display.fill() <#kitchen.text.display.fill> instead.

       kitchen.text.utf8.utf8_text_wrap(text, width=70, initial_indent='', subsequent_indent='')
              Deprecated  Similar  to textwrap.wrap() but understands utf-8 <#term-UTF-8> data and doesn't screw
              up lists/blocks/etc

              Use kitchen.text.display.wrap() <#kitchen.text.display.wrap> instead

       kitchen.text.utf8.utf8_valid(msg)
              Deprecated Detect if a string is valid utf-8 <#term-UTF-8>

              Use kitchen.text.misc.byte_string_valid_encoding() <#kitchen.text.misc.byte_string_valid_encoding>
              instead.

       kitchen.text.utf8.utf8_width(msg)
              Deprecated Get the textual width <#term-textual-width> of a utf-8 <#term-UTF-8> string

              Use kitchen.text.display.textual_width() <#kitchen.text.display.textual_width> instead.

       kitchen.text.utf8.utf8_width_chop(msg, chop=None)
              Deprecated Return a string chopped to a given textual width <#term-textual-width>

              Use textual_width_chop() <#kitchen.text.display.textual_width_chop> and textual_width()  <#kitchen
              .text.display.textual_width> instead:

                 >>> msg = 'く ku ら ra と to み mi'
                 >>> # Old way:
                 >>> utf8_width_chop(msg, 5)
                 (5, 'く ku')
                 >>> # New way
                 >>> from kitchen.text.converters import to_bytes
                 >>> from kitchen.text.display import textual_width, textual_width_chop
                 >>> (textual_width(msg), to_bytes(textual_width_chop(msg, 5)))
                 (5, 'く ku')

       kitchen.text.utf8.utf8_width_fill(msg, fill, chop=None, left=True, prefix='', suffix='')
              Deprecated Pad a utf-8 <#term-UTF-8> string to fill a specified width

              Use     byte_string_textual_width_fill()    <#kitchen.text.display.byte_string_textual_width_fill>
              instead

       converters <#module-kitchen.text.converters>
              deals with converting text for different encodings and to and from XML

       display <#module-kitchen.text.display>
              deals with issues with printing text to a screen

       misc <#module-kitchen.text.misc>
              is a catchall for text manipulation functions that don't seem to fit elsewhere

       utf8 <#module-kitchen.text.utf8>
              contains deprecated functions to manipulate utf8 byte strings

   Kitchen.collections
   StrictDict
       kitchen.collections.StrictDict provides a dictionary that treats bytes and str as distinct key values.

       kitchen.collections.strictdict.StrictDict
              alias of defaultdict

   Kitchen.iterutils Module
       Functions to manipulate iterables

       Added in version Kitchen:: 0.2.1a1

       Module author: Toshio Kuratomi <<toshio@fedoraproject.org>>

       Module author: Luke Macken <<lmacken@redhat.com>>

       kitchen.iterutils.isiterable(obj, include_string=False)
              Check whether an object is an iterable

              Parametersobj -- Object to test whether it is an iterable

                     • include_string -- If True and obj is a byte bytes or str string this function will return
                       True.  If set to False, byte bytes and str strings will cause  this  function  to  return
                       False.  Default False.

              Returns
                     True if obj is iterable, otherwise False.

       kitchen.iterutils.iterate(obj, include_string=False)
              Generator that can be used to iterate over anything

              Parametersobj -- The object to iterate over

                     • include_string  -- if True, treat strings as iterables.  Otherwise treat them as a single
                       scalar value.  Default False

              This function will create an iterator out of any scalar or iterable.  It is useful  for  making  a
              value  given  to  you  an  iterable  before operating on it.  Iterables have their items returned.
              scalars are transformed into iterables.  A  string  is  treated  as  a  scalar  value  unless  the
              include_string parameter is set to True.  Example usage:

                 >>> list(iterate(None))
                 [None]
                 >>> list(iterate([None]))
                 [None]
                 >>> list(iterate([1, 2, 3]))
                 [1, 2, 3]
                 >>> list(iterate(set([1, 2, 3])))
                 [1, 2, 3]
                 >>> list(iterate(dict(a='1', b='2')))
                 ['a', 'b']
                 >>> list(iterate(1))
                 [1]
                 >>> list(iterate(iter([1, 2, 3])))
                 [1, 2, 3]
                 >>> list(iterate('abc'))
                 ['abc']
                 >>> list(iterate('abc', include_string=True))
                 ['a', 'b', 'c']

   Helpers for versioning software
   PEP-386 compliant versioning
       PEP  386  <https://peps.python.org/pep-0386/> defines a standard format for version strings.  This module
       contains a function for creating strings in that format.

       kitchen.versioning.version_tuple_to_string(version_info)
              Return a PEP 386 <https://peps.python.org/pep-0386/> version string from a PEP  386  <https://peps
              .python.org/pep-0386/> style version tuple

              Parameters
                     version_info -- Nested set of tuples that describes the version.  See below for an example.

              Returns
                     a version string

              This function implements just enough of PEP 386 <https://peps.python.org/pep-0386/> to satisfy our
              needs.   PEP 386 <https://peps.python.org/pep-0386/> defines a standard format for version strings
              and refers to a function that will be merged into the python standard library  <http://docs.python
              .org/library> that transforms a tuple of version information into a standard version string.  This
              function  is  an  implementation  of  that  function.  Once that function becomes available in the
              python standard library <http://docs.python.org/library> we will start using it and deprecate this
              function.

              version_info   takes    the    form    that    PEP    386    <https://peps.python.org/pep-0386/>'s
              NormalizedVersion.from_parts() uses:

                 ((Major, Minor, [Micros]), [(Alpha/Beta/rc marker, version)],
                     [(post/dev marker, version)])

                 Ex: ((1, 0, 0), ('a', 2), ('dev', 3456))

              It generates a PEP 386 <https://peps.python.org/pep-0386/> compliant version string:

                 N.N[.N]+[{a|b|c|rc}N[.N]+][.postN][.devN]

                 Ex: 1.0.0a2.dev3456

              Warning:
                 This function does next to no error checking.  It's up to the person defining the version tuple
                 to  make  sure  that the values make sense.  If the PEP 386 <https://peps.python.org/pep-0386/>
                 compliant version parser doesn't get released soon we'll look at  making  this  function  check
                 that the version tuple makes sense before transforming it into a string.

              It's  recommended  that  you  use  this  function to keep a __version_info__ tuple and __version__
              string in your modules.  Why do we need both a tuple and a string?  The string is often useful for
              putting into human readable locations like release announcements,  version  strings  in  tarballs,
              etc.  Meanwhile the tuple is very easy for a computer to compare. For example, kitchen sets up its
              version information like this:

                 from kitchen.versioning import version_tuple_to_string
                 __version_info__ = ((0, 2, 1),)
                 __version__ = version_tuple_to_string(__version_info__)

              Other  programs  that  depend  on  a  kitchen version between 0.2.1 and 0.3.0 can find whether the
              present version is okay with code like this:

                 from kitchen import __version_info__, __version__
                 if __version_info__ < ((0, 2, 1),) or __version_info__ >= ((0, 3, 0),):
                     print 'kitchen is present but not at the right version.'
                     print 'We need at least version 0.2.1 and less than 0.3.0'
                     print 'Currently found: kitchen-%s' % __version__

   Exceptions
       Kitchen has a hierarchy of exceptions that should make it easy to catch many errors  emitted  by  kitchen
       itself.

   Base kitchen exceptions
       Exception classes for kitchen and the root of the exception hierarchy for all kitchen modules.

       exception kitchen.exceptions.KitchenError
              Base exception class for any error thrown directly by kitchen.

   Kitchen.text exceptions
       Exception classes thrown by kitchen's text processing routines.

       exception kitchen.text.exceptions.ControlCharError
              Exception thrown when an ascii control character is encountered.

       exception kitchen.text.exceptions.XmlEncodeError
              Exception thrown by error conditions when encoding an xml string.

   1.0.0 Porting Guide
       The 0.1 through 1.0.0 releases focused on bringing in functions from yum and python-fedora.  This porting
       guide tells how to port from those APIs to their kitchen replacements.

   python-fedora
                      ┌───────────────────────────────┬───────────────────────────────────────┐
                      │ python-fedora                 │ kitchen replacement                   │
                      ├───────────────────────────────┼───────────────────────────────────────┤
                      │ fedora.iterutils.isiterable()kitchen.iterutils.isiterable()     <# │
                      │                               │ kitchen.iterutils.isiterable> [1]     │
                      ├───────────────────────────────┼───────────────────────────────────────┤
                      │ fedora.textutils.to_unicode()kitchen.text.converters.to_unicode()  │
                      │                               │ <#kitchen.text.converters.to_unicode> │
                      ├───────────────────────────────┼───────────────────────────────────────┤
                      │ fedora.textutils.to_bytes()kitchen.text.converters.to_bytes() <# │
                      │                               │ kitchen.text.converters.to_bytes>     │
                      └───────────────────────────────┴───────────────────────────────────────┘

       [1]  isiterable() <#kitchen.iterutils.isiterable> has changed slightly in  kitchen.   The  include_string
            attribute has switched its default value from True to False.  So you need to change code like:

          >>> # Old code
          >>> isiterable('abcdef')
          True
          >>> # New code
          >>> isiterable('abcdef', include_string=True)
          True

   yum
                   ┌─────────────────────────────┬────────────────────────────────────────────────┐
                   │ yum                         │ kitchen replacement                            │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.dummy_wrapper()kitchen.i18n.DummyTranslations.ugettext()      │
                   │                             │ [2]                                            │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.dummyP_wrapper()kitchen.i18n.DummyTanslations.ungettext()      │
                   │                             │ [2]                                            │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.utf8_width()kitchen.text.display.textual_width()   <#      │
                   │                             │ kitchen.text.display.textual_width>            │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.utf8_width_chop()kitchen.text.display.textual_width_chop()      │
                   │                             │ <#kitchen.text.display                         │
                   │                             │ .textual_width_chop>                  and      │
                   │                             │ kitchen.text.display.textual_width()   <#      │
                   │                             │ kitchen.text.display.textual_width>   [3]      │
                   │                             │ [5]                                            │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.utf8_valid()kitchen.text.misc.byte_string_valid_encoding() │
                   │                             │ <#kitchen.text.misc                            │
                   │                             │ .byte_string_valid_encoding>                   │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.utf8_text_wrap()kitchen.text.display.wrap()     <#kitchen.text │
                   │                             │ .display.wrap> [4]                             │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.utf8_text_fill()kitchen.text.display.fill()     <#kitchen.text │
                   │                             │ .display.fill> [4]                             │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.to_unicode()kitchen.text.converters.to_unicode() <#kitchen │
                   │                             │ .text.converters.to_unicode> [6]               │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.to_unicode_maybe()kitchen.text.converters.to_unicode() <#kitchen │
                   │                             │ .text.converters.to_unicode> [6]               │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.to_utf8()kitchen.text.converters.to_bytes()   <#kitchen │
                   │                             │ .text.converters.to_bytes> [6]                 │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.to_str()kitchen.text.converters.to_unicode() <#kitchen │
                   │                             │ .text.converters.to_unicode>                or │
                   │                             │ kitchen.text.converters.to_bytes()   <#kitchen │
                   │                             │ .text.converters.to_bytes> [7]                 │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.str_eq()kitchen.text.misc.str_eq() <#kitchen.text.misc │
                   │                             │ .str_eq>                                       │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.misc.to_xml()kitchen.text.converters.unicode_to_xml()    <# │
                   │                             │ kitchen.text.converters.unicode_to_xml>     or │
                   │                             │ kitchen.text.converters.byte_string_to_xml()   │
                   │                             │ <#kitchen.text.converters.byte_string_to_xml>  │
                   │                             │ [8]                                            │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n._()                │ See: Initializing Yum i18n                     │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.P_()               │ See: Initializing Yum i18n                     │
                   ├─────────────────────────────┼────────────────────────────────────────────────┤
                   │ yum.i18n.exception2msg()kitchen.text.converters.exception_to_unicode() │
                   │                             │ <#kitchen.text.converters                      │
                   │                             │ .exception_to_unicode>                      or │
                   │                             │ kitchen.text.converter.exception_to_bytes()    │
                   │                             │ [9]                                            │
                   └─────────────────────────────┴────────────────────────────────────────────────┘

       [2]  These  yum methods provided fallback support for gettext functions in case either gaftonmode was set
            or gettext failed to return an object.  In kitchen, we can use the kitchen.i18n.DummyTranslations <#
            kitchen.i18n.DummyTranslations> object to fulfill that role.  Please see Initializing Yum  i18n  for
            more suggestions on how to do this.

       [3]  The  yum  version of these functions returned a byte str.  The kitchen version listed here returns a
            unicode string.  If you need a byte str  simply  call  kitchen.text.converters.to_bytes()  <#kitchen
            .text.converters.to_bytes> on the result.

       [4]  The  yum  version of these functions would return either a byte str or a unicode string depending on
            what the input value was.  The kitchen version always returns unicode strings.

       [5]  yum.i18n.utf8_width_chop() performed two functions.  It returned the piece of the message  that  fit
            in a specified width and the width of that message.  In kitchen, you need to call two functions, one
            for each action:

          >>> # Old way
          >>> utf8_width_chop(msg, 5)
          (5, 'く ku')
          >>> # New way
          >>> from kitchen.text.display import textual_width, textual_width_chop
          >>> (textual_width(msg), textual_width_chop(msg, 5))
          (5, u'く ku')

       [6]  If  the yum version of to_unicode() or to_utf8() is given an object that is not a string, it returns
            the object itself.  kitchen.text.converters.to_unicode()  <#kitchen.text.converters.to_unicode>  and
            kitchen.text.converters.to_bytes()  <#kitchen.text.converters.to_bytes>  default  to  returning  the
            simplerepr of the object instead.  If you want the yum behaviour, set  the  nonstring  parameter  to
            passthru:

          >>> from kitchen.text.converters import to_unicode
          >>> to_unicode(5)
          u'5'
          >>> to_unicode(5, nonstring='passthru')
          5

       [7]  yum.i18n.to_str()  could  return  either a byte str.  or a unicode string In kitchen you can get the
            same effect but you get to choose whether you want a byte str or a unicode string.   Use  to_bytes()
            <#kitchen.text.converters.to_bytes>  for  str and to_unicode() <#kitchen.text.converters.to_unicode>
            for unicode.

       [8]  yum.misc.to_xml() was buggy as written.  I think the intention was for you to be able to pass a byte
            str or unicode string in and get out a byte str that was valid to use  in  an  xml  file.   The  two
            kitchen    functions    byte_string_to_xml()    <#kitchen.text.converters.byte_string_to_xml>    and
            unicode_to_xml() <#kitchen.text.converters.unicode_to_xml> do that for each string type.

       [9]  When porting yum.i18n.exception2msg() to use kitchen, you should setup two wrapper functions to  aid
            in your port.  They'll look like this:

          from kitchen.text.converters import EXCEPTION_CONVERTERS, \
              BYTE_EXCEPTION_CONVERTERS, exception_to_unicode, \
              exception_to_bytes
          def exception2umsg(e):
              '''Return a unicode representation of an exception'''
              c = [lambda e: e.value]
              c.extend(EXCEPTION_CONVERTERS)
              return exception_to_unicode(e, converters=c)
          def exception2bmsg(e):
              '''Return a utf8 encoded str representation of an exception'''
              c = [lambda e: e.value]
              c.extend(BYTE_EXCEPTION_CONVERTERS)
              return exception_to_bytes(e, converters=c)

       The  reason  to  define  this  wrapper is that many of the exceptions in yum put the message in the value
       attribute  of  the  Exception  instead  of  adding  it  to  the   args   attribute.    So   the   default
       EXCEPTION_CONVERTERS   <#kitchen.text.converters.EXCEPTION_CONVERTERS>  don't  know  where  to  find  the
       message.  The wrapper tells kitchen to check the value attribute for the message.  The reason  to  define
       two  wrappers  may  be  less obvious.  yum.i18n.exception2msg() can return a unicode string or a byte str
       depending on a combination of what attributes are present on the Exception and what locale  the  function
       is  being  run in.  By contrast, kitchen.text.converters.exception_to_unicode() <#kitchen.text.converters
       .exception_to_unicode> only returns unicode strings and  kitchen.text.converters.exception_to_bytes()  <#
       kitchen.text.converters.exception_to_bytes>  only  returns byte str.  This is much safer as it keeps code
       that can only handle unicode or only handle byte str correctly from getting the wrong type when an  input
       changes  but it means you need to examine the calling code when porting from yum.i18n.exception2msg() and
       use the appropriate wrapper.

   Initializing Yum i18n
       Previously, yum had several pieces of code to initialize i18n.  From the toplevel of yum/i18n.py:

          try:.
              '''
              Setup the yum translation domain and make _() and P_() translation wrappers
              available.
              using ugettext to make sure translated strings are in Unicode.
              '''
              import gettext
              t = gettext.translation('yum', fallback=True)
              _ = t.ugettext
              P_ = t.ungettext
          except:
              '''
              Something went wrong so we make a dummy _() wrapper there is just
              returning the same text
              '''
              _ = dummy_wrapper
              P_ = dummyP_wrapper

       With kitchen, this can be changed to this:

          from kitchen.i18n import easy_gettext_setup, DummyTranslations
          try:
              _, P_ = easy_gettext_setup('yum')
          except:
              translations = DummyTranslations()
              _ = translations.ugettext
              P_ = translations.ungettext

       Note:
          In Overcoming frustration:  Correctly  using  unicode  in  python2  <#overcoming-frustration>,  it  is
          mentioned  that  for  some  things (like exception messages), using the byte str oriented functions is
          more  appropriate.   If  this  is  desired,  the  setup   portion   is   only   a   second   call   to
          kitchen.i18n.easy_gettext_setup() <#kitchen.i18n.easy_gettext_setup>:

              b_, bP_ = easy_gettext_setup('yum', use_unicode=False)

       The second place where i18n is setup is in yum.YumBase._getConfig() in yum/__init_.py if gaftonmode is in
       effect:

          if startupconf.gaftonmode:
              global _
              _ = yum.i18n.dummy_wrapper

       This can be changed to:

          if startupconf.gaftonmode:
              global _
              _ = DummyTranslations().ugettext()

   Conventions for contributing to kitchen
   Style
       • Strive to be PEP 8 <https://peps.python.org/pep-0008/> compliant

       • Run :command:`pylint ` over the code and try to resolve most of its nitpicking

   Python 2.4 compatibility
       At  the  moment, we're supporting python-2.4 and above.  Understand that there's a lot of python features
       that we cannot use because of this.

       Sometimes modules in the python standard library <http://docs.python.org/library> can be added to kitchen
       so that they're available.  When we do that we need to be careful of several things:

       1. Keep   the   module   in   sync    with    the    version    in    the    python-2.x    trunk.     Use
          maintainers/sync-copied-files.py for this.

       2. Sync the unittests as well as the module.

       3. Be  aware  that  not all modules are written to remain compatible with Python-2.4 and might use python
          language features that were not present then (generator  expressions,  relative  imports,  decorators,
          with,  try:  with  both  except:  and finally:, etc)  These are not good candidates for importing into
          kitchen as they require more work to keep synced.

   Unittests
       • At least smoketest your code (make sure a function will return expected values for one set of inputs).

       • Note that even 100% coverage is not a guarantee of working code!  Good tests will realize that you need
         to also give multiple inputs that test the code paths of called functions  that  are  outside  of  your
         code.  Example:

            def to_unicode(msg, encoding='utf8', errors='replace'):
                return unicode(msg, encoding, errors)

            # Smoketest only.  This will give 100% coverage for your code (it
            # tests all of the code inside of to_unicode) but it leaves a lot of
            # room for errors as it doesn't test all combinations of arguments
            # that are then passed to the unicode() function.

            tools.ok_(to_unicode('abc') == u'abc')

            # Better -- tests now cover non-ascii characters and that error conditions
            # occur properly.  There's a lot of other permutations that can be
            # added along these same lines.
            tools.ok_(to_unicode(u'café', 'utf8', 'replace'))
            tools.assert_raises(UnicodeError, to_unicode, [u'cafè ñunru'.encode('latin1')])

       • We're  using  nose  for  unittesting.  Rather than depend on unittest2 functionality, use the functions
         that nose provides.

       • Remember to maintain python-2.4 compatibility even in unittests.

   Docstrings and documentation
       We use sphinx to build our documentation.  We use the sphinx autodoc extension to pull docstrings out  of
       the  modules for API documentation.  This means that docstrings for subpackages and modules should follow
       a certain pattern.  The general structure is:

       • Introductory material about a module in the module's top level docstring.

         • Introductory material should begin with a level two title: an overbar and underbar of '-'.

       • docstrings for every function.

         • The first line is a short summary of what the function does

         • This is followed by a blank line

         • The next lines are a field list  <http://sphinx.pocoo.org/markup/desc.html#info-field-lists>_  giving
           information  about  the  function's signature.  We use the keywords: arg, kwarg, raises, returns, and
           sometimes rtype.  Use these to describe all arguments, key word  arguments,  exceptions  raised,  and
           return values using these.

           • Parameters that are kwarg should specify what their default behaviour is.

   Kitchen versioning
       Currently  the  kitchen  library  is in early stages of development.  While we're in this state, the main
       kitchen library uses the following pattern for version information:

       •

         Versions look like this::
                __version_info__ = ((0, 1, 2),) __version__ = '0.1.2'

       • The Major version number remains at 0 until we decide to make the first 1.0  release  of  kitchen.   At
         that  point,  we're  declaring  that  we  have  some  confidence  that we won't need to break backwards
         compatibility for a while.

       • The Minor version increments for any backwards incompatible API changes.   When  this  is  updated,  we
         reset micro to zero.

       • The  Micro  version  increments for any other changes (backwards compatible API changes, pure bugfixes,
         etc).

       Note:
          Versioning is only updated for  releases  that  generate  sdists  and  new  uploads  to  the  download
          directory.   Usually  we  update  the  version  information  for  the library just before release.  By
          contrast, we update kitchen Versioning when an API change is made.  When in doubt, look at the version
          information in the last release.

   I18N
       All strings that are used as feedback for users need to be translated.  kitchen sets up several functions
       for this.  _() is used for marking things that are shown to users via print, GUIs,  or  other  "standard"
       methods.   Strings for exceptions are marked with b_().  This function returns a byte str which is needed
       for use with exceptions:

          from kitchen import _, b_

          def print_message(msg, username):
              print _('%(user)s, your message of the day is:  %(message)s') % {
                      'message': msg, 'user': username}

              raise Exception b_('Test message')

       This serves several purposes:

       • It marks the strings to be extracted by an xgettext-like program.

       • _() is a function that will substitute available translations at runtime.

       Note:
          By using the %()s with dict style of string formatting, we make this string  friendly  to  translators
          that may need to reorder the variables when they're translating the string.

       paver <http://www.blueskyonmars.com/projects/paver/>_ and babel <http://babel.edgewall.org/>_ are used to
       extract the strings.

   API updates
       Kitchen  strives  to  have a long deprecation cycle so that people have time to switch away from any APIs
       that we decide to discard.  Discarded APIs should raise a DeprecationWarning and  clearly  state  in  the
       warning  message  and  the  docstring  how  to  convert old code to use the new interface.  An example of
       deprecating a function:

          import warnings

          from kitchen import _
          from  kitchen.text.converters import to_bytes, to_unicode
          from kitchen.text.new_module import new_function

          def old_function(param):
              '''**Deprecated**

              This function is deprecated.  Use
              :func:`kitchen.text.new_module.new_function` instead. If you want
              unicode strngs as output, switch to::

                  >>> from kitchen.text.new_module import new_function
                  >>> output = new_function(param)

              If you want byte strings, use::

                  >>> from kitchen.text.new_module import new_function
                  >>> from kitchen.text.converters import to_bytes
                  >>> output = to_bytes(new_function(param))
              '''
              warnings.warn(_('kitchen.text.old_function is deprecated.  Use'
                  ' kitchen.text.new_module.new_function instead'),
                  DeprecationWarning, stacklevel=2)

              as_unicode = isinstance(param, unicode)
              message = new_function(to_unicode(param))
              if not as_unicode:
                  message = to_bytes(message)
              return message

       If a particular API change is very intrusive, it may be better to create a new version of the  subpackage
       and ship both the old version and the new version.

   NEWS file
       Update  the  NEWS file when you make a change that will be visible to the users.  This is not a ChangeLog
       file so we don't need to list absolutely everything but it should give the  user  an  idea  of  how  this
       version differs from prior versions.  API changes should be listed here explicitly.  bugfixes can be more
       general:

          -----
          0.2.0
          -----
          * Relicense to LGPLv2+
          * Add kitchen.text.format module with the following functions:
            textual_width, textual_width_chop.
          * Rename the kitchen.text.utils module to kitchen.text.misc.  use of the
            old names is deprecated but still available.
          * bugfixes applied to kitchen.pycompat24.defaultdict that fixes some
            tracebacks

   Kitchen subpackages
       Kitchen itself is a namespace.  The kitchen sdist (tarball) provides certain useful subpackages.

       See also:

          Kitchen addon packages
                 For  information  about  subpackages not distributed in the kitchen sdist that install into the
                 kitchen namespace.

   Versioning
       Each subpackage should have its own version  information  which  is  independent  of  the  other  kitchen
       subpackages  and the main kitchen library version. This is used so that code that depends on kitchen APIs
       can check the version information.  The standard way to do this is to put  something  like  this  in  the
       subpackage's __init__.py:

          from kitchen.versioning import version_tuple_to_string

          __version_info__ = ((1, 0, 0),)
          __version__ = version_tuple_to_string(__version_info__)

       __version_info__  is  documented  in  kitchen.versioning <#module-kitchen.versioning>.  The values of the
       first tuple should describe API changes to the module.  There are at least three numbers present  in  the
       tuple:  (Major,  minor,  micro).   The  major  version  number is for backwards incompatible changes (For
       instance, removing a function, or adding a new mandatory argument to a function).  Whenever one of  these
       occurs,  you  should  increment the major number and reset minor and micro to zero.  The second number is
       the minor version.  Anytime new but backwards compatible changes are introduced  this  number  should  be
       incremented  and  the micro version number reset to zero.  The micro version should be incremented when a
       change is made that does not change the API at all.  This is a common case for bugfixes, for instance.

       Version information beyond the first three parts of the first tuple may  be  useful  for  versioning  but
       semantically have similar meaning to the micro version.

       Note:
          We  update  the  __version_info__  tuple  when  the  API  is updated.  This way there's less chance of
          forgetting to update the API version when a new release is made.  However, we try  to  only  increment
          the  version  numbers a single step for any release.  So if kitchen-0.1.0 has kitchen.text.__version__
          == '1.0.1', kitchen-0.1.1 should have kitchen.text.__version__ == '1.0.2' or '1.1.0' or '2.0.0'.

   Criteria for subpackages in kitchen
       Supackages within kitchen should meet these criteria:

       • Generally useful or needed for other pieces of kitchen.

       • No mandatory requirements outside of the python standard library <http://docs.python.org/library>.

         • Optional requirements from outside the python standard library  <http://docs.python.org/library>  are
           allowed.  Things with mandatory requirements are better placed in kitchen addon packages

       • Somewhat  API stable -- this is not a hard requirement.  We can change the kitchen api.  However, it is
         better not to as people may come to depend on it.

         See also:
            API Updates

   Kitchen addon packages
       Addon packages are very similar to subpackages integrated into the  kitchen  sdist.   This  section  just
       lists some of the differences to watch out for.

   setup.py
       Your setup.py should contain entries like this:

          # It's suggested to use a dotted name like this so the package is easily
          # findable on pypi:
          setup(name='kitchen.config',
              # Include kitchen in the keywords, again, for searching on pypi
              keywords=['kitchen', 'configuration'],
              # This package lives in the directory kitchen/config
              packages=['kitchen.config'],
              # [...]
          )

   Package directory layout
       Create a kitchen directory in the toplevel.  Place the addon subpackage in there.  For example:

          ./                     <== toplevel with README, setup.py, NEWS, etc
          kitchen/
          kitchen/__init__.py
          kitchen/config/        <== subpackage directory
          kitchen/config/__init__.py

   Fake kitchen module
       The  :file::__init__.py  in  the  kitchen directory is special.  It won't be installed.  It just needs to
       pull in the kitchen from the system so that you are able to test your module.  You should be able to  use
       this boilerplate:

          # Fake module.  This is not installed,  It's just made to import the real
          # kitchen modules for testing this module
          import pkgutil

          # Extend the __path__ with everything in the real kitchen module
          __path__ = pkgutil.extend_path(__path__, __name__)

       Note:
          kitchen  needs to be findable by python for this to work.  Installed in the site-packages directory or
          adding it to the PYTHONPATH will work.

       Your unittests should now be able to find both your submodule and the main kitchen module.

   Versioning
       It is recommended that  addon  packages  version  similarly  to  Versioning.   The  __version_info__  and
       __version__  strings  can  be  changed independently of  the version exposed by setup.py so that you have
       both an API version (__version_info__) and release version that's easier for people to  parse.   However,
       you  aren't  required  to do this and you could follow a different methodology if you want (for instance,
       Kitchen versioning)

   Glossary
       "Everything but the kitchen sink"
              An English idiom meaning to include nearly everything that you can think of.

       API version
              Version that is meant for computer consumption.   This  version  is  parsable  and  comparable  by
              computers.   It  contains  information  about a library's API so that computer software can decide
              whether it works with the software.

       ASCII  A character encoding that maps numbers to characters essential to American English.  It  maps  128
              characters using 7bits.

              See also:
                 <http://en.wikipedia.org/wiki/ASCII>

       ASCII compatible
              An  encoding  in  which the particular byte that maps to a character in the ASCII character set is
              only used to map to that character.  This excludes EBDIC based encodings and many multi-byte fixed
              and variable width encodings since they reuse the bytes that make up the ASCII encoding for  other
              purposes.  UTF-8 is notable as a variable width encoding that is ASCII compatible.

              See also:

                 <http://en.wikipedia.org/wiki/Variable-width_encoding>
                        For  another  explanation  of  various ways bytes are mapped to characters in a possibly
                        incompatible manner.

       code points
              code point

       code point
              A number that maps to a particular abstract character.  Code points make it  so  that  we  have  a
              number  pointing to a character without worrying about implementation details of how those numbers
              are stored for the computer to read.  Encodings define how  the  code  points  map  to  particular
              sequences of bytes on disk  and in memory.

       control characters
              control character

       control character
              The  set  of characters in unicode that are used, not to display glyphs on the screen, but to tell
              the display in program to do something.

              See also:
                 <http://en.wikipedia.org/wiki/Control_character>

       grapheme
              characters or pieces of characters that you might write on a page to  make  words,  sentences,  or
              other pieces of text.

              See also:
                 <http://en.wikipedia.org/wiki/Grapheme>

       I18N   I18N  is  an  abbreviation  for  internationalization.   It's  often  used  to signify the need to
              translate words, number and date formats, and other pieces of data in a computer program  so  that
              it will work well for people who speak another language than yourself.

       message catalogs
              message catalog

       message catalog
              Message  catalogs  contain  translations  for  user-visible strings that are present in your code.
              Normally, you need to mark the strings to be translated by wrapping them in one of several gettext
              functions.  The function serves two purposes:

              1. It allows automated tools to find which strings are supposed to be extracted for translation.

              2. The functions perform the translation when the program is running.

              See also:
                 babel's documentation <http://babel.edgewall.org/wiki/Documentation/messages.html>
                     for one method of extracting message catalogs from source code.

       Murphy's Law
              "Anything that can go wrong, will go wrong."

              See also:
                 <http://en.wikipedia.org/wiki/Murphy%27s_Law>

       release version
              Version that is meant for human consumption.  This version is easy for  a  human  to  look  at  to
              decide how a particular version relates to other versions of the software.

       textual width
              The  amount of horizontal space a character takes up on a monospaced screen.  The units are number
              of character cells or columns that it takes the place of.

       UTF-8  A character encoding that maps all unicode code points to a sequence of bytes.  It  is  compatible
              with  ASCII.   It uses a variable number of bytes to encode all of unicode.  ASCII characters take
              one byte.  Characters from other parts of unicode take two to four bytes.  It is widespread as  an
              encoding on the internet and in Linux.

INDICES AND TABLES

       • Index <>

       • Module Index <>

       • Search Page <>

PROJECT PAGES

       More information about the project can be found on the project webpage <https://fedorahosted.org/kitchen>

       The  latest  published version of this documentation can be found on the documentation page <https://pypi
       .python.org/pypi/kitchen/docs>

Author

       Author name not set

Copyright

       2012 Red Hat, Inc. and others

0.2                                               Sep 06, 2025                                        KITCHEN(1)