Provided by: python-kitchen-doc_1.2.6-4_all bug

NAME

       kitchen - kitchen 1.2.6

       Author Toshio Kuratomi

       Date   19 March 2011

       Version
              1.0.x

       We've  all  done  it.   In the process of writing a brand new application we've discovered
       that we need a little bit of code that we've invented before.  Perhaps it's  something  to
       handle  unicode  text.   Perhaps  it's  something  to make a bit of python-2.5 code run on
       python-2.4.  Whatever it is, it ends up being a tiny bit of code that seems too  small  to
       worry  about pushing into its own module so it sits there, a part of your current project,
       waiting to be cut and pasted into your next project.  And the next.  And  the  next.   And
       since  that  little  bittybit  of code proved so useful to you, it's highly likely that it
       proved useful to someone else as well.  Useful enough that they've written it and copy and
       pasted it over and over into each of their new projects.

       Well,  no  longer!   Kitchen  aims  to pull these small snippets of code into a few python
       modules which you can import and use within your project.  No more copy  and  paste!   Now
       you  can let someone else maintain and release these small snippets so that you can get on
       with your life.

       This package forms the core of Kitchen.  It contains some useful modules for  using  newer
       python  standard  library  modules  on  older  python versions, text manipulation, PEP 386
       versioning, and initializing gettext.  With this package we're trying  to  provide  a  few
       useful  features  that  don't  have  too  many dependencies outside of the python standard
       library.  We'll be releasing other modules that drop into the  kitchen  namespace  to  add
       other features (possibly with larger deps) as time goes on.

REQUIREMENTS

       We've  tried  to  keep  the core kitchen module's requirements lightweight.  At the moment
       kitchen only requires

       python 2.4 or later

       WARNING:
          Kitchen-1.1.0 was the last release to support python-2.3.x.

   Soft Requirements
       If found, these libraries will be used to make the implementation of some part of  kitchen
       better  in  some  way.  If they are not present, the API that they enable will still exist
       but may function in a different manner.

       chardet
              Used in guess_encoding() and guess_encoding_to_xml() to help guess encoding of byte
              strings being converted.  If not present, unknown encodings will be converted as if
              they were latin1

OTHER RECOMMENDED LIBRARIES

       These libraries implement commonly used  functionality  that  everyone  seems  to  invent.
       Rather  than  reinvent  their  wheel,  I simply list the things that they do well for now.
       Perhaps if people can't find them normally, I'll add them as requirements in  setup.py  or
       link them into kitchen's namespace.  For now, I just mention them here:

       bunch  Bunch is a dictionary that you can use attribute lookup as well as bracket notation
              to access.  Setting it apart from most homebrewed implementations is the bunchify()
              function  which will descend nested structures of lists and dicts, transforming the
              dicts to Bunch's.

       hashlib
              Python 2.5 and forward have a hashlib library that provides secure  hash  functions
              to  python.   If  you're  developing  for  python2.4  though,  you  can install the
              standalone hashlib library and have access to the same functions.

       iterutils
              The python documentation for itertools has some examples  of  other  nice  iterable
              functions  that can be built from the itertools functions.  This third-party module
              creates those recipes as a module.

       ordereddict
              Python 2.7 and forward have a OrderedDict that provides  a  dict  whose  items  are
              ordered (and indexable) as well as named.

       unittest2
              Python  2.7  has  an updated unittest library with new functions not present in the
              python standard library for Python 2.6 or less.  If  you  want  to  use  those  new
              functions  but  need  your testing framework to be compatible with older Python the
              unittest2 library provides the update as an external module.

       nose   If you want to use a  test  discovery  tool  instead  of  the  unittest  framework,
              nosetests provides a simple to use way to do that.

LICENSE

       This python module is distributed under the terms of the GNU Lesser General Public License
       Version 2 or later.

       NOTE:
          Some parts of this module are licensed under terms less restrictive than  the  LGPLv2+.
          If  you separate these files from the work as a whole you are allowed to use them under
          the less restrictive licenses.  The following is a list of the files that are known:

          Python 2 license
                 _subprocess.py,   test_subprocess.py,    defaultdict.py,    test_defaultdict.py,
                 _base64.py, and test_base64.py

CONTENTS

   Using kitchen to write good code
       Kitchen's  functions  won't automatically make you a better programmer.  You have to learn
       when and how to use them as well.  This section of the documentation is intended  to  show
       you  some  of  the  ways  that you can apply kitchen's functions to problems that may have
       arisen in your life.  The goal of this section  is  to  give  you  enough  information  to
       understand  what  the kitchen API can do for you and where in the Kitchen API docs to look
       for something that can help you with your next issue.  Along the way, you  might  pick  up
       the knack for identifying issues with your code before you publish it.  And that will make
       you a better coder.

   Overcoming frustration: Correctly using unicode in python2
       In python-2.x, there's two types that deal with text.

       1. str is for strings of bytes.  These are very similar  in  nature  to  how  strings  are
          handled in C.

       2. unicode is for strings of unicode code points.

       NOTE:
          Just what the dickens is "Unicode"?

          One  mistake  that  people encountering this issue for the first time make is confusing
          the unicode type and the encodings of unicode stored in the str type.  In  python,  the
          unicode  type stores an abstract sequence of code points.  Each code point represents a
          grapheme.  By contrast, byte str stores a sequence of bytes which can then be mapped to
          a  sequence  of code points.  Each unicode encoding (UTF-8, UTF-7, UTF-16, UTF-32, etc)
          maps different sequences of bytes to the unicode code points.

          What  does  that  mean  to  you  as  a  programmer?   When  you're  dealing  with  text
          manipulations (finding the number of characters in a string or cutting a string on word
          boundaries) you should be dealing with unicode strings as they abstract characters in a
          manner  that's  appropriate for thinking of them as a sequence of letters that you will
          see on a page.  When dealing with I/O, reading to and from  the  disk,  printing  to  a
          terminal,  sending  something over a network link, etc, you should be dealing with byte
          str as those devices are going to need to deal with concrete  implementations  of  what
          bytes represent your abstract characters.

       In  the python2 world many APIs use these two classes interchangably but there are several
       important APIs where only one or the other will do the right thing.   When  you  give  the
       wrong type of string to an API that wants the other type, you may end up with an exception
       being raised (UnicodeDecodeError or UnicodeEncodeError).  However, these exceptions aren't
       always raised because python implicitly converts between types... sometimes.

   Frustration #1: Inconsistent Errors
       Although  converting  when  possible  seems  like the right thing to do, it's actually the
       first source of frustration.  A programmer can test out their program with a string  like:
       The  quick brown fox jumped over the lazy dog and not encounter any issues.  But when they
       release their software into the wild, someone enters the string: I sat down for coffee  at
       the  café  and  suddenly an exception is thrown.  The reason?  The mechanism that converts
       between the two types is only able to deal with ASCII characters.  Once  you  throw  non-‐
       ASCII  characters  into  your  strings,  you  have  to  start  dealing with the conversion
       manually.

       So, if I manually convert everything to either byte str or  unicode  strings,  will  I  be
       okay?  The answer is.... sometimes.

   Frustration #2: Inconsistent APIs
       The problem you run into when converting everything to byte str or unicode strings is that
       you'll be using someone else's API quite often (this  includes  the  APIs  in  the  python
       standard  library)  and find that the API will only accept byte str or only accept unicode
       strings.  Or worse, that the code will accept either when you're dealing with strings that
       consist  solely of ASCII but throw an error when you give it a string that's got non-ASCII
       characters.  When you encounter these APIs you first need to identify which type will work
       better  and  then you have to convert your values to the correct type for that code.  Thus
       the programmer that wants to proactively fix all unicode errors in their code needs to  do
       two things:

       1. You  must keep track of what type your sequences of text are.  Does my_sentence contain
          unicode or str?  If you don't know that then you're going to be in for a world of hurt.

       2. Anytime you call a function you need to evaluate whether  that  function  will  do  the
          right  thing  with  str or unicode values.  Sending the wrong value here will lead to a
          UnicodeError being thrown when the string contains non-ASCII characters.

       NOTE:
          There is one mitigating factor here.  The python community has  been  standardizing  on
          using unicode in all its APIs.  Although there are some APIs that you need to send byte
          str to in order to be safe, (including things as ubiquitous as print() as we'll see  in
          the  next  section),  it's  getting  easier and easier to use unicode strings with most
          APIs.

   Frustration #3: Inconsistent treatment of output
       Alright, since the python community is moving to  using  unicode  strings  everywhere,  we
       might  as  well  convert  everything  to  unicode  strings and use that by default, right?
       Sounds good most of the time but there's at least one huge caveat to be aware of.  Anytime
       you  output  text  to  the terminal or to a file, the text has to be converted into a byte
       str.  Python will try to implicitly convert from unicode to byte str... but it will  throw
       an exception if the bytes are non-ASCII:

          >>> string = unicode(raw_input(), 'utf8')
          café
          >>> log = open('/var/tmp/debug.log', 'w')
          >>> log.write(string)
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

       Okay, this is simple enough to solve:  Just convert to a byte str and we're all set:

          >>> string = unicode(raw_input(), 'utf8')
          café
          >>> string_for_output = string.encode('utf8', 'replace')
          >>> log = open('/var/tmp/debug.log', 'w')
          >>> log.write(string_for_output)
          >>>

       So  that  was simple, right?  Well... there's one gotcha that makes things a bit harder to
       debug sometimes.  When you attempt to write  non-ASCII  unicode  strings  to  a  file-like
       object  you  get  a  traceback  everytime.   But  what  happens when you use print()?  The
       terminal is a file-like object so it should raise an exception right?  The answer to  that
       is....  sometimes:

          $ python
          >>> print u'café'
          café

       No exception.  Okay, we're fine then?

       We are until someone does one of the following:

       • Runs the script in a different locale:

            $ LC_ALL=C python
            >>> # Note: if you're using a good terminal program when running in the C locale
            >>> # The terminal program will prevent you from entering non-ASCII characters
            >>> # python will still recognize them if you use the codepoint instead:
            >>> print u'caf\xe9'
            Traceback (most recent call last):
              File "<stdin>", line 1, in <module>
            UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

       • Redirects output to a file:

            $ cat test.py
            #!/usr/bin/python -tt
            # -*- coding: utf-8 -*-
            print u'café'
            $ ./test.py  >t
            Traceback (most recent call last):
              File "./test.py", line 4, in <module>
                print u'café'
            UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

       Okay,  the  locale thing is a pain but understandable: the C locale doesn't understand any
       characters outside of ASCII so naturally attempting to display those won't work.  Now  why
       does  redirecting  to  a  file cause problems?  It's because print() in python2 is treated
       specially.  Whereas the other file-like objects in python always convert to  ASCII  unless
       you  set  them up differently, using print() to output to the terminal will use the user's
       locale to convert before sending  the  output  to  the  terminal.   When  print()  is  not
       outputting  to  the  terminal  (being redirected to a file, for instance), print() decides
       that it doesn't know what locale to use for that file and so it tries to convert to  ASCII
       instead.

       So  what  does  this  mean  for  you,  as  a  programmer?   Unless  you have the luxury of
       controlling how your users use your code, you should always, always, always convert  to  a
       byte str before outputting strings to the terminal or to a file.  Python even provides you
       with a facility to do just this.  If you know that every unicode  string  you  send  to  a
       particular  file-like  object  (for  instance, stdout) should be converted to a particular
       encoding you can use a codecs.StreamWriter object to convert from a unicode string into  a
       byte  str.   In  particular, codecs.getwriter() will return a StreamWriter class that will
       help you to wrap a file-like object for output.  Using our print() example:

          $ cat test.py
          #!/usr/bin/python -tt
          # -*- coding: utf-8 -*-
          import codecs
          import sys

          UTF8Writer = codecs.getwriter('utf8')
          sys.stdout = UTF8Writer(sys.stdout)
          print u'café'
          $ ./test.py  >t
          $ cat t
          café

   Frustrations #4 and #5 -- The other shoes
       In English, there's a saying "waiting for the other shoe to drop".  It means that when one
       event  (usually  bad)  happens,  you  come to expect another event (usually worse) to come
       after.  In this case we have two other shoes.

   Frustration #4: Now it doesn't take byte strings?!
       If you wrap sys.stdout using codecs.getwriter() and think you are now safe  to  print  any
       variable  without  checking  its type I am afraid I must inform you that you're not paying
       enough attention to Murphy's Law.  The StreamWriter that codecs.getwriter() provides  will
       take  unicode strings and transform them into byte str before they get to sys.stdout.  The
       problem is if you give it something that's already a byte str it tries to  transform  that
       as  well.   To  do  that  it  tries to turn the byte str you give it into unicode and then
       transform that back into a byte str...  and since it uses the ASCII codec to perform those
       conversions, chances are that it'll blow up when making them:

          >>> import codecs
          >>> import sys
          >>> UTF8Writer = codecs.getwriter('utf8')
          >>> sys.stdout = UTF8Writer(sys.stdout)
          >>> print 'café'
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
            File "/usr/lib64/python2.6/codecs.py", line 351, in write
              data, consumed = self.encode(object, self.errors)
          UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

       To  work around this, kitchen provides an alternate version of codecs.getwriter() that can
       deal with both byte str and unicode strings.  Use  kitchen.text.converters.getwriter()  in
       place of the codecs version like this:

          >>> import sys
          >>> from kitchen.text.converters import getwriter
          >>> UTF8Writer = getwriter('utf8')
          >>> sys.stdout = UTF8Writer(sys.stdout)
          >>> print u'café'
          café
          >>> print 'café'
          café

   Frustration #5: Inconsistent APIs Part deux
       Sometimes  you  do  everything right in your code but other people's code fails you.  With
       unicode issues this happens more often than we want.  A glaring example of  this  is  when
       you get values back from a function that aren't consistently unicode string or byte str.

       An example from the python standard library is gettext.  The gettext functions are used to
       help translate messages that you display to users in the users' native  languages.   Since
       most  languages  contain  letters outside of the ASCII range, the values that are returned
       contain unicode characters.  gettext provides  you  with  ugettext()  and  ungettext()  to
       return  these  translations  as unicode strings and gettext(), ngettext(), lgettext(), and
       lngettext() to return them as  encoded  byte  str.   Unfortunately,  even  though  they're
       documented  to  return only one type of string or the other, the implementation has corner
       cases where the wrong type can be returned.

       This means that even if you separate your unicode string and byte str correctly before you
       pass your strings to a gettext function, afterwards, you might have to check that you have
       the right sort of string type again.

       NOTE:
          kitchen.i18n provides alternate gettext translation objects that return only  byte  str
          or only unicode string.

   A few solutions
       Now  that  we've identified the issues, can we define a comprehensive strategy for dealing
       with them?

   Convert text at the border
       If you get some piece of text from a library, read from  a  file,  etc,  turn  it  into  a
       unicode  string  immediately.   Since python is moving in the direction of unicode strings
       everywhere it's going to be easier to work with unicode strings within your code.

       If your code is heavily involved with using things that are bytes, you can do the opposite
       and convert all text into byte str at the border and only convert to unicode when you need
       it for passing to another library or performing string operations on it.

       In either case, the important thing is to pick a default type for strings and  stick  with
       it  throughout  your  code.  When you mix the types it becomes much easier to operate on a
       string with a function that can only use the other type by mistake.

       NOTE:
          In python3, the abstract unicode type becomes much more prominent.  The type named  str
          is the equivalent of python2's unicode and python3's bytes type replaces python2's str.
          Most APIs deal in the unicode type of string with just some pieces that are  low  level
          dealing  with bytes.  The implicit conversions between bytes and unicode is removed and
          whenever you want to make the conversion you need to do so explicitly.

   When the data needs to be treated as bytes (or unicode) use a naming convention
       Sometimes you're converting nearly all of your data to unicode strings but you have one or
       two  values  where you have to keep byte str around.  This is often the case when you need
       to use the value verbatim with some external resource.  For  instance,  filenames  or  key
       values  in  a  database.   When  you  do this, use a naming convention for the data you're
       working with so you (and others reading your code later) don't get confused  about  what's
       being stored in the value.

       If  you  need  both  a textual string to present to the user and a byte value for an exact
       match, consider keeping both versions around.  You can either use two variables  for  this
       or a dict whose key is the byte value.

       NOTE:
          You  can use the naming convention used in kitchen as a guide for implementing your own
          naming convention.  It prefixes byte str variables of unknown encoding with b_ and byte
          str of known encoding with the encoding name like: utf8_.  If the default was to handle
          str and only keep a few unicode values, those variables would be prefixed with u_.

   When outputting data, convert back into bytes
       When you go to send your data back outside of your program (to the  filesystem,  over  the
       network, displaying to the user, etc) turn the data back into a byte str.  How you do this
       will depend on the expected output format of the data.  For displaying to  the  user,  you
       can  use  the  user's  default encoding using locale.getpreferredencoding().  For entering
       into a file, you're best bet is to pick a single encoding and stick with it.

       WARNING:
          When  using   the   encoding   that   the   user   has   set   (for   instance,   using
          locale.getpreferredencoding(),  remember  that  they  may  have  their  encoding set to
          something that can't display every single  unicode  character.   That  means  when  you
          convert  from  unicode  to a byte str you need to decide what should happen if the byte
          value is not valid in the user's encoding.  For purposes of displaying messages to  the
          user,  it's  usually  okay  to  use  the  replace encoding error handler to replace the
          invalid characters with a question mark or other symbol meaning the character  couldn't
          be displayed.

       You  can  use kitchen.text.converters.getwriter() to do this automatically for sys.stdout.
       When creating exception messages be sure to convert to bytes manually.

   When writing unittests, include non-ASCII values and both unicode and str type
       Unless you know that a specific portion of your code will only deal with ASCII, be sure to
       include  non-ASCII  values  in  your  unittests.   Including a few characters from several
       different scripts is highly advised as well because  some  code  may  have  special  cased
       accented roman characters but not know how to handle characters used in Asian alphabets.

       Similarly,  unless  you  know  that  that  portion of your code will only be given unicode
       strings or only byte str be sure to try variables of both types in your  unittests.   When
       doing  this,  make  sure  that  the  variables  are  also  non-ASCII  as python's implicit
       conversion will mask problems with pure ASCII data.  In many  cases,  it  makes  sense  to
       check what happens if byte str and unicode strings that won't decode in the present locale
       are given.

   Be vigilant about spotting poor APIs
       Make sure that the libraries you use return only unicode strings or byte  str.   Unittests
       can  help  you  spot issues here by running many variations of data through your functions
       and checking that you're still getting the types of string that you expect.

   Example: Putting this all together with kitchen
       The kitchen library provides a wide array of functions to help you deal with byte str  and
       unicode  strings in your program.  Here's a short example that uses many kitchen functions
       to do its work:

          #!/usr/bin/python -tt
          # -*- coding: utf-8 -*-
          import locale
          import os
          import sys
          import unicodedata

          from kitchen.text.converters import getwriter, to_bytes, to_unicode
          from kitchen.i18n import get_translation_object

          if __name__ == '__main__':
              # Setup gettext driven translations but use the kitchen functions so
              # we don't have the mismatched bytes-unicode issues.
              translations = get_translation_object('example')
              # We use _() for marking strings that we operate on as unicode
              # This is pretty much everything
              _ = translations.ugettext
              # And b_() for marking strings that we operate on as bytes.
              # This is limited to exceptions
              b_ = translations.lgettext

              # Setup stdout
              encoding = locale.getpreferredencoding()
              Writer = getwriter(encoding)
              sys.stdout = Writer(sys.stdout)

              # Load data.  Format is filename\0description
              # description should be utf-8 but filename can be any legal filename
              # on the filesystem
              # Sample datafile.txt:
              #   /etc/shells\x00Shells available on caf\xc3\xa9.lan
              #   /var/tmp/file\xff\x00File with non-utf8 data in the filename
              #
              # And to create /var/tmp/file\xff (under bash or zsh) do:
              #   echo 'Some data' > /var/tmp/file$'\377'
              datafile = open('datafile.txt', 'r')
              data = {}
              for line in datafile:
                  # We're going to keep filename as bytes because we will need the
                  # exact bytes to access files on a POSIX operating system.
                  # description, we'll immediately transform into unicode type.
                  b_filename, description = line.split('\0', 1)

                  # to_unicode defaults to decoding output from utf-8 and replacing
                  # any problematic bytes with the unicode replacement character
                  # We accept mangling of the description here knowing that our file
                  # format is supposed to use utf-8 in that field and that the
                  # description will only be displayed to the user, not used as
                  # a key value.
                  description = to_unicode(description, 'utf-8').strip()
                  data[b_filename] = description
              datafile.close()

              # We're going to add a pair of extra fields onto our data to show the
              # length of the description and the filesize.  We put those between
              # the filename and description because we haven't checked that the
              # description is free of NULLs.
              datafile = open('newdatafile.txt', 'w')

              # Name filename with a b_ prefix to denote byte string of unknown encoding
              for b_filename in data:
                  # Since we have the byte representation of filename, we can read any
                  # filename
                  if os.access(b_filename, os.F_OK):
                      size = os.path.getsize(b_filename)
                  else:
                      size = 0
                  # Because the description is unicode type,  we know the number of
                  # characters corresponds to the length of the normalized unicode
                  # string.
                  length = len(unicodedata.normalize('NFC', description))

                  # Print a summary to the screen
                  # Note that we do not let implici type conversion from str to
                  # unicode transform b_filename into a unicode string.  That might
                  # fail as python would use the ASCII filename.  Instead we use
                  # to_unicode() to explictly transform in a way that we know will
                  # not traceback.
                  print _(u'filename: %s') % to_unicode(b_filename)
                  print _(u'file size: %s') % size
                  print _(u'desc length: %s') % length
                  print _(u'description: %s') % data[b_filename]

                  # First combine the unicode portion
                  line = u'%s\0%s\0%s' % (size, length, data[b_filename])
                  # Since the filenames are bytes, turn everything else to bytes before combining
                  # Turning into unicode first would be wrong as the bytes in b_filename
                  # might not convert
                  b_line = '%s\0%s\n' % (b_filename, to_bytes(line))

                  # Just to demonstrate that getwriter will pass bytes through fine
                  print b_('Wrote: %s') % b_line
                  datafile.write(b_line)
              datafile.close()

              # And just to show how to properly deal with an exception.
              # Note two things about this:
              # 1) We use the b_() function to translate the string.  This returns a
              #    byte string instead of a unicode string
              # 2) We're using the b_() function returned by kitchen.  If we had
              #    used the one from gettext we would need to convert the message to
              #    a byte str first
              message = u'Demonstrate the proper way to raise exceptions.  Sincerely,  \u3068\u3057\u304a'
              raise Exception(b_(message))

       SEE ALSO:
          kitchen.text.converters

   Designing Unicode Aware APIs
       APIs that deal with byte str and unicode strings are difficult to get right.  Here  are  a
       few strategies with pros and cons of each.

   ContentsDesigning Unicode Aware APIsTake either bytes or unicode, output only unicodeTake either bytes or unicode, output the same typeSeparate functionsDeciding whether to take str or unicode when no value is returnedWriting to external dataUpdating data structuresAPIs to AvoidReturning unicode unless a conversion failsIgnoring values with no chance of recoveryRaising a UnicodeException with no chance of recoveryKnowing your dataDo you need to operate on both bytes and unicode?Can you restrict the encodings?Single byte encodingsMultibyte encodingsFixed widthVariable WidthASCII compatibleEscapedOther

   Take either bytes or unicode, output only unicode
       In  this strategy, you allow the user to enter either unicode strings or byte str but what
       you give back is always unicode.  This strategy is easy for novice endusers to start using
       immediately  as  they will be able to feed either type of string into the function and get
       back a string that they can use in other places.

       However, it does lead to the novice writing code that functions correctly when testing  it
       with ASCII-only data but fails when given data that contains non-ASCII characters.  Worse,
       if your API is not designed to be flexible, the consumer of your code  won't  be  able  to
       easily correct those problems once they find them.

       Here's a good API that uses this strategy:

          from kitchen.text.converters import to_unicode

          def truncate(msg, max_length, encoding='utf8', errors='replace'):
              msg = to_unicode(msg, encoding, errors)
              return msg[:max_length]

       The  call  to truncate() starts with the essential parameters for performing the task.  It
       ends with two optional keyword arguments that define the encoding to use to transform from
       a  byte  str to unicode and the strategy to use if undecodable bytes are encountered.  The
       defaults may vary depending on the use cases  you  have  in  mind.   When  the  output  is
       generally going to be printed for the user to see, errors='replace' is a good default.  If
       you are constructing keys to a database, raisng an exception (with errors='strict') may be
       a better default.  In either case, having both parameters allows the person using your API
       to choose how they want to handle any problems.  Having the values is also a clue to  them
       that a conversion from byte str to unicode string is going to occur.

       NOTE:
          If  you're  targeting  python-3.1  and  above, errors='surrogateescape' may be a better
          default than errors='strict'.  You need to be  mindful  of  a  few  things  when  using
          surrogateescape though:

          • surrogateescape  will  cause  issues  if a non-ASCII compatible encoding is used (for
            instance, UTF-16 and UTF-32.)  That makes it unhelpful in  situations  where  a  true
            general   purpose   method  of  encoding  must  be  found.   PEP  383  mentions  that
            surrogateescape was specifically designed with the limitations of  translating  using
            system  locales  (where  ASCII compatibility is generally seen as inescapable) so you
            should keep that in mind.

          • If you use surrogateescape to decode from bytes to unicode you will need  to  use  an
            error  handler  other  than  strict  to  encode as the lone surrogate that this error
            handler creates makes for invalid unicode that must be  handled  when  encoding.   In
            Python-3.1.2  or less, a bug in the encoder error handlers mean that you can only use
            surrogateescape to encode; anything else will throw an error.

          Evaluate your usages of the variables in question to see what makes sense.

       Here's a bad example of using this strategy:

          from kitchen.text.converters import to_unicode

          def truncate(msg, max_length):
              msg = to_unicode(msg)
              return msg[:max_length]

       In this example, we don't have the optional keyword arguments for encoding and errors.   A
       user  who  uses  this function is more likely to miss the fact that a conversion from byte
       str to unicode is going to occur.  And once an error is reported, they will have  to  look
       through  their  backtrace  and  think harder about where they want to transform their data
       into unicode strings instead of having the opportunity to control how the conversion takes
       place  in the function itself.  Note that the user does have the ability to make this work
       by making the transformation to unicode themselves:

          from kitchen.text.converters import to_unicode

          msg = to_unicode(msg, encoding='euc_jp', errors='ignore')
          new_msg = truncate(msg, 5)

   Take either bytes or unicode, output the same type
       This strategy is sometimes called polymorphic because the type of data that is returned is
       dependent  on the type of data that is received.  The concept is that when you are given a
       byte str to process, you return a byte str in your output.  When  you  are  given  unicode
       strings to process, you return unicode strings in your output.

       This  can  work  well for end users as the ones that know about the difference between the
       two string types will already have transformed the strings to their  desired  type  before
       giving it to this function.  The ones that don't can remain blissfully ignorant (at least,
       as far as your function is concerned) as the function does not change the type.

       In cases where the encoding of the byte str is known or can be  discovered  based  on  the
       input  data  this  works  well.  If you can't figure out the input encoding, however, this
       strategy can fail in any of the following cases:

       1. It needs to do an internal conversion between byte str and unicode string.

       2. It cannot return the same data as either a unicode string or byte str.

       3. You may need to deal with byte strings that are not byte-compatible with ASCII

       First, a couple examples of using this strategy in a good way:

          def translate(msg, table):
              replacements = table.keys()
              new_msg = []
              for index, char in enumerate(msg):
                  if char in replacements:
                      new_msg.append(table[char])
                  else:
                      new_msg.append(char)

              return ''.join(new_msg)

       In this example, all of the strings that we use (except the empty  string  which  is  okay
       because  it doesn't have any characters to encode) come from outside of the function.  Due
       to that, the user is responsible for making sure that the msg, and the keys and values  in
       table  all  match  in  terms  of type (unicode vs str) and encoding (You can do some error
       checking to make sure the user gave all the same type but you can't do the  same  for  the
       user  giving  different  encodings).   You  do not need to make changes to the string that
       require you to know the encoding or type of the string; everything is a simple replacement
       of one element in the array of characters in message with the character in table.

          import json
          from kitchen.text.converters import to_unicode, to_bytes

          def first_field_from_json_data(json_string):
              '''Return the first field in a json data structure.

              The format of the json data is a simple list of strings.
              '["one", "two", "three"]'
              '''
              if isinstance(json_string, unicode):
                  # On all python versions, json.loads() returns unicode if given
                  # a unicode string
                  return json.loads(json_string)[0]

              # Byte str: figure out which encoding we're dealing with
              if '\x00' not in json_data[:2]
                  encoding = 'utf8'
              elif '\x00\x00\x00' == json_data[:3]:
                  encoding = 'utf-32-be'
              elif '\x00\x00\x00' == json_data[1:4]:
                  encoding = 'utf-32-le'
              elif '\x00' == json_data[0] and '\x00' == json_data[2]:
                  encoding = 'utf-16-be'
              else:
                  encoding = 'utf-16-le'

              data = json.loads(unicode(json_string, encoding))
              return data[0].encode(encoding)

       In  this  example the function takes either a byte str type or a unicode string that has a
       list in json format and returns the first field from it as the type of the  input  string.
       The  first  section of code is very straightforward; we receive a unicode string, parse it
       with a function, and then return the first field from our parsed data (which our  function
       returned to us as json data).

       The  second  portion  that  deals  with byte str is not so straightforward.  Before we can
       parse the string we have to determine what characters the bytes in the string map to.   If
       we  didn't  do  that, we wouldn't be able to properly find which characters are present in
       the string.  In order to do that we have to figure out  the  encoding  of  the  byte  str.
       Luckily,  the  json specification states that all strings are unicode and encoded with one
       of UTF32be, UTF32le, UTF16be, UTF16le, or UTF-8.  It further defines the format such  that
       the  first  two  characters  are  always ASCII.  Each of these has a different sequence of
       NULLs when they encode an ASCII character.  We can use that to detect which  encoding  was
       used to create the byte str.

       Finally, we return the byte str by encoding the unicode back to a byte str.

       As you can see, in this example we have to convert from byte str to unicode and back.  But
       we know from the json specification that byte str has to be one of  a  limited  number  of
       encodings that we are able to detect.  That ability makes this strategy work.

       Now for some examples of using this strategy in ways that fail:

          import unicodedata
          def first_char(msg):
              '''Return the first character in a string'''
              if not isinstance(msg, unicode):
                  try:
                      msg = unicode(msg, 'utf8')
                  except UnicodeError:
                      msg = unicode(msg, 'latin1')
              msg = unicodedata.normalize('NFC', msg)
              return msg[0]

       If you look at that code and think that there's something fragile and prone to breaking in
       the try: except: block you are correct in  being  suspicious.   This  code  will  fail  on
       multi-byte  character sets that aren't UTF-8.  It can also fail on data where the sequence
       of bytes is valid UTF-8 but the bytes are actually of a different encoding.   The  reasons
       this  code  fails  is  that we don't know what encoding the bytes are in and the code must
       convert from a byte str to a unicode string in order to function.

       In order to make this code robust we must know the encoding of msg.  The only way to  know
       that is to ask the user so the API must do that:

          import unicodedata
          def number_of_chars(msg, encoding='utf8', errors='strict'):
              if not isinstance(msg, unicode):
                  msg = unicode(msg, encoding, errors)
              msg = unicodedata.normalize('NFC', msg)
              return len(msg)

       Another example of failure:

          import os
          def listdir(directory):
              files = os.listdir(directory)
              if isinstance(directory, str):
                  return files
              # files could contain both bytes and unicode
              new_files = []
              for filename in files:
                  if not isinstance(filename, unicode):
                      # What to do here?
                      continue
                  new_files.appen(filename)
              return new_files

       This  function  illustrates the second failure mode.  Here, not all of the possible values
       can be represented as unicode without knowing more about  the  encoding  of  each  of  the
       filenames  involved.   Since  each  filename could have a different encoding there's a few
       different options to pursue.  We could make this function always  return  byte  str  since
       that  can  accurately  represent  anything  that  could be returned.  If we want to return
       unicode we need to at least allow the user to specify what to  do  in  case  of  an  error
       decoding  the  bytes to unicode.  We can also let the user specify the encoding to use for
       doing the decoding but that won't help in all cases since not all files  will  be  in  the
       same encoding (or even necessarily in any encoding):

          import locale
          import os
          def listdir(directory, encoding=locale.getpreferredencoding(), errors='strict'):
              # Note: In python-3.1+, surrogateescape may be a better default
              files = os.listdir(directory)
              if isinstance(directory, str):
                  return files
              new_files = []
              for filename in files:
                  if not isinstance(filename, unicode):
                      filename = unicode(filename, encoding=encoding, errors=errors)
                  new_files.append(filename)
              return new_files

       Note that although we use errors in this example as what to pass to the codec that decodes
       to unicode we could also have an errors argument that decides other things to do like skip
       a  filename  entirely,  return  a  placeholder  (Nondisplayable  filename),  or  raise  an
       exception.

       This leaves us with one last failure to describe:

          def first_field(csv_string):
              '''Return the first field in a comma separated values string.'''
              try:
                  return csv_string[:csv_string.index(',')]
              except ValueError:
                  return csv_string

       This code looks simple enough.  The hidden error here is that we are searching for a comma
       character  in  a  byte  str  but  not all encodings will use the same sequence of bytes to
       represent the comma.  If you use an encoding that's  not  ASCII  compatible  on  the  byte
       level,  then the literal comma ',' in the above code will match inappropriate bytes.  Some
       examples of how it can fail:

       • Will find the byte representing an ASCII comma in another character

       • Will find the comma but leave trailing garbage bytes on the end of the string

       • Will not match the character that represents the comma in this encoding

       There are two ways to solve this.  You can either take the encoding value from the user or
       you  can  take  the separator value from the user.  Of the two, taking the encoding is the
       better option for two reasons:

       1. Taking a separator argument doesn't clearly document for the API user that  the  reason
          they must give it is to properly match the encoding of the csv_string.  They're just as
          likely to think that it's simply a way to specify an alternate character (like  ":"  or
          "|") for the separator.

       2. It's  possible  for  a  variable  width  encoding  to  reuse the same byte sequence for
          different characters in multiple sequences.

          NOTE:
             UTF-8 is resistant to this as any character's sequence of  bytes  will  never  be  a
             subset of another character's sequence of bytes.

       With that in mind, here's how to improve the API:

          def first_field(csv_string, encoding='utf-8', errors='replace'):
              if not isinstance(csv_string, unicode):
                  u_string = unicode(csv_string, encoding, errors)
                  is_unicode = False
              else:
                  u_string = csv_string

              try:
                  field = u_string[:U_string.index(u',')]
              except ValueError:
                  return csv_string

              if not is_unicode:
                  field = field.encode(encoding, errors)
              return field

       NOTE:
          If  you  decide  you'll  never  encounter  a  variable  width encoding that reuses byte
          sequences you can use this code instead:

              def first_field(csv_string, encoding='utf-8'):
                  try:
                      return csv_string[:csv_string.index(','.encode(encoding))]
                  except ValueError:
                      return csv_string

   Separate functions
       Sometimes you want to be able to take either byte str or unicode strings, perform  similar
       operations  on  either one and then return data in the same format as was given.  Probably
       the easiest way to do that is to have separate functions  for  each  and  adopt  a  naming
       convention to show that one is for working with byte str and the other is for working with
       unicode strings:

          def translate_b(msg, table):
              '''Replace values in str with other byte values like unicode.translate'''
              if not isinstance(msg, str):
                  raise TypeError('msg must be of type str')
              str_table = [chr(s) for s in xrange(0,256)]
              delete_chars = []
              for chr_val in (k for k in table.keys() if isinstance(k, int)):
                  if chr_val > 255:
                      raise ValueError('Keys in table must not exceed 255)')
                  if table[chr_val] == None:
                      delete_chars.append(chr(chr_val))
                  elif isinstance(table[chr_val], int):
                      if table[chr_val] > 255:
                          raise TypeError('table values cannot be more than 255 or less than 0')
                      str_table[chr_val] = chr(table[chr_val])
                  else:
                      if not isinstance(table[chr_val], str):
                          raise TypeError('character mapping must return integer, None or str')
                      str_table[chr_val] = table[chr_val]
              str_table = ''.join(str_table)
              delete_chars = ''.join(delete_chars)
              return msg.translate(str_table, delete_chars)

          def translate(msg, table):
              '''Replace values in a unicode string with other values'''
              if not isinstance(msg, unicode):
                  raise TypeError('msg must be of type unicode')
              return msg.translate(table)

       There's several things that we have to do in this API:

       • Because the function names might not be enough of a clue to the user of the functions of
         the value types that are expected, we have to check that the types are correct.

       • We  keep  the behaviour of the two functions as close to the same as possible, just with
         byte str and unicode strings substituted for each other.

   Deciding whether to take str or unicode when no value is returned
       Not all functions have a return value.  Sometimes a function is  there  to  interact  with
       something  external to python, for instance, writing a file out to disk or a method exists
       to update the internal state of a data structure.  One of the main  questions  with  these
       APIs is whether to take byte str, unicode string, or both.  The answer depends on your use
       case but I'll give some examples here.

   Writing to external data
       When your information is going to an external data source like writing to a file you  need
       to  decide  whether  to  take in unicode strings or byte str.  Remember that most external
       data sources are not going to be dealing with unicode directly.  Instead, they're going to
       be  dealing  with  a  sequence  of bytes that may be interpreted as unicode.  With that in
       mind, you either need to have the user give you a byte str or convert to a byte str inside
       the function.

       Next  you  need  to  think  about the type of data that you're receiving.  If it's textual
       data, (for instance, this is a chat client and the  user  is  typing  messages  that  they
       expect  to  be  read by another person) it probably makes sense to take in unicode strings
       and do the conversion inside your function.  On the other hand, if this is a  lower  level
       function  that's passing data into a network socket, it probably should be taking byte str
       instead.

       Just as noted in the API notes above, you should specify an encoding and  errors  argument
       if  you  need to transform from unicode string to byte str and you are unable to guess the
       encoding from the data itself.

   Updating data structures
       Sometimes your API is just going to update a data structure  and  not  immediately  output
       that  data anywhere.  Just as when writing external data, you should think about both what
       your function is going to do with the data eventually and what the caller of your function
       is  thinking  that  they're  giving  you.   Most  of the time, you'll want to take unicode
       strings and enter them into the data structure as unicode when  the  data  is  textual  in
       nature.   You'll  want to take byte str and enter them into the data structure as byte str
       when the data is not text.  Use a naming convention so the user knows what's expected.

   APIs to Avoid
       There are a few APIs that are just wrong.  If you catch yourself making an API  that  does
       one of these things, change it before anyone sees your code.

   Returning unicode unless a conversion fails
       This  type  of  API  usually  deals with byte str at some point and converts it to unicode
       because it's usually thought to be text.  However, there are times when the bytes fail  to
       convert to a unicode string.  When that happens, this API returns the raw byte str instead
       of a unicode string.  One example of this is  present  in  the  python  standard  library:
       python2's os.listdir():

          >>> import os
          >>> import locale
          >>> locale.getpreferredencoding()
          'UTF-8'
          >>> os.mkdir('/tmp/mine')
          >>> os.chdir('/tmp/mine')
          >>> open('nonsense_char_\xff', 'w').close()
          >>> open('all_ascii', 'w').close()
          >>> os.listdir(u'.')
          [u'all_ascii', 'nonsense_char_\xff']

       The problem with APIs like this is that they cause failures that are hard to debug because
       they don't happen where the variables are set.  For  instance,  let's  say  you  take  the
       filenames from os.listdir() and give it to this function:

          def normalize_filename(filename):
              '''Change spaces and dashes into underscores'''
              return filename.translate({ord(u' '):u'_', ord(u' '):u'_'})

       When  you  test  this, you use filenames that all are decodable in your preferred encoding
       and everything seems to work.  But when this code is run on a machine that  has  filenames
       in  multiple  encodings  the filenames returned by os.listdir() suddenly include byte str.
       And byte str has a different string.translate() function that takes different values.   So
       the  code  raises  an exception where it's not immediately obvious that os.listdir() is at
       fault.

   Ignoring values with no chance of recovery
       An early version of python3 attempted to fix the os.listdir() problem pointed out  in  the
       last  section  by  returning  all  values  that were decodable to unicode and omitting the
       filenames that were not.  This lead to the following output:

          >>> import os
          >>> import locale
          >>> locale.getpreferredencoding()
          'UTF-8'
          >>> os.mkdir('/tmp/mine')
          >>> os.chdir('/tmp/mine')
          >>> open(b'nonsense_char_\xff', 'w').close()
          >>> open('all_ascii', 'w').close()
          >>> os.listdir('.')
          ['all_ascii']

       The issue with this type of code is that it is silently doing something  surprising.   The
       caller  expects  to get a full list of files back from os.listdir().  Instead, it silently
       ignores some of the files, returning only a subset.  This leads to code  that  doesn't  do
       what is expected that may go unnoticed until the code is in production and someone notices
       that something important is being missed.

   Raising a UnicodeException with no chance of recovery
       Believe it or not, a few libraries exist that make it impossible to deal with unicode text
       without  raising  a  UnicodeError.   What  seems  to  occur in these libraries is that the
       library has functions that expect to receive a unicode string.  However, internally, those
       functions  call  other functions that expect to receive a byte str.  The programmer of the
       API was smart enough to convert from a unicode string to a byte str but they did not  give
       the  user  the  chance  to  specify the encodings to use or how to deal with errors.  This
       results in exceptions when the user passes in a byte  str  because  the  initial  function
       wants a unicode string and exceptions when the user passes in a unicode string because the
       function can't convert the string to bytes in the encoding that it's selected.

       Do not put the user in the position of not being able to use your API  without  raising  a
       UnicodeError  with  certain values.  If you can only safely take unicode strings, document
       that byte str is not allowed and vice versa.  If you have to convert internally, make sure
       to  give  the  caller of your function parameters to control the encoding and how to treat
       errors that may occur during the encoding/decoding process.  If your  code  will  raise  a
       UnicodeError with non-ASCII values no matter what, you should probably rethink your API.

   Knowing your data
       If  you've  read  all  the  way  down to this section without skipping you've seen several
       admonitions about the type of data you are  processing  affecting  the  viability  of  the
       various API choices.

       Here's a few things to consider in your data:

   Do you need to operate on both bytes and unicode?
       Much  of the data in libraries, programs, and the general environment outside of python is
       written where strings are sequences of bytes.  So when we interact with  data  that  comes
       from  outside  of  python  or data that is about to leave python it may make sense to only
       operate on the data as a byte str.  There's two times when this may make sense:

       1. The user is intended to hand the data to the function and then the function takes  care
          of sending the data outside of python (to the filesystem, over the network, etc).

       2. The data is not representable as text.  For instance, writing a binary file format.

       Even  when your code is operating in this area you still need to think a little more about
       your data.  For instance, it might make sense for the person using your  API  to  pass  in
       unicode  strings  and  let  the function convert that into the byte str that it then sends
       over the wire.

       There are also times when it might make sense to operate only on unicode strings.  unicode
       represents  text so anytime that you are working on textual data that isn't going to leave
       python it has the potential to be a unicode-only API.  However, there's  two  things  that
       you should consider when designing a unicode-only API:

       1. As  your  API gains popularity, people are going to use your API in places that you may
          not have thought of.  Corner cases in these other places may mean that processing bytes
          is desirable.

       2. In  python2,  byte str and unicode are often used interchangably with each other.  That
          means that people programming against your API may have received str  from  some  other
          API and it would be most convenient for their code if your API accepted it.

       NOTE:
          In  python3, the separation between the text type and the byte type are more clear.  So
          in python3, there's less need to have all APIs take both unicode and bytes.

   Can you restrict the encodings?
       If you determine that you have to deal with byte str  you  should  realize  that  not  all
       encodings  are  created equal.  Each has different properties that may make it possible to
       provide a simpler API provided that you can reasonably tell the users  of  your  API  that
       they cannot use certain classes of encodings.

       As  one  example, if you are required to find a comma (,) in a byte str you have different
       choices based on what encodings are allowed.  If you  can  reasonably  restrict  your  API
       users  to  only  giving ASCII compatible encodings you can do this simply by searching for
       the literal comma character because that character will be represented by  the  same  byte
       sequence in all ASCII compatible encodings.

       The  following are some classes of encodings to be aware of as you decide how generic your
       code needs to be.

   Single byte encodings
       Single byte encodings can only represent 256  total  characters.   They  encode  the  code
       points for a character to the equivalent number in a single byte.

       Most  single byte encodings are ASCII compatible.  ASCII compatible encodings are the most
       likely to be usable without changes to code so this is good news.  A notable exception  to
       this is the EBDIC family of encodings.

   Multibyte encodings
       Multibyte encodings use more than one byte to encode some characters.

   Fixed width
       Fixed width encodings have a set number of bytes to represent all of the characters in the
       character set.  UTF-32 is an example of a fixed width encoding that uses  four  bytes  per
       character  and  can express every unicode characters.  There are a number of problems with
       writing APIs that need to operate on fixed width, multibyte characters.  To go back to our
       earlier  example  of  finding  a comma in a string, we have to realize that even in UTF-32
       where the code point for ASCII characters is the same as in ASCII, the byte  sequence  for
       them  is different.  So you cannot search for the literal byte character as it may pick up
       false positives and may break a byte sequence in an odd place.

   Variable Width
   ASCII compatible
       UTF-8 and the EUC  family  of  encodings  are  examples  of  ASCII  compatible  multi-byte
       encodings.  They achieve this by adhering to two principles:

       • All  of  the  ASCII  characters  are  represented by the byte that they are in the ASCII
         encoding.

       • None of the ASCII byte sequences are reused in any other byte sequence for  a  different
         character.

   Escaped
       Some  multibyte  encodings  work  by  using  only bytes from the ASCII encoding but when a
       particular sequence of those byes is found, they  are  interpreted  as  meaning  something
       other  than  their  ASCII  values.   UTF-7 is one such encoding that can encode all of the
       unicode code points.  For instance, here's a some Japanese characters encoded as UTF-7:

          >>> a = u'\u304f\u3089\u3068\u307f'
          >>> print a
          くらとみ
          >>> print a.encode('utf-7')

          +ME8wiTBoMH8-
       These encodings can be used when you need to encode unicode data that  may  contain  non-‐
       ASCII characters for inclusion in an ASCII only transport medium or file.

       However, they are not ASCII compatible in the sense that we used earlier as the bytes that
       represent a ASCII character are being reused as part of other characters.  If you were  to
       search  for  a  literal  plus sign in this encoded string, you would run across many false
       positives, for instance.

   Other
       There are many other popular variable width encodings, for instance UTF-16 and  shift-JIS.
       Many  of these are not ASCII compatible so you cannot search for a literal ASCII character
       without danger of false positives or false negatives.

   Kitchen API
       Kitchen is structured as a collection of modules.  In its current  configuration,  Kitchen
       ships  with the following modules.  Other addon modules that may drag in more dependencies
       can be found on the project webpage

   Kitchen.i18n Module
       I18N is an important piece of any modern program.  Unfortunately, setting up i18n in  your
       program  is  often  a  confusing  process.   The  functions  provided here aim to make the
       programming side of that a little easier.

       Most projects will be able to do something like this when they startup:

          # myprogram/__init__.py:

          import os
          import sys

          from kitchen.i18n import easy_gettext_setup

          _, N_  = easy_gettext_setup('myprogram', localedirs=(
                  os.path.join(os.path.realpath(os.path.dirname(__file__)), 'locale'),
                  os.path.join(sys.prefix, 'lib', 'locale')
                  ))

       Then, in other files that have strings that need translating:

          # myprogram/commands.py:

          from myprogram import _, N_

          def print_usage():
              print _(u"""available commands are:
              --help              Display help
              --version           Display version of this program
              --bake-me-a-cake    as fast as you can
                  """)

          def print_invitations(age):
              print _('Please come to my party.')
              print N_('I will be turning %(age)s year old',
                  'I will be turning %(age)s years old', age) % {'age': age}

       See the  documentation  of  easy_gettext_setup()  and  get_translation_object()  for  more
       details.

          SEE ALSO:

              gettext
                     for details of how the python gettext facilities work

              babel  The  babel module for in depth information on gettext, message catalogs, and
                     translating your app.  babel provides some nice features for i18n on top  of
                     gettext

   Functions
       easy_gettext_setup()  should satisfy the needs of most users.  get_translation_object() is
       designed to ease the way for anyone that needs more control.

       kitchen.i18n.easy_gettext_setup(domain, localedirs=(), use_unicode=True)
              Setup translation functions for an application

              Parametersdomain -- Name of the message domain.  This should be a unique  name  that
                       can be used to lookup the message catalog for this app.

                     • localedirs  -- Iterator of directories to look for message catalogs under.
                       The first directory to exist is used regardless of  whether  messages  for
                       this  domain  are  present.  If none of the directories exist, fallback on
                       sys.prefix + /share/locale Default: No directories to search  so  we  just
                       use the fallback.

                     • use_unicode  --  If True return the gettext functions for str strings else
                       return the functions for byte bytes  for  the  translations.   Default  is
                       True.

              Returns
                     tuple of the gettext function and gettext function for plurals

              Setting  up  gettext can be a little tricky because of lack of documentation.  This
              function will setup gettext  using the Class-based API for  you.   For  the  simple
              case, you can use the default arguments and call it like this:

                 _, N_ = easy_gettext_setup()

              This  will  get you two functions, _() and N_() that you can use to mark strings in
              your code for translation.  _() is used to mark strings that don't  need  to  worry
              about  plural  forms  no matter what the value of the variable is.  N_() is used to
              mark strings that do need to have a different form if a variable in the  string  is
              plural.

              SEE ALSO:

                 Kitchen.i18n Module
                        This module's documentation has examples of using _() and N_()

                 get_translation_object()
                        for  information  on  how  to  use  localedirs  to get the proper message
                        catalogs both when in development and when  installed  to  FHS  compliant
                        directories on Linux.

              NOTE:
                 The gettext functions returned from this function should be superior to the ones
                 returned from gettext.  The traits that make them better are  described  in  the
                 DummyTranslations and NewGNUTranslations documentation.

              Changed    in   version   kitchen-0.2.4:   ;   API   kitchen.i18n   2.0.0   Changed
              easy_gettext_setup() to return the lgettext functions instead of gettext  functions
              when use_unicode=False.

       kitchen.i18n.get_translation_object(domain,  localedirs=(),  languages=None,  class_=None,
       fallback=True, codeset=None, python2_api=True)
              Get a translation object bound to the message catalogs

              Parametersdomain -- Name of the message domain.  This should be a unique  name  that
                       can be used to lookup the message catalog for this app or library.

                     • localedirs  -- Iterator of directories to look for message catalogs under.
                       The directories are searched in order for message catalogs.  For  each  of
                       the  directories  searched,  we check for message catalogs in any language
                       specified in:attr:languages.  The message catalogs are used to create  the
                       Translation object that we return.  The Translation object will attempt to
                       lookup the msgid in the first catalog that  we  found.   If  it's  not  in
                       there,  it  will  go  through each subsequent catalog looking for a match.
                       For this reason, the order in which you  specify  the  localedirs  may  be
                       important.    If   no   message   catalogs  are  found,  either  return  a
                       DummyTranslations object or raise an IOError depending  on  the  value  of
                       fallback.     Rhe    default    localedir    from     gettext   which   is
                       os.path.join(sys.prefix, 'share', 'locale') on Unix is implicitly appended
                       to the localedirs, making it the last directory searched.

                     • languages --

                       Iterator of language codes to check for message catalogs.  If unspecified,
                       the user's locale settings will be used.

                       SEE ALSO:
                          gettext.find() for information on what environment variables are used.

                     • class -- The class  to  use  to  extract  translations  from  the  message
                       catalogs.  Defaults to NewGNUTranslations.

                     • fallback  -- If set to data:False, raise an IOError if no message catalogs
                       are found.  If True, the default, return a DummyTranslations object.

                     • codeset -- Set the character encoding to use  when  returning  byte  bytes
                       objects.    This   is   equivalent  to  calling  output_charset()  on  the
                       Translations object that is returned from this function.

                     • python2_api -- When data:True (default), return Translation  objects  that
                       use  the  python2 gettext api (gettext() and lgettext() return byte bytes.
                       ugettext()  exists  and  returns  str  strings).    When   False,   return
                       Translation  objects that use the python3 gettext api (gettext returns str
                       strings and lgettext returns byte bytes.  ugettext does not exist.)

              Returns
                     Translation object to get gettext methods from

              If you need more flexibility than easy_gettext_setup(), use this function.  It sets
              up  a gettext Translation object and returns it to you.  Then you can access any of
              the  methods  of  the  object  that  you  need  directly.   For  instance,  if  you
              specifically need to access lgettext():

                 translations = get_translation_object('foo')
                 translations.lgettext('My Message')

              This  function  is similar to the python standard library gettext.translation() but
              makes it better in two ways

              1.

                 It returns NewGNUTranslations or DummyTranslations
                        objects by default.  These are superior  to  the  gettext.GNUTranslations
                        and  gettext.NullTranslations  objects because they are consistent in the
                        string type they return and they fix several  issues  that  can  causethe
                        python standard library objects to throw UnicodeError.

              2.

                 This function takes multiple directories to search for
                        message catalogs.

              The latter is important when setting up gettext in a portable manner.  There is not
              a common directory for translations across operating systems so one needs  to  look
              in  multiple directories for the translations.  get_translation_object() is able to
              handle that if you give it a list of directories to search for catalogs:

                 translations = get_translation_object('foo', localedirs=(
                      os.path.join(os.path.realpath(os.path.dirname(__file__)), 'locale'),
                      os.path.join(sys.prefix, 'lib', 'locale')))

              This will search for several different directories:

              1. A directory named locale in  the  same  directory  as  the  module  that  called
                 get_translation_object(),

              2. In /usr/lib/locale

              3. In /usr/share/locale (the fallback directory)

              This  allows  gettext  to  work  on  Windows  and in development (where the message
              catalogs are typically in the toplevel module directory) and  also  when  installed
              under  Linux  (where the message catalogs are installed in /usr/share/locale).  You
              (or  the  system  packager)  just  need  to  install  the   message   catalogs   in
              /usr/share/locale  and  remove  the  locale  directory from the module to make this
              work.  ie:

                 In development:
                     ~/foo   # Toplevel module directory
                     ~/foo/__init__.py
                     ~/foo/locale    # With message catalogs below here:
                     ~/foo/locale/es/LC_MESSAGES/foo.mo

                 Installed on Linux:
                     /usr/lib/python2.7/site-packages/foo
                     /usr/lib/python2.7/site-packages/foo/__init__.py
                     /usr/share/locale/  # With message catalogs below here:
                     /usr/share/locale/es/LC_MESSAGES/foo.mo

              NOTE:
                 This function will setup Translation objects that attempt to  lookup  msgids  in
                 all  of  the found message catalogs.  This means if you have several versions of
                 the message catalogs  installed  in  different  directories  that  the  function
                 searches,  you  need  to  make sure that localedirs specifies the directories so
                 that newer message catalogs are searched first.  It also means that if  a  newer
                 catalog  does  not  contain a translation for a msgid but an older one that's in
                 localedirs does, the translation from that older catalog will be returned.

              Changed in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0 Add more  parameters  to
              get_translation_object()  so  it  can  more  easily  be  used  as a replacement for
              gettext.translation().  Also change the way we use localedirs.   We  cycle  through
              them  until we find a suitable locale file rather than simply cycling through until
              we find a directory that exists.  The new code  is  based  heavily  on  the  python
              standard library gettext.translation() function.

              Changed  in  version  kitchen-1.2.0:  ;  API  kitchen.i18n  2.2.0  Add  python2_api
              parameter

   Translation Objects
       The standard translation objects from the gettext module suffer from several problems:

       • They can throw UnicodeError

       • They can't find translations for non-ASCII byte str messages

       • They may return either unicode string or byte str from the same function even though the
         functions say they will only return unicode or only return byte str.

       DummyTranslations and NewGNUTranslations were written to fix these issues.

       class kitchen.i18n.DummyTranslations(fp=None, python2_api=True)
              Safer version of gettext.NullTranslations

              This Translations class doesn't translate the strings and is intended to be used as
              a fallback when there were errors setting up  a  real  Translations  object.   It's
              safer than gettext.NullTranslations in its handling of byte bytes vs str strings.

              Unlike  NullTranslations,  this  Translation class will never throw a UnicodeError.
              The  code  that  you  have  around  a  call  to  DummyTranslations  might  throw  a
              UnicodeError  but  at  least  that  will be in code you control and can fix.  Also,
              unlike NullTranslations all of  this  Translation  object's  methods  guarantee  to
              return  byte  bytes except for ugettext() and ungettext() which guarantee to return
              str strings.

              When byte bytes are returned,  the  strings  will  be  encoded  according  to  this
              algorithm:

              1. If a fallback has been added, the fallback will be called first.  You'll need to
                 consult the fallback to see whether it performs any encoding changes.

              2. If a byte bytes was given, the same byte bytes will be returned.

              3. If a str string was given and  set_output_charset()  has  been  called  then  we
                 encode the string using the output_charset

              4. If  a  str string was given and this is gettext() or ngettext() and _charset was
                 set output in that charset.

              5. If a str string was given and this is gettext() or ngettext() we encode it using
                 'utf-8'.

              6. If  a str string was given and this is lgettext() or lngettext() we encode using
                 the value of locale.getpreferredencoding()

              For ugettext() and ungettext(), we go through  the  same  set  of  steps  with  the
              following differences:

              • We transform byte bytes into str strings for these methods.

              • The  encoding  used  to decode the byte bytes is taken from input_charset if it's
                set, otherwise we decode using UTF-8.

              input_charset
                     is an extension to the python standard library gettext that  specifies  what
                     charset  a  message  is  encoded in when decoding a message to str.  This is
                     used for two purposes:

              1. If the message string is a byte bytes, this is used to decode the  string  to  a
                 str string before looking it up in the message catalog.

              2. In  ugettext()  and ungettext() methods, if a byte bytes is given as the message
                 and is untranslated this is used as the encoding when decoding to str.  This  is
                 different  from  _charset  which  may  be  set  when a message catalog is loaded
                 because input_charset is used to describe an encoding used in  a  python  source
                 file while _charset describes the encoding used in the message catalog file.

              Any  characters  that aren't able to be transformed from a byte bytes to str string
              or vice versa will be replaced with a replacement character (ie:  u'�'  in  unicode
              based encodings, '?' in other ASCII compatible encodings).

              SEE ALSO:

                 gettext.NullTranslations
                        For information about what methods are available and what they do.

              Changed  in  version  kitchen-1.1.0:  ;  API  kitchen.i18n  2.1.0 * Although we had
              adapted gettext(), ngettext(),
                lgettext(), and lngettext() to always return byte
                bytes, we hadn't forced those byte bytes to always be
                in a specified charset.  We now make sure that gettext() and
                ngettext() return byte bytes encoded using
                output_charset if set, otherwise charset and if
                neither of those, UTF-8.  With lgettext() and
                lngettext() output_charset if set, otherwise
                locale.getpreferredencoding().  * Make setting input_charset  and  output_charset
              also
                set those attributes on any fallback translation objects.

              Changed  in  version  kitchen-1.2.0:  ;  API  kitchen.i18n  2.2.0  Add  python2_api
              parameter to __init__()

              set_output_charset(charset)
                     Set the output charset

                     This        serves         two         purposes.          The         normal
                     gettext.NullTranslations.set_output_charset()  does  not  set  the output on
                     fallback objects.  On  python-2.3,  gettext.NullTranslations  objects  don't
                     contain this method.

       class kitchen.i18n.NewGNUTranslations(fp=None, python2_api=True)
              Safer version of gettext.GNUTranslations

              gettext.GNUTranslations suffers from two problems that this class fixes.

              1. gettext.GNUTranslations       can       throw       a       UnicodeError      in
                 gettext.GNUTranslations.ugettext() if the message being translated has non-ASCII
                 characters and there is no translation for it.

              2. gettext.GNUTranslations       can       return       byte       bytes       from
                 gettext.GNUTranslations.ugettext() and str  strings  from  the  other  gettext()
                 methods if the message being translated is the wrong type

              When  byte  bytes  are  returned,  the  strings  will  be encoded according to this
              algorithm:

              1. If a fallback has been added, the fallback will be called first.  You'll need to
                 consult the fallback to see whether it performs any encoding changes.

              2. If a byte bytes was given, the same byte bytes will be returned.

              3. If  a  str  string  was  given  and set_output_charset() has been called then we
                 encode the string using the output_charset

              4. If a str string was given and this is gettext() or ngettext() and a charset  was
                 detected when parsing the message catalog, output in that charset.

              5. If a str string was given and this is gettext() or ngettext() we encode it using
                 UTF-8.

              6. If a str string was given and this is lgettext() or lngettext() we encode  using
                 the value of locale.getpreferredencoding()

              For  ugettext()  and  ungettext(),  we  go  through  the same set of steps with the
              following differences:

              • We transform byte bytes into str strings for these methods.

              • The encoding used to decode the byte bytes is taken from  input_charset  if  it's
                set, otherwise we decode using UTF-8

              input_charset
                     an  extension  to  the  python  standard library gettext that specifies what
                     charset a message is encoded in when decoding a message  to  str.   This  is
                     used for two purposes:

              1. If  the  message  string is a byte bytes, this is used to decode the string to a
                 str string before looking it up in the message catalog.

              2. In ugettext() and ungettext() methods, if a byte bytes is given as  the  message
                 and  is  untranslated his is used as the encoding when decoding to str.  This is
                 different from the _charset parameter that may be set when a message catalog  is
                 loaded  because  input_charset  is used to describe an encoding used in a python
                 source file while _charset describes the encoding used in  the  message  catalog
                 file.

              Any  characters  that aren't able to be transformed from a byte bytes to str string
              or vice versa will be replaced with a replacement character (ie:  u'�'  in  unicode
              based encodings, '?' in other ASCII compatible encodings).

              SEE ALSO:

                 gettext.GNUTranslations.gettext
                        For information about what methods this class has and what they do

              Changed  in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0 Although we had adapted
              gettext(), ngettext(), lgettext(), and lngettext() to always return byte bytes,  we
              hadn't  forced  those  byte bytes to always be in a specified charset.  We now make
              sure that gettext() and ngettext() return byte bytes encoded  using  output_charset
              if  set,  otherwise  charset  and  if neither of those, UTF-8.  With lgettext() and
              lngettext() output_charset if set, otherwise locale.getpreferredencoding().

   Kitchen.text: unicode and utf8 and xml oh my!
       The kitchen.text module contains functions that deal with text manipulation.

   Kitchen.text.converters
       Functions to handle conversion of byte bytes and str strings.

       Changed in version kitchen: 0.2a2 ; API kitchen.text 2.0.0 Added getwriter()

       Changed in version kitchen: 0.2.2  ; API kitchen.text 2.1.0 Added  exception_to_unicode(),
       exception_to_bytes(), EXCEPTION_CONVERTERS, and BYTE_EXCEPTION_CONVERTERS

       Changed    in    version    kitchen:   1.0.1   ;   API   kitchen.text   2.1.1   Deprecated
       BYTE_EXCEPTION_CONVERTERS    as    we've     simplified     exception_to_unicode()     and
       exception_to_bytes() to make it unnecessary

   Byte Strings and Unicode in Python2
       Python2 has two string types, str and unicode.  unicode represents an abstract sequence of
       text characters.  It can hold any character that is present in the unicode standard.   str
       can hold any byte of data.  The operating system and python work together to display these
       bytes as characters in many cases but you should always keep in mind that the  information
       is  really  a sequence of bytes, not a sequence of characters.  In python2 these types are
       interchangeable a large amount of the time.  They are one of the few pairs of  types  that
       automatically convert when used in equality:

          >>> # string is converted to unicode and then compared
          >>> "I am a string" == u"I am a string"
          True
          >>> # Other types, like int, don't have this special treatment
          >>> 5 == "5"
          False

       However,  this  automatic  conversion tends to lull people into a false sense of security.
       As long as you're dealing with ASCII characters the automatic  conversion  will  save  you
       from  seeing  any differences.  Once you start using characters that are not in ASCII, you
       will start getting UnicodeError and UnicodeWarning as the  automatic  conversions  between
       the types fail:

          >>> "I am an ñ" == u"I am an ñ"
          __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
          False

       Why  do these conversions fail?  The reason is that the python2 unicode type represents an
       abstract sequence of unicode text known as code points.  str, on the  other  hand,  really
       represents  a  sequence  of  bytes.  Those bytes are converted by your operating system to
       appear as characters on your screen using a particular encoding (usually  with  a  default
       defined  by  the operating system and customizable by the individual user.) Although ASCII
       characters are fairly standard in what bytes represent each character, the  bytes  outside
       of the ASCII range are not.  In general, each encoding will map a different character to a
       particular byte.  Newer encodings map individual characters to multiple bytes  (which  the
       older  encodings  will  instead  treat  as  multiple  characters).   In  the face of these
       differences, python refuses to guess at an  encoding  and  instead  issues  a  warning  or
       exception and refuses to convert.

       SEE ALSO:

          Overcoming frustration: Correctly using unicode in python2
                 For a longer introduction on this subject.

   Strategy for Explicit Conversion
       So  what is the best method of dealing with this weltering babble of incoherent encodings?
       The basic strategy is to explicitly turn everything into unicode when it first enters your
       program.  Then, when you send it to output, you can transform the unicode back into bytes.
       Doing this allows you to control the encodings that are used and avoid getting  tracebacks
       due to UnicodeError. Using the functions defined in this module, that looks something like
       this:

          >>> from kitchen.text.converters import to_unicode, to_bytes
          >>> name = raw_input('Enter your name: ')
          Enter your name: Toshio くらとみ
          >>> name
          'Toshio \xe3\x81\x8f\xe3\x82\x89\xe3\x81\xa8\xe3\x81\xbf'
          >>> type(name)
          <type 'str'>
          >>> unicode_name = to_unicode(name)
          >>> type(unicode_name)
          <type 'unicode'>
          >>> unicode_name
          u'Toshio \u304f\u3089\u3068\u307f'
          >>> # Do a lot of other things before needing to save/output again:
          >>> output = open('datafile', 'w')
          >>> output.write(to_bytes(u'Name: %s\\n' % unicode_name))

       A few notes:

       Looking at line 6, you'll notice that the input we took from the user was a byte str.   In
       general,  anytime  we're  getting  a value from outside of python (The filesystem, reading
       data from the network, interacting with an  external  command,  reading  values  from  the
       environment) we are interacting with something that will want to give us a byte str.  Some
       python standard library modules and third party libraries will  automatically  attempt  to
       convert  a  byte str to unicode strings for you.  This is both a boon and a curse.  If the
       library can guess correctly about the encoding that the data is in, it will return unicode
       objects  to  you without you having to convert.  However, if it can't guess correctly, you
       may end up with one of several problems:

       UnicodeError
              The library attempted to decode a byte str  into  a  unicode,  string  failed,  and
              raises an exception.

       Garbled data
              If  the  library  returns  the  data after decoding it with the wrong encoding, the
              characters you see in the unicode string won't be the ones that you expect.

       A byte str instead of unicode string
              Some libraries will return a unicode string when they're able to  decode  the  data
              and  a  byte  str  when they can't.  This is generally the hardest problem to debug
              when it occurs.  Avoid it in your own code and try to avoid or  open  bugs  against
              upstreams  that do this. See Designing Unicode Aware APIs for strategies to do this
              properly.

       On line 8, we convert from a byte str to a unicode string.  to_unicode() does this for us.
       It  has  some error handling and sane defaults that make this a nicer function to use than
       calling str.decode() directly:

       • Instead of defaulting to the ASCII encoding which fails with all but the simple American
         English characters, it defaults to UTF-8.

       • Instead  of raising an error if it cannot decode a value, it will replace the value with
         the unicode "Replacement character" symbol ().

       • If you happen to call this method with something that is not a str or unicode,  it  will
         return an empty unicode string.

       All  three  of  these can be overridden using different keyword arguments to the function.
       See the to_unicode() documentation for more information.

       On line 15 we push the data back out to a file.  Two things you should note here:

       1. We deal with the strings as unicode until the last instant.   The  string  format  that
          we're  using is unicode and the variable also holds unicode.  People sometimes get into
          trouble when they mix a byte str format with a variable that holds a unicode string (or
          vice versa) at this stage.

       2. to_bytes(),  does  the  reverse of to_unicode().  In this case, we're using the default
          values which turn unicode into a byte str using UTF-8.  Any errors are replaced with  a
            and  sending nonstring objects yield empty unicode strings.  Just like to_unicode(),
          you can look at the documentation for to_bytes() to find out how  to  override  any  of
          these defaults.

   When to use an alternate strategy
       The  default strategy of decoding to unicode strings when you take data in and encoding to
       a byte str when you send the data back out works great for most problems but there  are  a
       few times when you shouldn't:

       • The values aren't meant to be read as text

       • The  values need to be byte-for-byte when you send them back out -- for instance if they
         are database keys or filenames.

       • You are transferring the data between several libraries that all expect byte str.

       In each of these instances, there is a reason to keep around the byte  str  version  of  a
       value.  Here's a few hints to keep your sanity in these situations:

       1. Keep  your unicode and str values separate.  Just like the pain caused when you have to
          use someone else's library that returns both unicode and str  you  can  cause  yourself
          pain  if  you  have  functions  that can return both types or variables that could hold
          either type of value.

       2. Name your variables so that you can tell whether you're storing  byte  str  or  unicode
          string.   One  of  the first things you end up having to do when debugging is determine
          what type of string you have in a variable and what type of string you  are  expecting.
          Naming your variables consistently so that you can tell which type they are supposed to
          hold will save you from at least one of those steps.

       3. When you get values initially, make sure that you're dealing with  the  type  of  value
          that  you  expect  as  you  save  it.   You  can  use  isinstance() or to_bytes() since
          to_bytes() doesn't do any modifications of the string if  it's  already  a  str.   When
          using to_bytes() for this purpose you might want to use:

             try:
                 b_input = to_bytes(input_should_be_bytes_already, errors='strict', nonstring='strict')
             except:
                 handle_errors_somehow()

          The  reason  is that the default of to_bytes() will take characters that are illegal in
          the chosen encoding and transform them to replacement characters.  Since the  point  of
          keeping  this  data  as  a  byte  str  is to keep the exact same bytes when you send it
          outside of your code, changing things to replacement characters should  be  rasing  red
          flags  that something is wrong.  Setting errors to strict will raise an exception which
          gives you an opportunity to fail gracefully.

       4. Sometimes you will want to print out the values that you have in your byte  str.   When
          you  do  this  you  will  need  to  make  sure that you transform unicode to str before
          combining them.  Also be sure that any other function  calls  (including  gettext)  are
          going to give you strings that are the same type.  For instance:

             print to_bytes(_('Username: %(user)s'), 'utf-8') % {'user': b_username}

   Gotchas and how to avoid them
       Even  when  you have a good conceptual understanding of how python2 treats unicode and str
       there are still some things that can surprise you.  In most  cases  this  is  because,  as
       noted  earlier, python or one of the python libraries you depend on is trying to convert a
       value automatically and failing.  Explicit conversion at  the  appropriate  place  usually
       solves that.

   str(obj)
       One common idiom for getting a simple, string representation of an object is to use:

          str(obj)

       Unfortunately,  this  is  not safe.  Sometimes str(obj) will return unicode.  Sometimes it
       will return a byte str.  Sometimes, it will attempt to convert from a unicode string to  a
       byte  str,  fail,  and  throw  a UnicodeError.  To be safe from all of these, first decide
       whether you need unicode or str to be returned.  Then use to_unicode()  or  to_bytes()  to
       get the simple representation like this:

          u_representation = to_unicode(obj, nonstring='simplerepr')
          b_representation = to_bytes(obj, nonstring='simplerepr')

   print
       python  has  a  builtin  print()  statement  that  outputs  strings to the terminal.  This
       originated in a time when python only dealt with byte  str.   When  unicode  strings  came
       about,  some  enhancements were made to the print() statement so that it could print those
       as well.  The enhancements make print() work most of the time.  However, the times when it
       doesn't work tend to make for cryptic debugging.

       The  basic  issue  is that print() has to figure out what encoding to use when it prints a
       unicode string to the terminal.  When python is attached  to  your  terminal  (ie,  you're
       running  the  interpreter or running a script that prints to the screen) python is able to
       take the encoding value from your  locale  settings  LC_ALL  or  LC_CTYPE  and  print  the
       characters  allowed  by that encoding.  On most modern Unix systems, the encoding is utf-8
       which means that you can print any unicode character without problem.

       There are two common cases of things going wrong:

       1. Someone has a locale set that does  not  accept  all  valid  unicode  characters.   For
          instance:

             $ LC_ALL=C python
             >>> print u'\ufffd'
             Traceback (most recent call last):
               File "<stdin>", line 1, in <module>
             UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

          This  often happens when a script that you've written and debugged from the terminal is
          run from an automated environment like cron.  It also occurs when you  have  written  a
          script  using  a  utf-8 aware locale and released it for consumption by people all over
          the internet.  Inevitably, someone is running with  a  locale  that  can't  handle  all
          unicode characters and you get a traceback reported.

       2. You redirect output to a file.  Python isn't using the values in LC_ALL unconditionally
          to decide what encoding to use.  Instead it is using the encoding set for the  terminal
          you  are  printing  to  which  is  set to accept different encodings by LC_ALL.  If you
          redirect to a file, you are no longer printing to the terminal so LC_ALL won't have any
          effect.   At  this  point, python will decide it can't find an encoding and fallback to
          ASCII which will likely lead to UnicodeError being raised.  You can see this in a short
          script:

             #! /usr/bin/python -tt
             print u'\ufffd'

          And then look at the difference between running it normally and redirecting to a file:

             $ ./test.py
             �
             $ ./test.py > t
             Traceback (most recent call last):
               File "test.py", line 3, in <module>
                   print u'\ufffd'
             UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

       The short answer to dealing with this is to always use bytes when writing output.  You can
       do this by explicitly converting to bytes like this:

          from kitchen.text.converters import to_bytes
          u_string = u'\ufffd'
          print to_bytes(u_string)

       or you can wrap stdout and stderr with a StreamWriter.  A StreamWriter  is  convenient  in
       that  you  can  assign  it  to  encode  for  sys.stdout or sys.stderr and then have output
       automatically converted but it has the drawback of still being able to throw  UnicodeError
       if the writer can't encode all possible unicode codepoints.  Kitchen provides an alternate
       version which can be retrieved with  kitchen.text.converters.getwriter()  which  will  not
       traceback in its standard configuration.

   Unicode, str, and dict keys
       The  hash()  of  the  ASCII characters is the same for unicode and byte str.  When you use
       them in dict keys, they evaluate to the same dictionary slot:

          >>> u_string = u'a'
          >>> b_string = 'a'
          >>> hash(u_string), hash(b_string)
          (12416037344, 12416037344)
          >>> d = {}
          >>> d[u_string] = 'unicode'
          >>> d[b_string] = 'bytes'
          >>> d
          {u'a': 'bytes'}

       When you deal with key values outside of ASCII, unicode and byte str evaluate unequally no
       matter what their character content or hash value:

          >>> u_string = u'ñ'
          >>> b_string = u_string.encode('utf-8')
          >>> print u_string
          ñ
          >>> print b_string
          ñ
          >>> d = {}
          >>> d[u_string] = 'unicode'
          >>> d[b_string] = 'bytes'
          >>> d
          {u'\\xf1': 'unicode', '\\xc3\\xb1': 'bytes'}
          >>> b_string2 = '\\xf1'
          >>> hash(u_string), hash(b_string2)
          (30848092528, 30848092528)
          >>> d = {}
          >>> d[u_string] = 'unicode'
          >>> d[b_string2] = 'bytes'
          {u'\\xf1': 'unicode', '\\xf1': 'bytes'}

       How  do  you work with this one?  Remember rule #1:  Keep your unicode and byte str values
       separate.  That goes for keys in a dictionary just like anything else.

       • For any given dictionary, make sure that all your keys are either unicode  or  str.   Do
         not  mix  the  two.   If  you're  being given both unicode and str but you don't need to
         preserve separate keys for each, I recommend using to_unicode() or to_bytes() to convert
         all keys to one type or the other like this:

            >>> from kitchen.text.converters import to_unicode
            >>> u_string = u'one'
            >>> b_string = 'two'
            >>> d = {}
            >>> d[to_unicode(u_string)] = 1
            >>> d[to_unicode(b_string)] = 2
            >>> d
            {u'two': 2, u'one': 1}

       • These issues also apply to using dicts with tuple keys that contain a mixture of unicode
         and str.  Once again the best fix is to standardise on either str or unicode.

       • If you absolutely need to store values in a dictionary where the keys  could  be  either
         unicode  or  str  you  can use StrictDict which has separate entries for all unicode and
         byte str and deals correctly with any tuple containing mixed unicode and byte str.

   Functions
   Unicode and byte str conversion
       kitchen.text.converters.to_unicode(obj,        encoding='utf-8',         errors='replace',
       nonstring=None, non_string=None)
              Convert an object into a str string

              Parametersobj  -- Object to convert to a str string.  This should normally be a byte
                       bytesencoding -- What encoding to try converting the byte bytes  as.   Defaults
                       to utf-8errors  --  If  errors  are  found  while  decoding,  perform this action.
                       Defaults to replace which replaces the invalid bytes with a character that
                       means  the  bytes were unable to be decoded.  Other values are the same as
                       the error handling schemes in the codec base classes.  For instance strict
                       which  raises an exception and ignore which simply omits the non-decodable
                       characters.

                     • nonstring --

                       How to treat nonstring values.  Possible values are:

                       simplerepr
                              Attempt to call the object's  "simple  representation"  method  and
                              return  that value.  Python-2.3+ has two methods that try to return
                              a simple representation: object.__unicode__() and object.__str__().
                              We  first  try to get a usable value from object.__unicode__().  If
                              that fails we try the same with object.__str__().

                       empty  Return an empty str string

                       strict Raise a TypeError

                       passthru
                              Return the object unchanged

                       repr   Attempt to return a str string of the repr of the object

                       Default is simplereprnon_string -- Deprecated Use nonstring instead

              RaisesTypeError -- if nonstring is strict and a non-basestring object is  passed
                       in or if nonstring is set to an unknown value

                     • UnicodeDecodeError  --  if errors is strict and obj is not decodable using
                       the given encoding

              Returns
                     str string or the original object depending on the value of nonstring.

              Usually this should be used on a byte bytes but it can take both byte bytes and str
              strings  intelligently.   Nonstring objects are handled in different ways depending
              on the setting of the nonstring parameter.

              The default values of this function are set so as to always return a str string and
              never  raise  an error when converting from a byte bytes to a str string.  However,
              when you do not pass validly encoded text (or a nonstring object), you may  end  up
              with output that you don't expect.  Be sure you understand the requirements of your
              data, not just ignore errors by passing it through this function.

              Changed in version 0.2.1a2: Deprecated non_string in favor of  nonstring  parameter
              and changed default value to simplerepr

       kitchen.text.converters.to_bytes(obj,  encoding='utf-8', errors='replace', nonstring=None,
       non_string=None)
              Convert an object into a byte bytes

              Parametersobj -- Object to convert to a byte bytes.  This should normally be  a  str
                       string.

                     • encoding  --  Encoding to use to convert the str string into a byte bytes.
                       Defaults to utf-8.

                     • errors --

                       If errors are found while encoding,  perform  this  action.   Defaults  to
                       replace  which  replaces the invalid bytes with a character that means the
                       bytes were unable to be encoded.  Other values are the same as  the  error
                       handling  schemes  in  the  codec base classes.  For instance strict which
                       raises an exception  and  ignore  which  simply  omits  the  non-encodable
                       characters.

                     • nonstring --

                       How to treat nonstring values.  Possible values are:

                       simplerepr
                              Attempt  to  call  the  object's "simple representation" method and
                              return that value.  Python-2.3+ has two methods that try to  return
                              a simple representation: object.__unicode__() and object.__str__().
                              We first try to get a usable value from object.__str__().  If  that
                              fails we try the same with object.__unicode__().

                       empty  Return an empty byte bytes

                       strict Raise a TypeError

                       passthru
                              Return the object unchanged

                       repr   Attempt to return a byte bytes of the repr() of the object

                       Default is simplerepr.

                     • non_string -- Deprecated Use nonstring instead.

              RaisesTypeError  -- if nonstring is strict and a non-basestring object is passed
                       in or if nonstring is set to an unknown value.

                     • UnicodeEncodeError -- if errors is strict and all of the bytes of obj  are
                       unable to be encoded using encoding.

              Returns
                     byte bytes or the original object depending on the value of nonstring.

              WARNING:
                 If  you  pass  a  byte  bytes  into  this  function  the  byte bytes is returned
                 unmodified.  It is not re-encoded with the specified encoding.  The easiest  way
                 to achieve that is:

                     to_bytes(to_unicode(text), encoding='utf-8')

                 The  initial  to_unicode()  call  will  ensure  text  is  a  str  string.  Then,
                 to_bytes() will turn that into a byte bytes with the specified encoding.

              Usually, this should be used on a str string but it can take either a byte bytes or
              a  str  string  intelligently.   Nonstring  objects  are  handled in different ways
              depending on the setting of the nonstring parameter.

              The default values of this function are set so as to always return a byte bytes and
              never  raise  an error when converting from unicode to bytes.  However, when you do
              not pass an encoding that can validly encode the object (or a  non-string  object),
              you  may  end  up  with  output  that you don't expect.  Be sure you understand the
              requirements of your data, not just  ignore  errors  by  passing  it  through  this
              function.

              Changed  in  version 0.2.1a2: Deprecated non_string in favor of nonstring parameter
              and changed default value to simplerepr

       kitchen.text.converters.getwriter(encoding)
              Return a codecs.StreamWriter that resists tracing back.

              Parameters
                     encoding -- Encoding to use for transforming str strings into byte bytes.

              Return type
                     codecs.StreamWriter

              Returns
                     StreamWriter  that  you  can  instantiate  to   wrap   output   streams   to
                     automatically translate str strings into encoding.

              This  is  a  reimplemetation of codecs.getwriter() that returns a StreamWriter that
              resists   issuing   tracebacks.    The   StreamWriter   that   is   returned   uses
              kitchen.text.converters.to_bytes()  to  convert  str  strings into byte bytes.  The
              departures from codecs.getwriter() are:

              1. The StreamWriter that is returned will take byte bytes as well as  str  strings.
                 Any byte bytes will be passed through unmodified.

              2. The  default  error  handler  for unknown bytes is to replace the bytes with the
                 unknown character (? in most ascii-based encodings,    in  the  utf  encodings)
                 whereas  codecs.getwriter()  defaults  to strict.  Like codecs.StreamWriter, the
                 returned StreamWriter can have its error handler  changed  in  code  by  setting
                 stream.errors = 'new_handler_name'

              Example usage:

                 $ LC_ALL=C python
                 >>> import sys
                 >>> from kitchen.text.converters import getwriter
                 >>> UTF8Writer = getwriter('utf-8')
                 >>> unwrapped_stdout = sys.stdout
                 >>> sys.stdout = UTF8Writer(unwrapped_stdout)
                 >>> print 'caf\xc3\xa9'
                 café
                 >>> print u'caf\xe9'
                 café
                 >>> ASCIIWriter = getwriter('ascii')
                 >>> sys.stdout = ASCIIWriter(unwrapped_stdout)
                 >>> print 'caf\xc3\xa9'
                 café
                 >>> print u'caf\xe9'
                 caf?

              SEE ALSO:
                 API  docs  for codecs.StreamWriter and codecs.getwriter() and Print Fails on the
                 python wiki.

              New in version kitchen: 0.2a2, API: kitchen.text 1.1.0

       kitchen.text.converters.to_str(obj)
              Deprecated

              This function converts something to a byte bytes if it isn't  one.   It's  used  to
              call  str()  or  unicode()  on  the object to get its simple representation without
              danger of getting a UnicodeError.  You should be using to_unicode()  or  to_bytes()
              explicitly instead.

              If you need str strings:

                 to_unicode(obj, nonstring='simplerepr')

              If you need byte bytes:

                 to_bytes(obj, nonstring='simplerepr')

       kitchen.text.converters.to_utf8(obj, errors='replace', non_string='passthru')
              Deprecated

              Convert  str  to  an  encoded  utf-8  byte  bytes.   You should be using to_bytes()
              instead:

                 to_bytes(obj, encoding='utf-8', non_string='passthru')

   Transformation to XML
       kitchen.text.converters.unicode_to_xml(string,       encoding='utf-8',       attrib=False,
       control_chars='replace')
              Take a str string and turn it into a byte bytes suitable for xml

              Parametersstring -- str string to encode into an XML compatible byte bytesencoding  --  encoding  to use for the returned byte bytes.  Default is to
                       encode to UTF-8.  If some of the characters in string are not encodable in
                       this  encoding,  the  unknown  characters  will be entered into the output
                       string using xml character references.

                     • attrib -- If True, quote the string for use in an xml attribute.  If False
                       (default), quote for use in an xml text field.

                     • control_chars --

                       control  characters  are  not allowed in XML documents.  When we encounter
                       those we need to know what to do.  Valid options are:

                       replace
                              (default) Replace the control characters with ?

                       ignore Remove the characters altogether from the output

                       strict Raise an XmlEncodeError  when we encounter a control character

              Raiseskitchen.text.exceptions.XmlEncodeError  --  If  control_chars  is  set  to
                       strict  and  the  string  to  be  made suitable for output to xml contains
                       control characters or if string is not a str string  then  we  raise  this
                       exception.

                     • ValueError  --  If  control_chars  is set to something other than replace,
                       ignore, or strict.

              Return type
                     byte bytes

              Returns
                     representation of the str string as a valid XML byte bytes

              XML files consist mainly of text encoded using  a  particular  charset.   XML  also
              denies  the  use of certain bytes in the encoded text (example: ASCII Null).  There
              are also special characters that must be escaped if they are present in  the  input
              (example: <).  This function takes care of all of those issues for you.

              There  are  a few different ways to use this function depending on your needs.  The
              simplest invocation is like this:

                 unicode_to_xml(u'String with non-ASCII characters: <"á と">')

              This will return the following to you, encoded in utf-8:

                 'String with non-ASCII characters: &lt;"á と"&gt;'

              Pretty straightforward.  Now, what if you need to encode your document in something
              other than utf-8?  For instance, latin-1?  Let's see:

                 unicode_to_xml(u'String with non-ASCII characters: <"á と">', encoding='latin-1')
                 'String with non-ASCII characters: &lt;"á &#12392;"&gt;'

              Because  the    character is not available in the latin-1 charset, it is replaced
              with &#12392; in our output.  This is an xml character reference  which  represents
              the character at unicode codepoint 12392, the  character.

              When  you  want  to reverse this, use xml_to_unicode() which will turn a byte bytes
              into a str string and  replace  the  xml  character  references  with  the  unicode
              characters.

              XML  also  has  the  quirk  of  not allowing control characters in its output.  The
              control_chars parameter allows us to specify what to do with those.  For use  cases
              that  don't need absolute character by character fidelity (example: holding strings
              that will just be used for display in a  GUI  app  later),  the  default  value  of
              replace works well:

                 unicode_to_xml(u'String with disallowed control chars: \u0000\u0007')
                 'String with disallowed control chars: ??'

              If  you  do  need  to  be  able  to reproduce all of the characters at a later date
              (examples: if the string is a key value in a database or a path  on  a  filesystem)
              you  have many choices.  Here are a few that rely on utf-7, a verbose encoding that
              encodes control characters (as well as non-ASCII unicode values) to characters from
              within the ASCII printable characters.  The good thing about doing this is that the
              code is pretty simple.  You just need to use utf-7 both when encoding the field for
              xml and when decoding it for use in your python program:

                 unicode_to_xml(u'String with unicode: と and control char: ', encoding='utf7')
                 'String with unicode: +MGg and control char: +AAc-'
                 # [...]
                 xml_to_unicode('String with unicode: +MGg and control char: +AAc-', encoding='utf7')
                 u'String with unicode: と and control char: '

              As  you  can  see,  the utf-7 encoding will transform even characters that would be
              representable in utf-8.  This can be a drawback if you want unicode  characters  in
              the file to be readable without being decoded first.  You can work around this with
              increased complexity in your application code:

                 encoding = 'utf-8'
                 u_string = u'String with unicode: と and control char: '
                 try:
                     # First attempt to encode to utf8
                     data = unicode_to_xml(u_string, encoding=encoding, errors='strict')
                 except XmlEncodeError:
                     # Fallback to utf-7
                     encoding = 'utf-7'
                     data = unicode_to_xml(u_string, encoding=encoding, errors='strict')
                 write_tag('<mytag encoding=%s>%s</mytag>' % (encoding, data))
                 # [...]
                 encoding = tag.attributes.encoding
                 u_string = xml_to_unicode(u_string, encoding=encoding)

              Using code similar to that, you can have some fields  encoded  using  your  default
              encoding and fallback to utf-7 if there are control characters present.

              NOTE:
                 If  your  goal  is to preserve the control characters you cannot save the entire
                 file as utf-7 and set the xml encoding parameter to utf-7 if  your  goal  is  to
                 preserve  the control characters.  Because XML doesn't allow control characters,
                 you have to encode those separate from any encoding work  that  the  XML  parser
                 itself knows about.

              SEE ALSO:

                 bytes_to_xml()
                        if  you're dealing with bytes that are non-text or of an unknown encoding
                        that you must preserve on a byte for byte level.

                 guess_encoding_to_xml()
                        if you're dealing with strings in unknown encodings that you  don't  need
                        to save with char-for-char fidelity.

       kitchen.text.converters.xml_to_unicode(byte_string, encoding='utf-8', errors='replace')
              Transform a byte bytes from an xml file into a str string

              Parametersbyte_string -- byte bytes to decode

                     • encoding -- encoding that the byte bytes is in

                     • errors  --  What  to do if not every character is  valid in encoding.  See
                       the to_unicode() documentation for legal values.

              Return type
                     str string

              Returns
                     string decoded from byte_string

              This function attempts to reverse what unicode_to_xml()  does.   It  takes  a  byte
              bytes  (presumably read in from an xml file) and expands all the html entities into
              unicode characters and decodes the byte bytes into a  str  string.   One  thing  it
              cannot  do  is  restore any control characters that were removed prior to inserting
              into the file.  If you need to keep such characters you need to use  xml_to_bytes()
              and  bytes_to_xml()  or  use  on  of  the strategies documented in unicode_to_xml()
              instead.

       kitchen.text.converters.byte_string_to_xml(byte_string,            input_encoding='utf-8',
       errors='replace', output_encoding='utf-8', attrib=False, control_chars='replace')
              Make sure a byte bytes is validly encoded for xml output

              Parametersbyte_string -- Byte bytes to turn into valid xml output

                     • input_encoding -- Encoding of byte_string.  Default utf-8errors --

                       How  to  handle errors encountered while decoding the byte_string into str
                       at the beginning of the process.  Values are:

                       replace
                              (default) Replace the invalid bytes with a ?

                       ignore Remove the characters altogether from the output

                       strict Raise an  UnicodeDecodeError  when  we  encounter  a  non-decodable
                              character

                     • output_encoding  --  Encoding  for  the  xml file that this string will go
                       into.  Default is utf-8.  If all the characters  in  byte_string  are  not
                       encodable  in  this  encoding, the unknown characters will be entered into
                       the output string using xml character references.

                     • attrib -- If True, quote the string for use in an xml attribute.  If False
                       (default), quote for use in an xml text field.

                     • control_chars --

                       XML does not allow control characters.  When we encounter those we need to
                       know what to do.  Valid options are:

                       replace
                              (default) Replace the control characters with ?

                       ignore Remove the characters altogether from the output

                       strict Raise an error when we encounter a control character

              RaisesXmlEncodeError -- If control_chars is set to strict and the string  to  be
                       made  suitable for output to xml contains control characters then we raise
                       this exception.

                     • UnicodeDecodeError -- If errors is  set  to  strict  and  the  byte_string
                       contains  bytes that are not decodable using input_encoding, this error is
                       raised

              Return type
                     byte bytes

              Returns
                     representation of the byte bytes in the output encoding with any bytes  that
                     aren't available in xml taken care of.

              Use  this  when  you  have  a  byte  bytes  representing text that you need to make
              suitable for output to xml.  There are several cases where this is the  case.   For
              instance,  if  you  need  to transform some strings encoded in latin-1 to utf-8 for
              output:

                 utf8_string = byte_string_to_xml(latin1_string, input_encoding='latin-1')

              If you already have strings in the proper encoding you may still want to  use  this
              function to remove control characters:

                 cleaned_string = byte_string_to_xml(string, input_encoding='utf-8', output_encoding='utf-8')

              SEE ALSO:

                 unicode_to_xml()
                        for other ideas on using this function

       kitchen.text.converters.xml_to_byte_string(byte_string,            input_encoding='utf-8',
       errors='replace', output_encoding='utf-8')
              Transform a byte bytes from an xml file into str string

              Parametersbyte_string -- byte bytes to decode

                     • input_encoding -- encoding that the byte bytes is in

                     • errors -- What to do if not every character is valid in encoding.  See the
                       to_unicode() docstring for legal values.

                     • output_encoding -- Encoding for the output byte bytes

              Returns
                     str string decoded from byte_string

              This  function  attempts  to  reverse  what unicode_to_xml() does.  It takes a byte
              bytes (presumably read in from an xml file) and expands all the html entities  into
              unicode  characters  and  decodes  the  byte bytes into a str string.  One thing it
              cannot do is restore any control characters that were removed  prior  to  inserting
              into  the file.  If you need to keep such characters you need to use xml_to_bytes()
              and bytes_to_xml() or use one of  the  strategies  documented  in  unicode_to_xml()
              instead.

       kitchen.text.converters.bytes_to_xml(byte_string, *args, **kwargs)
              Return a byte bytes encoded so it is valid inside of any xml file

              Parametersbyte_string -- byte bytes to transform

                     • **kwargs (*args,) -- extra arguments to this function are passed on to the
                       function actually implementing the encoding.  You can use  this  to  tweak
                       the output in some cases but, as a general rule, you shouldn't because the
                       underlying encoding function is not guaranteed to remain the same.

              Return type
                     byte bytes consisting of all ASCII characters

              Returns
                     byte bytes representation of the input.  This will be encoded using base64.

              This function is made especially to put binary information into xml documents.

              This function is intended for encoding things that must be preserved byte-for-byte.
              If  you  want  to encode a byte string that's text and don't mind losing the actual
              bytes you probably want  to  try  byte_string_to_xml()  or  guess_encoding_to_xml()
              instead.

              NOTE:
                 Although the current implementation uses base64.b64encode() and there's no plans
                 to change it, that isn't guaranteed.  If you want to  make  sure  that  you  can
                 encode and decode these messages it's best to use xml_to_bytes() if you use this
                 function to encode.

       kitchen.text.converters.xml_to_bytes(byte_string, *args, **kwargs)
              Decode a string encoded using bytes_to_xml()

              Parametersbyte_string -- byte bytes to transform.  This should be a  base64  encoded
                       sequence of bytes originally generated by bytes_to_xml().

                     • **kwargs (*args,) -- extra arguments to this function are passed on to the
                       function actually implementing the encoding.  You can use  this  to  tweak
                       the output in some cases but, as a general rule, you shouldn't because the
                       underlying encoding function is not guaranteed to remain the same.

              Return type
                     byte bytes

              Returns
                     byte bytes that's the decoded input

              If you've got fields in an xml document that were encoded with bytes_to_xml()  then
              you  want  to  use  this  function  to undecode them.  It converts a base64 encoded
              string into a byte bytes.

              NOTE:
                 Although the current implementation uses base64.b64decode() and there's no plans
                 to  change  it,  that  isn't  guaranteed.  If you want to make sure that you can
                 encode and decode these messages it's best to use bytes_to_xml() if you use this
                 function to decode.

       kitchen.text.converters.guess_encoding_to_xml(string,             output_encoding='utf-8',
       attrib=False, control_chars='replace')
              Return a byte bytes suitable for inclusion in xml

              Parametersstring -- str or byte bytes to be transformed into a byte  bytes  suitable
                       for  inclusion  in xml.  If string is a byte bytes we attempt to guess the
                       encoding.  If we cannot guess, we fallback to latin-1.

                     • output_encoding -- Output encoding for the byte bytes.  This should  match
                       the encoding of your xml file.

                     • attrib  -- If True, escape the item for use in an xml attribute.  If False
                       (default) escape the item for use in a text node.

              Returns
                     utf-8 encoded byte bytes

       kitchen.text.converters.to_xml(string,           encoding='utf-8',           attrib=False,
       control_chars='ignore')
              Deprecated: Use guess_encoding_to_xml() instead

   Working with exception messages
       kitchen.text.converters.EXCEPTION_CONVERTERS = (<function <lambda>>, <function <lambda>>)

              Tuple of functions to try to use to convert an exception into a string
                     representation.   Its main use is to extract a string (str or bytes) from an
                     exception object in exception_to_unicode()  and  exception_to_bytes().   The
                     functions  here  will  try  the exception's args[0] and the exception itself
                     (roughly equivalent to str(exception)) to extract the message. This is  only
                     a  default and can be easily overridden when calling those functions.  There
                     are several reasons you might wish to do that.  If you have exceptions where
                     the  best  string  representing the exception is not returned by the default
                     functions, you can add another function to extract from a different field:

                        from kitchen.text.converters import (EXCEPTION_CONVERTERS,
                                exception_to_unicode)

                        class MyError(Exception):
                            def __init__(self, message):
                                self.value = message

                        c = [lambda e: e.value]
                        c.extend(EXCEPTION_CONVERTERS)
                        try:
                            raise MyError('An Exception message')
                        except MyError, e:
                            print exception_to_unicode(e, converters=c)

                     Another reason would be if you're converting to a byte bytes  and  you  know
                     the  bytes  needs to be a non-utf-8 encoding.  exception_to_bytes() defaults
                     to utf-8 but if you convert into a byte bytes explicitly using  a  converter
                     then you can choose a different encoding:

                        from kitchen.text.converters import (EXCEPTION_CONVERTERS,
                                exception_to_bytes, to_bytes)
                        c = [lambda e: to_bytes(e.args[0], encoding='euc_jp'),
                                lambda e: to_bytes(e, encoding='euc_jp')]
                        c.extend(EXCEPTION_CONVERTERS)
                        try:
                            do_something()
                        except Exception, e:
                            log = open('logfile.euc_jp', 'a')
                            log.write('%s

              ' % exception_to_bytes(e, converters=c)
                        log.close()

                     Each  function  in  this list should take the exception as its sole argument
                     and return a string containing the message representing the exception.   The
                     functions  may  return  the message as a :byte class:bytes, a str string, or
                     even  an  object  if  you  trust  the  object  to  return  a  decent  string
                     representation.    The   exception_to_unicode()   and   exception_to_bytes()
                     functions will make sure to convert the string to  the  proper  type  before
                     returning.

                     New in version 0.2.2.

       kitchen.text.converters.BYTE_EXCEPTION_CONVERTERS   =   (<function   <lambda>>,  <function
       to_bytes>)
              Deprecated: Use EXCEPTION_CONVERTERS instead.

              Tuple  of  functions  to  try  to  use  to  convert  an  exception  into  a  string
              representation.   This tuple is similar to the one in EXCEPTION_CONVERTERS but it's
              used with exception_to_bytes() instead.  Ideally, these functions should  do  their
              best  to  return  the  data  as  a  byte  bytes but the results will be run through
              to_bytes() before being returned.

              New in version 0.2.2.

              Changed in version 1.0.1: Deprecated as simplifications allow  EXCEPTION_CONVERTERS
              to perform the same function.

       kitchen.text.converters.exception_to_unicode(exc,     converters=(<function     <lambda>>,
       <function <lambda>>))
              Convert an exception object into a unicode representation

              Parametersexc -- Exception object to convert

                     • converters -- List of functions to use to convert  the  exception  into  a
                       string.   See EXCEPTION_CONVERTERS for the default value and an example of
                       adding other converters to the defaults.  The functions in  the  list  are
                       tried  one  at  a  time  to  see  if  they  can  extract a string from the
                       exception.  The first one to do so without raising an exception is used.

              Returns
                     str string representation of the exception.   The  value  extracted  by  the
                     converters  will be converted into str before being returned using the utf-8
                     encoding.  If you know you need to use an alternate encoding add a  function
                     that does that to the list of functions in converters)

              New in version 0.2.2.

       kitchen.text.converters.exception_to_bytes(exc, converters=(<function <lambda>>, <function
       <lambda>>))
              Convert an exception object into a str representation

              Parametersexc -- Exception object to convert

                     • converters -- List of functions to use to convert  the  exception  into  a
                       string.   See EXCEPTION_CONVERTERS for the default value and an example of
                       adding other converters to the defaults.  The functions in  the  list  are
                       tried  one  at  a  time  to  see  if  they  can  extract a string from the
                       exception.  The first one to do so without raising an exception is used.

              Returns
                     byte bytes representation of the exception.   The  value  extracted  by  the
                     converters  will  be  converted  into  bytes before being returned using the
                     utf-8 encoding.  If you know you need to use an  alternate  encoding  add  a
                     function that does that to the list of functions in converters)

              New in version 0.2.2.

              Changed  in  version  1.0.1:  Code  simplification  allowed  us  to switch to using
              EXCEPTION_CONVERTERS as the default value of converters.

   Format Text for Display
       Functions related to displaying unicode text.  Unicode characters don't all have the  same
       width so we need helper functions for displaying them.

       New in version 0.2: kitchen.display API 1.0.0

       kitchen.text.display.textual_width(msg,      control_chars='guess',      encoding='utf-8',
       errors='replace')
              Get the textual width of a string

              Parametersmsg -- str string or byte bytes to get the width of

                     • control_chars --

                       specify how to deal with control characters.  Possible values are:

                       guess  (default) will take a guess for  control  character  widths.   Most
                              codes  will return zero width.  backspace, delete, and clear delete
                              return -1.  escape currently returns -1 as well  but  this  is  not
                              guaranteed as it's not always correct

                       strict will  raise  kitchen.text.exceptions.ControlCharError  if a control
                              character is encountered

                     • encoding -- If we are given a byte bytes this is used to  decode  it  into
                       str  string.   Any characters that are not decodable in this encoding will
                       get a value dependent on the errors parameter.

                     • errors -- How to treat errors encoding  the  byte  bytes  to  str  string.
                       Legal  values  are  the  same as for kitchen.text.converters.to_unicode().
                       The default value of replace will cause undecodable byte sequences to have
                       a width of one. ignore will have a width of zero.

              Raises ControlCharError -- if msg contains a control character and control_chars is
                     strict.

              Returns
                     Textual width of the msg.  This is the amount of space that the string  will
                     consume  on  a  monospace  display.   It's  measured  in  the number of cell
                     positions or columns it will take up on a monospace display.   This  is  not
                     the number of glyphs that are in the string.

              NOTE:
                 This  function  can be wrong sometimes because Unicode does not specify a strict
                 width value for all of the code points.  In particular, we've  found  that  some
                 Tamil characters take up to four character cells but we return a lesser amount.

       kitchen.text.display.textual_width_chop(msg, chop, encoding='utf-8', errors='replace')
              Given a string, return it chopped to a given textual width

              Parametersmsg -- str string or byte bytes to chop

                     • chop -- Chop msg if it exceeds this textual widthencoding -- If we are given a byte bytes, this is used to decode it into a
                       str string.  Any characters that are not decodable in this  encoding  will
                       be assigned a width of one.

                     • errors  --  How  to  treat  errors  encoding the byte bytes to str.  Legal
                       values are the same as for kitchen.text.converters.to_unicode()

              Return type
                     str string

              Returns
                     str string of the msg chopped at the given textual width

              This is what you want to use instead of %.*s, as it does  the  "right"  thing  with
              regard  to  UTF-8 sequences, control characters, and characters that take more than
              one cell position. Eg:

                 >>> # Wrong: only displays 8 characters because it is operating on bytes
                 >>> print "%.*s" % (10, 'café ñunru!')
                 café ñun
                 >>> # Properly operates on graphemes
                 >>> '%s' % (textual_width_chop('café ñunru!', 10))
                 café ñunru
                 >>> # takes too many columns because the kanji need two cell positions
                 >>> print '1234567890\n%.*s' % (10, u'一二三四五六七八九十')
                 1234567890
                 一二三四五六七八九十
                 >>> # Properly chops at 10 columns
                 >>> print '1234567890\n%s' % (textual_width_chop(u'一二三四五六七八九十', 10))
                 1234567890
                 一二三四五

       kitchen.text.display.textual_width_fill(msg,  fill,   chop=None,   left=True,   prefix='',
       suffix='')
              Expand a str string to a specified textual width or chop to same

              Parametersmsg -- str string to format

                     • fill -- pad string until the textual width of the string is this length

                     • chop  --  before  doing  anything  else,  chop  the string to this length.
                       Default: Don't chop the string at all

                     • left -- If True (default) left justify the string and put the  padding  on
                       the right.  If False, pad on the left side.

                     • prefix -- Attach this string before the field we're filling

                     • suffix -- Append this string to the end of the field we're filling

              Return type
                     str string

              Returns
                     msg  formatted  to  fill  the specified width.  If no chop is specified, the
                     string could exceed the fill length when completed.  If prefix or suffix are
                     printable characters, the string could be longer than the fill width.

              NOTE:
                 prefix  and  suffix should be used for "invisible" characters like highlighting,
                 color changing escape codes, etc.  The fill characters are appended  outside  of
                 any  prefix or suffix elements.  This allows you to only highlight msg inside of
                 the field you're filling.

              WARNING:
                 msg, prefix, and suffix should all be representable as unicode  characters.   In
                 particular,  any escape sequences in prefix and suffix need to be convertible to
                 str.  If you need to use byte sequences here rather than unicode characters, use
                 byte_string_textual_width_fill() instead.

              This  function expands a string to fill a field of a particular textual width.  Use
              it instead of %*.*s, as it does the "right" thing with regard to  UTF-8  sequences,
              control  characters,  and  characters  that  take  more than one cell position in a
              display.  Example usage:

                 >>> msg = u'一二三四五六七八九十'
                 >>> # Wrong: This uses 10 characters instead of 10 cells:
                 >>> u":%-*.*s:" % (10, 10, msg[:9])
                 :一二三四五六七八九 :
                 >>> # This uses 10 cells like we really want:
                 >>> u":%s:" % (textual_width_fill(msg[:9], 10, 10))
                 :一二三四五:

                 >>> # Wrong: Right aligned in the field, but too many cells
                 >>> u"%20.10s" % (msg)
                           一二三四五六七八九十
                 >>> # Correct: Right aligned with proper number of cells
                 >>> u"%s" % (textual_width_fill(msg, 20, 10, left=False))
                           一二三四五

                 >>> # Wrong: Adding some escape characters to highlight the line but too many cells
                 >>> u"%s%20.10s%s" % (prefix, msg, suffix)
                 u'[7m          一二三四五六七八九十[0m'
                 >>> # Correct highlight of the line
                 >>> u"%s%s%s" % (prefix, display.textual_width_fill(msg, 20, 10, left=False), suffix)
                 u'[7m          一二三四五[0m'

                 >>> # Correct way to not highlight the fill
                 >>> u"%s" % (display.textual_width_fill(msg, 20, 10, left=False, prefix=prefix, suffix=suffix))
                 u'          [7m一二三四五[0m'

       kitchen.text.display.wrap(text,   width=70,    initial_indent='',    subsequent_indent='',
       encoding='utf-8', errors='replace')
              Works like we want textwrap.wrap() to work,

              Parameterstext -- str string or byte bytes to wrap

                     • width -- textual width at which to wrap.  Default: 70

                     • initial_indent -- string to use to indent the first line.  Default: do not
                       indent.

                     • subsequent_indent -- string to use to wrap subsequent lines.  Default:  do
                       not indent

                     • encoding -- Encoding to use if text is a byte byteserrors  --  error handler to use if text is a byte bytes and contains some
                       undecodable characters.

              Return type
                     list of str strings

              Returns
                     list of lines that have been text wrapped and indented.

              textwrap.wrap() from the python  standard  library  has  two  drawbacks  that  this
              attempts to fix:

              1. It does not handle textual width.  It only operates on bytes or characters which
                 are both inadequate (due to multi-byte and double width characters).

              2. It malforms lists and blocks.

       kitchen.text.display.fill(text, *args, **kwargs)
              Works like we want textwrap.fill() to work

              Parameters
                     text -- str string or byte bytes to process

              Returns
                     str string with each line separated by a newline

              SEE ALSO:

                 kitchen.text.display.wrap()
                        for other parameters that you can give this command.

              This function is a light wrapper around  kitchen.text.display.wrap().   Where  that
              function  returns  a list of lines, this function returns one string with each line
              separated by a newline.

       kitchen.text.display.byte_string_textual_width_fill(msg,   fill,   chop=None,   left=True,
       prefix='', suffix='', encoding='utf-8', errors='replace')
              Expand a byte bytes to a specified textual width or chop to same

              Parametersmsg -- byte bytes encoded in UTF-8 that we want formatted

                     • fill -- pad msg until the textual width is this long

                     • chop  --  before  doing  anything  else,  chop  the string to this length.
                       Default: Don't chop the string at all

                     • left -- If True (default) left justify the string and put the  padding  on
                       the right.  If False, pad on the left side.

                     • prefix -- Attach this byte bytes before the field we're filling

                     • suffix -- Append this byte bytes to the end of the field we're filling

              Return type
                     byte bytes

              Returns
                     msg formatted to fill the specified textual width.  If no chop is specified,
                     the string could exceed the fill length when completed.  If prefix or suffix
                     are printable characters, the string could be longer than fill width.

              NOTE:
                 prefix  and  suffix should be used for "invisible" characters like highlighting,
                 color changing escape codes, etc.  The fill characters are appended  outside  of
                 any  prefix or suffix elements.  This allows you to only highlight msg inside of
                 the field you're filling.

              SEE ALSO:

                 textual_width_fill()
                        For example usage.  This function has only two differences.

                        1. it takes byte bytes for prefix and suffix so you can pass in arbitrary
                           sequences of bytes, not just unicode characters.

                        2. it returns a byte bytes instead of a str string.

   Internal Data
       There  are a few internal functions and variables in this module.  Code outside of kitchen
       shouldn't use them but people coding on kitchen itself may find them useful.

       kitchen.text.display._COMBINING = ((768, 879), (1155, 1161), (1425, 1469),  (1471,  1471),
       (1473, 1474), (1476, 1477), (1479, 1479), (1536, 1539), (1552, 1562), (1611, 1631), (1648,
       1648), (1750, 1764), (1767, 1768), (1770, 1773), (1807, 1807), (1809, 1809), (1840, 1866),
       (1958, 1968), (2027, 2035), (2045, 2045), (2070, 2073), (2075, 2083), (2085, 2087), (2089,
       2093), (2137, 2139), (2259, 2273), (2275, 2303), (2305, 2306), (2364, 2364), (2369, 2376),
       (2381, 2381), (2385, 2388), (2402, 2403), (2433, 2433), (2492, 2492), (2497, 2500), (2509,
       2509), (2530, 2531), (2558, 2558), (2561, 2562), (2620, 2620), (2625, 2626), (2631, 2632),
       (2635, 2637), (2672, 2673), (2689, 2690), (2748, 2748), (2753, 2757), (2759, 2760), (2765,
       2765), (2786, 2787), (2817, 2817), (2876, 2876), (2879, 2879), (2881, 2883), (2893, 2893),
       (2902, 2902), (2946, 2946), (3008, 3008), (3021, 3021), (3134, 3136), (3142, 3144), (3146,
       3149), (3157, 3158), (3260, 3260), (3263, 3263), (3270, 3270), (3276, 3277), (3298, 3299),
       (3387, 3388), (3393, 3395), (3405, 3405), (3530, 3530), (3538, 3540), (3542, 3542), (3633,
       3633), (3636, 3642), (3655, 3662), (3761, 3761), (3764, 3772), (3784, 3789), (3864, 3865),
       (3893, 3893), (3895, 3895), (3897, 3897), (3953, 3966), (3968, 3972), (3974, 3975), (3984,
       3991), (3993, 4028), (4038, 4038), (4141, 4144), (4146, 4146), (4150, 4151), (4153, 4154),
       (4184, 4185), (4237, 4237), (4448, 4607), (4957, 4959), (5906, 5908), (5938, 5940), (5970,
       5971), (6002, 6003), (6068, 6069), (6071, 6077), (6086, 6086), (6089, 6099), (6109, 6109),
       (6155, 6157), (6313, 6313), (6432, 6434), (6439, 6440), (6450, 6450), (6457, 6459), (6679,
       6680), (6752, 6752), (6773, 6780), (6783, 6783), (6832, 6845), (6912, 6915), (6964, 6964),
       (6966, 6970), (6972, 6972), (6978, 6978), (6980, 6980), (7019, 7027), (7082, 7083), (7142,
       7142), (7154, 7155), (7223, 7223), (7376, 7378), (7380, 7392), (7394, 7400), (7405, 7405),
       (7412, 7412), (7416, 7417), (7616, 7673), (7675, 7679), (8203, 8207), (8234, 8238), (8288,
       8291), (8298, 8303), (8400, 8432), (11503, 11505), (11647, 11647), (11744, 11775), (12330,
       12335),  (12441,  12442),  (42607, 42607), (42612, 42621), (42654, 42655), (42736, 42737),
       (43014, 43014), (43019, 43019), (43045, 43046), (43204, 43204),  (43232,  43249),  (43307,
       43309),  (43347,  43347),  (43443, 43443), (43456, 43456), (43696, 43696), (43698, 43700),
       (43703, 43704), (43710, 43711), (43713, 43713), (43766, 43766),  (44013,  44013),  (64286,
       64286),  (65024,  65039),  (65056, 65071), (65279, 65279), (65529, 65531), (66045, 66045),
       (66272, 66272), (66422, 66426), (68097, 68099), (68101, 68102),  (68108,  68111),  (68152,
       68154),  (68159,  68159),  (68325, 68326), (68900, 68903), (69446, 69456), (69702, 69702),
       (69759, 69759), (69817, 69818), (69888, 69890), (69939, 69940),  (70003,  70003),  (70080,
       70080),  (70090,  70090),  (70197, 70198), (70377, 70378), (70459, 70460), (70477, 70477),
       (70502, 70508), (70512, 70516), (70722, 70722), (70726, 70726),  (70750,  70750),  (70850,
       70851),  (71103,  71104),  (71231, 71231), (71350, 71351), (71467, 71467), (71737, 71738),
       (72160, 72160), (72244, 72244), (72263, 72263), (72345, 72345),  (72767,  72767),  (73026,
       73026),  (73028, 73029), (73111, 73111), (92912, 92916), (92976, 92982), (113822, 113822),
       (119141, 119145), (119149, 119170), (119173, 119179), (119210, 119213), (119362,  119364),
       (122880,  122886), (122888, 122904), (122907, 122913), (122915, 122916), (122918, 122922),
       (123184, 123190), (123628, 123631), (125136, 125142), (125252, 125258), (917505,  917505),
       (917536, 917631), (917760, 917999))
              Internal  table,  provided  by  this  module to list code points which combine with
              other characters and therefore should have no textual  width.   This  is  a  sorted
              tuple  of  non-overlapping  intervals.  Each interval is a tuple listing a starting
              code point and ending code point.  Every code point between the two end points is a
              combining character.

              SEE ALSO:

                 _generate_combining_table()
                        for how this table is generated

              This  table was last regenerated on python-3.8.0a3 with unicodedata.unidata_version
              12.0.0

       kitchen.text.display._generate_combining_table()
              Combine Markus Kuhn's data with unicodedata to make combining char list

              Return type
                     tuple of tuples

              Returns
                     tuple of intervals of  code  points  that  are  combining  character.   Each
                     interval  is  a 2-tuple of the starting code point and the ending code point
                     for the combining characters.

              In normal use, this function serves to tell how we're generating the combining char
              list.   For  speed reasons, we use this to generate a static list and just use that
              later.

              Markus Kuhn's list of combining characters is more  complete  than  what's  in  the
              python  unicodedata  library  but  the  python  unicodedata is synced against later
              versions of the unicode database

              This is used to generate the _COMBINING table.

       kitchen.text.display._print_combining_table()
              Print out a new _COMBINING table

              This   will   print   a   new   _COMBINING   table   in   the   format   used    in
              kitchen/text/display.py.   It's  useful  for  updating  the  _COMBINING  table with
              updated data from a new python as the format won't change from  what's  already  in
              the file.

       kitchen.text.display._interval_bisearch(value, table)
              Binary search in an interval table.

              Parametersvalue -- numeric value to search for

                     • table  --  Ordered  list of intervals.  This is a list of two-tuples.  The
                       elements of the two-tuple define an interval's start and end points.

              Returns
                     If value is found within an interval in the table return  True.   Otherwise,
                     False

              This  function  checks  whether  a  numeric  value  is  present  within  a table of
              intervals.  It checks using a binary search algorithm, dividing the list of  values
              in half and checking against the values until it determines whether the value is in
              the table.

       kitchen.text.display._ucp_width(ucs, control_chars='guess')
              Get the textual width of a ucs character

              Parametersucs -- integer representing a single unicode code pointcontrol_chars --

                       specify how to deal with control characters.  Possible values are:

                       guess  (default) will take a guess for  control  character  widths.   Most
                              codes  will return zero width.  backspace, delete, and clear delete
                              return -1.  escape currently returns -1 as well  but  this  is  not
                              guaranteed as it's not always correct

                       strict will raise ControlCharError if a control character is encountered

              Raises ControlCharError  --  if  the  code point is a unicode control character and
                     control_chars is set to 'strict'

              Returns
                     textual width of the character.

              NOTE:
                 It's important to  remember  this  is  textual  width  and  not  the  number  of
                 characters or bytes.

       kitchen.text.display._textual_width_le(width, *args)
              Optimize the common case when deciding which textual width is larger

              Parameterswidth -- textual width to compare against.

                     • *args -- str strings to check the total textual width of

              Returns
                     True if the total length of args are less than or equal to width.  Otherwise
                     False.

              We often want to know "does X fit in Y".  It takes a while to  use  textual_width()
              to  calculate  this.   However, we know that the number of canonically composed str
              characters is always going to have 1 or 2 for  the  textual  width  per  character.
              With this we can take the following shortcuts:

              1. If  the  number  of canonically composed characters is more than width, the true
                 textual width cannot be less than width.

              2. If the number of canonically composed characters * 2 is less than the width then
                 the textual width must be ok.

              textual  width  of a canonically composed str string will always be greater than or
              equal to the the number of str characters.  So we can first check if the number  of
              composed  str  characters is less than the asked for width.  If it is we can return
              True immediately.  If not, then we must do a full textual width lookup.

   Miscellaneous functions for manipulating text
       Collection of text functions that don't fit in another category.

       Changed  in  version  kitchen:  1.2.0,  API:  kitchen.text  2.2.0  Added   isbasestring(),
       isbytestring(),  and  isunicodestring() to help tell which string type is which on python2
       and python3

       kitchen.text.misc.byte_string_valid_encoding(byte_string, encoding='utf-8')
              Detect if a byte bytes is valid in a specific encoding

              Parametersbyte_string -- Byte bytes to test for bytes not valid in this encoding

                     • encoding -- encoding to test against.  Defaults to UTF-8.

              Returns
                     True if there  are  no  invalid  UTF-8  characters.   False  if  an  invalid
                     character is detected.

              NOTE:
                 This  function checks whether the byte bytes is valid in the specified encoding.
                 It does not detect whether the byte bytes actually was encoded in that encoding.
                 If   you   want   that   sort   of  functionality,  you  probably  want  to  use
                 guess_encoding() instead.

       kitchen.text.misc.byte_string_valid_xml(byte_string, encoding='utf-8')
              Check that a byte bytes would be valid in xml

              Parametersbyte_string -- Byte bytes to check

                     • encoding -- Encoding of the xml file.  Default: UTF-8

              Returns
                     True if the string is valid.  False if it would be invalid in the xml file

              In some  cases  you'll  have  a  whole  bunch  of  byte  strings  and  rather  than
              transforming  them  to  str and back to byte bytes for output to xml, you will just
              want to make sure they work with the xml file you're constructing.   This  function
              will help you do that.  Example:

                 ARRAY_OF_MOSTLY_UTF8_STRINGS = [...]
                 processed_array = []
                 for string in ARRAY_OF_MOSTLY_UTF8_STRINGS:
                     if byte_string_valid_xml(string, 'utf-8'):
                         processed_array.append(string)
                     else:
                         processed_array.append(guess_bytes_to_xml(string, encoding='utf-8'))
                 output_xml(processed_array)

       kitchen.text.misc.guess_encoding(byte_string, disable_chardet=False)
              Try to guess the encoding of a byte bytes

              Parametersbyte_string -- byte bytes to guess the encoding of

                     • disable_chardet  --  If  this  is True, we never attempt to use chardet to
                       guess the encoding.  This is useful if you need  to  have  reproducibility
                       whether chardet is installed or not.  Default: False.

              Raises TypeError -- if byte_string is not a byte bytes type

              Returns
                     string  containing  a  guess  at  the  encoding  of  byte_string.   This  is
                     appropriate to pass as the encoding  argument  when  encoding  and  decoding
                     unicode strings.

              We start by attempting to decode the byte bytes as UTF-8.  If this succeeds we tell
              the world it's UTF-8 text.  If it doesn't and chardet is installed  on  the  system
              and  disable_chardet  is  False  this  function  will  use  it to try detecting the
              encoding of byte_string.  If it is not installed or chardet  cannot  determine  the
              encoding  with a high enough confidence then we rather arbitrarily claim that it is
              latin-1.  Since latin-1 will encode to every byte, decoding  from  latin-1  to  str
              will not cause UnicodeErrors although the output might be mangled.

       kitchen.text.misc.html_entities_unescape(string)
              Substitute unicode characters for HTML entities

              Parameters
                     string -- str string to substitute out html entities

              Raises TypeError -- if something other than a str string is given

              Return type
                     str string

              Returns
                     The plain text without html entities

       kitchen.text.misc.isbasestring(obj)
              Determine if obj is a byte bytes or str string

              In python2 this is eqiuvalent to isinstance(obj, basestring).  In python3 it checks
              whether the object is an instance of str, bytes, or bytearray.  This is an  aid  to
              porting  code  that needed to test whether an object was derived from basestring in
              python2 (commonly used in unicode-bytes conversion functions)

              Parameters
                     obj -- Object to test

              Returns
                     True if the object is a basestring.  Otherwise False.

              New in version Kitchen:: 1.2.0, API kitchen.text 2.2.0

       kitchen.text.misc.isbytestring(obj)
              Determine if obj is a byte bytes

              In python2 this is equivalent  to  isinstance(obj,  str).   In  python3  it  checks
              whether the object is an instance of bytes or bytearray.

              Parameters
                     obj -- Object to test

              Returns
                     True if the object is a byte bytes.  Otherwise, False.

              New in version Kitchen:: 1.2.0, API kitchen.text 2.2.0

       kitchen.text.misc.isunicodestring(obj)
              Determine if obj is a str string

              In  python2  this  is equivalent to isinstance(obj, unicode).  In python3 it checks
              whether the object is an instance of bytes.

              Parameters
                     obj -- Object to test

              Returns
                     True if the object is a str string.  Otherwise, False.

              New in version Kitchen:: 1.2.0, API kitchen.text 2.2.0

       kitchen.text.misc.process_control_chars(string, strategy='replace')
              Look for and transform control characters in a string

              Parametersstring -- string to search for and transform control characters within

                     • strategy --

                       XML does not allow ASCII control characters.  When we encounter  those  we
                       need to know what to do.  Valid options are:

                       replace
                              (default) Replace the control characters with "?"

                       ignore Remove the characters altogether from the output

                       strict Raise a ControlCharError when we encounter a control character

              RaisesTypeError -- if string is not a unicode string.

                     • ValueError -- if the strategy is not one of replace, ignore, or strict.

                     • kitchen.text.exceptions.ControlCharError  -- if the strategy is strict and
                       a control character is present in the string

              Returns
                     str string with no control characters in it.

              Changed in version kitchen: 1.2.0, API: kitchen.text 2.2.0 Strip out the C1 control
              characters in addition to the C0 control characters.

       kitchen.text.misc.str_eq(str1, str2, encoding='utf-8', errors='replace')
              Compare two strings, converting to byte bytes if one is str

              Parametersstr1 -- First string to compare

                     • str2 -- Second string to compare

                     • encoding -- If we need to convert one string into a byte bytes to compare,
                       the encoding to use.  Default is utf-8.

                     • errors -- What to do if we encounter errors when encoding the string.  See
                       the  kitchen.text.converters.to_bytes() documentation for possible values.
                       The default is replace.

              This function prevents UnicodeError (python-2.4 or less) and UnicodeWarning (python
              2.5  and higher) when we compare a str string to a byte bytes.  The errors normally
              arise because the conversion is done to ASCII.  This function lets you  convert  to
              utf-8 or another encoding instead.

              NOTE:
                 When  we need to convert one of the strings from str in order to compare them we
                 convert the str string into a byte bytes.  That means that strings  can  compare
                 differently if you use different encodings for each.

              Note that str1 == str2 is faster than this function if you can accept the following
              limitations:

              • Limited to python-2.5+ (otherwise a UnicodeDecodeError may be thrown)

              • Will generate a UnicodeWarning if non-ASCII byte bytes is compared to str string.

   UTF-8
       Functions for operating on byte bytes encoded as UTF-8

       NOTE:
          In many cases, it is better to convert to str, operate on  the  strings,  then  convert
          back  to UTF-8.  str type can handle many of these functions itself.  For those that it
          doesn't (removing control characters from length calculations, for instance)  the  code
          to do so with a str type is often simpler.

       WARNING:
          All  of  the  functions in this module are deprecated.  Most of them have been replaced
          with   functions   that   operate   on   unicode   values   in    kitchen.text.display.
          kitchen.text.utf8.utf8_valid() has been replaced with a function in kitchen.text.misc.

       kitchen.text.utf8.utf8_text_fill(text, *args, **kwargs)
              Deprecated  Similar  to  textwrap.fill()  but understands utf-8 strings and doesn't
              screw up lists/blocks/etc.

              Use kitchen.text.display.fill() instead.

       kitchen.text.utf8.utf8_text_wrap(text, width=70, initial_indent='', subsequent_indent='')
              Deprecated Similar to textwrap.wrap() but understands utf-8 data and doesn't  screw
              up lists/blocks/etc

              Use kitchen.text.display.wrap() instead

       kitchen.text.utf8.utf8_valid(msg)
              Deprecated Detect if a string is valid utf-8

              Use kitchen.text.misc.byte_string_valid_encoding() instead.

       kitchen.text.utf8.utf8_width(msg)
              Deprecated Get the textual width of a utf-8 string

              Use kitchen.text.display.textual_width() instead.

       kitchen.text.utf8.utf8_width_chop(msg, chop=None)
              Deprecated Return a string chopped to a given textual width

              Use textual_width_chop() and textual_width() instead:

                 >>> msg = 'く ku ら ra と to み mi'
                 >>> # Old way:
                 >>> utf8_width_chop(msg, 5)
                 (5, 'く ku')
                 >>> # New way
                 >>> from kitchen.text.converters import to_bytes
                 >>> from kitchen.text.display import textual_width, textual_width_chop
                 >>> (textual_width(msg), to_bytes(textual_width_chop(msg, 5)))
                 (5, 'く ku')

       kitchen.text.utf8.utf8_width_fill(msg, fill, chop=None, left=True, prefix='', suffix='')
              Deprecated Pad a utf-8 string to fill a specified width

              Use byte_string_textual_width_fill() instead

       converters
              deals with converting text for different encodings and to and from XML

       display
              deals with issues with printing text to a screen

       misc   is a catchall for text manipulation functions that don't seem to fit elsewhere

       utf8   contains deprecated functions to manipulate utf8 byte strings

   Kitchen.collections
   StrictDict
       kitchen.collections.StrictDict provides a dictionary that treats bytes and str as distinct
       key values.

       kitchen.collections.strictdict.StrictDict
              alias of collections.defaultdict

   Kitchen.iterutils Module
       Functions to manipulate iterables

       New in version Kitchen:: 0.2.1a1

       Module author: Toshio Kuratomi <toshio@fedoraproject.org>

       Module author: Luke Macken <lmacken@redhat.com>

       kitchen.iterutils.isiterable(obj, include_string=False)
              Check whether an object is an iterable

              Parametersobj -- Object to test whether it is an iterable

                     • include_string -- If True and obj is a  byte  bytes  or  str  string  this
                       function  will  return  True.  If set to False, byte bytes and str strings
                       will cause this function to return False.  Default False.

              Returns
                     True if obj is iterable, otherwise False.

       kitchen.iterutils.iterate(obj, include_string=False)
              Generator that can be used to iterate over anything

              Parametersobj -- The object to iterate over

                     • include_string -- if True, treat strings as  iterables.   Otherwise  treat
                       them as a single scalar value.  Default False

              This  function will create an iterator out of any scalar or iterable.  It is useful
              for making a value given to you an iterable before operating on it.  Iterables have
              their items returned.  scalars are transformed into iterables.  A string is treated
              as a scalar value unless the include_string parameter  is  set  to  True.   Example
              usage:

                 >>> list(iterate(None))
                 [None]
                 >>> list(iterate([None]))
                 [None]
                 >>> list(iterate([1, 2, 3]))
                 [1, 2, 3]
                 >>> list(iterate(set([1, 2, 3])))
                 [1, 2, 3]
                 >>> list(iterate(dict(a='1', b='2')))
                 ['a', 'b']
                 >>> list(iterate(1))
                 [1]
                 >>> list(iterate(iter([1, 2, 3])))
                 [1, 2, 3]
                 >>> list(iterate('abc'))
                 ['abc']
                 >>> list(iterate('abc', include_string=True))
                 ['a', 'b', 'c']

   Helpers for versioning software
   PEP-386 compliant versioning
       PEP  386  defines  a standard format for version strings.  This module contains a function
       for creating strings in that format.

       kitchen.versioning.version_tuple_to_string(version_info)
              Return a PEP 386 version string from a PEP 386 style version tuple

              Parameters
                     version_info -- Nested set of tuples that describes the version.  See  below
                     for an example.

              Returns
                     a version string

              This  function  implements  just  enough  of PEP 386 to satisfy our needs.  PEP 386
              defines a standard format for version strings and refers to a function that will be
              merged  into  the  python  standard  library  that  transforms  a  tuple of version
              information into a standard version string.  This function is an implementation  of
              that function.  Once that function becomes available in the python standard library
              we will start using it and deprecate this function.

              version_info takes the form that PEP 386's NormalizedVersion.from_parts() uses:

                 ((Major, Minor, [Micros]), [(Alpha/Beta/rc marker, version)],
                     [(post/dev marker, version)])

                 Ex: ((1, 0, 0), ('a', 2), ('dev', 3456))

              It generates a PEP 386 compliant version string:

                 N.N[.N]+[{a|b|c|rc}N[.N]+][.postN][.devN]

                 Ex: 1.0.0a2.dev3456

              WARNING:
                 This function does next to no error checking.  It's up to  the  person  defining
                 the  version  tuple  to  make  sure  that the values make sense.  If the PEP 386
                 compliant version parser doesn't get released soon we'll  look  at  making  this
                 function  check that the version tuple makes sense before transforming it into a
                 string.

              It's recommended that you use this function to keep a  __version_info__  tuple  and
              __version__ string in your modules.  Why do we need both a tuple and a string?  The
              string is often useful for putting  into  human  readable  locations  like  release
              announcements,  version strings in tarballs, etc.  Meanwhile the tuple is very easy
              for a computer to compare. For example, kitchen sets  up  its  version  information
              like this:

                 from kitchen.versioning import version_tuple_to_string
                 __version_info__ = ((0, 2, 1),)
                 __version__ = version_tuple_to_string(__version_info__)

              Other  programs  that  depend on a kitchen version between 0.2.1 and 0.3.0 can find
              whether the present version is okay with code like this:

                 from kitchen import __version_info__, __version__
                 if __version_info__ < ((0, 2, 1),) or __version_info__ >= ((0, 3, 0),):
                     print 'kitchen is present but not at the right version.'
                     print 'We need at least version 0.2.1 and less than 0.3.0'
                     print 'Currently found: kitchen-%s' % __version__

   Exceptions
       Kitchen has a hierarchy of exceptions that should  make  it  easy  to  catch  many  errors
       emitted by kitchen itself.

   Base kitchen exceptions
       Exception  classes  for  kitchen  and  the root of the exception hierarchy for all kitchen
       modules.

       exception kitchen.exceptions.KitchenError
              Base exception class for any error thrown directly by kitchen.

   Kitchen.text exceptions
       Exception classes thrown by kitchen's text processing routines.

       exception kitchen.text.exceptions.ControlCharError
              Exception thrown when an ascii control character is encountered.

       exception kitchen.text.exceptions.XmlEncodeError
              Exception thrown by error conditions when encoding an xml string.

   1.0.0 Porting Guide
       The  0.1  through  1.0.0  releases  focused  on  bringing  in  functions  from   yum   and
       python-fedora.   This  porting  guide  tells  how to port from those APIs to their kitchen
       replacements.

   python-fedora
                 ───────────────────────────────────────────────────────────────────────
                  python-fedora                   kitchen replacement
                 ───────────────────────────────────────────────────────────────────────
                  fedora.iterutils.isiterable()   kitchen.iterutils.isiterable()
                                                  [1]
                 ───────────────────────────────────────────────────────────────────────
                  fedora.textutils.to_unicode()   kitchen.text.converters.to_unicode()
                 ───────────────────────────────────────────────────────────────────────
                  fedora.textutils.to_bytes()     kitchen.text.converters.to_bytes()
                 ┌──────────────────────────────┬──────────────────────────────────────┐
                 │                              │                                      │
--

INDICES AND TABLES

IndexModule IndexSearch Page

PROJECT PAGES

       More information about the project can be found on the project webpage

       The latest published version of this documentation can be found on the documentation page

AUTHOR

       unknown

COPYRIGHT

       2022 Red Hat, Inc. and others