Ubuntu Manpage: kitchen - kitchen 1.2.5

Provided by: python-kitchen-doc_1.2.5-1_all

NAME

       kitchen - kitchen 1.2.5

       Author Toshio Kuratomi

       Date   19 March 2011

       Version
              1.0.x

       We’ve  all  done  it.   In the process of writing a brand new application we’ve discovered
       that we need a little bit of code that we’ve invented before.  Perhaps it’s  something  to
       handle  unicode  text.   Perhaps  it’s  something  to make a bit of python-2.5 code run on
       python-2.4.  Whatever it is, it ends up being a tiny bit of code that seems too  small  to
       worry  about pushing into its own module so it sits there, a part of your current project,
       waiting to be cut and pasted into your next project.  And the next.  And  the  next.   And
       since  that  little  bittybit  of code proved so useful to you, it’s highly likely that it
       proved useful to someone else as well.  Useful enough that they’ve written it and copy and
       pasted it over and over into each of their new projects.

       Well,  no  longer!   Kitchen  aims  to pull these small snippets of code into a few python
       modules which you can import and use within your project.  No more copy  and  paste!   Now
       you  can let someone else maintain and release these small snippets so that you can get on
       with your life.

       This package forms the core of Kitchen.  It contains some useful modules for  using  newer
       python  standard  library  modules  on  older  python versions, text manipulation, PEP 386
       versioning, and initializing gettext.  With this package we’re trying  to  provide  a  few
       useful  features  that  don’t  have  too  many dependencies outside of the python standard
       library.  We’ll be releasing other modules that drop into the  kitchen  namespace  to  add
       other features (possibly with larger deps) as time goes on.

REQUIREMENTS

       We’ve  tried  to  keep  the core kitchen module’s requirements lightweight.  At the moment
       kitchen only requires

       python 2.4 or later

       WARNING:
          Kitchen-1.1.0 was the last release that supported python-2.3.x

   Soft Requirements
       If found, these libraries will be used to make the implementation of some part of  kitchen
       better  in  some  way.  If they are not present, the API that they enable will still exist
       but may function in a different manner.

       chardet
              Used in guess_encoding() and guess_encoding_to_xml() to help guess encoding of byte
              strings being converted.  If not present, unknown encodings will be converted as if
              they were latin1

OTHER RECOMMENDED LIBRARIES

These libraries implement commonly used functionality that everyone seems to invent.
Rather than reinvent their wheel, I simply list the things that they do well for now.
Perhaps if people can’t find them normally, I’ll add them as requirements in setup.py or
link them into kitchen’s namespace. For now, I just mention them here:

bunch Bunch is a dictionary that you can use attribute lookup as well as bracket notation
to access. Setting it apart from most homebrewed implementations is the bunchify()
function which will descend nested structures of lists and dicts, transforming the
dicts to Bunch’s.

hashlib
Python 2.5 and forward have a hashlib library that provides secure hash functions
to python. If you’re developing for python2.4 though, you can install the
standalone hashlib library and have access to the same functions.

iterutils
The python documentation for itertools has some examples of other nice iterable
functions that can be built from the itertools functions. This third-party module
creates those recipes as a module.

ordereddict
Python 2.7 and forward have a OrderedDict that provides a dict whose items are
ordered (and indexable) as well as named.

unittest2
Python 2.7 has an updated unittest library with new functions not present in the
python standard library for Python 2.6 or less. If you want to use those new
functions but need your testing framework to be compatible with older Python the
unittest2 library provides the update as an external module.

nose If you want to use a test discovery tool instead of the unittest framework,
nosetests provides a simple to use way to do that.

LICENSE

       This python module is distributed under the terms of the GNU Lesser General Public License
       Version 2 or later.

       NOTE:
          Some parts of this module are licensed under terms less restrictive than  the  LGPLv2+.
          If  you separate these files from the work as a whole you are allowed to use them under
          the less restrictive licenses.  The following is a list of the files that are known:

          Python 2 license
                 _subprocess.py,   test_subprocess.py,    defaultdict.py,    test_defaultdict.py,
                 _base64.py, and test_base64.py

   Using kitchen to write good code
       Kitchen’s  functions  won’t automatically make you a better programmer.  You have to learn
       when and how to use them as well.  This section of the documentation is intended  to  show
       you  some  of  the  ways  that you can apply kitchen’s functions to problems that may have
       arisen in your life.  The goal of this section  is  to  give  you  enough  information  to
       understand  what  the  kitchen API can do for you and where in the KitchenAPI docs to look
       for something that can help you with your next issue.  Along the way, you  might  pick  up
       the knack for identifying issues with your code before you publish it.  And that will make
       you a better coder.

   Overcoming frustration: Correctly using unicode in python2
       In python-2.x, there’s two types that deal with text.

       1. str is for strings of bytes.  These are very similar  in  nature  to  how  strings  are
          handled in C.

       2. unicode is for strings of unicode code points.

       NOTE:
          Just what the dickens is “Unicode”?

          One  mistake  that  people encountering this issue for the first time make is confusing
          the unicode type and the encodings of unicode stored in the str type.  In  python,  the
          unicode  type stores an abstract sequence of code points.  Each code point represents a
          grapheme.  By contrast, byte str stores a sequence of bytes which can then be mapped to
          a  sequence  of code points.  Each unicode encoding (UTF-8, UTF-7, UTF-16, UTF-32, etc)
          maps different sequences of bytes to the unicode code points.

          What  does  that  mean  to  you  as  a  programmer?   When  you’re  dealing  with  text
          manipulations (finding the number of characters in a string or cutting a string on word
          boundaries) you should be dealing with unicode strings as they abstract characters in a
          manner  that’s  appropriate for thinking of them as a sequence of letters that you will
          see on a page.  When dealing with I/O, reading to and from  the  disk,  printing  to  a
          terminal,  sending  something over a network link, etc, you should be dealing with byte
          str as those devices are going to need to deal with concrete  implementations  of  what
          bytes represent your abstract characters.

       In the python2 world many APIs use these two classes interchangeably but there are several
       important APIs where only one or the other will do the right thing.   When  you  give  the
       wrong type of string to an API that wants the other type, you may end up with an exception
       being raised (UnicodeDecodeError or UnicodeEncodeError).  However, these exceptions aren’t
       always raised because python implicitly converts between types… sometimes.

   Frustration #1: Inconsistent Errors
       Although  converting  when  possible  seems  like the right thing to do, it’s actually the
       first source of frustration.  A programmer can test out their program with a string  like:
       The  quick brown fox jumped over the lazy dog and not encounter any issues.  But when they
       release their software into the wild, someone enters the string: I sat down for coffee  at
       the  café  and  suddenly an exception is thrown.  The reason?  The mechanism that converts
       between the two types is only  able  to  deal  with  ASCII  characters.   Once  you  throw
       non-ASCII  characters  into  your  strings,  you have to start dealing with the conversion
       manually.

       So, if I manually convert everything to either byte str or  unicode  strings,  will  I  be
       okay?  The answer is…. sometimes.

   Frustration #2: Inconsistent APIs
       The problem you run into when converting everything to byte str or unicode strings is that
       you’ll be using someone else’s API quite often (this  includes  the  APIs  in  the  python
       standard  library)  and find that the API will only accept byte str or only accept unicode
       strings.  Or worse, that the code will accept either when you’re dealing with strings that
       consist  solely of ASCII but throw an error when you give it a string that’s got non-ASCII
       characters.  When you encounter these APIs you first need to identify which type will work
       better  and  then you have to convert your values to the correct type for that code.  Thus
       the programmer that wants to proactively fix all unicode errors in their code needs to  do
       two things:

       1. You  must keep track of what type your sequences of text are.  Does my_sentence contain
          unicode or str?  If you don’t know that then you’re going to be in for a world of hurt.

       2. Anytime you call a function you need to evaluate whether  that  function  will  do  the
          right  thing  with  str or unicode values.  Sending the wrong value here will lead to a
          UnicodeError being thrown when the string contains non-ASCII characters.

       NOTE:
          There is one mitigating factor here.  The python community has  been  standardizing  on
          using unicode in all its APIs.  Although there are some APIs that you need to send byte
          str to in order to be safe, (including things as ubiquitous as print() as we’ll see  in
          the  next  section),  it’s  getting  easier and easier to use unicode strings with most
          APIs.

   Frustration #3: Inconsistent treatment of output
       Alright, since the python community is moving to  using  unicode  strings  everywhere,  we
       might  as  well  convert  everything  to  unicode  strings and use that by default, right?
       Sounds good most of the time but there’s at least one huge caveat to be aware of.  Anytime
       you  output  text  to  the terminal or to a file, the text has to be converted into a byte
       str.  Python will try to implicitly convert from unicode to byte str… but it will throw an
       exception if the bytes are non-ASCII:

          >>> string = unicode(raw_input(), 'utf8')
          café
          >>> log = open('/var/tmp/debug.log', 'w')
          >>> log.write(string)
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

       Okay, this is simple enough to solve:  Just convert to a byte str and we’re all set:

          >>> string = unicode(raw_input(), 'utf8')
          café
          >>> string_for_output = string.encode('utf8', 'replace')
          >>> log = open('/var/tmp/debug.log', 'w')
          >>> log.write(string_for_output)
          >>>

       So  that  was  simple,  right?  Well… there’s one gotcha that makes things a bit harder to
       debug sometimes.  When you attempt to write  non-ASCII  unicode  strings  to  a  file-like
       object  you  get  a  traceback  every  time.   But what happens when you use print()?  The
       terminal is a file-like object so it should raise an exception right?  The answer to  that
       is….  sometimes:

          $ python
          >>> print u'café'
          café

       No exception.  Okay, we’re fine then?

       We are until someone does one of the following:

       • Runs the script in a different locale:

            $ LC_ALL=C python
            >>> # Note: if you're using a good terminal program when running in the C locale
            >>> # The terminal program will prevent you from entering non-ASCII characters
            >>> # python will still recognize them if you use the codepoint instead:
            >>> print u'caf\xe9'
            Traceback (most recent call last):
              File "<stdin>", line 1, in <module>
            UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

       • Redirects output to a file:

            $ cat test.py
            #!/usr/bin/python -tt
            # -*- coding: utf-8 -*-
            print u'café'
            $ ./test.py  >t
            Traceback (most recent call last):
              File "./test.py", line 4, in <module>
                print u'café'
            UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

       Okay,  the  locale thing is a pain but understandable: the C locale doesn’t understand any
       characters outside of ASCII so naturally attempting to display those won’t work.  Now  why
       does  redirecting  to  a  file cause problems?  It’s because print() in python2 is treated
       specially.  Whereas the other file-like objects in python always convert to  ASCII  unless
       you  set  them up differently, using print() to output to the terminal will use the user’s
       locale to convert before sending  the  output  to  the  terminal.   When  print()  is  not
       outputting  to  the  terminal  (being redirected to a file, for instance), print() decides
       that it doesn’t know what locale to use for that file and so it tries to convert to  ASCII
       instead.

       So  what  does  this  mean  for  you,  as  a  programmer?   Unless  you have the luxury of
       controlling how your users use your code, you should always, always, always convert  to  a
       byte str before outputting strings to the terminal or to a file.  Python even provides you
       with a facility to do just this.  If you know that every unicode  string  you  send  to  a
       particular  file-like  object  (for  instance, stdout) should be converted to a particular
       encoding you can use a codecs.StreamWriter object to convert from a unicode string into  a
       byte  str.   In  particular, codecs.getwriter() will return a StreamWriter class that will
       help you to wrap a file-like object for output.  Using our print() example:

          $ cat test.py
          #!/usr/bin/python -tt
          # -*- coding: utf-8 -*-
          import codecs
          import sys

          UTF8Writer = codecs.getwriter('utf8')
          sys.stdout = UTF8Writer(sys.stdout)
          print u'café'
          $ ./test.py  >t
          $ cat t
          café

   Frustrations #4 and #5 – The other shoes
       In English, there’s a saying “waiting for the other shoe to drop”.  It means that when one
       event  (usually  bad)  happens,  you  come to expect another event (usually worse) to come
       after.  In this case we have two other shoes.

   Frustration #4: Now it doesn’t take byte strings?!
       If you wrap sys.stdout using codecs.getwriter() and think you are now safe  to  print  any
       variable  without  checking  its type I am afraid I must inform you that you’re not paying
       enough attention to Murphy’s Law.  The StreamWriter that codecs.getwriter() provides  will
       take  unicode strings and transform them into byte str before they get to sys.stdout.  The
       problem is if you give it something that’s already a byte str it tries to  transform  that
       as  well.   To  do  that  it  tries to turn the byte str you give it into unicode and then
       transform that back into a byte str… and since it uses the ASCII codec  to  perform  those
       conversions, chances are that it’ll blow up when making them:

          >>> import codecs
          >>> import sys
          >>> UTF8Writer = codecs.getwriter('utf8')
          >>> sys.stdout = UTF8Writer(sys.stdout)
          >>> print 'café'
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
            File "/usr/lib64/python2.6/codecs.py", line 351, in write
              data, consumed = self.encode(object, self.errors)
          UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

       To  work around this, kitchen provides an alternate version of codecs.getwriter() that can
       deal with both byte str and unicode strings.  Use  kitchen.text.converters.getwriter()  in
       place of the codecs version like this:

          >>> import sys
          >>> from kitchen.text.converters import getwriter
          >>> UTF8Writer = getwriter('utf8')
          >>> sys.stdout = UTF8Writer(sys.stdout)
          >>> print u'café'
          café
          >>> print 'café'
          café

   Frustration #5: Exceptions
       Okay,  so  we’ve  gotten  ourselves  this  far.  We convert everything to unicode strings.
       We’re aware that we need to convert back into byte str before we write  to  the  terminal.
       We’ve  worked  around the inability of the standard getwriter() to deal with both byte str
       and unicode strings.  Are we all set?  Well, there’s at least one  more  gotcha:   raising
       exceptions with a unicode message.  Take a look:

          >>> class MyException(Exception):
          >>>     pass
          >>>
          >>> raise MyException(u'Cannot do this')
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          __main__.MyException: Cannot do this
          >>> raise MyException(u'Cannot do this while at a café')
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          __main__.MyException:
          >>>

       No,  I  didn’t  truncate that last line; raising exceptions really cannot handle non-ASCII
       characters in a unicode string and will output an exception without  the  message  if  the
       message contains them.  What happens if we try to use the handy dandy getwriter() trick to
       work around this?

          >>> import sys
          >>> from kitchen.text.converters import getwriter
          >>> sys.stderr = getwriter('utf8')(sys.stderr)
          >>> raise MyException(u'Cannot do this')
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          __main__.MyException: Cannot do this
          >>> raise MyException(u'Cannot do this while at a café')
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          __main__.MyException>>>

       Not only did this also fail, it  even  swallowed  the  trailing  newline  that’s  normally
       there….  So  how  to  make this work?  Transform from unicode strings to byte str manually
       before outputting:

          >>> from kitchen.text.converters import to_bytes
          >>> raise MyException(to_bytes(u'Cannot do this while at a café'))
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          __main__.MyException: Cannot do this while at a café
          >>>

       WARNING:
          If you use codecs.getwriter() on sys.stderr, you’ll find that raising an exception with
          a byte str is broken by the default StreamWriter as well.  Don’t do that or you’ll have
          no way to output non-ASCII characters.  If you want to use  a  StreamWriter  to  encode
          other    things    on    stderr    while   still   having   working   exceptions,   use
          kitchen.text.converters.getwriter().

   Frustration #6: Inconsistent APIs Part deux
       Sometimes you do everything right in your code but other people’s code  fails  you.   With
       unicode  issues  this  happens more often than we want.  A glaring example of this is when
       you get values back from a function that aren’t consistently unicode string or byte str.

       An example from the python standard library is gettext.  The gettext functions are used to
       help  translate  messages that you display to users in the users’ native languages.  Since
       most languages contain letters outside of the ASCII range, the values  that  are  returned
       contain  unicode  characters.   gettext  provides  you  with ugettext() and ungettext() to
       return these translations as unicode strings and gettext(),  ngettext(),  lgettext(),  and
       lngettext()  to  return  them  as  encoded  byte  str.  Unfortunately, even though they’re
       documented to return only one type of string or the other, the implementation  has  corner
       cases where the wrong type can be returned.

       This means that even if you separate your unicode string and byte str correctly before you
       pass your strings to a gettext function, afterwards, you might have to check that you have
       the right sort of string type again.

       NOTE:
          kitchen.i18n  provides  alternate gettext translation objects that return only byte str
          or only unicode string.

   A few solutions
       Now that we’ve identified the issues, can we define a comprehensive strategy  for  dealing
       with them?

   Convert text at the border
       If  you  get  some  piece  of  text  from a library, read from a file, etc, turn it into a
       unicode string immediately.  Since python is moving in the direction  of  unicode  strings
       everywhere it’s going to be easier to work with unicode strings within your code.

       If your code is heavily involved with using things that are bytes, you can do the opposite
       and convert all text into byte str at the border and only convert to unicode when you need
       it for passing to another library or performing string operations on it.

       In  either  case, the important thing is to pick a default type for strings and stick with
       it throughout your code.  When you mix the types it becomes much easier to  operate  on  a
       string with a function that can only use the other type by mistake.

       NOTE:
          In  python3, the abstract unicode type becomes much more prominent.  The type named str
          is the equivalent of python2’s unicode and python3’s bytes type replaces python2’s str.
          Most  APIs  deal in the unicode type of string with just some pieces that are low level
          dealing with bytes.  The implicit conversions between bytes and unicode is removed  and
          whenever you want to make the conversion you need to do so explicitly.

   When the data needs to be treated as bytes (or unicode) use a naming convention
       Sometimes you’re converting nearly all of your data to unicode strings but you have one or
       two values where you have to keep byte str around.  This is often the case when  you  need
       to  use  the  value  verbatim with some external resource.  For instance, filenames or key
       values in a database.  When you do this, use a  naming  convention  for  the  data  you’re
       working  with  so you (and others reading your code later) don’t get confused about what’s
       being stored in the value.

       If you need both a textual string to present to the user and a byte  value  for  an  exact
       match,  consider  keeping both versions around.  You can either use two variables for this
       or a dict whose key is the byte value.

       NOTE:
          You can use the naming convention used in kitchen as a guide for implementing your  own
          naming convention.  It prefixes byte str variables of unknown encoding with b_ and byte
          str of known encoding with the encoding name like: utf8_.  If the default was to handle
          str and only keep a few unicode values, those variables would be prefixed with u_.

   When outputting data, convert back into bytes
       When  you  go  to send your data back outside of your program (to the filesystem, over the
       network, displaying to the user, etc) turn the data back into a byte str.  How you do this
       will  depend  on  the expected output format of the data.  For displaying to the user, you
       can use the user’s default encoding  using  locale.getpreferredencoding().   For  entering
       into a file, you’re best bet is to pick a single encoding and stick with it.

       WARNING:
          When   using   the   encoding   that   the   user   has   set   (for   instance,  using
          locale.getpreferredencoding(), remember that  they  may  have  their  encoding  set  to
          something  that  can’t  display  every  single  unicode character.  That means when you
          convert from unicode to a byte str you need to decide what should happen  if  the  byte
          value  is not valid in the user’s encoding.  For purposes of displaying messages to the
          user, it’s usually okay to use the  replace  encoding  error  handler  to  replace  the
          invalid  characters with a question mark or other symbol meaning the character couldn’t
          be displayed.

       You can use kitchen.text.converters.getwriter() to do this automatically  for  sys.stdout.
       When creating exception messages be sure to convert to bytes manually.

   When writing unittests, include non-ASCII values and both unicode and str type
       Unless you know that a specific portion of your code will only deal with ASCII, be sure to
       include non-ASCII values in your unittests.   Including  a  few  characters  from  several
       different  scripts  is  highly  advised  as  well because some code may have special cased
       accented roman characters but not know how to handle characters used in Asian alphabets.

       Similarly, unless you know that that portion of your  code  will  only  be  given  unicode
       strings  or  only byte str be sure to try variables of both types in your unittests.  When
       doing this, make  sure  that  the  variables  are  also  non-ASCII  as  python’s  implicit
       conversion  will  mask  problems  with  pure ASCII data.  In many cases, it makes sense to
       check what happens if byte str and unicode strings that won’t decode in the present locale
       are given.

   Be vigilant about spotting poor APIs
       Make  sure  that the libraries you use return only unicode strings or byte str.  Unittests
       can help you spot issues here by running many variations of data  through  your  functions
       and checking that you’re still getting the types of string that you expect.

   Example: Putting this all together with kitchen
       The  kitchen library provides a wide array of functions to help you deal with byte str and
       unicode strings in your program.  Here’s a short example that uses many kitchen  functions
       to do its work:

          #!/usr/bin/python -tt
          # -*- coding: utf-8 -*-
          import locale
          import os
          import sys
          import unicodedata

          from kitchen.text.converters import getwriter, to_bytes, to_unicode
          from kitchen.i18n import get_translation_object

          if __name__ == '__main__':
              # Setup gettext driven translations but use the kitchen functions so
              # we don't have the mismatched bytes-unicode issues.
              translations = get_translation_object('example')
              # We use _() for marking strings that we operate on as unicode
              # This is pretty much everything
              _ = translations.ugettext
              # And b_() for marking strings that we operate on as bytes.
              # This is limited to exceptions
              b_ = translations.lgettext

              # Setup stdout
              encoding = locale.getpreferredencoding()
              Writer = getwriter(encoding)
              sys.stdout = Writer(sys.stdout)

              # Load data.  Format is filename\0description
              # description should be utf-8 but filename can be any legal filename
              # on the filesystem
              # Sample datafile.txt:
              #   /etc/shells\x00Shells available on caf\xc3\xa9.lan
              #   /var/tmp/file\xff\x00File with non-utf8 data in the filename
              #
              # And to create /var/tmp/file\xff (under bash or zsh) do:
              #   echo 'Some data' > /var/tmp/file$'\377'
              datafile = open('datafile.txt', 'r')
              data = {}
              for line in datafile:
                  # We're going to keep filename as bytes because we will need the
                  # exact bytes to access files on a POSIX operating system.
                  # description, we'll immediately transform into unicode type.
                  b_filename, description = line.split('\0', 1)

                  # to_unicode defaults to decoding output from utf-8 and replacing
                  # any problematic bytes with the unicode replacement character
                  # We accept mangling of the description here knowing that our file
                  # format is supposed to use utf-8 in that field and that the
                  # description will only be displayed to the user, not used as
                  # a key value.
                  description = to_unicode(description, 'utf-8').strip()
                  data[b_filename] = description
              datafile.close()

              # We're going to add a pair of extra fields onto our data to show the
              # length of the description and the filesize.  We put those between
              # the filename and description because we haven't checked that the
              # description is free of NULLs.
              datafile = open('newdatafile.txt', 'w')

              # Name filename with a b_ prefix to denote byte string of unknown encoding
              for b_filename in data:
                  # Since we have the byte representation of filename, we can read any
                  # filename
                  if os.access(b_filename, os.F_OK):
                      size = os.path.getsize(b_filename)
                  else:
                      size = 0
                  # Because the description is unicode type,  we know the number of
                  # characters corresponds to the length of the normalized unicode
                  # string.
                  length = len(unicodedata.normalize('NFC', description))

                  # Print a summary to the screen
                  # Note that we do not let implici type conversion from str to
                  # unicode transform b_filename into a unicode string.  That might
                  # fail as python would use the ASCII filename.  Instead we use
                  # to_unicode() to explicitly transform in a way that we know will
                  # not traceback.
                  print _(u'filename: %s') % to_unicode(b_filename)
                  print _(u'file size: %s') % size
                  print _(u'desc length: %s') % length
                  print _(u'description: %s') % data[b_filename]

                  # First combine the unicode portion
                  line = u'%s\0%s\0%s' % (size, length, data[b_filename])
                  # Since the filenames are bytes, turn everything else to bytes before combining
                  # Turning into unicode first would be wrong as the bytes in b_filename
                  # might not convert
                  b_line = '%s\0%s\n' % (b_filename, to_bytes(line))

                  # Just to demonstrate that getwriter will pass bytes through fine
                  print b_('Wrote: %s') % b_line
                  datafile.write(b_line)
              datafile.close()

              # And just to show how to properly deal with an exception.
              # Note two things about this:
              # 1) We use the b_() function to translate the string.  This returns a
              #    byte string instead of a unicode string
              # 2) We're using the b_() function returned by kitchen.  If we had
              #    used the one from gettext we would need to convert the message to
              #    a byte str first
              message = u'Demonstrate the proper way to raise exceptions.  Sincerely,  \u3068\u3057\u304a'
              raise Exception(b_(message))

       SEE ALSO:
          kitchen.text.converters

   Designing Unicode Aware APIs
       APIs  that  deal with byte str and unicode strings are difficult to get right.  Here are a
       few strategies with pros and cons of each.

   Contents
       • Designing Unicode Aware APIs

         • Take either bytes or unicode, output only unicode

         • Take either bytes or unicode, output the same type

         • Separate functions

         • Deciding whether to take str or unicode when no value is returned

           • Writing to external data

           • Updating data structures

         • APIs to Avoid

           • Returning unicode unless a conversion fails

           • Ignoring values with no chance of recovery

           • Raising a UnicodeException with no chance of recovery

         • Knowing your data

           • Do you need to operate on both bytes and unicode?

           • Can you restrict the encodings?

             • Single byte encodings

             • Multibyte encodings

               • Fixed width

               • Variable Width

                 • ASCII compatible

                 • Escaped

                 • Other

   Take either bytes or unicode, output only unicode
       In this strategy, you allow the user to enter either unicode strings or byte str but  what
       you give back is always unicode.  This strategy is easy for novice endusers to start using
       immediately as they will be able to feed either type of string into the function  and  get
       back a string that they can use in other places.

       However,  it does lead to the novice writing code that functions correctly when testing it
       with ASCII-only data but fails when given data that contains non-ASCII characters.  Worse,
       if  your  API  is  not designed to be flexible, the consumer of your code won’t be able to
       easily correct those problems once they find them.

       Here’s a good API that uses this strategy:

          from kitchen.text.converters import to_unicode

          def truncate(msg, max_length, encoding='utf8', errors='replace'):
              msg = to_unicode(msg, encoding, errors)
              return msg[:max_length]

       The call to truncate() starts with the essential parameters for performing the  task.   It
       ends with two optional keyword arguments that define the encoding to use to transform from
       a byte str to unicode and the strategy to use if undecodable bytes are  encountered.   The
       defaults  may  vary  depending  on  the  use  cases  you have in mind.  When the output is
       generally going to be printed for the user to see, errors='replace' is a good default.  If
       you are constructing keys to a database, raisng an exception (with errors='strict') may be
       a better default.  In either case, having both parameters allows the person using your API
       to  choose how they want to handle any problems.  Having the values is also a clue to them
       that a conversion from byte str to unicode string is going to occur.

       NOTE:
          If you’re targeting python-3.1 and above,  errors='surrogateescape'  may  be  a  better
          default  than  errors='strict'.   You  need  to  be  mindful of a few things when using
          surrogateescape though:

          • surrogateescape will cause issues if a non-ASCII compatible  encoding  is  used  (for
            instance,  UTF-16  and  UTF-32.)   That makes it unhelpful in situations where a true
            general  purpose  method  of  encoding  must  be  found.   PEP  383   mentions   that
            surrogateescape  was  specifically designed with the limitations of translating using
            system locales (where ASCII compatibility is generally seen as  inescapable)  so  you
            should keep that in mind.

          • If  you  use  surrogateescape to decode from bytes to unicode you will need to use an
            error handler other than strict to encode as  the  lone  surrogate  that  this  error
            handler  creates  makes  for  invalid unicode that must be handled when encoding.  In
            Python-3.1.2 or less, a bug in the encoder error handlers mean that you can only  use
            surrogateescape to encode; anything else will throw an error.

          Evaluate your usages of the variables in question to see what makes sense.

       Here’s a bad example of using this strategy:

          from kitchen.text.converters import to_unicode

          def truncate(msg, max_length):
              msg = to_unicode(msg)
              return msg[:max_length]

       In  this example, we don’t have the optional keyword arguments for encoding and errors.  A
       user who uses this function is more likely to miss the fact that a  conversion  from  byte
       str  to  unicode is going to occur.  And once an error is reported, they will have to look
       through their backtrace and think harder about where they want  to  transform  their  data
       into unicode strings instead of having the opportunity to control how the conversion takes
       place in the function itself.  Note that the user does have the ability to make this  work
       by making the transformation to unicode themselves:

          from kitchen.text.converters import to_unicode

          msg = to_unicode(msg, encoding='euc_jp', errors='ignore')
          new_msg = truncate(msg, 5)

   Take either bytes or unicode, output the same type
       This strategy is sometimes called polymorphic because the type of data that is returned is
       dependent on the type of data that is received.  The concept is that when you are given  a
       byte  str  to  process,  you return a byte str in your output.  When you are given unicode
       strings to process, you return unicode strings in your output.

       This can work well for end users as the ones that know about the  difference  between  the
       two  string  types  will already have transformed the strings to their desired type before
       giving it to this function.  The ones that don’t can remain blissfully ignorant (at least,
       as far as your function is concerned) as the function does not change the type.

       In  cases  where  the  encoding of the byte str is known or can be discovered based on the
       input data this works well.  If you can’t figure out the  input  encoding,  however,  this
       strategy can fail in any of the following cases:

       1. It needs to do an internal conversion between byte str and unicode string.

       2. It cannot return the same data as either a unicode string or byte str.

       3. You may need to deal with byte strings that are not byte-compatible with ASCII

       First, a couple examples of using this strategy in a good way:

          def translate(msg, table):
              replacements = table.keys()
              new_msg = []
              for index, char in enumerate(msg):
                  if char in replacements:
                      new_msg.append(table[char])
                  else:
                      new_msg.append(char)

              return ''.join(new_msg)

       In  this  example,  all  of the strings that we use (except the empty string which is okay
       because it doesn’t have any characters to encode) come from outside of the function.   Due
       to  that, the user is responsible for making sure that the msg, and the keys and values in
       table all match in terms of type (unicode vs str) and encoding  (You  can  do  some  error
       checking  to  make  sure the user gave all the same type but you can’t do the same for the
       user giving different encodings).  You do not need to make  changes  to  the  string  that
       require you to know the encoding or type of the string; everything is a simple replacement
       of one element in the array of characters in message with the character in table.

          import json
          from kitchen.text.converters import to_unicode, to_bytes

          def first_field_from_json_data(json_string):
              '''Return the first field in a json data structure.

              The format of the json data is a simple list of strings.
              '["one", "two", "three"]'
              '''
              if isinstance(json_string, unicode):
                  # On all python versions, json.loads() returns unicode if given
                  # a unicode string
                  return json.loads(json_string)[0]

              # Byte str: figure out which encoding we're dealing with
              if '\x00' not in json_data[:2]
                  encoding = 'utf8'
              elif '\x00\x00\x00' == json_data[:3]:
                  encoding = 'utf-32-be'
              elif '\x00\x00\x00' == json_data[1:4]:
                  encoding = 'utf-32-le'
              elif '\x00' == json_data[0] and '\x00' == json_data[2]:
                  encoding = 'utf-16-be'
              else:
                  encoding = 'utf-16-le'

              data = json.loads(unicode(json_string, encoding))
              return data[0].encode(encoding)

       In this example the function takes either a byte str type or a unicode string that  has  a
       list  in  json format and returns the first field from it as the type of the input string.
       The first section of code is very straightforward; we receive a unicode string,  parse  it
       with  a function, and then return the first field from our parsed data (which our function
       returned to us as json data).

       The second portion that deals with byte str is not  so  straightforward.   Before  we  can
       parse  the string we have to determine what characters the bytes in the string map to.  If
       we didn’t do that, we wouldn’t be able to properly find which characters  are  present  in
       the  string.   In  order  to  do  that we have to figure out the encoding of the byte str.
       Luckily, the json specification states that all strings are unicode and encoded  with  one
       of  UTF32be, UTF32le, UTF16be, UTF16le, or UTF-8.  It further defines the format such that
       the first two characters are always ASCII.  Each of these  has  a  different  sequence  of
       NULLs  when  they encode an ASCII character.  We can use that to detect which encoding was
       used to create the byte str.

       Finally, we return the byte str by encoding the unicode back to a byte str.

       As you can see, in this example we have to convert from byte str to unicode and back.  But
       we  know  from  the  json specification that byte str has to be one of a limited number of
       encodings that we are able to detect.  That ability makes this strategy work.

       Now for some examples of using this strategy in ways that fail:

          import unicodedata
          def first_char(msg):
              '''Return the first character in a string'''
              if not isinstance(msg, unicode):
                  try:
                      msg = unicode(msg, 'utf8')
                  except UnicodeError:
                      msg = unicode(msg, 'latin1')
              msg = unicodedata.normalize('NFC', msg)
              return msg[0]

       If you look at that code and think that there’s something fragile and prone to breaking in
       the  try:  except:  block  you  are  correct  in being suspicious.  This code will fail on
       multi-byte character sets that aren’t UTF-8.  It can also fail on data where the  sequence
       of  bytes  is valid UTF-8 but the bytes are actually of a different encoding.  The reasons
       this code fails is that we don’t know what encoding the bytes are in  and  the  code  must
       convert from a byte str to a unicode string in order to function.

       In  order to make this code robust we must know the encoding of msg.  The only way to know
       that is to ask the user so the API must do that:

          import unicodedata
          def number_of_chars(msg, encoding='utf8', errors='strict'):
              if not isinstance(msg, unicode):
                  msg = unicode(msg, encoding, errors)
              msg = unicodedata.normalize('NFC', msg)
              return len(msg)

       Another example of failure:

          import os
          def listdir(directory):
              files = os.listdir(directory)
              if isinstance(directory, str):
                  return files
              # files could contain both bytes and unicode
              new_files = []
              for filename in files:
                  if not isinstance(filename, unicode):
                      # What to do here?
                      continue
                  new_files.appen(filename)
              return new_files

       This function illustrates the second failure mode.  Here, not all of the  possible  values
       can  be  represented  as  unicode  without  knowing more about the encoding of each of the
       filenames involved.  Since each filename could have a different  encoding  there’s  a  few
       different  options  to  pursue.   We could make this function always return byte str since
       that can accurately represent anything that could be  returned.   If  we  want  to  return
       unicode  we  need  to  at  least  allow the user to specify what to do in case of an error
       decoding the bytes to unicode.  We can also let the user specify the encoding to  use  for
       doing  the  decoding  but  that won’t help in all cases since not all files will be in the
       same encoding (or even necessarily in any encoding):

          import locale
          import os
          def listdir(directory, encoding=locale.getpreferredencoding(), errors='strict'):
              # Note: In python-3.1+, surrogateescape may be a better default
              files = os.listdir(directory)
              if isinstance(directory, str):
                  return files
              new_files = []
              for filename in files:
                  if not isinstance(filename, unicode):
                      filename = unicode(filename, encoding=encoding, errors=errors)
                  new_files.append(filename)
              return new_files

       Note that although we use errors in this example as what to pass to the codec that decodes
       to unicode we could also have an errors argument that decides other things to do like skip
       a  filename  entirely,  return  a  placeholder  (Nondisplayable  filename),  or  raise  an
       exception.

       This leaves us with one last failure to describe:

          def first_field(csv_string):
              '''Return the first field in a comma separated values string.'''
              try:
                  return csv_string[:csv_string.index(',')]
              except ValueError:
                  return csv_string

       This code looks simple enough.  The hidden error here is that we are searching for a comma
       character in a byte str but not all encodings will use  the  same  sequence  of  bytes  to
       represent  the  comma.   If  you  use  an encoding that’s not ASCII compatible on the byte
       level, then the literal comma ',' in the above code will match inappropriate bytes.   Some
       examples of how it can fail:

       • Will find the byte representing an ASCII comma in another character

       • Will find the comma but leave trailing garbage bytes on the end of the string

       • Will not match the character that represents the comma in this encoding

       There are two ways to solve this.  You can either take the encoding value from the user or
       you can take the separator value from the user.  Of the two, taking the  encoding  is  the
       better option for two reasons:

       1. Taking  a  separator argument doesn’t clearly document for the API user that the reason
          they must give it is to properly match the encoding of the csv_string.  They’re just as
          likely  to  think that it’s simply a way to specify an alternate character (like “:” or
          “|”) for the separator.

       2. It’s possible for a variable width  encoding  to  reuse  the  same  byte  sequence  for
          different characters in multiple sequences.

          NOTE:
             UTF-8  is  resistant  to  this  as any character’s sequence of bytes will never be a
             subset of another character’s sequence of bytes.

       With that in mind, here’s how to improve the API:

          def first_field(csv_string, encoding='utf-8', errors='replace'):
              if not isinstance(csv_string, unicode):
                  u_string = unicode(csv_string, encoding, errors)
                  is_unicode = False
              else:
                  u_string = csv_string

              try:
                  field = u_string[:U_string.index(u',')]
              except ValueError:
                  return csv_string

              if not is_unicode:
                  field = field.encode(encoding, errors)
              return field

       NOTE:
          If you decide you’ll never  encounter  a  variable  width  encoding  that  reuses  byte
          sequences you can use this code instead:

              def first_field(csv_string, encoding='utf-8'):
                  try:
                      return csv_string[:csv_string.index(','.encode(encoding))]
                  except ValueError:
                      return csv_string

   Separate functions
       Sometimes  you want to be able to take either byte str or unicode strings, perform similar
       operations on either one and then return data in the same format as was  given.   Probably
       the  easiest  way  to  do  that  is to have separate functions for each and adopt a naming
       convention to show that one is for working with byte str and the other is for working with
       unicode strings:

          def translate_b(msg, table):
              '''Replace values in str with other byte values like unicode.translate'''
              if not isinstance(msg, str):
                  raise TypeError('msg must be of type str')
              str_table = [chr(s) for s in xrange(0,256)]
              delete_chars = []
              for chr_val in (k for k in table.keys() if isinstance(k, int)):
                  if chr_val > 255:
                      raise ValueError('Keys in table must not exceed 255)')
                  if table[chr_val] == None:
                      delete_chars.append(chr(chr_val))
                  elif isinstance(table[chr_val], int):
                      if table[chr_val] > 255:
                          raise TypeError('table values cannot be more than 255 or less than 0')
                      str_table[chr_val] = chr(table[chr_val])
                  else:
                      if not isinstance(table[chr_val], str):
                          raise TypeError('character mapping must return integer, None or str')
                      str_table[chr_val] = table[chr_val]
              str_table = ''.join(str_table)
              delete_chars = ''.join(delete_chars)
              return msg.translate(str_table, delete_chars)

          def translate(msg, table):
              '''Replace values in a unicode string with other values'''
              if not isinstance(msg, unicode):
                  raise TypeError('msg must be of type unicode')
              return msg.translate(table)

       There’s several things that we have to do in this API:

       • Because the function names might not be enough of a clue to the user of the functions of
         the value types that are expected, we have to check that the types are correct.

       • We keep the behaviour of the two functions as close to the same as possible,  just  with
         byte str and unicode strings substituted for each other.

   Deciding whether to take str or unicode when no value is returned
       Not  all  functions  have  a return value.  Sometimes a function is there to interact with
       something external to python, for instance, writing a file out to disk or a method  exists
       to  update  the  internal state of a data structure.  One of the main questions with these
       APIs is whether to take byte str, unicode string, or both.  The answer depends on your use
       case but I’ll give some examples here.

   Writing to external data
       When  your information is going to an external data source like writing to a file you need
       to decide whether to take in unicode strings or byte str.   Remember  that  most  external
       data sources are not going to be dealing with unicode directly.  Instead, they’re going to
       be dealing with a sequence of bytes that may be interpreted  as  unicode.   With  that  in
       mind, you either need to have the user give you a byte str or convert to a byte str inside
       the function.

       Next you need to think about the type of data that  you’re  receiving.   If  it’s  textual
       data,  (for  instance,  this  is  a  chat client and the user is typing messages that they
       expect to be read by another person) it probably makes sense to take  in  unicode  strings
       and  do  the conversion inside your function.  On the other hand, if this is a lower level
       function that’s passing data into a network socket, it probably should be taking byte  str
       instead.

       Just  as  noted in the API notes above, you should specify an encoding and errors argument
       if you need to transform from unicode string to byte str and you are unable to  guess  the
       encoding from the data itself.

   Updating data structures
       Sometimes  your  API  is  just going to update a data structure and not immediately output
       that data anywhere.  Just as when writing external data, you should think about both  what
       your function is going to do with the data eventually and what the caller of your function
       is thinking that they’re giving you.  Most of  the  time,  you’ll  want  to  take  unicode
       strings  and  enter  them  into  the data structure as unicode when the data is textual in
       nature.  You’ll want to take byte str and enter them into the data structure as  byte  str
       when the data is not text.  Use a naming convention so the user knows what’s expected.

   APIs to Avoid
       There  are  a few APIs that are just wrong.  If you catch yourself making an API that does
       one of these things, change it before anyone sees your code.

   Returning unicode unless a conversion fails
       This type of API usually deals with byte str at some point  and  converts  it  to  unicode
       because  it’s usually thought to be text.  However, there are times when the bytes fail to
       convert to a unicode string.  When that happens, this API returns the raw byte str instead
       of  a  unicode  string.   One  example  of this is present in the python standard library:
       python2’s os.listdir():

          >>> import os
          >>> import locale
          >>> locale.getpreferredencoding()
          'UTF-8'
          >>> os.mkdir('/tmp/mine')
          >>> os.chdir('/tmp/mine')
          >>> open('nonsense_char_\xff', 'w').close()
          >>> open('all_ascii', 'w').close()
          >>> os.listdir(u'.')
          [u'all_ascii', 'nonsense_char_\xff']

       The problem with APIs like this is that they cause failures that are hard to debug because
       they  don’t  happen  where  the  variables  are set.  For instance, let’s say you take the
       filenames from os.listdir() and give it to this function:

          def normalize_filename(filename):
              '''Change spaces and dashes into underscores'''
              return filename.translate({ord(u' '):u'_', ord(u' '):u'_'})

       When you test this, you use filenames that all are decodable in  your  preferred  encoding
       and  everything  seems to work.  But when this code is run on a machine that has filenames
       in multiple encodings the filenames returned by os.listdir() suddenly  include  byte  str.
       And  byte str has a different string.translate() function that takes different values.  So
       the code raises an exception where it’s not immediately obvious that  os.listdir()  is  at
       fault.

   Ignoring values with no chance of recovery
       An  early  version of python3 attempted to fix the os.listdir() problem pointed out in the
       last section by returning all values that were  decodable  to  unicode  and  omitting  the
       filenames that were not.  This lead to the following output:

          >>> import os
          >>> import locale
          >>> locale.getpreferredencoding()
          'UTF-8'
          >>> os.mkdir('/tmp/mine')
          >>> os.chdir('/tmp/mine')
          >>> open(b'nonsense_char_\xff', 'w').close()
          >>> open('all_ascii', 'w').close()
          >>> os.listdir('.')
          ['all_ascii']

       The  issue  with this type of code is that it is silently doing something surprising.  The
       caller expects to get a full list of files back from os.listdir().  Instead,  it  silently
       ignores  some  of  the files, returning only a subset.  This leads to code that doesn’t do
       what is expected that may go unnoticed until the code is in production and someone notices
       that something important is being missed.

   Raising a UnicodeException with no chance of recovery
       Believe it or not, a few libraries exist that make it impossible to deal with unicode text
       without raising a UnicodeError.  What seems to  occur  in  these  libraries  is  that  the
       library has functions that expect to receive a unicode string.  However, internally, those
       functions call other functions that expect to receive a byte str.  The programmer  of  the
       API  was smart enough to convert from a unicode string to a byte str but they did not give
       the user the chance to specify the encodings to use or how  to  deal  with  errors.   This
       results  in  exceptions  when  the  user passes in a byte str because the initial function
       wants a unicode string and exceptions when the user passes in a unicode string because the
       function can’t convert the string to bytes in the encoding that it’s selected.

       Do  not  put  the user in the position of not being able to use your API without raising a
       UnicodeError with certain values.  If you can only safely take unicode  strings,  document
       that byte str is not allowed and vice versa.  If you have to convert internally, make sure
       to give the caller of your function parameters to control the encoding and  how  to  treat
       errors  that  may  occur  during the encoding/decoding process.  If your code will raise a
       UnicodeError with non-ASCII values no matter what, you should probably rethink your API.

   Knowing your data
       If you’ve read all the way down to this  section  without  skipping  you’ve  seen  several
       admonitions  about  the  type  of  data  you are processing affecting the viability of the
       various API choices.

       Here’s a few things to consider in your data:

   Do you need to operate on both bytes and unicode?
       Much of the data in libraries, programs, and the general environment outside of python  is
       written  where  strings  are sequences of bytes.  So when we interact with data that comes
       from outside of python or data that is about to leave python it may  make  sense  to  only
       operate on the data as a byte str.  There’s two times when this may make sense:

       1. The  user is intended to hand the data to the function and then the function takes care
          of sending the data outside of python (to the filesystem, over the network, etc).

       2. The data is not representable as text.  For instance, writing a binary file format.

       Even when your code is operating in this area you still need to think a little more  about
       your  data.   For  instance,  it might make sense for the person using your API to pass in
       unicode strings and let the function convert that into the byte str  that  it  then  sends
       over the wire.

       There are also times when it might make sense to operate only on unicode strings.  unicode
       represents text so anytime that you are working on textual data that isn’t going to  leave
       python  it  has  the potential to be a unicode-only API.  However, there’s two things that
       you should consider when designing a unicode-only API:

       1. As your API gains popularity, people are going to use your API in places that  you  may
          not have thought of.  Corner cases in these other places may mean that processing bytes
          is desirable.

       2. In python2, byte str and unicode are often used interchangeably with each other.   That
          means  that  people  programming against your API may have received str from some other
          API and it would be most convenient for their code if your API accepted it.

       NOTE:
          In python3, the separation between the text type and the byte type are more clear.   So
          in python3, there’s less need to have all APIs take both unicode and bytes.

   Can you restrict the encodings?
       If  you  determine  that  you  have  to deal with byte str you should realize that not all
       encodings are created equal.  Each has different properties that may make it  possible  to
       provide  a  simpler  API  provided that you can reasonably tell the users of your API that
       they cannot use certain classes of encodings.

       As one example, if you are required to find a comma (,) in a byte str you  have  different
       choices  based  on  what  encodings  are allowed.  If you can reasonably restrict your API
       users to only giving ASCII compatible encodings you can do this simply  by  searching  for
       the  literal  comma  character because that character will be represented by the same byte
       sequence in all ASCII compatible encodings.

       The following are some classes of encodings to be aware of as you decide how generic  your
       code needs to be.

   Single byte encodings
       Single  byte  encodings  can  only  represent  256 total characters.  They encode the code
       points for a character to the equivalent number in a single byte.

       Most single byte encodings are ASCII compatible.  ASCII compatible encodings are the  most
       likely  to be usable without changes to code so this is good news.  A notable exception to
       this is the EBDIC family of encodings.

   Multibyte encodings
       Multibyte encodings use more than one byte to encode some characters.

   Fixed width
       Fixed width encodings have a set number of bytes to represent all of the characters in the
       character  set.   UTF-32  is an example of a fixed width encoding that uses four bytes per
       character and can express every unicode characters.  There are a number of  problems  with
       writing APIs that need to operate on fixed width, multibyte characters.  To go back to our
       earlier example of finding a comma in a string, we have to realize  that  even  in  UTF-32
       where  the  code point for ASCII characters is the same as in ASCII, the byte sequence for
       them is different.  So you cannot search for the literal byte character as it may pick  up
       false positives and may break a byte sequence in an odd place.

   Variable Width
   ASCII compatible
       UTF-8  and  the  EUC  family  of  encodings  are  examples  of ASCII compatible multi-byte
       encodings.  They achieve this by adhering to two principles:

       • All of the ASCII characters are represented by the byte  that  they  are  in  the  ASCII
         encoding.

       • None  of  the ASCII byte sequences are reused in any other byte sequence for a different
         character.

   Escaped
       Some multibyte encodings work by using only bytes from  the  ASCII  encoding  but  when  a
       particular  sequence  of  those  byes  is found, they are interpreted as meaning something
       other than their ASCII values.  UTF-7 is one such encoding that  can  encode  all  of  the
       unicode code points.  For instance, here’s a some Japanese characters encoded as UTF-7:

          >>> a = u'\u304f\u3089\u3068\u307f'
          >>> print a
          くらとみ
          >>> print a.encode('utf-7')

          +ME8wiTBoMH8-
       These  encodings  can  be  used  when  you  need  to  encode unicode data that may contain
       non-ASCII characters for inclusion in an ASCII only transport medium or file.

       However, they are not ASCII compatible in the sense that we used earlier as the bytes that
       represent  a ASCII character are being reused as part of other characters.  If you were to
       search for a literal plus sign in this encoded string, you would  run  across  many  false
       positives, for instance.

   Other
       There  are many other popular variable width encodings, for instance UTF-16 and shift-JIS.
       Many of these are not ASCII compatible so you cannot search for a literal ASCII  character
       without danger of false positives or false negatives.

   Kitchen API
       Kitchen  is  structured as a collection of modules.  In its current configuration, Kitchen
       ships with the following modules.  Other addon modules that may drag in more  dependencies
       can be found on the project webpage

   Kitchen.i18n Module
       I18N  is an important piece of any modern program.  Unfortunately, setting up i18n in your
       program is often a confusing process.   The  functions  provided  here  aim  to  make  the
       programming side of that a little easier.

       Most projects will be able to do something like this when they startup:

          # myprogram/__init__.py:

          import os
          import sys

          from kitchen.i18n import easy_gettext_setup

          _, N_  = easy_gettext_setup('myprogram', localedirs=(
                  os.path.join(os.path.realpath(os.path.dirname(__file__)), 'locale'),
                  os.path.join(sys.prefix, 'lib', 'locale')
                  ))

       Then, in other files that have strings that need translating:

          # myprogram/commands.py:

          from myprogram import _, N_

          def print_usage():
              print _(u"""available commands are:
              --help              Display help
              --version           Display version of this program
              --bake-me-a-cake    as fast as you can
                  """)

          def print_invitations(age):
              print _('Please come to my party.')
              print N_('I will be turning %(age)s year old',
                  'I will be turning %(age)s years old', age) % {'age': age}

       See  the  documentation  of  easy_gettext_setup()  and  get_translation_object()  for more
       details.

          SEE ALSO:

              gettext
                     for details of how the python gettext facilities work

              babel  The babel module for in depth information on gettext, message catalogs,  and
                     translating  your app.  babel provides some nice features for i18n on top of
                     gettext

   Functions
       easy_gettext_setup() should satisfy the needs of most users.  get_translation_object()  is
       designed to ease the way for anyone that needs more control.

       kitchen.i18n.easy_gettext_setup(domain, localedirs=(), use_unicode=True)
              Setup translation functions for an application

              Parameters

                     • domain  –  Name  of the message domain.  This should be a unique name that
                       can be used to lookup the message catalog for this app.

                     • localedirs – Iterator of directories to look for message  catalogs  under.
                       The  first  directory  to exist is used regardless of whether messages for
                       this domain are present.  If none of the directories  exist,  fallback  on
                       sys.prefix  +  /share/locale  Default: No directories to search so we just
                       use the fallback.

                     • use_unicode – If True return the gettext  functions  for  unicode  strings
                       else  return  the functions for byte str for the translations.  Default is
                       True.

              Returns
                     tuple of the gettext function and gettext function for plurals

              Setting up gettext can be a little tricky because of lack of  documentation.   This
              function  will  setup  gettext   using the Class-based API for you.  For the simple
              case, you can use the default arguments and call it like this:

                 _, N_ = easy_gettext_setup()

              This will get you two functions, _() and N_() that you can use to mark  strings  in
              your  code  for  translation.  _() is used to mark strings that don’t need to worry
              about plural forms no matter what the value of the variable is.  N_()  is  used  to
              mark  strings  that do need to have a different form if a variable in the string is
              plural.

              SEE ALSO:

                 api-i18n
                        This module’s documentation has examples of using _() and N_()

                 get_translation_object()
                        for information on how to  use  localedirs  to  get  the  proper  message
                        catalogs  both  when  in  development and when installed to FHS compliant
                        directories on Linux.

              NOTE:
                 The gettext functions returned from this function should be superior to the ones
                 returned  from  gettext.   The traits that make them better are described in the
                 DummyTranslations and NewGNUTranslations documentation.

              Changed   in   version   kitchen-0.2.4:   ;   API   kitchen.i18n   2.0.0    Changed
              easy_gettext_setup()  to return the lgettext functions instead of gettext functions
              when use_unicode=False.

       kitchen.i18n.get_translation_object(domain,  localedirs=(),  languages=None,  class_=None,
       fallback=True, codeset=None, python2_api=True)
              Get a translation object bound to the message catalogs

              Parameters

                     • domain  –  Name  of the message domain.  This should be a unique name that
                       can be used to lookup the message catalog for this app or library.

                     • localedirs – Iterator of directories to look for message  catalogs  under.
                       The  directories  are searched in order for message catalogs.  For each of
                       the directories searched, we check for message catalogs  in  any  language
                       specified  in:attr:languages.  The message catalogs are used to create the
                       Translation object that we return.  The Translation object will attempt to
                       lookup  the  msgid  in  the  first  catalog that we found.  If it’s not in
                       there, it will go through each subsequent catalog  looking  for  a  match.
                       For  this  reason,  the  order  in which you specify the localedirs may be
                       important.   If  no  message  catalogs  are   found,   either   return   a
                       DummyTranslations  object  or  raise  an IOError depending on the value of
                       fallback.    Rhe   default    localedir    from     gettext    which    is
                       os.path.join(sys.prefix, 'share', 'locale') on Unix is implicitly appended
                       to the localedirs, making it the last directory searched.

                     • languages –

                       Iterator of language codes to check for message catalogs.  If unspecified,
                       the user’s locale settings will be used.

                       SEE ALSO:
                          gettext.find() for information on what environment variables are used.

                     • class  –  The  class  to  use  to  extract  translations  from the message
                       catalogs.  Defaults to NewGNUTranslations.

                     • fallback – If set to data:False, raise an IOError if no  message  catalogs
                       are found.  If True, the default, return a DummyTranslations object.

                     • codeset  –  Set  the  character  encoding  to  use when returning byte str
                       objects.   This  is  equivalent  to  calling   output_charset()   on   the
                       Translations object that is returned from this function.

                     • python2_api  –  When  data:True (default), return Translation objects that
                       use the python2 gettext api (gettext() and  lgettext()  return  byte  str.
                       ugettext()  exists  and  returns  unicode  strings).   When  False, return
                       Translation objects that use the  python3  gettext  api  (gettext  returns
                       unicode strings and lgettext returns byte str.  ugettext does not exist.)

              Returns
                     Translation object to get gettext methods from

              If you need more flexibility than easy_gettext_setup(), use this function.  It sets
              up a gettext Translation object and returns it to you.  Then you can access any  of
              the  methods  of  the  object  that  you  need  directly.   For  instance,  if  you
              specifically need to access lgettext():

                 translations = get_translation_object('foo')
                 translations.lgettext('My Message')

              This function is similar to the python standard library  gettext.translation()  but
              makes it better in two ways

              1.

                 It returns NewGNUTranslations or DummyTranslations
                        objects  by  default.   These are superior to the gettext.GNUTranslations
                        and gettext.NullTranslations objects because they are consistent  in  the
                        string  type  they  return and they fix several issues that can cause the
                        python standard library objects to throw UnicodeError.

              2.

                 This function takes multiple directories to search for
                        message catalogs.

              The latter is important when setting up gettext in a portable manner.  There is not
              a  common  directory for translations across operating systems so one needs to look
              in multiple directories for the translations.  get_translation_object() is able  to
              handle that if you give it a list of directories to search for catalogs:

                 translations = get_translation_object('foo', localedirs=(
                      os.path.join(os.path.realpath(os.path.dirname(__file__)), 'locale'),
                      os.path.join(sys.prefix, 'lib', 'locale')))

              This will search for several different directories:

              1. A  directory  named  locale  in  the  same  directory  as the module that called
                 get_translation_object(),

              2. In /usr/lib/locale

              3. In /usr/share/locale (the fallback directory)

              This allows gettext to work on  Windows  and  in  development  (where  the  message
              catalogs  are  typically  in the toplevel module directory) and also when installed
              under Linux (where the message catalogs are installed in  /usr/share/locale).   You
              (or   the   system   packager)  just  need  to  install  the  message  catalogs  in
              /usr/share/locale and remove the locale directory from  the  module  to  make  this
              work.  ie:

                 In development:
                     ~/foo   # Toplevel module directory
                     ~/foo/__init__.py
                     ~/foo/locale    # With message catalogs below here:
                     ~/foo/locale/es/LC_MESSAGES/foo.mo

                 Installed on Linux:
                     /usr/lib/python2.7/site-packages/foo
                     /usr/lib/python2.7/site-packages/foo/__init__.py
                     /usr/share/locale/  # With message catalogs below here:
                     /usr/share/locale/es/LC_MESSAGES/foo.mo

              NOTE:
                 This  function  will  setup Translation objects that attempt to lookup msgids in
                 all of the found message catalogs.  This means if you have several  versions  of
                 the  message  catalogs  installed  in  different  directories  that the function
                 searches, you need to make sure that localedirs  specifies  the  directories  so
                 that  newer  message catalogs are searched first.  It also means that if a newer
                 catalog does not contain a translation for a msgid but an older  one  that’s  in
                 localedirs does, the translation from that older catalog will be returned.

              Changed  in  version kitchen-1.1.0: ; API kitchen.i18n 2.1.0 Add more parameters to
              get_translation_object() so it can  more  easily  be  used  as  a  replacement  for
              gettext.translation().   Also  change  the way we use localedirs.  We cycle through
              them until we find a suitable locale file rather than simply cycling through  until
              we  find  a  directory  that  exists.   The new code is based heavily on the python
              standard library gettext.translation() function.

              Changed  in  version  kitchen-1.2.0:  ;  API  kitchen.i18n  2.2.0  Add  python2_api
              parameter

   Translation Objects
       The standard translation objects from the gettext module suffer from several problems:

       • They can throw UnicodeError

       • They can’t find translations for non-ASCII byte str messages

       • They may return either unicode string or byte str from the same function even though the
         functions say they will only return unicode or only return byte str.

       DummyTranslations and NewGNUTranslations were written to fix these issues.

       class kitchen.i18n.DummyTranslations(fp=None, python2_api=True)
              Safer version of gettext.NullTranslations

              This Translations class doesn’t translate the strings and is intended to be used as
              a  fallback  when  there  were  errors setting up a real Translations object.  It’s
              safer than gettext.NullTranslations in its handling of byte str vs unicode strings.

              Unlike NullTranslations, this Translation class will never  throw  a  UnicodeError.
              The  code  that  you  have  around  a  call  to  DummyTranslations  might  throw  a
              UnicodeError but at least that will be in code you  control  and  can  fix.   Also,
              unlike  NullTranslations  all  of  this  Translation  object’s methods guarantee to
              return byte str except for ugettext() and ungettext()  which  guarantee  to  return
              unicode strings.

              When  byte  str  are  returned,  the  strings  will  be  encoded  according to this
              algorithm:

              1. If a fallback has been added, the fallback will be called first.  You’ll need to
                 consult the fallback to see whether it performs any encoding changes.

              2. If a byte str was given, the same byte str will be returned.

              3. If  a  unicode string was given and set_output_charset() has been called then we
                 encode the string using the output_charset

              4. If a unicode string was given and this is gettext() or ngettext()  and  _charset
                 was set output in that charset.

              5. If  a  unicode string was given and this is gettext() or ngettext() we encode it
                 using ‘utf-8’.

              6. If a unicode string was given and this is lgettext() or  lngettext()  we  encode
                 using the value of locale.getpreferredencoding()

              For  ugettext()  and  ungettext(),  we  go  through  the same set of steps with the
              following differences:

              • We transform byte str into unicode strings for these methods.

              • The encoding used to decode the byte str is taken from input_charset if it’s set,
                otherwise we decode using UTF-8.

              input_charset
                     is  an  extension to the python standard library gettext that specifies what
                     charset a message is encoded in when decoding a message to unicode.  This is
                     used for two purposes:

              1. If  the  message  string  is  a byte str, this is used to decode the string to a
                 unicode string before looking it up in the message catalog.

              2. In ugettext() and ungettext() methods, if a byte str is given as the message and
                 is  untranslated this is used as the encoding when decoding to unicode.  This is
                 different from _charset which may be  set  when  a  message  catalog  is  loaded
                 because  input_charset  is  used to describe an encoding used in a python source
                 file while _charset describes the encoding used in the message catalog file.

              Any characters that aren’t able to be transformed from a byte str to unicode string
              or  vice  versa  will be replaced with a replacement character (ie: u'�' in unicode
              based encodings, '?' in other ASCII compatible encodings).

              SEE ALSO:

                 gettext.NullTranslations
                        For information about what methods are available and what they do.

              Changed in version kitchen-1.1.0: ;  API  kitchen.i18n  2.1.0  *  Although  we  had
              adapted gettext(), ngettext(),
                lgettext(), and lngettext() to always return byte
                str, we hadn’t forced those byte str to always be
                in a specified charset.  We now make sure that gettext() and
                ngettext() return byte str encoded using
                output_charset if set, otherwise charset and if
                neither of those, UTF-8.  With lgettext() and
                lngettext() output_charset if set, otherwise
                locale.getpreferredencoding().   *  Make setting input_charset and output_charset
              also
                set those attributes on any fallback translation objects.

              Changed  in  version  kitchen-1.2.0:  ;  API  kitchen.i18n  2.2.0  Add  python2_api
              parameter to __init__()

              set_output_charset(charset)
                     Set the output charset

                     This         serves         two         purposes.          The        normal
                     gettext.NullTranslations.set_output_charset() does not  set  the  output  on
                     fallback  objects.   On  python-2.3,  gettext.NullTranslations objects don’t
                     contain this method.

       class kitchen.i18n.NewGNUTranslations(fp=None, python2_api=True)
              Safer version of gettext.GNUTranslations

              gettext.GNUTranslations suffers from two problems that this class fixes.

              1. gettext.GNUTranslations      can       throw       a       UnicodeError       in
                 gettext.GNUTranslations.ugettext() if the message being translated has non-ASCII
                 characters and there is no translation for it.

              2. gettext.GNUTranslations       can       return       byte        str        from
                 gettext.GNUTranslations.ugettext()  and unicode strings from the other gettext()
                 methods if the message being translated is the wrong type

              When byte str  are  returned,  the  strings  will  be  encoded  according  to  this
              algorithm:

              1. If a fallback has been added, the fallback will be called first.  You’ll need to
                 consult the fallback to see whether it performs any encoding changes.

              2. If a byte str was given, the same byte str will be returned.

              3. If a unicode string was given and set_output_charset() has been called  then  we
                 encode the string using the output_charset

              4. If  a unicode string was given and this is gettext() or ngettext() and a charset
                 was detected when parsing the message catalog, output in that charset.

              5. If a unicode string was given and this is gettext() or ngettext() we  encode  it
                 using UTF-8.

              6. If  a  unicode  string was given and this is lgettext() or lngettext() we encode
                 using the value of locale.getpreferredencoding()

              For ugettext() and ungettext(), we go through  the  same  set  of  steps  with  the
              following differences:

              • We transform byte str into unicode strings for these methods.

              • The encoding used to decode the byte str is taken from input_charset if it’s set,
                otherwise we decode using UTF-8

              input_charset
                     an extension to the python standard  library  gettext  that  specifies  what
                     charset a message is encoded in when decoding a message to unicode.  This is
                     used for two purposes:

              1. If the message string is a byte str, this is used to  decode  the  string  to  a
                 unicode string before looking it up in the message catalog.

              2. In ugettext() and ungettext() methods, if a byte str is given as the message and
                 is untranslated his is used as the encoding when decoding to unicode.   This  is
                 different  from the _charset parameter that may be set when a message catalog is
                 loaded because input_charset is used to describe an encoding used  in  a  python
                 source  file  while  _charset describes the encoding used in the message catalog
                 file.

              Any characters that aren’t able to be transformed from a byte str to unicode string
              or  vice  versa  will be replaced with a replacement character (ie: u'�' in unicode
              based encodings, '?' in other ASCII compatible encodings).

              SEE ALSO:

                 gettext.GNUTranslations.gettext
                        For information about what methods this class has and what they do

              Changed in version kitchen-1.1.0: ; API kitchen.i18n 2.1.0 Although we had  adapted
              gettext(),  ngettext(),  lgettext(),  and lngettext() to always return byte str, we
              hadn’t forced those byte str to always be in a specified charset.  We now make sure
              that  gettext() and ngettext() return byte str encoded using output_charset if set,
              otherwise charset and if neither of those, UTF-8.  With lgettext() and  lngettext()
              output_charset if set, otherwise locale.getpreferredencoding().

   Kitchen.text: unicode and utf8 and xml oh my!
       The kitchen.text module contains functions that deal with text manipulation.

   Kitchen.text.converters
       Functions to handle conversion of byte str and unicode strings.

       Changed in version kitchen: 0.2a2 ; API kitchen.text 2.0.0 Added getwriter()

       Changed  in version kitchen: 0.2.2  ; API kitchen.text 2.1.0 Added exception_to_unicode(),
       exception_to_bytes(), EXCEPTION_CONVERTERS, and BYTE_EXCEPTION_CONVERTERS

       Changed   in   version   kitchen:   1.0.1   ;   API    kitchen.text    2.1.1    Deprecated
       BYTE_EXCEPTION_CONVERTERS     as     we’ve     simplified    exception_to_unicode()    and
       exception_to_bytes() to make it unnecessary

   Byte Strings and Unicode in Python2
       Python2 has two string types, str and unicode.  unicode represents an abstract sequence of
       text  characters.  It can hold any character that is present in the unicode standard.  str
       can hold any byte of data.  The operating system and python work together to display these
       bytes  as characters in many cases but you should always keep in mind that the information
       is really a sequence of bytes, not a sequence of characters.  In python2 these  types  are
       interchangeable  a  large amount of the time.  They are one of the few pairs of types that
       automatically convert when used in equality:

          >>> # string is converted to unicode and then compared
          >>> "I am a string" == u"I am a string"
          True
          >>> # Other types, like int, don't have this special treatment
          >>> 5 == "5"
          False

       However, this automatic conversion tends to lull people into a false  sense  of  security.
       As  long  as  you’re  dealing with ASCII characters the automatic conversion will save you
       from seeing any differences.  Once you start using characters that are not in  ASCII,  you
       will  start  getting  UnicodeError and UnicodeWarning as the automatic conversions between
       the types fail:

          >>> "I am an ñ" == u"I am an ñ"
          __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
          False

       Why do these conversions fail?  The reason is that the python2 unicode type represents  an
       abstract  sequence  of  unicode text known as code points.  str, on the other hand, really
       represents a sequence of bytes.  Those bytes are converted by  your  operating  system  to
       appear  as  characters  on your screen using a particular encoding (usually with a default
       defined by the operating system and customizable by the individual user.)  Although  ASCII
       characters  are  fairly standard in what bytes represent each character, the bytes outside
       of the ASCII range are not.  In general, each encoding will map a different character to a
       particular  byte.   Newer encodings map individual characters to multiple bytes (which the
       older encodings will instead  treat  as  multiple  characters).   In  the  face  of  these
       differences,  python  refuses  to  guess  at  an  encoding and instead issues a warning or
       exception and refuses to convert.

       SEE ALSO:

          overcoming-frustration
                 For a longer introduction on this subject.

   Strategy for Explicit Conversion
       So what is the best method of dealing with this weltering babble of incoherent  encodings?
       The basic strategy is to explicitly turn everything into unicode when it first enters your
       program.  Then, when you send it to output, you can transform the unicode back into bytes.
       Doing  this allows you to control the encodings that are used and avoid getting tracebacks
       due to UnicodeError. Using the functions defined in this module, that looks something like
       this:

          >>> from kitchen.text.converters import to_unicode, to_bytes
          >>> name = raw_input('Enter your name: ')
          Enter your name: Toshio くらとみ
          >>> name
          'Toshio \xe3\x81\x8f\xe3\x82\x89\xe3\x81\xa8\xe3\x81\xbf'
          >>> type(name)
          <type 'str'>
          >>> unicode_name = to_unicode(name)
          >>> type(unicode_name)
          <type 'unicode'>
          >>> unicode_name
          u'Toshio \u304f\u3089\u3068\u307f'
          >>> # Do a lot of other things before needing to save/output again:
          >>> output = open('datafile', 'w')
          >>> output.write(to_bytes(u'Name: %s\\n' % unicode_name))

       A few notes:

       Looking  at line 6, you’ll notice that the input we took from the user was a byte str.  In
       general, anytime we’re getting a value from outside of  python  (The  filesystem,  reading
       data  from  the  network,  interacting  with  an external command, reading values from the
       environment) we are interacting with something that will want to give us a byte str.  Some
       python  standard  library  modules and third party libraries will automatically attempt to
       convert a byte str to unicode strings for you.  This is both a boon and a curse.   If  the
       library can guess correctly about the encoding that the data is in, it will return unicode
       objects to you without you having to convert.  However, if it can’t guess  correctly,  you
       may end up with one of several problems:

       UnicodeError
              The  library  attempted  to  decode  a  byte str into a unicode, string failed, and
              raises an exception.

       Garbled data
              If the library returns the data after decoding it  with  the  wrong  encoding,  the
              characters you see in the unicode string won’t be the ones that you expect.

       A byte str instead of unicode string
              Some  libraries  will  return a unicode string when they’re able to decode the data
              and a byte str  when they can’t.  This is generally the hardest  problem  to  debug
              when  it  occurs.   Avoid it in your own code and try to avoid or open bugs against
              upstreams that do this. See DesigningUnicodeAwareAPIs for  strategies  to  do  this
              properly.

       On line 8, we convert from a byte str to a unicode string.  to_unicode() does this for us.
       It has some error handling and sane defaults that make this a nicer function to  use  than
       calling str.decode() directly:

       • Instead of defaulting to the ASCII encoding which fails with all but the simple American
         English characters, it defaults to UTF-8.

       • Instead of raising an error if it cannot decode a value, it will replace the value  with
         the unicode “Replacement character” symbol (�).

       • If  you  happen to call this method with something that is not a str or unicode, it will
         return an empty unicode string.

       All three of these can be overridden using different keyword arguments  to  the  function.
       See the to_unicode() documentation for more information.

       On line 15 we push the data back out to a file.  Two things you should note here:

       1. We  deal  with  the  strings as unicode until the last instant.  The string format that
          we’re using is unicode and the variable also holds unicode.  People sometimes get  into
          trouble when they mix a byte str format with a variable that holds a unicode string (or
          vice versa) at this stage.

       2. to_bytes(), does the reverse of to_unicode().  In this case, we’re  using  the  default
          values  which turn unicode into a byte str using UTF-8.  Any errors are replaced with a
          � and sending nonstring objects yield empty unicode strings.  Just  like  to_unicode(),
          you  can  look  at  the documentation for to_bytes() to find out how to override any of
          these defaults.

   When to use an alternate strategy
       The default strategy of decoding to unicode strings when you take data in and encoding  to
       a  byte  str when you send the data back out works great for most problems but there are a
       few times when you shouldn’t:

       • The values aren’t meant to be read as text

       • The values need to be byte-for-byte when you send them back out – for instance  if  they
         are database keys or filenames.

       • You are transferring the data between several libraries that all expect byte str.

       In  each  of  these  instances, there is a reason to keep around the byte str version of a
       value.  Here’s a few hints to keep your sanity in these situations:

       1. Keep your unicode and str values separate.  Just like the pain caused when you have  to
          use  someone  else’s  library  that returns both unicode and str you can cause yourself
          pain if you have functions that can return both types  or  variables  that  could  hold
          either type of value.

       2. Name  your  variables  so  that you can tell whether you’re storing byte str or unicode
          string.  One of the first things you end up having to do when  debugging  is  determine
          what  type  of string you have in a variable and what type of string you are expecting.
          Naming your variables consistently so that you can tell which type they are supposed to
          hold will save you from at least one of those steps.

       3. When  you  get  values  initially, make sure that you’re dealing with the type of value
          that you expect as  you  save  it.   You  can  use  isinstance()  or  to_bytes()  since
          to_bytes()  doesn’t  do  any  modifications  of the string if it’s already a str.  When
          using to_bytes() for this purpose you might want to use:

             try:
                 b_input = to_bytes(input_should_be_bytes_already, errors='strict', nonstring='strict')
             except:
                 handle_errors_somehow()

          The reason is that the default of to_bytes() will take characters that are  illegal  in
          the  chosen  encoding and transform them to replacement characters.  Since the point of
          keeping this data as a byte str is to keep the  exact  same  bytes  when  you  send  it
          outside  of  your  code, changing things to replacement characters should be rasing red
          flags that something is wrong.  Setting errors to strict will raise an exception  which
          gives you an opportunity to fail gracefully.

       4. Sometimes  you  will want to print out the values that you have in your byte str.  When
          you do this you will need to make  sure  that  you  transform  unicode  to  str  before
          combining  them.   Also  be  sure that any other function calls (including gettext) are
          going to give you strings that are the same type.  For instance:

             print to_bytes(_('Username: %(user)s'), 'utf-8') % {'user': b_username}

   Gotchas and how to avoid them
       Even when you have a good conceptual understanding of how python2 treats unicode  and  str
       there  are  still  some  things  that can surprise you.  In most cases this is because, as
       noted earlier, python or one of the python libraries you depend on is trying to convert  a
       value  automatically  and  failing.   Explicit conversion at the appropriate place usually
       solves that.

   str(obj)
       One common idiom for getting a simple, string representation of an object is to use:

          str(obj)

       Unfortunately, this is not safe.  Sometimes str(obj) will return  unicode.   Sometimes  it
       will  return a byte str.  Sometimes, it will attempt to convert from a unicode string to a
       byte str, fail, and throw a UnicodeError.  To be safe from  all  of  these,  first  decide
       whether  you  need  unicode or str to be returned.  Then use to_unicode() or to_bytes() to
       get the simple representation like this:

          u_representation = to_unicode(obj, nonstring='simplerepr')
          b_representation = to_bytes(obj, nonstring='simplerepr')

   print
       python has a builtin print()  statement  that  outputs  strings  to  the  terminal.   This
       originated  in  a  time  when  python only dealt with byte str.  When unicode strings came
       about, some enhancements were made to the print() statement so that it could  print  those
       as well.  The enhancements make print() work most of the time.  However, the times when it
       doesn’t work tend to make for cryptic debugging.

       The basic issue is that print() has to figure out what encoding to use when  it  prints  a
       unicode  string  to  the  terminal.   When python is attached to your terminal (ie, you’re
       running the interpreter or running a script that prints to the screen) python is  able  to
       take  the  encoding  value  from  your  locale  settings  LC_ALL or LC_CTYPE and print the
       characters allowed by that encoding.  On most modern Unix systems, the encoding  is  utf-8
       which means that you can print any unicode character without problem.

       There are two common cases of things going wrong:

       1. Someone  has  a  locale  set  that  does  not accept all valid unicode characters.  For
          instance:

             $ LC_ALL=C python
             >>> print u'\ufffd'
             Traceback (most recent call last):
               File "<stdin>", line 1, in <module>
             UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

          This often happens when a script that you’ve written and debugged from the terminal  is
          run  from  an  automated environment like cron.  It also occurs when you have written a
          script using a utf-8 aware locale and released it for consumption by  people  all  over
          the  internet.   Inevitably,  someone  is  running  with a locale that can’t handle all
          unicode characters and you get a traceback reported.

       2. You redirect output to a file.  Python isn’t using the values in LC_ALL unconditionally
          to  decide what encoding to use.  Instead it is using the encoding set for the terminal
          you are printing to which is set to accept  different  encodings  by  LC_ALL.   If  you
          redirect to a file, you are no longer printing to the terminal so LC_ALL won’t have any
          effect.  At this point, python will decide it can’t find an encoding  and  fallback  to
          ASCII which will likely lead to UnicodeError being raised.  You can see this in a short
          script:

             #! /usr/bin/python -tt
             print u'\ufffd'

          And then look at the difference between running it normally and redirecting to a file:

             $ ./test.py
             �
             $ ./test.py > t
             Traceback (most recent call last):
               File "test.py", line 3, in <module>
                   print u'\ufffd'
             UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

       The short answer to dealing with this is to always use bytes when writing output.  You can
       do this by explicitly converting to bytes like this:

          from kitchen.text.converters import to_bytes
          u_string = u'\ufffd'
          print to_bytes(u_string)

       or  you  can  wrap stdout and stderr with a StreamWriter.  A StreamWriter is convenient in
       that you can assign it to encode  for  sys.stdout  or  sys.stderr  and  then  have  output
       automatically  converted but it has the drawback of still being able to throw UnicodeError
       if the writer can’t encode all possible unicode codepoints.  Kitchen provides an alternate
       version  which  can  be  retrieved with kitchen.text.converters.getwriter() which will not
       traceback in its standard configuration.

   Unicode, str, and dict keys
       The hash() of the ASCII characters is the same for unicode and byte  str.   When  you  use
       them in dict keys, they evaluate to the same dictionary slot:

          >>> u_string = u'a'
          >>> b_string = 'a'
          >>> hash(u_string), hash(b_string)
          (12416037344, 12416037344)
          >>> d = {}
          >>> d[u_string] = 'unicode'
          >>> d[b_string] = 'bytes'
          >>> d
          {u'a': 'bytes'}

       When you deal with key values outside of ASCII, unicode and byte str evaluate unequally no
       matter what their character content or hash value:

          >>> u_string = u'ñ'
          >>> b_string = u_string.encode('utf-8')
          >>> print u_string
          ñ
          >>> print b_string
          ñ
          >>> d = {}
          >>> d[u_string] = 'unicode'
          >>> d[b_string] = 'bytes'
          >>> d
          {u'\\xf1': 'unicode', '\\xc3\\xb1': 'bytes'}
          >>> b_string2 = '\\xf1'
          >>> hash(u_string), hash(b_string2)
          (30848092528, 30848092528)
          >>> d = {}
          >>> d[u_string] = 'unicode'
          >>> d[b_string2] = 'bytes'
          {u'\\xf1': 'unicode', '\\xf1': 'bytes'}

       How do you work with this one?  Remember rule #1:  Keep your unicode and byte  str  values
       separate.  That goes for keys in a dictionary just like anything else.

       • For  any  given  dictionary, make sure that all your keys are either unicode or str.  Do
         not mix the two.  If you’re being given both unicode and  str  but  you  don’t  need  to
         preserve separate keys for each, I recommend using to_unicode() or to_bytes() to convert
         all keys to one type or the other like this:

            >>> from kitchen.text.converters import to_unicode
            >>> u_string = u'one'
            >>> b_string = 'two'
            >>> d = {}
            >>> d[to_unicode(u_string)] = 1
            >>> d[to_unicode(b_string)] = 2
            >>> d
            {u'two': 2, u'one': 1}

       • These issues also apply to using dicts with tuple keys that contain a mixture of unicode
         and str.  Once again the best fix is to standardise on either str or unicode.

       • If  you  absolutely  need to store values in a dictionary where the keys could be either
         unicode or str you can use StrictDict which has separate entries  for  all  unicode  and
         byte str and deals correctly with any tuple containing mixed unicode and byte str.

   Functions
   Unicode and byte str conversion
       kitchen.text.converters.to_unicode(obj,         encoding='utf-8',        errors='replace',
       nonstring=None, non_string=None)
              Convert an object into a unicode string

              Parameters

                     • obj – Object to convert to a unicode string.  This should  normally  be  a
                       byte str

                     • encoding  –  What encoding to try converting the byte str as.  Defaults to
                       utf-8

                     • errors –  If  errors  are  found  while  decoding,  perform  this  action.
                       Defaults to replace which replaces the invalid bytes with a character that
                       means the bytes were unable to be decoded.  Other values are the  same  as
                       the error handling schemes in the codec base classes.  For instance strict
                       which raises an exception and ignore which simply omits the  non-decodable
                       characters.

                     • nonstring –

                       How to treat nonstring values.  Possible values are:

                       simplerepr
                              Attempt  to  call  the  object’s “simple representation” method and
                              return that value.  Python-2.3+ has two methods that try to  return
                              a simple representation: object.__unicode__() and object.__str__().
                              We first try to get a usable value from  object.__unicode__().   If
                              that fails we try the same with object.__str__().

                       empty  Return an empty unicode string

                       strict Raise a TypeError

                       passthru
                              Return the object unchanged

                       repr   Attempt to return a unicode string of the repr of the object

                       Default is simplerepr

                     • non_string – Deprecated Use nonstring instead

              Raises

                     • TypeError  –  if nonstring is strict and a non-basestring object is passed
                       in or if nonstring is set to an unknown value

                     • UnicodeDecodeError – if errors is strict and obj is  not  decodable  using
                       the given encoding

              Returns
                     unicode string or the original object depending on the value of nonstring.

              Usually this should be used on a byte str but it can take both byte str and unicode
              strings intelligently.  Nonstring objects are handled in different  ways  depending
              on the setting of the nonstring parameter.

              The default values of this function are set so as to always return a unicode string
              and never raise an error when converting from a  byte  str  to  a  unicode  string.
              However, when you do not pass validly encoded text (or a nonstring object), you may
              end up with output that you don’t expect.  Be sure you understand the  requirements
              of your data, not just ignore errors by passing it through this function.

              Changed  in  version 0.2.1a2: Deprecated non_string in favor of nonstring parameter
              and changed default value to simplerepr

       kitchen.text.converters.to_bytes(obj, encoding='utf-8', errors='replace',  nonstring=None,
       non_string=None)
              Convert an object into a byte str

              Parameters

                     • obj  – Object to convert to a byte str.  This should normally be a unicode
                       string.

                     • encoding – Encoding to use to convert the unicode string into a byte  str.
                       Defaults to utf-8.

                     • errors –

                       If  errors  are  found  while  encoding, perform this action.  Defaults to
                       replace which replaces the invalid bytes with a character that  means  the
                       bytes  were  unable to be encoded.  Other values are the same as the error
                       handling schemes in the codec base classes.   For  instance  strict  which
                       raises  an  exception  and  ignore  which  simply  omits the non-encodable
                       characters.

                     • nonstring –

                       How to treat nonstring values.  Possible values are:

                       simplerepr
                              Attempt to call the object’s  “simple  representation”  method  and
                              return  that value.  Python-2.3+ has two methods that try to return
                              a simple representation: object.__unicode__() and object.__str__().
                              We  first try to get a usable value from object.__str__().  If that
                              fails we try the same with object.__unicode__().

                       empty  Return an empty byte str

                       strict Raise a TypeError

                       passthru
                              Return the object unchanged

                       repr   Attempt to return a byte str of the repr() of the object

                       Default is simplerepr.

                     • non_string – Deprecated Use nonstring instead.

              Raises

                     • TypeError – if nonstring is strict and a non-basestring object  is  passed
                       in or if nonstring is set to an unknown value.

                     • UnicodeEncodeError  –  if errors is strict and all of the bytes of obj are
                       unable to be encoded using encoding.

              Returns
                     byte str or the original object depending on the value of nonstring.

              WARNING:
                 If you pass a byte str into this function the byte str is  returned  unmodified.
                 It  is  not  re-encoded with the specified encoding.  The easiest way to achieve
                 that is:

                     to_bytes(to_unicode(text), encoding='utf-8')

                 The initial to_unicode() call will ensure  text  is  a  unicode  string.   Then,
                 to_bytes() will turn that into a byte str with the specified encoding.

              Usually,  this should be used on a unicode string but it can take either a byte str
              or a unicode string intelligently.  Nonstring objects are handled in different ways
              depending on the setting of the nonstring parameter.

              The  default  values of this function are set so as to always return a byte str and
              never raise an error when converting from unicode to bytes.  However, when  you  do
              not  pass  an encoding that can validly encode the object (or a non-string object),
              you may end up with output that you don’t  expect.   Be  sure  you  understand  the
              requirements  of  your  data,  not  just  ignore  errors by passing it through this
              function.

              Changed in version 0.2.1a2: Deprecated non_string in favor of  nonstring  parameter
              and changed default value to simplerepr

       kitchen.text.converters.getwriter(encoding)
              Return a codecs.StreamWriter that resists tracing back.

              Parameters
                     encoding – Encoding to use for transforming unicode strings into byte str.

              Return type
                     codecs.StreamWriter

              Returns
                     StreamWriter   that   you   can   instantiate  to  wrap  output  streams  to
                     automatically translate unicode strings into encoding.

              This is a reimplemetation of codecs.getwriter() that returns  a  StreamWriter  that
              resists   issuing   tracebacks.    The   StreamWriter   that   is   returned   uses
              kitchen.text.converters.to_bytes() to convert unicode strings into byte  str.   The
              departures from codecs.getwriter() are:

              1. The StreamWriter that is returned will take byte str as well as unicode strings.
                 Any byte str will be passed through unmodified.

              2. The default error handler for unknown bytes is to replace  the  bytes  with  the
                 unknown  character  (?  in  most  ascii-based encodings, � in the utf encodings)
                 whereas codecs.getwriter() defaults to strict.   Like  codecs.StreamWriter,  the
                 returned  StreamWriter  can  have  its  error handler changed in code by setting
                 stream.errors = 'new_handler_name'

              Example usage:

                 $ LC_ALL=C python
                 >>> import sys
                 >>> from kitchen.text.converters import getwriter
                 >>> UTF8Writer = getwriter('utf-8')
                 >>> unwrapped_stdout = sys.stdout
                 >>> sys.stdout = UTF8Writer(unwrapped_stdout)
                 >>> print 'caf\xc3\xa9'
                 café
                 >>> print u'caf\xe9'
                 café
                 >>> ASCIIWriter = getwriter('ascii')
                 >>> sys.stdout = ASCIIWriter(unwrapped_stdout)
                 >>> print 'caf\xc3\xa9'
                 café
                 >>> print u'caf\xe9'
                 caf?

              SEE ALSO:
                 API docs for codecs.StreamWriter and codecs.getwriter() and Print Fails  on  the
                 python wiki.

              New in version kitchen: 0.2a2, API: kitchen.text 1.1.0

       kitchen.text.converters.to_str(obj)
              Deprecated

              This  function converts something to a byte str if it isn’t one.  It’s used to call
              str() or unicode() on the object to get its simple representation without danger of
              getting  a UnicodeError.  You should be using to_unicode() or to_bytes() explicitly
              instead.

              If you need unicode strings:

                 to_unicode(obj, nonstring='simplerepr')

              If you need byte str:

                 to_bytes(obj, nonstring='simplerepr')

       kitchen.text.converters.to_utf8(obj, errors='replace', non_string='passthru')
              Deprecated

              Convert unicode to an encoded utf-8 byte  str.   You  should  be  using  to_bytes()
              instead:

                 to_bytes(obj, encoding='utf-8', non_string='passthru')

   Transformation to XML
       kitchen.text.converters.unicode_to_xml(string,       encoding='utf-8',       attrib=False,
       control_chars='replace')
              Take a unicode string and turn it into a byte str suitable for xml

              Parameters

                     • string – unicode string to encode into an XML compatible byte str

                     • encoding – encoding to use for the  returned  byte  str.   Default  is  to
                       encode to UTF-8.  If some of the characters in string are not encodable in
                       this encoding, the unknown characters will  be  entered  into  the  output
                       string using xml character references.

                     • attrib  – If True, quote the string for use in an xml attribute.  If False
                       (default), quote for use in an xml text field.

                     • control_chars –

                       control characters are not allowed in XML documents.   When  we  encounter
                       those we need to know what to do.  Valid options are:

                       replace
                              (default) Replace the control characters with ?

                       ignore Remove the characters altogether from the output

                       strict Raise an XmlEncodeError  when we encounter a control character

              Raises

                     • kitchen.text.exceptions.XmlEncodeError – If control_chars is set to strict
                       and the string to be made suitable for  output  to  xml  contains  control
                       characters  or  if  string  is  not  a  unicode  string then we raise this
                       exception.

                     • ValueError – If control_chars is set  to  something  other  than  replace,
                       ignore, or strict.

              Return type
                     byte str

              Returns
                     representation of the unicode string as a valid XML byte str

              XML  files  consist  mainly  of  text encoded using a particular charset.  XML also
              denies the use of certain bytes in the encoded text (example: ASCII  Null).   There
              are  also  special characters that must be escaped if they are present in the input
              (example: <).  This function takes care of all of those issues for you.

              There are a few different ways to use this function depending on your  needs.   The
              simplest invocation is like this:

                 unicode_to_xml(u'String with non-ASCII characters: <"á と">')

              This will return the following to you, encoded in utf-8:

                 'String with non-ASCII characters: &lt;"á と"&gt;'

              Pretty straightforward.  Now, what if you need to encode your document in something
              other than utf-8?  For instance, latin-1?  Let’s see:

                 unicode_to_xml(u'String with non-ASCII characters: <"á と">', encoding='latin-1')
                 'String with non-ASCII characters: &lt;"á &#12392;"&gt;'

              Because the と character is not available in the latin-1 charset,  it  is  replaced
              with  &#12392;  in our output.  This is an xml character reference which represents
              the character at unicode codepoint 12392, the と character.

              When you want to reverse this, use xml_to_unicode() which will turn a byte str into
              a  unicode  string  and  replace  the  xml  character  references  with the unicode
              characters.

              XML also has the quirk of not allowing  control  characters  in  its  output.   The
              control_chars  parameter allows us to specify what to do with those.  For use cases
              that don’t need absolute character by character fidelity (example: holding  strings
              that  will  just  be  used  for  display  in a GUI app later), the default value of
              replace works well:

                 unicode_to_xml(u'String with disallowed control chars: \u0000\u0007')
                 'String with disallowed control chars: ??'

              If you do need to be able to reproduce all  of  the  characters  at  a  later  date
              (examples:  if  the  string is a key value in a database or a path on a filesystem)
              you have many choices.  Here are a few that rely on utf-7, a verbose encoding  that
              encodes control characters (as well as non-ASCII unicode values) to characters from
              within the ASCII printable characters.  The good thing about doing this is that the
              code is pretty simple.  You just need to use utf-7 both when encoding the field for
              xml and when decoding it for use in your python program:

                 unicode_to_xml(u'String with unicode: と and control char: \u0007', encoding='utf7')
                 'String with unicode: +MGg and control char: +AAc-'
                 # [...]
                 xml_to_unicode('String with unicode: +MGg and control char: +AAc-', encoding='utf7')
                 u'String with unicode: と and control char: \u0007'

              As you can see, the utf-7 encoding will transform even  characters  that  would  be
              representable  in  utf-8.  This can be a drawback if you want unicode characters in
              the file to be readable without being decoded first.  You can work around this with
              increased complexity in your application code:

                 encoding = 'utf-8'
                 u_string = u'String with unicode: と and control char: \u0007'
                 try:
                     # First attempt to encode to utf8
                     data = unicode_to_xml(u_string, encoding=encoding, errors='strict')
                 except XmlEncodeError:
                     # Fallback to utf-7
                     encoding = 'utf-7'
                     data = unicode_to_xml(u_string, encoding=encoding, errors='strict')
                 write_tag('<mytag encoding=%s>%s</mytag>' % (encoding, data))
                 # [...]
                 encoding = tag.attributes.encoding
                 u_string = xml_to_unicode(u_string, encoding=encoding)

              Using  code  similar  to  that, you can have some fields encoded using your default
              encoding and fallback to utf-7 if there are control characters present.

              NOTE:
                 If your goal is to preserve the control characters you cannot  save  the  entire
                 file  as  utf-7  and  set the xml encoding parameter to utf-7 if your goal is to
                 preserve the control characters.  Because XML doesn’t allow control  characters,
                 you  have  to  encode  those separate from any encoding work that the XML parser
                 itself knows about.

              SEE ALSO:

                 bytes_to_xml()
                        if you’re dealing with bytes that are non-text or of an unknown  encoding
                        that you must preserve on a byte for byte level.

                 guess_encoding_to_xml()
                        if  you’re  dealing with strings in unknown encodings that you don’t need
                        to save with char-for-char fidelity.

       kitchen.text.converters.xml_to_unicode(byte_string, encoding='utf-8', errors='replace')
              Transform a byte str from an xml file into a unicode string

              Parameters

                     • byte_string – byte str to decode

                     • encoding – encoding that the byte str is in

                     • errors – What to do if not every character is  valid in encoding.  See the
                       to_unicode() documentation for legal values.

              Return type
                     unicode string

              Returns
                     string decoded from byte_string

              This  function attempts to reverse what unicode_to_xml() does.  It takes a byte str
              (presumably read in from an xml file)  and  expands  all  the  html  entities  into
              unicode  characters  and  decodes the byte str into a unicode string.  One thing it
              cannot do is restore any control characters that were removed  prior  to  inserting
              into  the file.  If you need to keep such characters you need to use xml_to_bytes()
              and bytes_to_xml() or use on  of  the  strategies  documented  in  unicode_to_xml()
              instead.

       kitchen.text.converters.byte_string_to_xml(byte_string,            input_encoding='utf-8',
       errors='replace', output_encoding='utf-8', attrib=False, control_chars='replace')
              Make sure a byte str is validly encoded for xml output

              Parameters

                     • byte_string – Byte str to turn into valid xml output

                     • input_encoding – Encoding of byte_string.  Default utf-8

                     • errors –

                       How to handle errors  encountered  while  decoding  the  byte_string  into
                       unicode at the beginning of the process.  Values are:

                       replace
                              (default) Replace the invalid bytes with a ?

                       ignore Remove the characters altogether from the output

                       strict Raise  an  UnicodeDecodeError  when  we  encounter  a non-decodable
                              character

                     • output_encoding – Encoding for the xml file that this string will go into.
                       Default  is utf-8.  If all the characters in byte_string are not encodable
                       in this encoding, the unknown characters will be entered into  the  output
                       string using xml character references.

                     • attrib  – If True, quote the string for use in an xml attribute.  If False
                       (default), quote for use in an xml text field.

                     • control_chars –

                       XML does not allow control characters.  When we encounter those we need to
                       know what to do.  Valid options are:

                       replace
                              (default) Replace the control characters with ?

                       ignore Remove the characters altogether from the output

                       strict Raise an error when we encounter a control character

              Raises

                     • XmlEncodeError  –  If  control_chars is set to strict and the string to be
                       made suitable for output to xml contains control characters then we  raise
                       this exception.

                     • UnicodeDecodeError  –  If  errors  is  set  to  strict and the byte_string
                       contains bytes that are not decodable using input_encoding, this error  is
                       raised

              Return type
                     byte str

              Returns
                     representation  of  the  byte str in the output encoding with any bytes that
                     aren’t available in xml taken care of.

              Use this when you have a byte str representing text that you need to make  suitable
              for  output to xml.  There are several cases where this is the case.  For instance,
              if you need to transform some strings encoded in latin-1 to utf-8 for output:

                 utf8_string = byte_string_to_xml(latin1_string, input_encoding='latin-1')

              If you already have strings in the proper encoding you may still want to  use  this
              function to remove control characters:

                 cleaned_string = byte_string_to_xml(string, input_encoding='utf-8', output_encoding='utf-8')

              SEE ALSO:

                 unicode_to_xml()
                        for other ideas on using this function

       kitchen.text.converters.xml_to_byte_string(byte_string,            input_encoding='utf-8',
       errors='replace', output_encoding='utf-8')
              Transform a byte str from an xml file into unicode string

              Parameters

                     • byte_string – byte str to decode

                     • input_encoding – encoding that the byte str is in

                     • errors – What to do if not every character is valid in encoding.  See  the
                       to_unicode() docstring for legal values.

                     • output_encoding – Encoding for the output byte str

              Returns
                     unicode string decoded from byte_string

              This  function attempts to reverse what unicode_to_xml() does.  It takes a byte str
              (presumably read in from an xml file)  and  expands  all  the  html  entities  into
              unicode  characters  and  decodes the byte str into a unicode string.  One thing it
              cannot do is restore any control characters that were removed  prior  to  inserting
              into  the file.  If you need to keep such characters you need to use xml_to_bytes()
              and bytes_to_xml() or use one of  the  strategies  documented  in  unicode_to_xml()
              instead.

       kitchen.text.converters.bytes_to_xml(byte_string, *args, **kwargs)
              Return a byte str encoded so it is valid inside of any xml file

              Parameters

                     • byte_string – byte str to transform

                     • **kwargs  (*args,) – extra arguments to this function are passed on to the
                       function actually implementing the encoding.  You can use  this  to  tweak
                       the output in some cases but, as a general rule, you shouldn’t because the
                       underlying encoding function is not guaranteed to remain the same.

              Return type
                     byte str consisting of all ASCII characters

              Returns
                     byte str representation of the input.  This will be encoded using base64.

              This function is made especially to put binary information into xml documents.

              This function is intended for encoding things that must be preserved byte-for-byte.
              If  you  want  to encode a byte string that’s text and don’t mind losing the actual
              bytes you probably want  to  try  byte_string_to_xml()  or  guess_encoding_to_xml()
              instead.

              NOTE:
                 Although the current implementation uses base64.b64encode() and there’s no plans
                 to change it, that isn’t guaranteed.  If you want to  make  sure  that  you  can
                 encode and decode these messages it’s best to use xml_to_bytes() if you use this
                 function to encode.

       kitchen.text.converters.xml_to_bytes(byte_string, *args, **kwargs)
              Decode a string encoded using bytes_to_xml()

              Parameters

                     • byte_string – byte str to transform.  This  should  be  a  base64  encoded
                       sequence of bytes originally generated by bytes_to_xml().

                     • **kwargs  (*args,) – extra arguments to this function are passed on to the
                       function actually implementing the encoding.  You can use  this  to  tweak
                       the output in some cases but, as a general rule, you shouldn’t because the
                       underlying encoding function is not guaranteed to remain the same.

              Return type
                     byte str

              Returns
                     byte str that’s the decoded input

              If you’ve got fields in an xml document that were encoded with bytes_to_xml()  then
              you  want  to  use  this  function  to undecode them.  It converts a base64 encoded
              string into a byte str.

              NOTE:
                 Although the current implementation uses base64.b64decode() and there’s no plans
                 to  change  it,  that  isn’t  guaranteed.  If you want to make sure that you can
                 encode and decode these messages it’s best to use bytes_to_xml() if you use this
                 function to decode.

       kitchen.text.converters.guess_encoding_to_xml(string,             output_encoding='utf-8',
       attrib=False, control_chars='replace')
              Return a byte str suitable for inclusion in xml

              Parameters

                     • string – unicode or byte str to be transformed into a  byte  str  suitable
                       for  inclusion  in  xml.   If string is a byte str we attempt to guess the
                       encoding.  If we cannot guess, we fallback to latin-1.

                     • output_encoding – Output encoding for the byte str.  This should match the
                       encoding of your xml file.

                     • attrib  –  If True, escape the item for use in an xml attribute.  If False
                       (default) escape the item for use in a text node.

              Returns
                     utf-8 encoded byte str

       kitchen.text.converters.to_xml(string,           encoding='utf-8',           attrib=False,
       control_chars='ignore')
              Deprecated: Use guess_encoding_to_xml() instead

   Working with exception messages
       kitchen.text.converters.EXCEPTION_CONVERTERS = (<function <lambda>>, <function <lambda>>)

              Tuple of functions to try to use to convert an exception into a string
                     representation.   Its  main use is to extract a string (unicode or str) from
                     an exception object in exception_to_unicode() and exception_to_bytes().  The
                     functions  here  will  try  the exception’s args[0] and the exception itself
                     (roughly equivalent to str(exception)) to extract the message. This is  only
                     a  default and can be easily overridden when calling those functions.  There
                     are several reasons you might wish to do that.  If you have exceptions where
                     the  best  string  representing the exception is not returned by the default
                     functions, you can add another function to extract from a different field:

                        from kitchen.text.converters import (EXCEPTION_CONVERTERS,
                                exception_to_unicode)

                        class MyError(Exception):
                            def __init__(self, message):
                                self.value = message

                        c = [lambda e: e.value]
                        c.extend(EXCEPTION_CONVERTERS)
                        try:
                            raise MyError('An Exception message')
                        except MyError, e:
                            print exception_to_unicode(e, converters=c)

                     Another reason would be if you’re converting to a byte str and you know  the
                     str  needs  to  be  a  non-utf-8 encoding.  exception_to_bytes() defaults to
                     utf-8 but if you convert into a byte str explicitly using a  converter  then
                     you can choose a different encoding:

                        from kitchen.text.converters import (EXCEPTION_CONVERTERS,
                                exception_to_bytes, to_bytes)
                        c = [lambda e: to_bytes(e.args[0], encoding='euc_jp'),
                                lambda e: to_bytes(e, encoding='euc_jp')]
                        c.extend(EXCEPTION_CONVERTERS)
                        try:
                            do_something()
                        except Exception, e:
                            log = open('logfile.euc_jp', 'a')
                            log.write('%s

              ‘ % exception_to_bytes(e, converters=c)
                        log.close()

                     Each  function  in  this list should take the exception as its sole argument
                     and return a string containing the message representing the exception.   The
                     functions  may return the message as a :byte class:str, a unicode string, or
                     even  an  object  if  you  trust  the  object  to  return  a  decent  string
                     representation.    The   exception_to_unicode()   and   exception_to_bytes()
                     functions will make sure to convert the string to  the  proper  type  before
                     returning.

                     New in version 0.2.2.

       kitchen.text.converters.BYTE_EXCEPTION_CONVERTERS   =   (<function   <lambda>>,  <function
       to_bytes>)
              Deprecated: Use EXCEPTION_CONVERTERS instead.

              Tuple  of  functions  to  try  to  use  to  convert  an  exception  into  a  string
              representation.   This tuple is similar to the one in EXCEPTION_CONVERTERS but it’s
              used with exception_to_bytes() instead.  Ideally, these functions should  do  their
              best  to  return  the  data  as  a  byte  str  but  the results will be run through
              to_bytes() before being returned.

              New in version 0.2.2.

              Changed in version 1.0.1: Deprecated as simplifications allow  EXCEPTION_CONVERTERS
              to perform the same function.

       kitchen.text.converters.exception_to_unicode(exc,     converters=(<function     <lambda>>,
       <function <lambda>>))
              Convert an exception object into a unicode representation

              Parameters

                     • exc – Exception object to convert

                     • converters – List of functions to use to  convert  the  exception  into  a
                       string.   See EXCEPTION_CONVERTERS for the default value and an example of
                       adding other converters to the defaults.  The functions in  the  list  are
                       tried  one  at  a  time  to  see  if  they  can  extract a string from the
                       exception.  The first one to do so without raising an exception is used.

              Returns
                     unicode string representation of the exception.  The value extracted by  the
                     converters  will  be  converted into unicode before being returned using the
                     utf-8 encoding.  If you know you need to use an  alternate  encoding  add  a
                     function that does that to the list of functions in converters)

              New in version 0.2.2.

       kitchen.text.converters.exception_to_bytes(exc, converters=(<function <lambda>>, <function
       <lambda>>))
              Convert an exception object into a str representation

              Parameters

                     • exc – Exception object to convert

                     • converters – List of functions to use to  convert  the  exception  into  a
                       string.   See EXCEPTION_CONVERTERS for the default value and an example of
                       adding other converters to the defaults.  The functions in  the  list  are
                       tried  one  at  a  time  to  see  if  they  can  extract a string from the
                       exception.  The first one to do so without raising an exception is used.

              Returns
                     byte str representation of  the  exception.   The  value  extracted  by  the
                     converters  will be converted into str before being returned using the utf-8
                     encoding.  If you know you need to use an alternate encoding add a  function
                     that does that to the list of functions in converters)

              New in version 0.2.2.

              Changed  in  version  1.0.1:  Code  simplification  allowed  us  to switch to using
              EXCEPTION_CONVERTERS as the default value of converters.

   Format Text for Display
       Functions related to displaying unicode text.  Unicode characters don’t all have the  same
       width so we need helper functions for displaying them.

       New in version 0.2: kitchen.display API 1.0.0

       kitchen.text.display.textual_width(msg,      control_chars='guess',      encoding='utf-8',
       errors='replace')
              Get the textual width of a string

              Parameters

                     • msg – unicode string or byte str to get the width of

                     • control_chars –

                       specify how to deal with control characters.  Possible values are:

                       guess  (default) will take a guess for  control  character  widths.   Most
                              codes  will return zero width.  backspace, delete, and clear delete
                              return -1.  escape currently returns -1 as well  but  this  is  not
                              guaranteed as it’s not always correct

                       strict will  raise  kitchen.text.exceptions.ControlCharError  if a control
                              character is encountered

                     • encoding – If we are given a byte str this  is  used  to  decode  it  into
                       unicode  string.   Any  characters that are not decodable in this encoding
                       will get a value dependent on the errors parameter.

                     • errors – How to treat errors encoding the  byte  str  to  unicode  string.
                       Legal  values  are  the  same as for kitchen.text.converters.to_unicode().
                       The default value of replace will cause undecodable byte sequences to have
                       a width of one. ignore will have a width of zero.

              Raises ControlCharError  – if msg contains a control character and control_chars is
                     strict.

              Returns
                     Textual width of the msg.  This is the amount of space that the string  will
                     consume  on  a  monospace  display.   It’s  measured  in  the number of cell
                     positions or columns it will take up on a monospace display.   This  is  not
                     the number of glyphs that are in the string.

              NOTE:
                 This  function  can be wrong sometimes because Unicode does not specify a strict
                 width value for all of the code points.  In particular, we’ve  found  that  some
                 Tamil characters take up to four character cells but we return a lesser amount.

       kitchen.text.display.textual_width_chop(msg, chop, encoding='utf-8', errors='replace')
              Given a string, return it chopped to a given textual width

              Parameters

                     • msg – unicode string or byte str to chop

                     • chop – Chop msg if it exceeds this textual width

                     • encoding  –  If  we are given a byte str, this is used to decode it into a
                       unicode string.  Any characters that are not decodable  in  this  encoding
                       will be assigned a width of one.

                     • errors  –  How  to  treat  errors encoding the byte str to unicode.  Legal
                       values are the same as for kitchen.text.converters.to_unicode()

              Return type
                     unicode string

              Returns
                     unicode string of the msg chopped at the given textual width

              This is what you want to use instead of %.*s, as it does  the  “right”  thing  with
              regard  to  UTF-8 sequences, control characters, and characters that take more than
              one cell position. Eg:

                 >>> # Wrong: only displays 8 characters because it is operating on bytes
                 >>> print "%.*s" % (10, 'café ñunru!')
                 café ñun
                 >>> # Properly operates on graphemes
                 >>> '%s' % (textual_width_chop('café ñunru!', 10))
                 café ñunru
                 >>> # takes too many columns because the kanji need two cell positions
                 >>> print '1234567890\n%.*s' % (10, u'一二三四五六七八九十')
                 1234567890
                 一二三四五六七八九十
                 >>> # Properly chops at 10 columns
                 >>> print '1234567890\n%s' % (textual_width_chop(u'一二三四五六七八九十', 10))
                 1234567890
                 一二三四五

       kitchen.text.display.textual_width_fill(msg,  fill,   chop=None,   left=True,   prefix='',
       suffix='')
              Expand a unicode string to a specified textual width or chop to same

              Parameters

                     • msg – unicode string to format

                     • fill – pad string until the textual width of the string is this length

                     • chop  –  before  doing  anything  else,  chop  the  string to this length.
                       Default: Don’t chop the string at all

                     • left – If True (default) left justify the string and put  the  padding  on
                       the right.  If False, pad on the left side.

                     • prefix – Attach this string before the field we’re filling

                     • suffix – Append this string to the end of the field we’re filling

              Return type
                     unicode string

              Returns
                     msg  formatted  to  fill  the specified width.  If no chop is specified, the
                     string could exceed the fill length when completed.  If prefix or suffix are
                     printable characters, the string could be longer than the fill width.

              NOTE:
                 prefix  and  suffix should be used for “invisible” characters like highlighting,
                 color changing escape codes, etc.  The fill characters are appended  outside  of
                 any  prefix or suffix elements.  This allows you to only highlight msg inside of
                 the field you’re filling.

              WARNING:
                 msg, prefix, and suffix should all be representable as unicode  characters.   In
                 particular,  any escape sequences in prefix and suffix need to be convertible to
                 unicode.  If you need to use byte sequences here rather than unicode characters,
                 use byte_string_textual_width_fill() instead.

              This  function expands a string to fill a field of a particular textual width.  Use
              it instead of %*.*s, as it does the “right” thing with regard to  UTF-8  sequences,
              control  characters,  and  characters  that  take  more than one cell position in a
              display.  Example usage:

                 >>> msg = u'一二三四五六七八九十'
                 >>> # Wrong: This uses 10 characters instead of 10 cells:
                 >>> u":%-*.*s:" % (10, 10, msg[:9])
                 :一二三四五六七八九 :
                 >>> # This uses 10 cells like we really want:
                 >>> u":%s:" % (textual_width_fill(msg[:9], 10, 10))
                 :一二三四五:

                 >>> # Wrong: Right aligned in the field, but too many cells
                 >>> u"%20.10s" % (msg)
                           一二三四五六七八九十
                 >>> # Correct: Right aligned with proper number of cells
                 >>> u"%s" % (textual_width_fill(msg, 20, 10, left=False))
                           一二三四五

                 >>> # Wrong: Adding some escape characters to highlight the line but too many cells
                 >>> u"%s%20.10s%s" % (prefix, msg, suffix)
                 u'[7m          一二三四五六七八九十[0m'
                 >>> # Correct highlight of the line
                 >>> u"%s%s%s" % (prefix, display.textual_width_fill(msg, 20, 10, left=False), suffix)
                 u'[7m          一二三四五[0m'

                 >>> # Correct way to not highlight the fill
                 >>> u"%s" % (display.textual_width_fill(msg, 20, 10, left=False, prefix=prefix, suffix=suffix))
                 u'          [7m一二三四五[0m'

       kitchen.text.display.wrap(text,   width=70,   initial_indent=u'',   subsequent_indent=u'',
       encoding='utf-8', errors='replace')
              Works like we want textwrap.wrap() to work,

              Parameters

                     • text – unicode string or byte str to wrap

                     • width – textual width at which to wrap.  Default: 70

                     • initial_indent  – string to use to indent the first line.  Default: do not
                       indent.

                     • subsequent_indent – string to use to wrap subsequent lines.   Default:  do
                       not indent

                     • encoding – Encoding to use if text is a byte str

                     • errors  –  error  handler  to  use if text is a byte str and contains some
                       undecodable characters.

              Return type
                     list of unicode strings

              Returns
                     list of lines that have been text wrapped and indented.

              textwrap.wrap() from the python  standard  library  has  two  drawbacks  that  this
              attempts to fix:

              1. It does not handle textual width.  It only operates on bytes or characters which
                 are both inadequate (due to multi-byte and double width characters).

              2. It malforms lists and blocks.

       kitchen.text.display.fill(text, *args, **kwargs)
              Works like we want textwrap.fill() to work

              Parameters
                     text – unicode string or byte str to process

              Returns
                     unicode string with each line separated by a newline

              SEE ALSO:

                 kitchen.text.display.wrap()
                        for other parameters that you can give this command.

              This function is a light wrapper around  kitchen.text.display.wrap().   Where  that
              function  returns  a list of lines, this function returns one string with each line
              separated by a newline.

       kitchen.text.display.byte_string_textual_width_fill(msg,   fill,   chop=None,   left=True,
       prefix='', suffix='', encoding='utf-8', errors='replace')
              Expand a byte str to a specified textual width or chop to same

              Parameters

                     • msg – byte str encoded in UTF-8 that we want formatted

                     • fill – pad msg until the textual width is this long

                     • chop  –  before  doing  anything  else,  chop  the  string to this length.
                       Default: Don’t chop the string at all

                     • left – If True (default) left justify the string and put  the  padding  on
                       the right.  If False, pad on the left side.

                     • prefix – Attach this byte str before the field we’re filling

                     • suffix – Append this byte str to the end of the field we’re filling

              Return type
                     byte str

              Returns
                     msg formatted to fill the specified textual width.  If no chop is specified,
                     the string could exceed the fill length when completed.  If prefix or suffix
                     are printable characters, the string could be longer than fill width.

              NOTE:
                 prefix  and  suffix should be used for “invisible” characters like highlighting,
                 color changing escape codes, etc.  The fill characters are appended  outside  of
                 any  prefix or suffix elements.  This allows you to only highlight msg inside of
                 the field you’re filling.

              SEE ALSO:

                 textual_width_fill()
                        For example usage.  This function has only two differences.

                        1. it takes byte str for prefix and suffix so you can pass  in  arbitrary
                           sequences of bytes, not just unicode characters.

                        2. it returns a byte str instead of a unicode string.

   Internal Data
       There  are a few internal functions and variables in this module.  Code outside of kitchen
       shouldn’t use them but people coding on kitchen itself may find them useful.

       kitchen.text.display._COMBINING = ((768, 879), (1155, 1161), (1425, 1469),  (1471,  1471),
       (1473, 1474), (1476, 1477), (1479, 1479), (1536, 1539), (1552, 1562), (1611, 1631), (1648,
       1648), (1750, 1764), (1767, 1768), (1770, 1773), (1807, 1807), (1809, 1809), (1840, 1866),
       (1958, 1968), (2027, 2035), (2070, 2073), (2075, 2083), (2085, 2087), (2089, 2093), (2137,
       2139), (2260, 2273), (2275, 2303), (2305, 2306), (2364, 2364), (2369, 2376), (2381, 2381),
       (2385, 2388), (2402, 2403), (2433, 2433), (2492, 2492), (2497, 2500), (2509, 2509), (2530,
       2531), (2561, 2562), (2620, 2620), (2625, 2626), (2631, 2632), (2635, 2637), (2672, 2673),
       (2689, 2690), (2748, 2748), (2753, 2757), (2759, 2760), (2765, 2765), (2786, 2787), (2817,
       2817), (2876, 2876), (2879, 2879), (2881, 2883), (2893, 2893), (2902, 2902), (2946, 2946),
       (3008, 3008), (3021, 3021), (3134, 3136), (3142, 3144), (3146, 3149), (3157, 3158), (3260,
       3260), (3263, 3263), (3270, 3270), (3276, 3277), (3298, 3299), (3393, 3395), (3405, 3405),
       (3530, 3530), (3538, 3540), (3542, 3542), (3633, 3633), (3636, 3642), (3655, 3662), (3761,
       3761), (3764, 3769), (3771, 3772), (3784, 3789), (3864, 3865), (3893, 3893), (3895, 3895),
       (3897, 3897), (3953, 3966), (3968, 3972), (3974, 3975), (3984, 3991), (3993, 4028), (4038,
       4038), (4141, 4144), (4146, 4146), (4150, 4151), (4153, 4154), (4184, 4185), (4237, 4237),
       (4448, 4607), (4957, 4959), (5906, 5908), (5938, 5940), (5970, 5971), (6002, 6003), (6068,
       6069), (6071, 6077), (6086, 6086), (6089, 6099), (6109, 6109), (6155, 6157), (6313, 6313),
       (6432, 6434), (6439, 6440), (6450, 6450), (6457, 6459), (6679, 6680), (6752, 6752), (6773,
       6780), (6783, 6783), (6832, 6845), (6912, 6915), (6964, 6964), (6966, 6970), (6972, 6972),
       (6978, 6978), (6980, 6980), (7019, 7027), (7082, 7083), (7142, 7142), (7154, 7155), (7223,
       7223), (7376, 7378), (7380, 7392), (7394, 7400), (7405, 7405), (7412, 7412), (7416, 7417),
       (7616, 7669), (7675, 7679), (8203, 8207), (8234, 8238), (8288, 8291), (8298, 8303), (8400,
       8432), (11503, 11505), (11647, 11647), (11744, 11775),  (12330,  12335),  (12441,  12442),
       (42607,  42607),  (42612,  42621), (42654, 42655), (42736, 42737), (43014, 43014), (43019,
       43019), (43045, 43046), (43204, 43204), (43232, 43249), (43307,  43309),  (43347,  43347),
       (43443,  43443),  (43456,  43456), (43696, 43696), (43698, 43700), (43703, 43704), (43710,
       43711), (43713, 43713), (43766, 43766), (44013, 44013), (64286,  64286),  (65024,  65039),
       (65056,  65071),  (65279,  65279), (65529, 65531), (66045, 66045), (66272, 66272), (66422,
       66426), (68097, 68099), (68101, 68102), (68108, 68111), (68152,  68154),  (68159,  68159),
       (68325,  68326),  (69702,  69702), (69759, 69759), (69817, 69818), (69888, 69890), (69939,
       69940), (70003, 70003), (70080, 70080), (70090, 70090), (70197,  70198),  (70377,  70378),
       (70460,  70460),  (70477,  70477), (70502, 70508), (70512, 70516), (70722, 70722), (70726,
       70726), (70850, 70851), (71103, 71104), (71231, 71231), (71350,  71351),  (71467,  71467),
       (72767,  72767),  (92912,  92916),  (92976,  92982),  (113822,  113822), (119141, 119145),
       (119149, 119170), (119173, 119179), (119210, 119213), (119362, 119364), (122880,  122886),
       (122888,  122904), (122907, 122913), (122915, 122916), (122918, 122922), (125136, 125142),
       (125252, 125258), (917505, 917505), (917536, 917631), (917760, 917999))
              Internal table, provided by this module to list  code  points  which  combine  with
              other  characters  and  therefore  should  have no textual width.  This is a sorted
              tuple of non-overlapping intervals.  Each interval is a tuple  listing  a  starting
              code point and ending code point.  Every code point between the two end points is a
              combining character.

              SEE ALSO:

                 _generate_combining_table()
                        for how this table is generated

              This    table     was     last     regenerated     on     python-3.6.0-rc1     with
              unicodedata.unidata_version 9.0.0

       kitchen.text.display._generate_combining_table()
              Combine Markus Kuhn’s data with unicodedata to make combining char list

              Return type
                     tuple of tuples

              Returns
                     tuple  of  intervals  of  code  points  that  are combining character.  Each
                     interval is a 2-tuple of the starting code point and the ending  code  point
                     for the combining characters.

              In normal use, this function serves to tell how we’re generating the combining char
              list.  For speed reasons, we use this to generate a static list and just  use  that
              later.

              Markus  Kuhn’s  list  of  combining  characters is more complete than what’s in the
              python unicodedata library but the  python  unicodedata  is  synced  against  later
              versions of the unicode database

              This is used to generate the _COMBINING table.

       kitchen.text.display._print_combining_table()
              Print out a new _COMBINING table

              This    will   print   a   new   _COMBINING   table   in   the   format   used   in
              kitchen/text/display.py.  It’s  useful  for  updating  the  _COMBINING  table  with
              updated  data  from  a new python as the format won’t change from what’s already in
              the file.

       kitchen.text.display._interval_bisearch(value, table)
              Binary search in an interval table.

              Parameters

                     • value – numeric value to search for

                     • table – Ordered list of intervals.  This is a  list  of  two-tuples.   The
                       elements of the two-tuple define an interval’s start and end points.

              Returns
                     If  value  is found within an interval in the table return True.  Otherwise,
                     False

              This function checks  whether  a  numeric  value  is  present  within  a  table  of
              intervals.   It checks using a binary search algorithm, dividing the list of values
              in half and checking against the values until it determines whether the value is in
              the table.

       kitchen.text.display._ucp_width(ucs, control_chars='guess')
              Get the textual width of a ucs character

              Parameters

                     • ucs – integer representing a single unicode code point

                     • control_chars –

                       specify how to deal with control characters.  Possible values are:

                       guess  (default)  will  take  a  guess for control character widths.  Most
                              codes will return zero width.  backspace, delete, and clear  delete
                              return  -1.   escape  currently  returns -1 as well but this is not
                              guaranteed as it’s not always correct

                       strict will raise ControlCharError if a control character is encountered

              Raises ControlCharError – if the code point is  a  unicode  control  character  and
                     control_chars is set to ‘strict’

              Returns
                     textual width of the character.

              NOTE:
                 It’s  important  to  remember  this  is  textual  width  and  not  the number of
                 characters or bytes.

       kitchen.text.display._textual_width_le(width, *args)
              Optimize the common case when deciding which textual width is larger

              Parameters

                     • width – textual width to compare against.

                     • *args – unicode strings to check the total textual width of

              Returns
                     True if the total length of args are less than or equal to width.  Otherwise
                     False.

              We  often  want to know “does X fit in Y”.  It takes a while to use textual_width()
              to calculate this.  However, we  know  that  the  number  of  canonically  composed
              unicode  characters  is  always  going  to  have  1  or 2 for the textual width per
              character.  With this we can take the following shortcuts:

              1. If the number of canonically composed characters is more than  width,  the  true
                 textual width cannot be less than width.

              2. If the number of canonically composed characters * 2 is less than the width then
                 the textual width must be ok.

              textual width of a canonically composed unicode string will always be greater  than
              or  equal  to  the  the number of unicode characters.  So we can first check if the
              number of composed unicode characters is less than the asked for width.  If  it  is
              we  can  return  True  immediately.   If  not, then we must do a full textual width
              lookup.

   Miscellaneous functions for manipulating text
       Collection of text functions that don’t fit in another category.

       Changed  in  version  kitchen:  1.2.0,  API:  kitchen.text  2.2.0  Added   isbasestring(),
       isbytestring(),  and  isunicodestring() to help tell which string type is which on python2
       and python3

       kitchen.text.misc.byte_string_valid_encoding(byte_string, encoding='utf-8')
              Detect if a byte str is valid in a specific encoding

              Parameters

                     • byte_string – Byte str to test for bytes not valid in this encoding

                     • encoding – encoding to test against.  Defaults to UTF-8.

              Returns
                     True if there  are  no  invalid  UTF-8  characters.   False  if  an  invalid
                     character is detected.

              NOTE:
                 This  function  checks  whether the byte str is valid in the specified encoding.
                 It does not detect whether the byte str actually was encoded in  that  encoding.
                 If   you   want   that   sort   of  functionality,  you  probably  want  to  use
                 guess_encoding() instead.

       kitchen.text.misc.byte_string_valid_xml(byte_string, encoding='utf-8')
              Check that a byte str would be valid in xml

              Parameters

                     • byte_string – Byte str to check

                     • encoding – Encoding of the xml file.  Default: UTF-8

              Returns
                     True if the string is valid.  False if it would be invalid in the xml file

              In some  cases  you’ll  have  a  whole  bunch  of  byte  strings  and  rather  than
              transforming  them to unicode and back to byte str for output to xml, you will just
              want to make sure they work with the xml file you’re constructing.   This  function
              will help you do that.  Example:

                 ARRAY_OF_MOSTLY_UTF8_STRINGS = [...]
                 processed_array = []
                 for string in ARRAY_OF_MOSTLY_UTF8_STRINGS:
                     if byte_string_valid_xml(string, 'utf-8'):
                         processed_array.append(string)
                     else:
                         processed_array.append(guess_bytes_to_xml(string, encoding='utf-8'))
                 output_xml(processed_array)

       kitchen.text.misc.guess_encoding(byte_string, disable_chardet=False)
              Try to guess the encoding of a byte str

              Parameters

                     • byte_string – byte str to guess the encoding of

                     • disable_chardet  –  If  this  is  True, we never attempt to use chardet to
                       guess the encoding.  This is useful if you need  to  have  reproducibility
                       whether chardet is installed or not.  Default: False.

              Raises TypeError – if byte_string is not a byte str type

              Returns
                     string  containing  a  guess  at  the  encoding  of  byte_string.   This  is
                     appropriate to pass as the encoding  argument  when  encoding  and  decoding
                     unicode strings.

              We  start  by attempting to decode the byte str as UTF-8.  If this succeeds we tell
              the world it’s UTF-8 text.  If it doesn’t and chardet is installed  on  the  system
              and  disable_chardet  is  False  this  function  will  use  it to try detecting the
              encoding of byte_string.  If it is not installed or chardet  cannot  determine  the
              encoding  with a high enough confidence then we rather arbitrarily claim that it is
              latin-1.  Since latin-1 will encode to every byte, decoding from latin-1 to unicode
              will not cause UnicodeErrors although the output might be mangled.

       kitchen.text.misc.html_entities_unescape(string)
              Substitute unicode characters for HTML entities

              Parameters
                     string – unicode string to substitute out html entities

              Raises TypeError – if something other than a unicode string is given

              Return type
                     unicode string

              Returns
                     The plain text without html entities

       kitchen.text.misc.isbasestring(obj)
              Determine if obj is a byte str or unicode string

              In python2 this is eqiuvalent to isinstance(obj, basestring).  In python3 it checks
              whether the object is an instance of str, bytes, or bytearray.  This is an  aid  to
              porting  code  that needed to test whether an object was derived from basestring in
              python2 (commonly used in unicode-bytes conversion functions)

              Parameters
                     obj – Object to test

              Returns
                     True if the object is a basestring.  Otherwise False.

              New in version Kitchen:: 1.2.0, API kitchen.text 2.2.0

       kitchen.text.misc.isbytestring(obj)
              Determine if obj is a byte str

              In python2 this is equivalent  to  isinstance(obj,  str).   In  python3  it  checks
              whether the object is an instance of bytes or bytearray.

              Parameters
                     obj – Object to test

              Returns
                     True if the object is a byte str.  Otherwise, False.

              New in version Kitchen:: 1.2.0, API kitchen.text 2.2.0

       kitchen.text.misc.isunicodestring(obj)
              Determine if obj is a unicode string

              In  python2  this  is equivalent to isinstance(obj, unicode).  In python3 it checks
              whether the object is an instance of str.

              Parameters
                     obj – Object to test

              Returns
                     True if the object is a unicode string.  Otherwise, False.

              New in version Kitchen:: 1.2.0, API kitchen.text 2.2.0

       kitchen.text.misc.process_control_chars(string, strategy='replace')
              Look for and transform control characters in a string

              Parameters

                     • string – string to search for and transform control characters within

                     • strategy –

                       XML does not allow ASCII control characters.  When we encounter  those  we
                       need to know what to do.  Valid options are:

                       replace
                              (default) Replace the control characters with "?"

                       ignore Remove the characters altogether from the output

                       strict Raise a ControlCharError when we encounter a control character

              Raises

                     • TypeError – if string is not a unicode string.

                     • ValueError – if the strategy is not one of replace, ignore, or strict.

                     • kitchen.text.exceptions.ControlCharError – if the strategy is strict and a
                       control character is present in the string

              Returns
                     unicode string with no control characters in it.

              Changed in version kitchen: 1.2.0, API: kitchen.text 2.2.0 Strip out the C1 control
              characters in addition to the C0 control characters.

       kitchen.text.misc.str_eq(str1, str2, encoding='utf-8', errors='replace')
              Compare two strings, converting to byte str if one is unicode

              Parameters

                     • str1 – First string to compare

                     • str2 – Second string to compare

                     • encoding  –  If  we need to convert one string into a byte str to compare,
                       the encoding to use.  Default is utf-8.

                     • errors – What to do if we encounter errors when encoding the string.   See
                       the  kitchen.text.converters.to_bytes() documentation for possible values.
                       The default is replace.

              This function prevents UnicodeError (python-2.4 or less) and UnicodeWarning (python
              2.5  and  higher)  when  we  compare  a  unicode  string to a byte str.  The errors
              normally arise because the conversion is done to ASCII.   This  function  lets  you
              convert to utf-8 or another encoding instead.

              NOTE:
                 When we need to convert one of the strings from unicode in order to compare them
                 we convert the unicode string into a byte str.   That  means  that  strings  can
                 compare differently if you use different encodings for each.

              Note that str1 == str2 is faster than this function if you can accept the following
              limitations:

              • Limited to python-2.5+ (otherwise a UnicodeDecodeError may be thrown)

              • Will generate a UnicodeWarning if non-ASCII  byte  str  is  compared  to  unicode
                string.

   UTF-8
       Functions for operating on byte str encoded as UTF-8

       NOTE:
          In many cases, it is better to convert to unicode, operate on the strings, then convert
          back to UTF-8.  unicode type can handle many of these functions itself.  For those that
          it  doesn’t  (removing  control  characters from length calculations, for instance) the
          code to do so with a unicode type is often simpler.

       WARNING:
          All of the functions in this module are deprecated.  Most of them  have  been  replaced
          with    functions   that   operate   on   unicode   values   in   kitchen.text.display.
          kitchen.text.utf8.utf8_valid() has been replaced with a function in kitchen.text.misc.

       kitchen.text.utf8.utf8_text_fill(text, *args, **kwargs)
              Deprecated Similar to textwrap.fill() but understands  utf-8  strings  and  doesn’t
              screw up lists/blocks/etc.

              Use kitchen.text.display.fill() instead.

       kitchen.text.utf8.utf8_text_wrap(text, width=70, initial_indent='', subsequent_indent='')
              Deprecated  Similar to textwrap.wrap() but understands utf-8 data and doesn’t screw
              up lists/blocks/etc

              Use kitchen.text.display.wrap() instead

       kitchen.text.utf8.utf8_valid(msg)
              Deprecated Detect if a string is valid utf-8

              Use kitchen.text.misc.byte_string_valid_encoding() instead.

       kitchen.text.utf8.utf8_width(msg)
              Deprecated Get the textual width of a utf-8 string

              Use kitchen.text.display.textual_width() instead.

       kitchen.text.utf8.utf8_width_chop(msg, chop=None)
              Deprecated Return a string chopped to a given textual width

              Use textual_width_chop() and textual_width() instead:

                 >>> msg = 'く ku ら ra と to み mi'
                 >>> # Old way:
                 >>> utf8_width_chop(msg, 5)
                 (5, 'く ku')
                 >>> # New way
                 >>> from kitchen.text.converters import to_bytes
                 >>> from kitchen.text.display import textual_width, textual_width_chop
                 >>> (textual_width(msg), to_bytes(textual_width_chop(msg, 5)))
                 (5, 'く ku')

       kitchen.text.utf8.utf8_width_fill(msg, fill, chop=None, left=True, prefix='', suffix='')
              Deprecated Pad a utf-8 string to fill a specified width

              Use byte_string_textual_width_fill() instead

       converters
              deals with converting text for different encodings and to and from XML

       display
              deals with issues with printing text to a screen

       misc   is a catchall for text manipulation functions that don’t seem to fit elsewhere

       utf8   contains deprecated functions to manipulate utf8 byte strings

   Kitchen.collections
   StrictDict
       kitchen.collections.StrictDict provides a  dictionary  that  treats  str  and  unicode  as
       distinct key values.

       class kitchen.collections.strictdict.StrictDict
              Map class that considers unicode and str different keys

              Ordinarily  when you are dealing with a dict keyed on strings you want to have keys
              that have the same characters end up in the same bucket even if one key is  unicode
              and  the  other is a byte str.  The normal dict type does this for ASCII characters
              (but not for anything outside of the ASCII range.)

              Sometimes, however, you want to keep the two string classes strictly separate,  for
              instance, if you’re creating a single table that can map from unicode characters to
              str characters and vice versa.  This class will help you  do  that  by  making  all
              unicode keys evaluate to a different key than all str keys.

              SEE ALSO:

                 dict   for documentation on this class’s methods.  This class implements all the
                        standard dict methods.  Its treatment of unicode and str keys as separate
                        is the only difference.

   Kitchen.iterutils Module
       Functions to manipulate iterables

       New in version Kitchen:: 0.2.1a1

       Module author: Toshio Kuratomi <toshio@fedoraproject.org>

       Module author: Luke Macken <lmacken@redhat.com>

       kitchen.iterutils.isiterable(obj, include_string=False)
              Check whether an object is an iterable

              Parameters

                     • obj – Object to test whether it is an iterable

                     • include_string  –  If  True  and  obj is a byte str or unicode string this
                       function will return True.  If set to False, byte str and unicode  strings
                       will cause this function to return False.  Default False.

              Returns
                     True if obj is iterable, otherwise False.

       kitchen.iterutils.iterate(obj, include_string=False)
              Generator that can be used to iterate over anything

              Parameters

                     • obj – The object to iterate over

                     • include_string  –  if  True,  treat strings as iterables.  Otherwise treat
                       them as a single scalar value.  Default False

              This function will create an iterator out of any scalar or iterable.  It is  useful
              for making a value given to you an iterable before operating on it.  Iterables have
              their items returned.  scalars are transformed into iterables.  A string is treated
              as  a  scalar  value  unless  the include_string parameter is set to True.  Example
              usage:

                 >>> list(iterate(None))
                 [None]
                 >>> list(iterate([None]))
                 [None]
                 >>> list(iterate([1, 2, 3]))
                 [1, 2, 3]
                 >>> list(iterate(set([1, 2, 3])))
                 [1, 2, 3]
                 >>> list(iterate(dict(a='1', b='2')))
                 ['a', 'b']
                 >>> list(iterate(1))
                 [1]
                 >>> list(iterate(iter([1, 2, 3])))
                 [1, 2, 3]
                 >>> list(iterate('abc'))
                 ['abc']
                 >>> list(iterate('abc', include_string=True))
                 ['a', 'b', 'c']

   Helpers for versioning software
   PEP-386 compliant versioning
       PEP 386 defines a standard format for version strings.  This module  contains  a  function
       for creating strings in that format.

       kitchen.versioning.version_tuple_to_string(version_info)
              Return a PEP 386 version string from a PEP 386 style version tuple

              Parameters
                     version_info  –  Nested set of tuples that describes the version.  See below
                     for an example.

              Returns
                     a version string

              This function implements just enough of PEP 386 to  satisfy  our  needs.   PEP  386
              defines a standard format for version strings and refers to a function that will be
              merged into the  python  standard  library  that  transforms  a  tuple  of  version
              information  into a standard version string.  This function is an implementation of
              that function.  Once that function becomes available in the python standard library
              we will start using it and deprecate this function.

              version_info takes the form that PEP 386’s NormalizedVersion.from_parts() uses:

                 ((Major, Minor, [Micros]), [(Alpha/Beta/rc marker, version)],
                     [(post/dev marker, version)])

                 Ex: ((1, 0, 0), ('a', 2), ('dev', 3456))

              It generates a PEP 386 compliant version string:

                 N.N[.N]+[{a|b|c|rc}N[.N]+][.postN][.devN]

                 Ex: 1.0.0a2.dev3456

              WARNING:
                 This  function  does  next to no error checking.  It’s up to the person defining
                 the version tuple to make sure that the values  make  sense.   If  the  PEP  386
                 compliant  version  parser  doesn’t  get released soon we’ll look at making this
                 function check that the version tuple makes sense before transforming it into  a
                 string.

              It’s  recommended  that  you use this function to keep a __version_info__ tuple and
              __version__ string in your modules.  Why do we need both a tuple and a string?  The
              string  is  often  useful  for  putting  into human readable locations like release
              announcements, version strings in tarballs, etc.  Meanwhile the tuple is very  easy
              for  a  computer  to  compare. For example, kitchen sets up its version information
              like this:

                 from kitchen.versioning import version_tuple_to_string
                 __version_info__ = ((0, 2, 1),)
                 __version__ = version_tuple_to_string(__version_info__)

              Other programs that depend on a kitchen version between 0.2.1 and  0.3.0  can  find
              whether the present version is okay with code like this:

                 from kitchen import __version_info__, __version__
                 if __version_info__ < ((0, 2, 1),) or __version_info__ >= ((0, 3, 0),):
                     print 'kitchen is present but not at the right version.'
                     print 'We need at least version 0.2.1 and less than 0.3.0'
                     print 'Currently found: kitchen-%s' % __version__

   Exceptions
       Kitchen  has  a  hierarchy  of  exceptions  that  should make it easy to catch many errors
       emitted by kitchen itself.

   Base kitchen exceptions
       Exception classes for kitchen and the root of the  exception  hierarchy  for  all  kitchen
       modules.

       exception kitchen.exceptions.KitchenError
              Base exception class for any error thrown directly by kitchen.

   Kitchen.text exceptions
       Exception classes thrown by kitchen’s text processing routines.

       exception kitchen.text.exceptions.XmlEncodeError
              Exception thrown by error conditions when encoding an xml string.

       exception kitchen.text.exceptions.ControlCharError
              Exception thrown when an ascii control character is encountered.

   1.0.0 Porting Guide
       The   0.1   through  1.0.0  releases  focused  on  bringing  in  functions  from  yum  and
       python-fedora.  This porting guide tells how to port from  those  APIs  to  their  kitchen
       replacements.

   python-fedora
                 ┌──────────────────────────────┬──────────────────────────────────────┐
                 │python-fedora                 │ kitchen replacement                  │
                 ├──────────────────────────────┼──────────────────────────────────────┤
                 │fedora.iterutils.isiterable() │ kitchen.iterutils.isiterable()       │
                 │                              │ [1]                                  │
                 ├──────────────────────────────┼──────────────────────────────────────┤
                 │fedora.textutils.to_unicode() │ kitchen.text.converters.to_unicode() │
                 ├──────────────────────────────┼──────────────────────────────────────┤
                 │fedora.textutils.to_bytes()   │ kitchen.text.converters.to_bytes()   │
                 └──────────────────────────────┴──────────────────────────────────────┘

       [1]  isiterable()  has  changed  slightly  in  kitchen.   The include_string attribute has
            switched its default value from True to False.  So you need to change code like:

          >>> # Old code
          >>> isiterable('abcdef')
          True
          >>> # New code
          >>> isiterable('abcdef', include_string=True)
          True

   yum
             ┌────────────────────────────┬────────────────────────────────────────────────┐
             │yum                         │ kitchen replacement                            │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.dummy_wrapper()    │ kitchen.i18n.DummyTranslations.ugettext()      │
             │                            │ [2]                                            │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.dummyP_wrapper()   │ kitchen.i18n.DummyTanslations.ungettext()      │
             │                            │ [2]                                            │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.utf8_width()       │ kitchen.text.display.textual_width()           │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.utf8_width_chop()  │ kitchen.text.display.textual_width_chop()      │
             │                            │ and  kitchen.text.display.textual_width()      │
             │                            │ [3] [5]                                        │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.utf8_valid()       │ kitchen.text.misc.byte_string_valid_encoding() │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.utf8_text_wrap()   │ kitchen.text.display.wrap() [4]                │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.utf8_text_fill()   │ kitchen.text.display.fill() [4]                │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.to_unicode()       │ kitchen.text.converters.to_unicode() [6]       │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.to_unicode_maybe() │ kitchen.text.converters.to_unicode() [6]       │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.to_utf8()          │ kitchen.text.converters.to_bytes() [6]         │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.to_str()           │ kitchen.text.converters.to_unicode()        or │
             │                            │ kitchen.text.converters.to_bytes() [7]         │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.str_eq()           │ kitchen.text.misc.str_eq()                     │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.misc.to_xml()           │ kitchen.text.converters.unicode_to_xml()    or │
             │                            │ kitchen.text.converters.byte_string_to_xml()   │
             │                            │ [8]                                            │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n._()                │ See: Initializing Yum i18n                     │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.P_()               │ See: Initializing Yum i18n                     │
             ├────────────────────────────┼────────────────────────────────────────────────┤
             │yum.i18n.exception2msg()    │ kitchen.text.converters.exception_to_unicode() │
             │                            │ or kitchen.text.converter.exception_to_bytes() │
             │                            │ [9]                                            │
             └────────────────────────────┴────────────────────────────────────────────────┘

       [2]  These yum methods provided fallback support for  gettext  functions  in  case  either
            gaftonmode was set or gettext failed to return an object.  In kitchen, we can use the
            kitchen.i18n.DummyTranslations object to fulfill that role.  Please see  Initializing
            Yum i18n for more suggestions on how to do this.

       [3]  The  yum  version of these functions returned a byte str.  The kitchen version listed
            here  returns  a  unicode  string.   If   you   need   a   byte   str   simply   call
            kitchen.text.converters.to_bytes() on the result.

       [4]  The yum version of these functions would return either a byte str or a unicode string
            depending on what the input value was.  The kitchen version  always  returns  unicode
            strings.

       [5]  yum.i18n.utf8_width_chop()  performed  two  functions.   It returned the piece of the
            message that fit in a specified width and the width of that message.  In kitchen, you
            need to call two functions, one for each action:

          >>> # Old way
          >>> utf8_width_chop(msg, 5)
          (5, 'く ku')
          >>> # New way
          >>> from kitchen.text.display import textual_width, textual_width_chop
          >>> (textual_width(msg), textual_width_chop(msg, 5))
          (5, u'く ku')

       [6]  If  the  yum  version  of  to_unicode() or to_utf8() is given an object that is not a
            string, it  returns  the  object  itself.   kitchen.text.converters.to_unicode()  and
            kitchen.text.converters.to_bytes()  default to returning the simplerepr of the object
            instead.  If you want the yum behaviour, set the nonstring parameter to passthru:

          >>> from kitchen.text.converters import to_unicode
          >>> to_unicode(5)
          u'5'
          >>> to_unicode(5, nonstring='passthru')
          5

       [7]  yum.i18n.to_str() could return either a byte str.  or a unicode string In kitchen you
            can  get  the  same  effect  but  you  get to choose whether you want a byte str or a
            unicode string.  Use to_bytes() for str and to_unicode() for unicode.

       [8]  yum.misc.to_xml() was buggy as written.  I think the intention was for you to be able
            to  pass a byte str or unicode string in and get out a byte str that was valid to use
            in an xml file.  The two kitchen functions byte_string_to_xml() and  unicode_to_xml()
            do that for each string type.

       [9]  When  porting  yum.i18n.exception2msg()  to use kitchen, you should setup two wrapper
            functions to aid in your port.  They’ll look like this:

          from kitchen.text.converters import EXCEPTION_CONVERTERS, \
              BYTE_EXCEPTION_CONVERTERS, exception_to_unicode, \
              exception_to_bytes
          def exception2umsg(e):
              '''Return a unicode representation of an exception'''
              c = [lambda e: e.value]
              c.extend(EXCEPTION_CONVERTERS)
              return exception_to_unicode(e, converters=c)
          def exception2bmsg(e):
              '''Return a utf8 encoded str representation of an exception'''
              c = [lambda e: e.value]
              c.extend(BYTE_EXCEPTION_CONVERTERS)
              return exception_to_bytes(e, converters=c)

       The reason to define this wrapper is that many of the exceptions in yum put the message in
       the  value  attribute of the Exception instead of adding it to the args attribute.  So the
       default EXCEPTION_CONVERTERS don’t know where to find  the  message.   The  wrapper  tells
       kitchen  to  check the value attribute for the message.  The reason to define two wrappers
       may be less obvious.  yum.i18n.exception2msg() can return a unicode string or a  byte  str
       depending on a combination of what attributes are present on the Exception and what locale
       the function is being run in.  By contrast, kitchen.text.converters.exception_to_unicode()
       only returns unicode strings and kitchen.text.converters.exception_to_bytes() only returns
       byte str.  This is much safer as it keeps code that can only handle unicode or only handle
       byte str correctly from getting the wrong type when an input changes but it means you need
       to examine the calling  code  when  porting  from  yum.i18n.exception2msg()  and  use  the
       appropriate wrapper.

   Initializing Yum i18n
       Previously,  yum  had  several  pieces  of  code to initialize i18n.  From the toplevel of
       yum/i18n.py:

          try:.
              '''
              Setup the yum translation domain and make _() and P_() translation wrappers
              available.
              using ugettext to make sure translated strings are in Unicode.
              '''
              import gettext
              t = gettext.translation('yum', fallback=True)
              _ = t.ugettext
              P_ = t.ungettext
          except:
              '''
              Something went wrong so we make a dummy _() wrapper there is just
              returning the same text
              '''
              _ = dummy_wrapper
              P_ = dummyP_wrapper

       With kitchen, this can be changed to this:

          from kitchen.i18n import easy_gettext_setup, DummyTranslations
          try:
              _, P_ = easy_gettext_setup('yum')
          except:
              translations = DummyTranslations()
              _ = translations.ugettext
              P_ = translations.ungettext

       NOTE:
          In overcoming-frustration, it  is  mentioned  that  for  some  things  (like  exception
          messages),  using  the  byte  str  oriented  functions is more appropriate.  If this is
          desired, the setup portion is only a second call to kitchen.i18n.easy_gettext_setup():

              b_, bP_ = easy_gettext_setup('yum', use_unicode=False)

       The second place where i18n is setup is in yum.YumBase._getConfig() in  yum/__init_.py  if
       gaftonmode is in effect:

          if startupconf.gaftonmode:
              global _
              _ = yum.i18n.dummy_wrapper

       This can be changed to:

          if startupconf.gaftonmode:
              global _
              _ = DummyTranslations().ugettext()

   Conventions for contributing to kitchen
   Style
       • Strive to be PEP 8 compliant

       • Run :command:`pylint ` over the code and try to resolve most of its nitpicking

   Python 2.4 compatibility
       At  the  moment,  we’re supporting python-2.4 and above.  Understand that there’s a lot of
       python features that we cannot use because of this.

       Sometimes modules in the python standard library can be added to kitchen so  that  they’re
       available.  When we do that we need to be careful of several things:

       1. Keep   the   module   in   sync   with  the  version  in  the  python-2.x  trunk.   Use
          maintainers/sync-copied-files.py for this.

       2. Sync the unittests as well as the module.

       3. Be aware that not all modules are written to  remain  compatible  with  Python-2.4  and
          might  use  python language features that were not present then (generator expressions,
          relative imports, decorators, with, try: with both except: and  finally:,  etc)   These
          are  not  good  candidates for importing into kitchen as they require more work to keep
          synced.

   Unittests
       • At least smoketest your code (make sure a function will return expected values  for  one
         set of inputs).

       • Note  that  even  100%  coverage  is  not  a guarantee of working code!  Good tests will
         realize that you need to also give multiple inputs that test the code  paths  of  called
         functions that are outside of your code.  Example:

            def to_unicode(msg, encoding='utf8', errors='replace'):
                return unicode(msg, encoding, errors)

            # Smoketest only.  This will give 100% coverage for your code (it
            # tests all of the code inside of to_unicode) but it leaves a lot of
            # room for errors as it doesn't test all combinations of arguments
            # that are then passed to the unicode() function.

            tools.ok_(to_unicode('abc') == u'abc')

            # Better -- tests now cover non-ascii characters and that error conditions
            # occur properly.  There's a lot of other permutations that can be
            # added along these same lines.
            tools.ok_(to_unicode(u'café', 'utf8', 'replace'))
            tools.assert_raises(UnicodeError, to_unicode, [u'cafè ñunru'.encode('latin1')])

       • We’re  using  nose  for unittesting.  Rather than depend on unittest2 functionality, use
         the functions that nose provides.

       • Remember to maintain python-2.4 compatibility even in unittests.

   Docstrings and documentation
       We use sphinx to build our documentation.  We use the sphinx  autodoc  extension  to  pull
       docstrings  out  of  the  modules  for  API documentation.  This means that docstrings for
       subpackages and modules should follow a certain pattern.  The general structure is:

       • Introductory material about a module in the module’s top level docstring.

         • Introductory material should begin with a level two title: an overbar and underbar  of
           ‘-‘.

       • docstrings for every function.

         • The first line is a short summary of what the function does

         • This is followed by a blank line

         • The           next          lines          are          a          field          list
           <http://sphinx.pocoo.org/markup/desc.html#info-field-lists>_ giving information  about
           the  function’s  signature.   We  use  the  keywords: arg, kwarg, raises, returns, and
           sometimes rtype.  Use these to describe all arguments, key word arguments,  exceptions
           raised, and return values using these.

           • Parameters that are kwarg should specify what their default behaviour is.

   Kitchen versioning
       Currently  the  kitchen  library  is  in early stages of development.  While we’re in this
       state, the main kitchen library uses the following pattern for version information:

       •

         Versions look like this::
                __version_info__ = ((0, 1, 2),) __version__ = ‘0.1.2’

       • The Major version number remains at 0 until we decide to make the first 1.0  release  of
         kitchen.  At that point, we’re declaring that we have some confidence that we won’t need
         to break backwards compatibility for a while.

       • The Minor version increments for any backwards incompatible API changes.  When  this  is
         updated, we reset micro to zero.

       • The  Micro  version  increments for any other changes (backwards compatible API changes,
         pure bugfixes, etc).

       NOTE:
          Versioning is only updated for releases that generate sdists and  new  uploads  to  the
          download  directory.   Usually  we  update the version information for the library just
          before release.  By contrast, we update kitchen Versioning when an API change is  made.
          When in doubt, look at the version information in the last release.

   I18N
       All  strings  that  are used as feedback for users need to be translated.  kitchen sets up
       several functions for this.  _() is used for marking things that are shown  to  users  via
       print,  GUIs,  or  other “standard” methods.  Strings for exceptions are marked with b_().
       This function returns a byte str which is needed for use with exceptions:

          from kitchen import _, b_

          def print_message(msg, username):
              print _('%(user)s, your message of the day is:  %(message)s') % {
                      'message': msg, 'user': username}

              raise Exception b_('Test message')

       This serves several purposes:

       • It marks the strings to be extracted by an xgettext-like program.

       • _() is a function that will substitute available translations at runtime.

       NOTE:
          By using the %()s with dict style of string formatting, we make this string friendly to
          translators that may need to reorder the variables when they’re translating the string.

       paver          <http://www.blueskyonmars.com/projects/paver/>_          and          babel
       <http://babel.edgewall.org/>_ are used to extract the strings.

   API updates
       Kitchen strives to have a long deprecation cycle so that people have time to  switch  away
       from any APIs that we decide to discard.  Discarded APIs should raise a DeprecationWarning
       and clearly state in the warning message and the docstring how to convert old code to  use
       the new interface.  An example of deprecating a function:

          import warnings

          from kitchen import _
          from  kitchen.text.converters import to_bytes, to_unicode
          from kitchen.text.new_module import new_function

          def old_function(param):
              '''**Deprecated**

              This function is deprecated.  Use
              :func:`kitchen.text.new_module.new_function` instead. If you want
              unicode strngs as output, switch to::

                  >>> from kitchen.text.new_module import new_function
                  >>> output = new_function(param)

              If you want byte strings, use::

                  >>> from kitchen.text.new_module import new_function
                  >>> from kitchen.text.converters import to_bytes
                  >>> output = to_bytes(new_function(param))
              '''
              warnings.warn(_('kitchen.text.old_function is deprecated.  Use'
                  ' kitchen.text.new_module.new_function instead'),
                  DeprecationWarning, stacklevel=2)

              as_unicode = isinstance(param, unicode)
              message = new_function(to_unicode(param))
              if not as_unicode:
                  message = to_bytes(message)
              return message

       If  a particular API change is very intrusive, it may be better to create a new version of
       the subpackage and ship both the old version and the new version.

   NEWS file
       Update the NEWS file when you make a change that will be visible to the  users.   This  is
       not a ChangeLog file so we don’t need to list absolutely everything but it should give the
       user an idea of how this version differs from  prior  versions.   API  changes  should  be
       listed here explicitly.  bugfixes can be more general:

          -----
          0.2.0
          -----
          * Relicense to LGPLv2+
          * Add kitchen.text.format module with the following functions:
            textual_width, textual_width_chop.
          * Rename the kitchen.text.utils module to kitchen.text.misc.  use of the
            old names is deprecated but still available.
          * bugfixes applied to kitchen.pycompat24.defaultdict that fixes some
            tracebacks

   Kitchen subpackages
       Kitchen  itself  is  a  namespace.   The  kitchen  sdist (tarball) provides certain useful
       subpackages.

       SEE ALSO:

          Kitchen addon packages
                 For information about subpackages not distributed  in  the  kitchen  sdist  that
                 install into the kitchen namespace.

   Versioning
       Each  subpackage should have its own version information which is independent of the other
       kitchen subpackages and the main kitchen library version. This is used so that  code  that
       depends on kitchen APIs can check the version information.  The standard way to do this is
       to put something like this in the subpackage’s __init__.py:

          from kitchen.versioning import version_tuple_to_string

          __version_info__ = ((1, 0, 0),)
          __version__ = version_tuple_to_string(__version_info__)

       __version_info__ is documented in kitchen.versioning.   The  values  of  the  first  tuple
       should  describe  API  changes to the module.  There are at least three numbers present in
       the tuple: (Major, minor, micro).  The major version number is for backwards  incompatible
       changes  (For  instance,  removing  a  function,  or  adding a new mandatory argument to a
       function).  Whenever one of these occurs, you should increment the major number and  reset
       minor  and  micro  to  zero.   The  second  number  is the minor version.  Anytime new but
       backwards compatible changes are introduced this number  should  be  incremented  and  the
       micro version number reset to zero.  The micro version should be incremented when a change
       is made that does not change the API at all.  This is a  common  case  for  bugfixes,  for
       instance.

       Version  information  beyond  the  first  three parts of the first tuple may be useful for
       versioning but semantically have similar meaning to the micro version.

       NOTE:
          We update the __version_info__ tuple when the API is updated.  This  way  there’s  less
          chance of forgetting to update the API version when a new release is made.  However, we
          try to only increment the version numbers  a  single  step  for  any  release.   So  if
          kitchen-0.1.0  has  kitchen.text.__version__  ==  ‘1.0.1’,  kitchen-0.1.1  should  have
          kitchen.text.__version__ == ‘1.0.2’ or ‘1.1.0’ or ‘2.0.0’.

   Criteria for subpackages in kitchen
       Subpackages within kitchen should meet these criteria:

       • Generally useful or needed for other pieces of kitchen.

       • No mandatory requirements outside of the python standard library.

         • Optional requirements from outside the python standard library  are  allowed.   Things
           with mandatory requirements are better placed in kitchen addon packages

       • Somewhat  API  stable  – this is not a hard requirement.  We can change the kitchen api.
         However, it is better not to as people may come to depend on it.

         SEE ALSO:
            API Updates

   Kitchen addon packages
       Addon packages are very similar to subpackages integrated into the  kitchen  sdist.   This
       section just lists some of the differences to watch out for.

   setup.py
       Your setup.py should contain entries like this:

          # It's suggested to use a dotted name like this so the package is easily
          # findable on pypi:
          setup(name='kitchen.config',
              # Include kitchen in the keywords, again, for searching on pypi
              keywords=['kitchen', 'configuration'],
              # This package lives in the directory kitchen/config
              packages=['kitchen.config'],
              # [...]
          )

   Package directory layout
       Create  a  kitchen  directory  in the toplevel.  Place the addon subpackage in there.  For
       example:

          ./                     <== toplevel with README, setup.py, NEWS, etc
          kitchen/
          kitchen/__init__.py
          kitchen/config/        <== subpackage directory
          kitchen/config/__init__.py

   Fake kitchen module
       The :file::__init__.py in the kitchen directory is special.  It won’t  be  installed.   It
       just  needs  to  pull  in  the  kitchen  from the system so that you are able to test your
       module.  You should be able to use this boilerplate:

          # Fake module.  This is not installed,  It's just made to import the real
          # kitchen modules for testing this module
          import pkgutil

          # Extend the __path__ with everything in the real kitchen module
          __path__ = pkgutil.extend_path(__path__, __name__)

       NOTE:
          kitchen  needs  to  be  findable  by  python  for  this  to  work.   Installed  in  the
          site-packages directory or adding it to the PYTHONPATH will work.

       Your unittests should now be able to find both your submodule and the main kitchen module.

   Versioning
       It   is   recommended   that   addon   packages  version  similarly  to  Versioning.   The
       __version_info__ and __version__ strings can be  changed  independently  of   the  version
       exposed  by  setup.py  so that you have both an API version (__version_info__) and release
       version that’s easier for people to parse.  However, you aren’t required to  do  this  and
       you could follow a different methodology if you want (for instance, Kitchen versioning)

   Glossary
       “Everything but the kitchen sink”
              An English idiom meaning to include nearly everything that you can think of.

       API version
              Version  that  is  meant  for  computer  consumption.  This version is parsable and
              comparable by computers.  It contains information about a  library’s  API  so  that
              computer software can decide whether it works with the software.

       ASCII  A character encoding that maps numbers to characters essential to American English.
              It maps 128 characters using 7bits.

              SEE ALSO:
                 http://en.wikipedia.org/wiki/ASCII

       ASCII compatible
              An encoding in which the particular byte that maps to  a  character  in  the  ASCII
              character  set  is  only  used to map to that character.  This excludes EBDIC based
              encodings and many multi-byte fixed and variable width encodings since  they  reuse
              the  bytes that make up the ASCII encoding for other purposes.  UTF-8 is notable as
              a variable width encoding that is ASCII compatible.

              SEE ALSO:

                 http://en.wikipedia.org/wiki/Variable-width_encoding
                        For another explanation of various ways bytes are mapped to characters in
                        a possibly incompatible manner.

       code points
              code point

       code point
              A number that maps to a particular abstract character.  Code points make it so that
              we have a number pointing to a  character  without  worrying  about  implementation
              details of how those numbers are stored for the computer to read.  Encodings define
              how the code points map to particular sequences of bytes on disk  and in memory.

       control characters
              control character

       control character
              The set of characters in unicode that are  used,  not  to  display  glyphs  on  the
              screen, but to tell the display in program to do something.

              SEE ALSO:
                 http://en.wikipedia.org/wiki/Control_character

       grapheme
              characters  or  pieces  of characters that you might write on a page to make words,
              sentences, or other pieces of text.

              SEE ALSO:
                 http://en.wikipedia.org/wiki/Grapheme

       I18N   I18N is an abbreviation for internationalization.  It’s often used to  signify  the
              need  to  translate  words,  number and date formats, and other pieces of data in a
              computer program so that it will work well for people who  speak  another  language
              than yourself.

       message catalogs
              message catalog

       message catalog
              Message  catalogs contain translations for user-visible strings that are present in
              your code.  Normally, you need to mark the strings to  be  translated  by  wrapping
              them in one of several gettext functions.  The function serves two purposes:

              1. It allows automated tools to find which strings are supposed to be extracted for
                 translation.

              2. The functions perform the translation when the program is running.

              SEE ALSO:
                 babel’s documentation
                     for one method of extracting message catalogs from source code.

       Murphy’s Law
              “Anything that can go wrong, will go wrong.”

              SEE ALSO:
                 http://en.wikipedia.org/wiki/Murphy%27s_Law

       release version
              Version that is meant for human consumption.  This version is easy for a  human  to
              look  at  to  decide  how  a  particular  version  relates to other versions of the
              software.

       textual width
              The amount of horizontal space a character takes up on a  monospaced  screen.   The
              units are number of character cells or columns that it takes the place of.

       UTF-8  A  character encoding that maps all unicode code points to a sequence of bytes.  It
              is compatible with ASCII.  It uses a variable number of  bytes  to  encode  all  of
              unicode.   ASCII  characters take one byte.  Characters from other parts of unicode
              take two to four bytes.  It is widespread as an encoding on  the  internet  and  in
              Linux.

INDICES AND TABLES

       • genindex

       • modindex

       • search

PROJECT PAGES

       More information about the project can be found on the project webpage

       The latest published version of this documentation can be found on the documentation page

COPYRIGHT

       2017 Red Hat, Inc. and others