Unicode

Pulling your hairs over some i18n bug or you fix it but are not able to explain what. This is little help in getting fair idea about unicode/codecs/encoding/decoding etc.

Quick tips:

  1. It does not make sense to have a string without knowing what encoding it uses.
  2. Utf-8 is a way of storing string of Unicode code points.
  3. Encoding: Transforming a unicode object into a sequence of bytes
  4. Decoding: Recreating the unicode object from the sequence of bytes is known as decoding. There are many different methods for how this transformation can be done (these methods are also called encodings).

Now

Must Read

  1. http://www.joelonsoftware.com/articles/Unicode.html
  2. http://stackoverflow.com/questions/447107/whats-the-difference-between-encode-decode-python-2-x

Continue reading

  1. http://farmdev.com/talks/unicode/
  2. http://diveintopython.org/xml_processing/unicode.html
  3. http://stackoverflow.com/questions/440320/unicode-vs-str-decode-for-a-utf8-encoded-byte-string-python-2-x