../

Unicode

Pulling your hair over some i18n bug? Or you fix one but are not able to explain it. This is little help in getting an idea about unicode/codecs/encoding/decoding etc.

Quick tips:

a. It does not make sense to have a string without knowing what encoding it uses.

b. Utf-8 is a way of storing string of Unicode code points.

c. Encoding: Transforming a unicode object into a sequence of bytes

d. Decoding: Recreating the unicode object from the sequence of bytes is known as decoding. There are many different methods for how this transformation can be done (these methods are also called encodings).

Now

Must Read

  1. http://www.joelonsoftware.com/articles/Unicode.html
  2. http://stackoverflow.com/questions/447107/whats-the-difference-between-encode-decode-python-2-x

Continue reading

  1. http://farmdev.com/talks/unicode/
  2. http://diveintopython.org/xml_processing/unicode.html
  3. http://stackoverflow.com/questions/440320/unicode-vs-str-decode-for-a-utf8-encoded-byte-string-python-2-x