[Python-talk] unicode handling in older Python versions

Arc Riley arcriley at gmail.com
Sat Oct 3 10:10:06 EDT 2009


Been focusing on 3.1.1, what we found is that the attached script returns:
'\ud801\udc51'
'\U00010451'

This was attached to ensure it transfers properly over the email list :-)

And, sadly, the workaround is adding .encode('utf-16').decode('utf-16').  It
appears that utf-8 support is bugged.

Make sure that you have a "wide" Python build for this, you can test that
with:
>>> import sys
>>> sys.maxunicode
1114111

A narrow build will report 65536.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://dlslug.org/pipermail/python-talk/attachments/20091003/ced8d0ad/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: u.py
Type: text/x-python
Size: 71 bytes
Desc: not available
URL: <http://dlslug.org/pipermail/python-talk/attachments/20091003/ced8d0ad/attachment.py>


More information about the Python-talk mailing list