[Python-talk] unicode handling in older Python versions

Arc Riley arcriley at gmail.com
Sat Oct 3 01:26:59 EDT 2009


If anyone has Python <2.5, can you please try the following and report back
on whether it worked?  I have verified this as broken in different ways on
2.5, 2.6, 3.0, and 3.1.1

It appears that every version of Python to date has a serious utf-8 >plane0
bug that has gone unnoticed until now.  It may be
useful to learn if the bug was introduced at some point in antiquity and we
just lacked the unit test for it.

$ python2.5
Python 2.5.4 (r254:67916, Jan 24 2009, 01:30:20)
[GCC 4.3.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> line = u'𐑑𐑧𐑕𐑑𐑦𐑙'
>>> first = u'𐑑'
>>> first
u'\ud801\udc51'

first should either be u'\U00010451' or obviously u'𐑑'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://dlslug.org/pipermail/python-talk/attachments/20091003/7e87ecd7/attachment.html>


More information about the Python-talk mailing list