[Python-talk] unicode handling in older Python versions

Lloyd Kvam python at venix.com
Sat Oct 3 10:02:14 EDT 2009


On Sat, 2009-10-03 at 01:26 -0400, Arc Riley wrote:
> If anyone has Python <2.5, can you please try the following and report
> back on whether it worked?  I have verified this as broken in
> different ways on 2.5, 2.6, 3.0, and 3.1.1
> 
> It appears that every version of Python to date has a serious utf-8
> >plane0 bug that has gone unnoticed until now.  It may be
> useful to learn if the bug was introduced at some point in antiquity
> and we just lacked the unit test for it.
> 
> $ python2.5
> Python 2.5.4 (r254:67916, Jan 24 2009, 01:30:20) 
> [GCC 4.3.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> line = u'𐑑𐑧𐑕𐑑𐑦𐑙'
> >>> first = u'𐑑'
> >>> first
> u'\ud801\udc51'
> 
> first should either be u'\U00010451' or obviously u'𐑑'

Emailing unicode text directly can be a little problematic.  Here's the
email source I received:
"""""""""""""""""""""""""""""""""""""
Type "help", "copyright", "credits" or "license" for more information.
>>> line =3D
u'=F0=90=91=91=F0=90=91=A7=F0=90=91=95=F0=90=91=91=F0=90=91=A6=
=F0=90=91=99'
>>> first =3D u'=F0=90=91=91'
>>> first
u'\ud801\udc51'

first should either be u'\U00010451' or obviously u'=F0=90=91=91'
"""""""""""""""""""""""""""""""""""""""""

I could not paste the string in the email into my Python window.  So I
tried to build it up.  This fails to demonstrate the bug.  I saw Kent's
email, showing the bug, so I assume I am fouling up the test scenario
somehow.  I have Python 2.3 and 2.4 for testing, but need a more
reliable way to create the problem string.

IPython 0.8.4   [on Py 2.5.2] ## Fedora 10

[~]|20> first_ords = [0xf0,0x90,0x91,0x91]
[~]|21> first_str = ''.join(chr(n) for n in first_ords)
[~]|22> first_uni = first_str.decode('utf8')
[~]|23> first_uni
   <23> u'\U00010451'

So I'm not seeing the bug or I'm not building first in the proper way to
demonstrate the bug.

> 
> _______________________________________________
> Python-talk mailing list
> Python-talk at dlslug.org
> http://dlslug.org/mailman/listinfo/python-talk
-- 
Lloyd Kvam
Venix Corp
DLSLUG/GNHLUG library
http://dlslug.org/library.html
http://www.librarything.com/catalog/dlslug
http://www.librarything.com/rsshtml/recent/dlslug
http://www.librarything.com/rss/recent/dlslug



More information about the Python-talk mailing list