[Python-talk] unicode handling in older Python versions
Kent Johnson
kent37 at tds.net
Sat Oct 3 07:31:19 EDT 2009
On Sat, Oct 3, 2009 at 1:26 AM, Arc Riley <arcriley at gmail.com> wrote:
> If anyone has Python <2.5, can you please try the following and report back
> on whether it worked? I have verified this as broken in different ways on
> 2.5, 2.6, 3.0, and 3.1.1
>
> It appears that every version of Python to date has a serious utf-8 >plane0
> bug that has gone unnoticed until now. It may be
> useful to learn if the bug was introduced at some point in antiquity and we
> just lacked the unit test for it.
>
> $ python2.5
> Python 2.5.4 (r254:67916, Jan 24 2009, 01:30:20)
> [GCC 4.3.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> line = u'𐑑𐑧𐑕𐑑𐑦𐑙'
>>>> first = u'𐑑'
>>>> first
> u'\ud801\udc51'
>
> first should either be u'\U00010451' or obviously u'𐑑'
On Mac OSX:
$ python2.4
Python 2.4.4 (#1, Oct 18 2006, 10:34:39)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
In [1]: line = u'𐑑𐑧𐑕𐑑𐑦𐑙'
In [2]: first = u'𐑑'
In [3]: first
Out[3]: u'\xf0\x90\x91\x91'
and the same with
Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39)
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Kent
More information about the Python-talk
mailing list