207134 – codec lookup fails due to unsafe case conversion

Bug 207134 - codec lookup fails due to unsafe case conversion

Summary: codec lookup fails due to unsafe case conversion

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	python
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	James Antill
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F8Blocker
TreeView+	depends on / blocked

Reported:	2006-09-19 16:52 UTC by Sertaç Ö. Yıldız
Modified:	2007-11-30 22:11 UTC (History)
CC List:	5 users (show)
Fixed In Version:	2.5-15.fc7
Clone Of:
Environment:
Last Closed:	2007-11-22 03:30:10 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sertaç Ö. Yıldız 2006-09-19 16:52:33 UTC

Description of problem:
Python/codecs.c uses uses tolower() for case normalization in codec lookup. This
doesn't work in turkish locale when normalizing encoding name "ISO-8859-9".

Version-Release number of selected component (if applicable):
$ rpm -q python
python-2.4.3-15.fc6

How reproducible:
always

Steps to Reproduce:
(with tr_TR.UTF-8 locale)
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "")
'tr_TR.UTF-8'
>>> unicode("test", "ISO-8859-9")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
LookupError: unknown encoding: ISO-8859-9

Additional info:
Turkish i18n of fedora tools written in python is almost always broken due to
this bug triggered by rhpl and Iso-8859-9 encoded .po files. I'm using this hack
to make things work:

--- Python-2.4.3/Python/codecs.c.orig	2006-03-28 00:47:54.000000000 +0300
+++ Python-2.4.3/Python/codecs.c	2006-09-19 17:54:30.000000000 +0300
@@ -69,6 +69,8 @@
         register char ch = string[i];
         if (ch == ' ')
             ch = '-';
+       else if (ch == 'I')
+            ch = 'i';
         else
             ch = tolower(ch);
        p[i] = ch;

Comment 1 Baris Cicek 2007-06-13 18:38:31 UTC

Better approach is to set locale to 'C' as codecs are in ascii letters anyways,
so it should be something like;

#include <locale.h>

setlocale (LC_ALL, "C");

...

ch = tolower (ch);

Comment 2 Sertaç Ö. Yıldız 2007-06-13 18:53:57 UTC

(In reply to comment #1)
> Better approach is to set locale to 'C' as codecs are in ascii letters anyways,

Your code is kind of a "walk around" rather than a "work around". It suggests
using C locale instead of turkish :)

Comment 3 James Antill 2007-06-13 22:10:21 UTC

Baris means that the tolower() in codecs.c should be in the C locale ... and
personally I'd lean towards doing the tolower() by hand assuming ASCII, thus.
not having to swap locale's. But I'll have to see what upstream says about this.

 Also all the examples I see of people using unicode()/encode()/etc. have the
code argument lowered already, like:

unicode("test", "iso-8859-9")

...can you find documentation that the ASCII toupper()'d argument is supposed to
work?

Comment 4 Sertaç Ö. Yıldız 2007-06-13 22:43:13 UTC

(In reply to comment #3)
> Baris means that the tolower() in codecs.c should be in the C locale ...

i misunderstood then, sorry.

>  Also all the examples I see of people using unicode()/encode()/etc. have the
> code argument lowered already, like:
> 
> unicode("test", "iso-8859-9")

Uppercase is also used in the wild, as was the case for .po files I mentioned in
the original report. fortunately utf-8 is prefered nowadays.

I guess the installation problems with turkish locale is also due to sqlite
package using ISO-8859-1 in the source file coding declaration.

> ...can you find documentation that the ASCII toupper()'d argument is supposed to
> work?
> 

from python-docs, section 4.8.3 Standard Encodings:
> [...] Notice that spelling alternatives that only differ in case or use a 
> hyphen instead of an underscore are also valid aliases.

also there are aliases that contain 'I' letter like IBM037 and EBCDIC which will
trigger this bug.

Comment 5 Baris Cicek 2007-06-14 12:42:55 UTC

Lowering is troublesome as well, since lower 'I' is 'ı' in Turkish. 

Considering manual tolower without changing locale would be much more efficient
in terms of memory as it won't touch locale data, sample code is very trivial:

else if (ch >= 'A' && ch <= 'Z') { 
  ch += 35;  // makes ascii lowercase char
} else {
  // not an ascii letter, bork!
}

Comment 6 Baris Cicek 2007-06-14 12:45:03 UTC

Sorry, 

> else if (ch >= 'A' && ch <= 'Z') { 
>   ch += 35;  // makes ascii lowercase char

That's 
    ch += 32;

Comment 7 Sertaç Ö. Yıldız 2007-06-16 15:13:08 UTC

(In reply to comment #6)
>     ch += 32; 

ch |= 32; /* would be even better :) */

But they might think about adding a utility function, as this is not the only
place with similar error in python code.

Comment 8 Jeremy Katz 2007-10-30 15:56:34 UTC

This breaks Turkish installs (related to bug 191096) and needs to be fixed for F8.

Comment 9 Jeremy Katz 2007-10-30 18:10:13 UTC

Fix committed built and requested to be tagged

Comment 10 Jesse Keating 2007-10-30 19:34:10 UTC

Tagged for release.

Comment 11 Jeremy Katz 2007-11-01 02:56:32 UTC

Turkish install from DVD completed and also did a quick verification with the
specific case mentioned in the first omment.

Comment 12 Devrim GUNDUZ 2007-11-01 21:24:31 UTC

This problem continues in rc2 iso (Oct 30). Was your fix in that iso?

Comment 13 Will Woods 2007-11-01 21:41:20 UTC

The fix is in python-2.5.1-15.fc8, which AFAIK was built after the rc2 isos were
made. Check the python version in rc2 to be sure.

I've also verified the fix by installing in Turkish from the rc3 iso.

Comment 14 Devrim GUNDUZ 2007-11-01 21:51:02 UTC

Ok, rc2 has python-2.5.1-14.

Thanks.

Comment 15 Devrim GUNDUZ 2007-11-03 00:34:54 UTC

I double checked Turkish installation today, with every details that I can think
of. I again confirm that it works.

Thanks everyone.

Regards, Devrim

Comment 16 Fedora Update System 2007-11-09 23:51:18 UTC

python-2.5-15.fc7 has been pushed to the Fedora 7 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update python'

Comment 17 Fedora Update System 2007-11-22 03:30:00 UTC

python-2.5-15.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.