Description of problem: Python/codecs.c uses uses tolower() for case normalization in codec lookup. This doesn't work in turkish locale when normalizing encoding name "ISO-8859-9". Version-Release number of selected component (if applicable): $ rpm -q python python-2.4.3-15.fc6 How reproducible: always Steps to Reproduce: (with tr_TR.UTF-8 locale) >>> import locale >>> locale.setlocale(locale.LC_ALL, "") 'tr_TR.UTF-8' >>> unicode("test", "ISO-8859-9") Traceback (most recent call last): File "<stdin>", line 1, in ? LookupError: unknown encoding: ISO-8859-9 Additional info: Turkish i18n of fedora tools written in python is almost always broken due to this bug triggered by rhpl and Iso-8859-9 encoded .po files. I'm using this hack to make things work: --- Python-2.4.3/Python/codecs.c.orig 2006-03-28 00:47:54.000000000 +0300 +++ Python-2.4.3/Python/codecs.c 2006-09-19 17:54:30.000000000 +0300 @@ -69,6 +69,8 @@ register char ch = string[i]; if (ch == ' ') ch = '-'; + else if (ch == 'I') + ch = 'i'; else ch = tolower(ch); p[i] = ch;
Better approach is to set locale to 'C' as codecs are in ascii letters anyways, so it should be something like; #include <locale.h> setlocale (LC_ALL, "C"); ... ch = tolower (ch);
(In reply to comment #1) > Better approach is to set locale to 'C' as codecs are in ascii letters anyways, Your code is kind of a "walk around" rather than a "work around". It suggests using C locale instead of turkish :)
Baris means that the tolower() in codecs.c should be in the C locale ... and personally I'd lean towards doing the tolower() by hand assuming ASCII, thus. not having to swap locale's. But I'll have to see what upstream says about this. Also all the examples I see of people using unicode()/encode()/etc. have the code argument lowered already, like: unicode("test", "iso-8859-9") ...can you find documentation that the ASCII toupper()'d argument is supposed to work?
(In reply to comment #3) > Baris means that the tolower() in codecs.c should be in the C locale ... i misunderstood then, sorry. > Also all the examples I see of people using unicode()/encode()/etc. have the > code argument lowered already, like: > > unicode("test", "iso-8859-9") Uppercase is also used in the wild, as was the case for .po files I mentioned in the original report. fortunately utf-8 is prefered nowadays. I guess the installation problems with turkish locale is also due to sqlite package using ISO-8859-1 in the source file coding declaration. > ...can you find documentation that the ASCII toupper()'d argument is supposed to > work? > from python-docs, section 4.8.3 Standard Encodings: > [...] Notice that spelling alternatives that only differ in case or use a > hyphen instead of an underscore are also valid aliases. also there are aliases that contain 'I' letter like IBM037 and EBCDIC which will trigger this bug.
Lowering is troublesome as well, since lower 'I' is 'ı' in Turkish. Considering manual tolower without changing locale would be much more efficient in terms of memory as it won't touch locale data, sample code is very trivial: else if (ch >= 'A' && ch <= 'Z') { ch += 35; // makes ascii lowercase char } else { // not an ascii letter, bork! }
Sorry, > else if (ch >= 'A' && ch <= 'Z') { > ch += 35; // makes ascii lowercase char That's ch += 32;
(In reply to comment #6) > ch += 32; ch |= 32; /* would be even better :) */ But they might think about adding a utility function, as this is not the only place with similar error in python code.
This breaks Turkish installs (related to bug 191096) and needs to be fixed for F8.
Fix committed built and requested to be tagged
Tagged for release.
Turkish install from DVD completed and also did a quick verification with the specific case mentioned in the first omment.
This problem continues in rc2 iso (Oct 30). Was your fix in that iso?
The fix is in python-2.5.1-15.fc8, which AFAIK was built after the rc2 isos were made. Check the python version in rc2 to be sure. I've also verified the fix by installing in Turkish from the rc3 iso.
Ok, rc2 has python-2.5.1-14. Thanks.
I double checked Turkish installation today, with every details that I can think of. I again confirm that it works. Thanks everyone. Regards, Devrim
python-2.5-15.fc7 has been pushed to the Fedora 7 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update python'
python-2.5-15.fc7 has been pushed to the Fedora 7 stable repository. If problems still persist, please make note of it in this bug report.