Bug 249675

Summary: w3m charset declaration parser fails with turkish locale
Product: [Fedora] Fedora Reporter: Sertaç Ö. Yıldız <sertacyildiz>
Component: w3mAssignee: Parag Nemade <pnemade>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 17CC: psatpute, triage
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-01 18:32:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
replace tolower() with bit flip none

Description Sertaç Ö. Yıldız 2007-07-26 12:36:40 UTC
Description of problem:
charset parser uses tolower() which is not locale safe. With turkish locale,
charsets that contain 'I' (i.e, all uppercase ISO* charset declarations) do not
work.

Version-Release number of selected component (if applicable):
0.5.2-1


Steps to Reproduce:
1. export LANG=tr_TR.UTF-8
2. w3m -I ISO-8859-9 iso8859-9_encoded_document

Comment 1 Sertaç Ö. Yıldız 2007-07-26 12:36:41 UTC
Created attachment 160014 [details]
replace tolower() with bit flip

Comment 2 Parag Nemade 2007-07-30 09:35:04 UTC
I do not understand turkish locale. Can you give me brief steps to reproduce
problem you have faced?

Comment 3 Sertaç Ö. Yıldız 2007-07-30 14:32:07 UTC
Briefly, in Turkish alphabet lowercase of 'I' is 'ı' and uppercase of 'i' is
'İ'.  Therefore, assuming tolower('I') to be 'i' does not work in turkish locale.

When parsing the charset declaration of a html page or arguments given to w3m
command (e.g. passing the charset declared in email header to w3m via mailcap),
if the charset is given in uppercase and contains the letter 'I' (as in
ISO-xxxx-x and WINDOWS-xxxx etc.) the parser fails.

Comment 4 Parag Nemade 2007-07-31 10:10:49 UTC
ok will build new package by tomorrow.

Comment 5 Parag Nemade 2007-08-01 03:57:32 UTC
Can you provide ISO-8859-9 encoded document to verify patch is working fine.
without sample test cases,screenshots I can't proceed further.
Or you can attach screen shots of what problem you faced.

Comment 6 Sertaç Ö. Yıldız 2007-08-01 05:30:24 UTC
You can paste my name to a text file and save it in iso-8859-9. Or type euro
sign and save in iso-8859-15, shouldn't matter. Here's what I get (note the iSO
and ISO difference):

$ rpm -q w3m
w3m-0.5.2-1.fc7
$ LANG=en_US.UTF-8 w3m -I ISO-8859-9 -dump iso8859-9.txt 
Encoding should be iso-8859-9 to see this properly: Sertaç Ö. Yıldız
$ LANG=tr_TR.UTF-8 w3m -I ISO-8859-9 -dump iso8859-9.txt 
Encoding should be iso-8859-9 to see this properly: Serta? ?. Y?ld?z
$ LANG=tr_TR.UTF-8 w3m -I iSO-8859-9 -dump iso8859-9.txt 
Encoding should be iso-8859-9 to see this properly: Sertaç Ö. Yıldız

[rebuild the rpm with the patch attached above]
$ rpm -q w3m
w3m-0.5.2-1.sy
$ LANG=en_US.UTF-8 w3m -I ISO-8859-9 -dump iso8859-9.txt
Encoding should be iso-8859-9 to see this properly: Sertaç Ö. Yıldız
$ LANG=tr_TR.UTF-8 w3m -I ISO-8859-9 -dump iso8859-9.txt
Encoding should be iso-8859-9 to see this properly: Sertaç Ö. Yıldız
$ LANG=tr_TR.UTF-8 w3m -I iSO-8859-9 -dump iso8859-9.txt
Encoding should be iso-8859-9 to see this properly: Sertaç Ö. Yıldız


Comment 7 Parag Nemade 2007-08-01 06:37:49 UTC
thanks. above tests give me positive results as you got.
But, I saw only iso8859-9 is affected and other encodings iso8859-7 and
iso8859-8 gave me same results w/ and wo/ patch applied.


Comment 8 Sertaç Ö. Yıldız 2007-08-01 06:54:14 UTC
hmm, I copied some hebrew caharacters from wikipedia and it seems to work with
iso8859-8 encoding:

$ rpm -q w3m
w3m-0.5.2-1.fc7
$ LANG=en_US.UTF-8 w3m -I ISO-8859-8 -dump iso8859-8.txt
Saved in iso-8859-8: בגדהוזחטיךכל
$ LANG=tr_TR.UTF-8 w3m -I ISO-8859-8 -dump iso8859-8.txt
Saved in iso-8859-8: ????????????


$ rpm -q w3m
w3m-0.5.2-1.sy
$ LANG=en_US.UTF-8 w3m -I ISO-8859-8 -dump iso8859-8.txt
Saved in iso-8859-8: בגדהוזחטיךכל
$ LANG=tr_TR.UTF-8 w3m -I ISO-8859-8 -dump iso8859-8.txt
Saved in iso-8859-8: בגדהוזחטיךכל


Comment 9 Parag Nemade 2007-08-01 06:58:43 UTC
so it looks problem related to turkish locale or its problem of font?

Comment 10 Sertaç Ö. Yıldız 2007-08-01 07:11:10 UTC
Sorry, I meant I can reproduce the bug with iso8859-8, i.e it does _not_ work
with other encodings too.

I get question marks with turkish locale and proper glyphs with other locales.
Because the parser cannot parse encoding names with capital I in turkish locale.

Comment 11 Bug Zapper 2008-05-14 13:41:10 UTC
This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists.

Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs:
http://docs.fedoraproject.org/release-notes/

The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 12 Bug Zapper 2008-06-17 01:58:22 UTC
Fedora 7 changed to end-of-life (EOL) status on June 13, 2008. 
Fedora 7 is no longer maintained, which means that it will not 
receive any further security or bug fix updates. As a result we 
are closing this bug. 

If you can reproduce this bug against a currently maintained version 
of Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 13 Sertaç Ö. Yıldız 2008-07-16 03:47:19 UTC
$ rpm -q w3m
w3m-0.5.2-10.fc9.i386
$ echo uüoösş |iconv -t iso-8859-9 | LANG=tr_TR.UTF-8 w3m -I ISO-8859-9 -dump u?o?s?
$ echo uüoösş |iconv -t iso-8859-9 | LANG=tr_TR.UTF-8 w3m -I iso-8859-9 -dump 
uüoösş


Comment 14 Parag Nemade 2009-05-06 06:30:20 UTC
can you check again this bug for reported bug maybe in F11? I am revisiting this bug to fix this?

Comment 15 Sertaç Ö. Yıldız 2009-05-06 13:52:52 UTC
I don't have f11 to test the package but AFAIK the same source in w3m-0.5.2-10 is just rebuilt. So I think this problem isn't solved in F11.

Is the explanation in comment #3 clear? For further explanation of why this happens only in Turkish locale, see <http://www.i18nguy.com/unicode/turkish-i18n.html>

Comment 16 Bug Zapper 2009-06-09 22:43:34 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 17 Sertaç Ö. Yıldız 2009-06-10 19:31:03 UTC
w3m-0.5.2-13.fc11.i586 is still buggy.

Comment 18 Bug Zapper 2010-04-27 11:45:06 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 19 Pravin Satpute 2010-05-25 06:49:56 UTC
(In reply to comment #3)
> Briefly, in Turkish alphabet lowercase of 'I' is 'ı' and uppercase of 'i' is
> 'İ'.  Therefore, assuming tolower('I') to be 'i' does not work in turkish locale.
> 
> When parsing the charset declaration of a html page or arguments given to w3m
> command (e.g. passing the charset declared in email header to w3m via mailcap),
> if the charset is given in uppercase and contains the letter 'I' (as in
> ISO-xxxx-x and WINDOWS-xxxx etc.) the parser fails.    

this thing is already handled in tr_TR locale 

see tolower and toupper pairs for same

http://sourceware.org/cgi-bin/cvsweb.cgi/~checkout~/libc/localedata/locales/tr_TR?rev=1.18.2.3&content-type=text/plain&cvsroot=glibc

IMO if one set locale properly this conversion should happen proper

Comment 20 Sertaç Ö. Yıldız 2010-05-25 09:56:12 UTC
(In reply to comment #19)
> this thing is already handled in tr_TR locale

Yes, but I haven't filed the bug against glibc locale data.

> IMO if one set locale properly this conversion should happen proper    

The problem is that "tolower('I') == 'i'" is locale dependent. It is TRUE for most locales but FALSE for some.

Check the output of these two (from comment #13):
echo uüoösş |iconv -t iso-8859-9 | LANG=tr_TR.UTF-8 w3m -I iso-8859-9 -dump
echo uüoösş |iconv -t iso-8859-9 | LANG=tr_TR.UTF-8 w3m -I Iso-8859-9 -dump

Comment 21 Bug Zapper 2010-11-04 12:08:47 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 22 Bug Zapper 2011-06-02 18:40:47 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 23 Sertaç Ö. Yıldız 2011-06-02 21:15:29 UTC
same with w3m-0.5.2-20.fc15.i686

Comment 24 Parag Nemade 2012-07-06 11:56:13 UTC
There is no upstream reply to this problem. This is just kept open and still present in F17.

Comment 25 Fedora End Of Life 2013-07-04 06:52:57 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 26 Fedora End Of Life 2013-08-01 18:32:32 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.