Bug 70518
Summary: | problems with 8-bit characters (such as German umlauts) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Public Beta | Reporter: | Michael Schwendt <bugs.michael> | ||||||||
Component: | pine | Assignee: | Mike A. Harris <mharris> | ||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Ben Levenson <benl> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | limbo | CC: | djh, hippytrail, jakub, jpf+redhat, menthos, me | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2006-02-21 18:49:19 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 67218 | ||||||||||
Attachments: |
|
Description
Michael Schwendt
2002-08-01 23:16:04 UTC
Please paste your /etc/sysconfig/i18n file, and an example of German input that does this. Also, I do not know exactly what an umlaut is. Created attachment 69025 [details]
German umlauts in xterm
Created attachment 69026 [details]
example mail message (.gz!)
Created attachment 69027 [details]
Example screenshot of PINE Message Index
$ cat /etc/sysconfig/i18n LANG="en_GB.UTF-8" SUPPORTED="en_GB.UTF-8:en_GB:en:en_US.UTF-8:en_US:en:de_DE.UTF-8:de_DE:de" SYSFONT="LatArCyrHeb-16" Doesn't make a difference with de_DE.UTF-8. Doesn't make a difference with "konsole" or another terminal. An "umlaut" (which is the English word for the German word "Umlaut" according to a dictionary), is a mutated vowel. German language has three lower-case and three upper-case umlauts for a,o,u and A,O,U, respectively. See attached pic from within xterm where these display fine and can also be typed without problems. "jed" is another application where they are broken (separate bug report). In PINE, quoted-printable ISO 8859-1 encoded umlauts are invisible in the "Message Index" and disturb subsequent characters in subject lines or cause the whole line to disappear. For instance, the rest of a line following an umlaut is not displayed. Or they appear in the "Compose Message" view suddenly. Find attached a gzipped example mail and another screenshot. In PINE's message index you should see the senders full name and the mail's subject line. I've mentioned this problem on the PINE user mailing list to see if anyone is aware of an existing bugfix for this. If not, I will report it to the PINE team also. Thanks for the screenshots, mail folder. Various UTF fixes have went into glibc and other infrastructure. Can you update your limbo to the latest stuff from rawhide soon, and see if anything has changed, in case it isnt a PINE bug? TIA I can reproduce this with pine4.44 on RedHat 7.3, but it seems to me that the problem is most likely with glibc, and in particular sprintf. In mailindx.c, line 3791 (pine4.44), there is the following line: sprintf(p, "%-*.*s", width, width, cdesc->string); which, in the person's example email, works out to be something like this: sprintf(p, "%-*.*s", 18, 18, "J\366rg Linn..."); When setting the system font to the same one as in the person's environment, sprintf returns -1 with an empty string for p (the man page doesn't say anything about a return value of -1 for sprintf). It works in my default environment, and it is the presence of the \366 char that causes it to return -1. Trying various combinations in gdb, it seems like the bug has to do with the ".*" (precision) part. I tried using what appears to be the latest glibc rpm : glibc-kernheaders-2.4-7.14 glibc-2.2.5-39 glibc-devel-2.2.5-39 glibc-common-2.2.5-39 ... And the problem still exists. Let me know if you need any other information. Jeff -- Jeff Franklin <jpf.edu> Networks and Distributed Computing University of Washington I just spoke with Jakub, and he says \336 isn't valid in a UTF-8 string. I'm not sure where that leaves us currently though. Looks like the string isn't being encoded properly for UTF-8. The string is not properly encoded UTF-8, as pine has yet to take the UTF-8 plunge. Should it really be up to sprintf to enforce UTF-8? Granted, the text will eventually not display correctly on a Unicode xterm, but a change like this in sprintf is bound to break in other places. There was no change in sprintf, this behaviour is mandated by lots of standards, starting with ISO C99, SuS etc. If pine wants to use sprintf on strings not encoded in currently set locale, it needs to temporarily setlocale to the one in which the string is encoded and then back (see glibc 2.3+ uselocale(3) for a faster way to do this), or it should use strcpy/memcpy etc. functions which don't depend on the locale. We are having a hard time convincing ourselves that there is not a problem with sprintf. Indeed, the problem only arises when we do a setlocale(LC_CTYPE, "");, which we rely upon for isspace(). Our use of this predates the C99 spec, and it seems peculiar that CTYPE would be interpreted broadly enough to cover the 's' specifier for a format string in sprintf. Going over our C99 reference, printf defines the 's' specifier's argument as being "characters ... to be copied to the output". It mentions "multibyte characters" as being used with wprintf. Wouldn't wsprintf be the place to do UTF-8 validation? Another peculiarity is that sprintf only fails when specifying a precision. IE, sprintf(p, "%s", "J\366rg"); works, whereas sprintf(p, "%.*s", 15, "J\366rg"); fails. This behavior is inconcistent indepdent of whether one were to say that multiple-byte chars must be valid or not. At the very least, there is a documentation bug, because every other dependency on locale seems to be mentioned in the man page. Any light that you could shed on this matter would be greatly appreciated. Thanks! Jeff Characters in multibyte locales (such as UTF-8 locales) are not necessarily equal to bytes. So, if you have given precision, then *printf have to determine which characters to print and which should already not be printed, and if when determining the length it fails to grok a character because it is invalid, you get the error. When precision is not specified, it knows it can print the whole string, so no checking is necessary. wsprintf prints wide characters, not multibyte ones, otherwise they are the same. We've decided that there's not much that we can do to fix the problem until we decide to pursue adding Unicode support. One option would be to remove our call of setlocale(LC_CTYPE, "") in file pico/unix, which will fix the index line display problem, but may break for someone else who depends on their locale's isspace() behavior. Also, this "fix" is not a complete solution; it only fixes the sprintf problem, and 8-bit chars will still fail to display correctly. One thing that I'm curious about is if Redhat is setting UTF-8 as the default charset in a lot of situations. And where does one decide to use UTF-8? Also, am I reading it right that the original poster set some sort of ISO 8859-1 locale where it also didn't work? Could someone tell me how I go about testing this? This sounds like something that should work to me. Thanks, Jeff Upon composing a new message, sometimes it displays German special characters, sometimes not. For instance: $ LANG="de_DE.UTF-8" $ konsole -e pine When I compose a new message and type in German characters into the "To:" header line, PINE displays a four-byte crap sequence for each key-press. However, when I move down the cursor to the next header line, it changes the "To:" line into something perfectly readable. Still, going back and editing that line reveals some of the crap again and corrupts the line. [...] If I'm correctly informed, the next official release of Red Hat Linux will default to UTF-8 locales, such as LANG=en_US.UTF-8 or LANG=de_DE.UTF-8. [...] When in current beta called "(null)" I set a non-UTF locale, such as LANG=de_DE or LANG=en_IE@euro (English with Euro symbol), I still cannot get PINE to print or accept non-ASCII characters, not even with an ISO 8859-1 or ISO 8859-15 font. It either doesn't display them at all or displays them as white space. It doesn't decode correctly ISO 8859-1 encoded Base64 or QuotedPrintable subject lines either. And upon composing a new message and typing German special characters into the "To:" header line, for instance, it either prints crap character sequences or two-byte/three-byte whitespace sequences as described above. [...] So, overall, my favourite mail client (PINE) is broken badly. I'm getting accustomed with Sylpheed as an alternative. mschwendt: You are correct that the Null beta release, as well as the final release of our next OS product defaults to using UTF-8 locale everywhere. Since the problem has been determined to be due to lack of unicode support in PINE, there is not much that we at Red Hat can do about this problem until upstream PINE supports unicode directly. I'm closing this bug report as WONTFIX currently, since we can't really do anything about it right now. Once upstream PINE supports UTF-8 properly, please reopen this bug report. I may do an enhancement update of pine for all supported OS releases at that point in time perhaps. Thanks. *** This bug has been marked as a duplicate of 91232 *** Changed to 'CLOSED' state since 'RESOLVED' has been deprecated. |