Red Hat Bugzilla – Bug 70518
problems with 8-bit characters (such as German umlauts)
Last modified: 2007-04-18 12:45:07 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020724
Description of problem:
Whether in xterm or virtual console, PINE does not display 8-bit characters
(such as German umlauts), neither ISO 8859-1 encoded nor with the default UTF-8
locale. Some of those characters (e.g. the German small u-umlaut) cause the
remainder of an entire line to become invisible.
Version-Release number of selected component (if applicable):
Please paste your /etc/sysconfig/i18n file, and an example of German
input that does this.
Also, I do not know exactly what an umlaut is.
Created attachment 69025 [details]
German umlauts in xterm
Created attachment 69026 [details]
example mail message (.gz!)
Created attachment 69027 [details]
Example screenshot of PINE Message Index
$ cat /etc/sysconfig/i18n
Doesn't make a difference with de_DE.UTF-8. Doesn't make a difference with
"konsole" or another terminal.
An "umlaut" (which is the English word for the German word "Umlaut" according to
a dictionary), is a mutated vowel. German language has three lower-case and
three upper-case umlauts for a,o,u and A,O,U, respectively. See attached pic
from within xterm where these display fine and can also be typed without
problems. "jed" is another application where they are broken (separate bug report).
In PINE, quoted-printable ISO 8859-1 encoded umlauts are invisible in the
"Message Index" and disturb subsequent characters in subject lines or cause the
whole line to disappear. For instance, the rest of a line following an umlaut is
not displayed. Or they appear in the "Compose Message" view suddenly. Find
attached a gzipped example mail and another screenshot. In PINE's message index
you should see the senders full name and the mail's subject line.
I've mentioned this problem on the PINE user mailing list to see if
anyone is aware of an existing bugfix for this. If not, I will report
it to the PINE team also.
Thanks for the screenshots, mail folder.
Various UTF fixes have went into glibc and other infrastructure.
Can you update your limbo to the latest stuff from rawhide soon,
and see if anything has changed, in case it isnt a PINE bug?
I can reproduce this with pine4.44 on RedHat 7.3, but it seems to me that
the problem is most likely with glibc, and in particular sprintf.
In mailindx.c, line 3791 (pine4.44), there is the following line:
sprintf(p, "%-*.*s", width, width, cdesc->string);
which, in the person's example email, works out to be something like this:
sprintf(p, "%-*.*s", 18, 18, "J\366rg Linn...");
When setting the system font to the same one as in the person's
environment, sprintf returns -1 with an empty string for p (the man page
doesn't say anything about a return value of -1 for sprintf). It works in
my default environment, and it is the presence of the \366 char that
causes it to return -1.
Trying various combinations in gdb, it seems like the bug has to do with
the ".*" (precision) part.
I tried using what appears to be the latest glibc rpm :
... And the problem still exists.
Let me know if you need any other information.
Jeff Franklin <email@example.com>
Networks and Distributed Computing University of Washington
I just spoke with Jakub, and he says \336 isn't valid in a UTF-8 string.
I'm not sure where that leaves us currently though. Looks like the
string isn't being encoded properly for UTF-8.
The string is not properly encoded UTF-8, as pine has yet to take the UTF-8
plunge. Should it really be up to sprintf to enforce UTF-8? Granted, the text
will eventually not display correctly on a Unicode xterm, but a change like this
in sprintf is bound to break in other places.
There was no change in sprintf, this behaviour is mandated by lots of standards,
starting with ISO C99, SuS etc.
If pine wants to use sprintf on strings not encoded in currently set locale,
it needs to temporarily setlocale to the one in which the string is encoded
and then back (see glibc 2.3+ uselocale(3) for a faster way to do this), or
it should use strcpy/memcpy etc. functions which don't depend on the locale.
We are having a hard time convincing ourselves that there is not a problem with
sprintf. Indeed, the problem only arises when we do a setlocale(LC_CTYPE, "");,
which we rely upon for isspace(). Our use of this predates the C99 spec, and it
seems peculiar that CTYPE would be interpreted broadly enough to cover the 's'
specifier for a format string in sprintf.
Going over our C99 reference, printf defines the 's' specifier's argument as
being "characters ... to be copied to the output". It mentions "multibyte
characters" as being used with wprintf. Wouldn't wsprintf be the place to do
Another peculiarity is that sprintf only fails when specifying a precision. IE,
sprintf(p, "%s", "J\366rg"); works, whereas sprintf(p, "%.*s", 15, "J\366rg");
fails. This behavior is inconcistent indepdent of whether one were to say that
multiple-byte chars must be valid or not.
At the very least, there is a documentation bug, because every other dependency
on locale seems to be mentioned in the man page.
Any light that you could shed on this matter would be greatly appreciated. Thanks!
Characters in multibyte locales (such as UTF-8 locales) are not necessarily
equal to bytes. So, if you have given precision, then *printf have to determine
which characters to print and which should already not be printed, and if when
determining the length it fails to grok a character because it is invalid,
you get the error. When precision is not specified, it knows it can print
the whole string, so no checking is necessary.
wsprintf prints wide characters, not multibyte ones, otherwise they are the same.
We've decided that there's not much that we can do to fix the problem until we
decide to pursue adding Unicode support. One option would be to remove our call
of setlocale(LC_CTYPE, "") in file pico/unix, which will fix the index line
display problem, but may break for someone else who depends on their locale's
isspace() behavior. Also, this "fix" is not a complete solution; it only fixes
the sprintf problem, and 8-bit chars will still fail to display correctly.
One thing that I'm curious about is if Redhat is setting UTF-8 as the default
charset in a lot of situations. And where does one decide to use UTF-8?
Also, am I reading it right that the original poster set some sort of ISO 8859-1
locale where it also didn't work? Could someone tell me how I go about testing
this? This sounds like something that should work to me.
Upon composing a new message, sometimes it displays German special characters,
sometimes not. For instance:
$ konsole -e pine
When I compose a new message and type in German characters into the "To:" header
line, PINE displays a four-byte crap sequence for each key-press. However, when
I move down the cursor to the next header line, it changes the "To:" line into
something perfectly readable. Still, going back and editing that line reveals
some of the crap again and corrupts the line.
If I'm correctly informed, the next official release of Red Hat Linux will
default to UTF-8 locales, such as LANG=en_US.UTF-8 or LANG=de_DE.UTF-8.
When in current beta called "(null)" I set a non-UTF locale, such as LANG=de_DE
or LANG=en_IE@euro (English with Euro symbol), I still cannot get PINE to print
or accept non-ASCII characters, not even with an ISO 8859-1 or ISO 8859-15 font.
It either doesn't display them at all or displays them as white space. It
doesn't decode correctly ISO 8859-1 encoded Base64 or QuotedPrintable subject
lines either. And upon composing a new message and typing German special
characters into the "To:" header line, for instance, it either prints crap
character sequences or two-byte/three-byte whitespace sequences as described above.
So, overall, my favourite mail client (PINE) is broken badly. I'm getting
accustomed with Sylpheed as an alternative.
mschwendt: You are correct that the Null beta release, as well as the
final release of our next OS product defaults to using UTF-8 locale
Since the problem has been determined to be due to lack of unicode
support in PINE, there is not much that we at Red Hat can do about
this problem until upstream PINE supports unicode directly.
I'm closing this bug report as WONTFIX currently, since we can't really
do anything about it right now. Once upstream PINE supports UTF-8
properly, please reopen this bug report. I may do an enhancement
update of pine for all supported OS releases at that point in time
*** This bug has been marked as a duplicate of 91232 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.