Bug 71170 - [patch] uninterpreted utf-8 in log with LANG=ru_RU.UTF-8
[patch] uninterpreted utf-8 in log with LANG=ru_RU.UTF-8
Status: CLOSED ERRATA
Product: Red Hat Linux
Classification: Retired
Component: sysklogd (Show other bugs)
9
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Bill Nottingham
Brian Brock
: EasyFix, i18n, Patch
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-08-09 11:32 EDT by Leonid Kanter
Modified: 2014-03-16 22:29 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-06-28 23:37:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
redhat-logviewer snapshot (60.22 KB, image/png)
2002-08-09 11:33 EDT, Leonid Kanter
no flags Details
my /var/log/messages (5.92 KB, application/octet-stream)
2002-08-12 06:05 EDT, Leonid Kanter
no flags Details
CJK example log files (579 bytes, application/octet-stream)
2002-08-28 14:03 EDT, Tammy Fox
no flags Details
/var/log/messages from (null) fresh install, booted once. LANG=ru_RU.UTF-8 (38.38 KB, text/plain)
2002-08-28 15:16 EDT, Eugene Kanter
no flags Details
Cyrillic letters in UTF-8 encoding. (142 bytes, text/plain)
2002-08-29 16:11 EDT, Eugene Kanter
no flags Details

  None (edit)
Description Leonid Kanter 2002-08-09 11:32:06 EDT
Description of Problem:

If the system is running in ru_RU.UTF-8 locale, some messages in system logs
appear in utf8 encoding. but redhat-logviewer show them as plain latin1 8bit
text, so it's really impossible to understand something.

Version-Release number of selected component (if applicable):

0.7-1

How Reproducible:

always

Additional Information:
	
see screenshot
Comment 1 Leonid Kanter 2002-08-09 11:33:30 EDT
Created attachment 69744 [details]
redhat-logviewer snapshot
Comment 2 Tammy Fox 2002-08-09 11:38:43 EDT
I thought I was translating everything to UTF-8 before displaying it.
I must have missed it somewhere. Please attach the log file with
the UTF-8 strings so I can verify that the fix works.
Comment 3 Tammy Fox 2002-08-12 00:45:02 EDT
This should be fixed in version 0.8.1-3. Please test when it hits rawhide.
Comment 4 Leonid Kanter 2002-08-12 06:05:41 EDT
Created attachment 69985 [details]
my /var/log/messages
Comment 5 Leonid Kanter 2002-08-12 06:11:44 EDT
Looks like problem isn't in logviewer but in syslogd or initscripts - utf8 in
/var/log/messages is broken. Look what happens with this file:

[leon@omnibook leon]$ LANG=C iconv -f utf-8 -t koi8-r messages
Aug 11 04:02:02 leon syslogd 1.4.1: restart.
Aug 12 12:41:03 localhost syslogd 1.4.1: restart.
Aug 12 12:41:03 localhost iconv: illegal input sequence at position 121

So component may be changed for syslog or initscripts. Possible reason can be
that initscripts translation is still in koi8-r, I'll fix it today.
Comment 6 Tammy Fox 2002-08-14 00:24:55 EDT
Do you still think it is a syslog or initscripts problem? I can open
translated po files, and they display fine. Unfortunately, those
are the only non-English files I have except the one you have attached.
The one you attached doesn't seem to work in redhat-logviewer.
Comment 7 Tammy Fox 2002-08-22 12:04:55 EDT
I think this should work with 0.8.2-3. If not, please reopen the bug.
Comment 8 Jay Turner 2002-08-28 12:04:56 EDT
I'm still seeing the odd output from redhat-logviewer-0.8.2-3.  This is looking
at the log file that the user provided, so maybe it is just that logfile. 
Actually, just tried this with a new install and sure enough, the log file can't
be viewed.
Comment 9 Tammy Fox 2002-08-28 14:02:09 EDT
I think the log file attached by leon is corrupt. The gtk.TextBuffer requires
UTF-8 input, but the log files are encoded in the native encoding. So, the
best solution pgampe, others, and I could come up with was to try to convert it
to UTF-8 from the native encoding based on $LANG. I think leon attached a ko_KR
log file, but iconv can't convert it properly because I think it is encoded
incorrectly. I am about to attach a tarball of sample log files in the native
encoding for the ko_KR, ja_JP, zh_TW, and zh_CN langs. If you set $LANG to
the proper lang and try to open the corresponding log file, it works fine.
Jay, if the attached files work for you as I have described, is that enough
verification?
Comment 10 Tammy Fox 2002-08-28 14:03:46 EDT
Created attachment 73564 [details]
CJK example log files
Comment 11 Eugene Kanter 2002-08-28 14:29:26 EDT
I do not think that file that leon sent is made on the system other then
LANG=ru_RU.UTF-8. I have exactly the same problem: iconv can not recognize
/var/log/messages and my LANG=ru_RU.UTF-8 (default install with Russian language
as primary). If you found messages file to be in some other then UTF-8 encoding
then there is a problem with a log file creation process.
Comment 12 Eugene Kanter 2002-08-28 15:16:41 EDT
Created attachment 73584 [details]
/var/log/messages from (null) fresh install, booted once. LANG=ru_RU.UTF-8
Comment 13 Eugene Kanter 2002-08-28 15:37:13 EDT
Please see line #448 in  fresh_null_messages.txt between words "firstboot" and
"SESSION_MANAGER" as an example. I found some valid UTF-8 Russian letters
interlaced with bogus character sequences. Could be something wrong with the
messages? Are all translations in UTF-8 now?
Comment 14 Tammy Fox 2002-08-28 15:53:37 EDT
ekanter@rcn.com: Are you saying the rest of the file is displayed properly
except for line #448 where there is UTF-8 text? Sorry, I don't know what Russian
is supposed to look like.
Comment 15 Eugene Kanter 2002-08-28 16:11:58 EDT
No, the rest of the file contains errors as well. You do not really have to know
how cyrillic should look like. Just examine the byte sequence using for example
od -c. Here is the fragment:
0000060   f   i   r   s   t   b   o   o   t   :     320 241 320 261 320
0000100 276 320 271     320 277 321   \   2   0   0 320 270     320 276
0000120 321   \   2   0   2 320 272 321   \   2   0   0 321   \   2   1
0000140   3 321   \   2   0   2 320 270 320 270     321   \   2   0   1
0000160 320 276 320 265 320 264 320 270 320 275 320 265 320 275 320 270
....
0000400   2   0   5 321   \   2   0   0 320 260 320 275 320 265 320 275
0000420 320 260   :       S   E   S   S   I   O   N   _   M   A   N   A

I do not have a table of UTF-8 in front of me but in this example valid Cyrillic
two byte sequence starts with 320. So in the middle of a second line you'll find
byte 321, then "\200" which is four chars instead of (apparently) one byte 0200.
Then you'll see byte 0321 and four chars "\211". Looks like all UTF-8 pairs that
 starts with byte 0321 are corrupted the same way. Reading valid UTF-8 letters I
see that the message about to be displayed makes sence. It is something like
"window positon is not saved, error connecting to the window manager".
Comment 16 Tammy Fox 2002-08-28 23:55:13 EDT
It looks like not everything is logging in UTF-8. If LANG=ru_RU.UTF-8, I change
it to read each line and try to convert from UTF-8 to UTF-8. If that fails, it
tries to encode from koi8-r to UTF-8. However, the file still doesn't look right.
Comment 17 Eugene Kanter 2002-08-29 11:04:57 EDT
Agree, the file does not look right. Every byte that follows 0321 gets converted
to four bytes string "\uuu" where "uuu" is octal representation of what supposed
to be there. If I manually replace every "\uuu" with a corresponding byte the
resulting file becomes readable. At this poing I don't believe that logviewer is
out of order - the log generator definitely is.

Comment 18 Tammy Fox 2002-08-29 12:18:58 EDT
OK. For what it is worth, I rebuild redhat-logviewer with the algorithm
mentioned in my last post (UTF-8 then koi8-r) on people.redhat.com/tfox.
I am deferring this bug until the log generator does the right thing. Then,
we can see if redhat-logviewer needs anymore tweaks.
Comment 19 Eugene Kanter 2002-08-29 12:33:34 EDT
Is there an open bug against a component responsible for log creation?
Comment 20 Havoc Pennington 2002-08-29 15:26:40 EDT
The right thing here is to definitively say what encoding the logs are in.
The options are /etc/sysconfig/i18n encoding, or UTF-8, or ASCII only.

Then convert from that encoding to UTF-8 for display, changing any 
invalid bytes to "?" or the like.
Comment 21 Havoc Pennington 2002-08-29 15:27:29 EDT
Apps should probably always pass locale encoding to syslog(), so if 
you say the logs should be in UTF-8, presumably syslogd should do that conversion?
Comment 22 Bill Nottingham 2002-08-29 15:50:44 EDT
Um, the apps log in Whatever Encoding They Like. The eventual solution is to fix
all the apps. That's sort of non-trivial.
Comment 23 Eugene Kanter 2002-08-29 16:09:50 EDT
Applications seems to log in UTF-8 but half of UTF-8 russian characters
(starting with 0321) does not get to /var/log/messages properly: second byte
gets writted as a string "\uuu" where "uuu" is octal representation of the byte
that is supposed to be there.

I found a UTF-8 text which lists cyrillic letters. See attached. Using vi this
file shows cyrillic alphabet in linux console. All capital letters and about
half of lower case start with 0320, the rest of lower case stars with 0321. All
of those that start with 0321 get corrupted in /var/log/messages.

logviewer might need a workaround to avoid abnormal behavior if bad encoding is
found but problem should be resolved in the logwriter.
Comment 24 Eugene Kanter 2002-08-29 16:11:22 EDT
Created attachment 73784 [details]
Cyrillic letters in UTF-8 encoding.
Comment 25 Eugene Kanter 2002-08-30 11:40:59 EDT
If syslog can not be fixed in time would it be possible to add a workaround to
logviewer? The workaround would be to repair logfile (since it is very clear
what is broken) in memory to proper UTF-8 before showing.
Comment 26 Leonid Kanter 2002-09-10 12:04:54 EDT
In fresh installed null I'm unable to start redhat-logviewer at all. Here is
traceback.

[root@leon root]# redhat-logviewer
Traceback (most recent call last):
  File "/usr/share/redhat-logviewer/redhat-logviewer.py", line 30, in ?
    import LogViewerGui
  File "/usr/share/redhat-logviewer/LogViewerGui.py", line 128, in ?
    bootClass = LogFileClass.LogFileClass("BOOTLOG")
  File "/usr/share/redhat-logviewer/LogFileClass.py", line 69, in __init__
    self.read_log(self.prefName)
  File "/usr/share/redhat-logviewer/LogFileClass.py", line 119, in read_log
    self.buffer.insert_into_buffer_at_offset(iter, line)
  File "/usr/share/redhat-logviewer/LogBuffer.py", line 57, in
insert_into_buffer_at_offset
    self.insert(iter, unicode(new_line,'utf-8'), -1)
UnicodeError: UTF-8 decoding error: invalid data

redhat-logviewer-0.8.1-3
Comment 27 Tammy Fox 2002-09-10 14:11:41 EDT
leon, you have an old version, version 0.8.3-2 should fix the traceback.
Comment 28 Leonid Kanter 2003-04-09 10:41:06 EDT
The actual reason of the problem is that syslogd escapes some characters in 
127-160 region. This patch fixes the problem:
http://hosting.micom.net.ru/~corwin/files/sysklogd-1.4.1rh-decode_str.patch
Comment 29 Bill Nottingham 2004-06-28 23:37:15 EDT
Fixed in 1.4.1-21.
Comment 30 Jay Turner 2004-09-01 23:29:41 EDT
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-335.html

Note You need to log in before you can comment on or make changes to this bug.