Bug 71170
Summary: | [patch] uninterpreted utf-8 in log with LANG=ru_RU.UTF-8 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Leonid Kanter <leon> | ||||||||||||
Component: | sysklogd | Assignee: | Bill Nottingham <notting> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 9 | CC: | aleksey, ekanter, notting, otaylor, rvokal | ||||||||||||
Target Milestone: | --- | Keywords: | EasyFix, i18n, Patch | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | i386 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2004-06-29 03:37:15 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Leonid Kanter
2002-08-09 15:32:06 UTC
Created attachment 69744 [details]
redhat-logviewer snapshot
I thought I was translating everything to UTF-8 before displaying it. I must have missed it somewhere. Please attach the log file with the UTF-8 strings so I can verify that the fix works. This should be fixed in version 0.8.1-3. Please test when it hits rawhide. Created attachment 69985 [details]
my /var/log/messages
Looks like problem isn't in logviewer but in syslogd or initscripts - utf8 in /var/log/messages is broken. Look what happens with this file: [leon@omnibook leon]$ LANG=C iconv -f utf-8 -t koi8-r messages Aug 11 04:02:02 leon syslogd 1.4.1: restart. Aug 12 12:41:03 localhost syslogd 1.4.1: restart. Aug 12 12:41:03 localhost iconv: illegal input sequence at position 121 So component may be changed for syslog or initscripts. Possible reason can be that initscripts translation is still in koi8-r, I'll fix it today. Do you still think it is a syslog or initscripts problem? I can open translated po files, and they display fine. Unfortunately, those are the only non-English files I have except the one you have attached. The one you attached doesn't seem to work in redhat-logviewer. I think this should work with 0.8.2-3. If not, please reopen the bug. I'm still seeing the odd output from redhat-logviewer-0.8.2-3. This is looking at the log file that the user provided, so maybe it is just that logfile. Actually, just tried this with a new install and sure enough, the log file can't be viewed. I think the log file attached by leon is corrupt. The gtk.TextBuffer requires UTF-8 input, but the log files are encoded in the native encoding. So, the best solution pgampe, others, and I could come up with was to try to convert it to UTF-8 from the native encoding based on $LANG. I think leon attached a ko_KR log file, but iconv can't convert it properly because I think it is encoded incorrectly. I am about to attach a tarball of sample log files in the native encoding for the ko_KR, ja_JP, zh_TW, and zh_CN langs. If you set $LANG to the proper lang and try to open the corresponding log file, it works fine. Jay, if the attached files work for you as I have described, is that enough verification? Created attachment 73564 [details]
CJK example log files
I do not think that file that leon sent is made on the system other then LANG=ru_RU.UTF-8. I have exactly the same problem: iconv can not recognize /var/log/messages and my LANG=ru_RU.UTF-8 (default install with Russian language as primary). If you found messages file to be in some other then UTF-8 encoding then there is a problem with a log file creation process. Created attachment 73584 [details]
/var/log/messages from (null) fresh install, booted once. LANG=ru_RU.UTF-8
Please see line #448 in fresh_null_messages.txt between words "firstboot" and "SESSION_MANAGER" as an example. I found some valid UTF-8 Russian letters interlaced with bogus character sequences. Could be something wrong with the messages? Are all translations in UTF-8 now? ekanter: Are you saying the rest of the file is displayed properly except for line #448 where there is UTF-8 text? Sorry, I don't know what Russian is supposed to look like. No, the rest of the file contains errors as well. You do not really have to know how cyrillic should look like. Just examine the byte sequence using for example od -c. Here is the fragment: 0000060 f i r s t b o o t : 320 241 320 261 320 0000100 276 320 271 320 277 321 \ 2 0 0 320 270 320 276 0000120 321 \ 2 0 2 320 272 321 \ 2 0 0 321 \ 2 1 0000140 3 321 \ 2 0 2 320 270 320 270 321 \ 2 0 1 0000160 320 276 320 265 320 264 320 270 320 275 320 265 320 275 320 270 .... 0000400 2 0 5 321 \ 2 0 0 320 260 320 275 320 265 320 275 0000420 320 260 : S E S S I O N _ M A N A I do not have a table of UTF-8 in front of me but in this example valid Cyrillic two byte sequence starts with 320. So in the middle of a second line you'll find byte 321, then "\200" which is four chars instead of (apparently) one byte 0200. Then you'll see byte 0321 and four chars "\211". Looks like all UTF-8 pairs that starts with byte 0321 are corrupted the same way. Reading valid UTF-8 letters I see that the message about to be displayed makes sence. It is something like "window positon is not saved, error connecting to the window manager". It looks like not everything is logging in UTF-8. If LANG=ru_RU.UTF-8, I change it to read each line and try to convert from UTF-8 to UTF-8. If that fails, it tries to encode from koi8-r to UTF-8. However, the file still doesn't look right. Agree, the file does not look right. Every byte that follows 0321 gets converted to four bytes string "\uuu" where "uuu" is octal representation of what supposed to be there. If I manually replace every "\uuu" with a corresponding byte the resulting file becomes readable. At this poing I don't believe that logviewer is out of order - the log generator definitely is. OK. For what it is worth, I rebuild redhat-logviewer with the algorithm mentioned in my last post (UTF-8 then koi8-r) on people.redhat.com/tfox. I am deferring this bug until the log generator does the right thing. Then, we can see if redhat-logviewer needs anymore tweaks. Is there an open bug against a component responsible for log creation? The right thing here is to definitively say what encoding the logs are in. The options are /etc/sysconfig/i18n encoding, or UTF-8, or ASCII only. Then convert from that encoding to UTF-8 for display, changing any invalid bytes to "?" or the like. Apps should probably always pass locale encoding to syslog(), so if you say the logs should be in UTF-8, presumably syslogd should do that conversion? Um, the apps log in Whatever Encoding They Like. The eventual solution is to fix all the apps. That's sort of non-trivial. Applications seems to log in UTF-8 but half of UTF-8 russian characters (starting with 0321) does not get to /var/log/messages properly: second byte gets writted as a string "\uuu" where "uuu" is octal representation of the byte that is supposed to be there. I found a UTF-8 text which lists cyrillic letters. See attached. Using vi this file shows cyrillic alphabet in linux console. All capital letters and about half of lower case start with 0320, the rest of lower case stars with 0321. All of those that start with 0321 get corrupted in /var/log/messages. logviewer might need a workaround to avoid abnormal behavior if bad encoding is found but problem should be resolved in the logwriter. Created attachment 73784 [details]
Cyrillic letters in UTF-8 encoding.
If syslog can not be fixed in time would it be possible to add a workaround to logviewer? The workaround would be to repair logfile (since it is very clear what is broken) in memory to proper UTF-8 before showing. In fresh installed null I'm unable to start redhat-logviewer at all. Here is traceback. [root@leon root]# redhat-logviewer Traceback (most recent call last): File "/usr/share/redhat-logviewer/redhat-logviewer.py", line 30, in ? import LogViewerGui File "/usr/share/redhat-logviewer/LogViewerGui.py", line 128, in ? bootClass = LogFileClass.LogFileClass("BOOTLOG") File "/usr/share/redhat-logviewer/LogFileClass.py", line 69, in __init__ self.read_log(self.prefName) File "/usr/share/redhat-logviewer/LogFileClass.py", line 119, in read_log self.buffer.insert_into_buffer_at_offset(iter, line) File "/usr/share/redhat-logviewer/LogBuffer.py", line 57, in insert_into_buffer_at_offset self.insert(iter, unicode(new_line,'utf-8'), -1) UnicodeError: UTF-8 decoding error: invalid data redhat-logviewer-0.8.1-3 leon, you have an old version, version 0.8.3-2 should fix the traceback. The actual reason of the problem is that syslogd escapes some characters in 127-160 region. This patch fixes the problem: http://hosting.micom.net.ru/~corwin/files/sysklogd-1.4.1rh-decode_str.patch Fixed in 1.4.1-21. An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-335.html |