Red Hat Bugzilla – Bug 71170
[patch] uninterpreted utf-8 in log with LANG=ru_RU.UTF-8
Last modified: 2014-03-16 22:29:56 EDT
Description of Problem:
If the system is running in ru_RU.UTF-8 locale, some messages in system logs
appear in utf8 encoding. but redhat-logviewer show them as plain latin1 8bit
text, so it's really impossible to understand something.
Version-Release number of selected component (if applicable):
Created attachment 69744 [details]
I thought I was translating everything to UTF-8 before displaying it.
I must have missed it somewhere. Please attach the log file with
the UTF-8 strings so I can verify that the fix works.
This should be fixed in version 0.8.1-3. Please test when it hits rawhide.
Created attachment 69985 [details]
Looks like problem isn't in logviewer but in syslogd or initscripts - utf8 in
/var/log/messages is broken. Look what happens with this file:
[leon@omnibook leon]$ LANG=C iconv -f utf-8 -t koi8-r messages
Aug 11 04:02:02 leon syslogd 1.4.1: restart.
Aug 12 12:41:03 localhost syslogd 1.4.1: restart.
Aug 12 12:41:03 localhost iconv: illegal input sequence at position 121
So component may be changed for syslog or initscripts. Possible reason can be
that initscripts translation is still in koi8-r, I'll fix it today.
Do you still think it is a syslog or initscripts problem? I can open
translated po files, and they display fine. Unfortunately, those
are the only non-English files I have except the one you have attached.
The one you attached doesn't seem to work in redhat-logviewer.
I think this should work with 0.8.2-3. If not, please reopen the bug.
I'm still seeing the odd output from redhat-logviewer-0.8.2-3. This is looking
at the log file that the user provided, so maybe it is just that logfile.
Actually, just tried this with a new install and sure enough, the log file can't
I think the log file attached by leon is corrupt. The gtk.TextBuffer requires
UTF-8 input, but the log files are encoded in the native encoding. So, the
best solution pgampe, others, and I could come up with was to try to convert it
to UTF-8 from the native encoding based on $LANG. I think leon attached a ko_KR
log file, but iconv can't convert it properly because I think it is encoded
incorrectly. I am about to attach a tarball of sample log files in the native
encoding for the ko_KR, ja_JP, zh_TW, and zh_CN langs. If you set $LANG to
the proper lang and try to open the corresponding log file, it works fine.
Jay, if the attached files work for you as I have described, is that enough
Created attachment 73564 [details]
CJK example log files
I do not think that file that leon sent is made on the system other then
LANG=ru_RU.UTF-8. I have exactly the same problem: iconv can not recognize
/var/log/messages and my LANG=ru_RU.UTF-8 (default install with Russian language
as primary). If you found messages file to be in some other then UTF-8 encoding
then there is a problem with a log file creation process.
Created attachment 73584 [details]
/var/log/messages from (null) fresh install, booted once. LANG=ru_RU.UTF-8
Please see line #448 in fresh_null_messages.txt between words "firstboot" and
"SESSION_MANAGER" as an example. I found some valid UTF-8 Russian letters
interlaced with bogus character sequences. Could be something wrong with the
messages? Are all translations in UTF-8 now?
email@example.com: Are you saying the rest of the file is displayed properly
except for line #448 where there is UTF-8 text? Sorry, I don't know what Russian
is supposed to look like.
No, the rest of the file contains errors as well. You do not really have to know
how cyrillic should look like. Just examine the byte sequence using for example
od -c. Here is the fragment:
0000060 f i r s t b o o t : 320 241 320 261 320
0000100 276 320 271 320 277 321 \ 2 0 0 320 270 320 276
0000120 321 \ 2 0 2 320 272 321 \ 2 0 0 321 \ 2 1
0000140 3 321 \ 2 0 2 320 270 320 270 321 \ 2 0 1
0000160 320 276 320 265 320 264 320 270 320 275 320 265 320 275 320 270
0000400 2 0 5 321 \ 2 0 0 320 260 320 275 320 265 320 275
0000420 320 260 : S E S S I O N _ M A N A
I do not have a table of UTF-8 in front of me but in this example valid Cyrillic
two byte sequence starts with 320. So in the middle of a second line you'll find
byte 321, then "\200" which is four chars instead of (apparently) one byte 0200.
Then you'll see byte 0321 and four chars "\211". Looks like all UTF-8 pairs that
starts with byte 0321 are corrupted the same way. Reading valid UTF-8 letters I
see that the message about to be displayed makes sence. It is something like
"window positon is not saved, error connecting to the window manager".
It looks like not everything is logging in UTF-8. If LANG=ru_RU.UTF-8, I change
it to read each line and try to convert from UTF-8 to UTF-8. If that fails, it
tries to encode from koi8-r to UTF-8. However, the file still doesn't look right.
Agree, the file does not look right. Every byte that follows 0321 gets converted
to four bytes string "\uuu" where "uuu" is octal representation of what supposed
to be there. If I manually replace every "\uuu" with a corresponding byte the
resulting file becomes readable. At this poing I don't believe that logviewer is
out of order - the log generator definitely is.
OK. For what it is worth, I rebuild redhat-logviewer with the algorithm
mentioned in my last post (UTF-8 then koi8-r) on people.redhat.com/tfox.
I am deferring this bug until the log generator does the right thing. Then,
we can see if redhat-logviewer needs anymore tweaks.
Is there an open bug against a component responsible for log creation?
The right thing here is to definitively say what encoding the logs are in.
The options are /etc/sysconfig/i18n encoding, or UTF-8, or ASCII only.
Then convert from that encoding to UTF-8 for display, changing any
invalid bytes to "?" or the like.
Apps should probably always pass locale encoding to syslog(), so if
you say the logs should be in UTF-8, presumably syslogd should do that conversion?
Um, the apps log in Whatever Encoding They Like. The eventual solution is to fix
all the apps. That's sort of non-trivial.
Applications seems to log in UTF-8 but half of UTF-8 russian characters
(starting with 0321) does not get to /var/log/messages properly: second byte
gets writted as a string "\uuu" where "uuu" is octal representation of the byte
that is supposed to be there.
I found a UTF-8 text which lists cyrillic letters. See attached. Using vi this
file shows cyrillic alphabet in linux console. All capital letters and about
half of lower case start with 0320, the rest of lower case stars with 0321. All
of those that start with 0321 get corrupted in /var/log/messages.
logviewer might need a workaround to avoid abnormal behavior if bad encoding is
found but problem should be resolved in the logwriter.
Created attachment 73784 [details]
Cyrillic letters in UTF-8 encoding.
If syslog can not be fixed in time would it be possible to add a workaround to
logviewer? The workaround would be to repair logfile (since it is very clear
what is broken) in memory to proper UTF-8 before showing.
In fresh installed null I'm unable to start redhat-logviewer at all. Here is
[root@leon root]# redhat-logviewer
Traceback (most recent call last):
File "/usr/share/redhat-logviewer/redhat-logviewer.py", line 30, in ?
File "/usr/share/redhat-logviewer/LogViewerGui.py", line 128, in ?
bootClass = LogFileClass.LogFileClass("BOOTLOG")
File "/usr/share/redhat-logviewer/LogFileClass.py", line 69, in __init__
File "/usr/share/redhat-logviewer/LogFileClass.py", line 119, in read_log
File "/usr/share/redhat-logviewer/LogBuffer.py", line 57, in
self.insert(iter, unicode(new_line,'utf-8'), -1)
UnicodeError: UTF-8 decoding error: invalid data
leon, you have an old version, version 0.8.3-2 should fix the traceback.
The actual reason of the problem is that syslogd escapes some characters in
127-160 region. This patch fixes the problem:
Fixed in 1.4.1-21.
An errata has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.