Red Hat Bugzilla – Bug 174007
Viewer does not display some iso-8859-2 chars
Last modified: 2013-07-02 19:11:46 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; cs-CZ; rv:1.7.12) Gecko/20050919 Firefox/1.0.7
Description of problem:
The embedded viewer does not show some iso-8859-2 chars.
Locales is LANG="cs_CZ.ISO-8859-2"
Version-Release number of selected component (if applicable):
Steps to Reproduce:
View test file with iso-8859-2 in embedded viewer.
Actual Results: Mcedit work OK. See attachment.
mc-4.6.1a-0.13.FC4 is OK.
Created attachment 121402 [details]
Test iso-8859-2 file
Created attachment 121403 [details]
file in mc viewer
Created attachment 121404 [details]
file in mc editor
Created attachment 121405 [details]
file in mc hex viewer
The same problem with russian characters.
My investigations show that is_printable() function in non-UTF-8 locales is not
correct handles unicode symbols. UTF8 define is set, but rountine is thinking
that is simple 8bit symbol.
is_printable (int c)
return iswprint (c);
c &= 0xff;
My mark shows inoperate case.
Jindrich, please, consider this undesired behaviour, because I can not porpose
anything but dirty hack like following:
return iswprint (c);
c &= 0xff;
Oh, I forgot addition several words.
In russian case for KOI8-R (for example) the problem range is
0x400 - 0x41f.
Created attachment 121448 [details]
Dirty hack for drop described issue. It works for me (russian case) but is
needed to be tested on other cases and locales.
Andy, thanks for the investigation.
Just a question, why do you consider 127 and 155 unprintable?
It's cut-and-past from is_8bit_printable().
Due to above that question is not for me. Sorry.
Hack from Comment #7 work ok for cs_CZ.ISO-8859-2. mc-4.6.1a-4.FC4 have bug
Created attachment 122602 [details]
The problem is in src/view.c:view_display_text(): a "wc" wide character is
always used regardless of whether locale is UTF8 or not. In such a situation we
should invoke iswprint(wc) rather than is_printable(wc).
is_printable() always checks SLsmg_Is_Unicode, and if it is not set, then
assumes that its argument is normal (not wide) char, but actually it is a wide
BTW there is no any SLsmg_Is_Unicode checking/fork in the src/view.c file.
Perhaps it should be added?
Also, probably something shoild be done for src/view.c:view_display_hex(), as
national characters are not printed in the text area...
Dmitry, thanks for the investigation. Some SLsmg_Is_Unicode checks seem really
to be missing there. The fixed version is now built in rawhide.
Regarding the hex mode I disabled the displaying of national characters in the
hex editor as the hex editor should provide per-byte insight into the
edited/viewed file and user can switch with F4 back to see the text correctly
Created attachment 122614 [details]
final patch (with hex mode too)
Ooops, I am late a little.... :(
It is my completed patch.
- is_printable() --> iswprint() as above
- remove convert_to_display_c (wc), as "wc" is already computed with call to
- When UTF8, change tty_print_char() to SLsmg_write_nchars(). Slang-utf8 always
consider that the agrument for tty_print_char() is wide char. As it is not
true, we should not print non-wide char under #ifdef UTF8 with tty_print_char()
Please, take a look for it again...
Dmitry, can you describe your second part of changes fully?
I think it useless at all (flexibility is decreases, but compiler optimizes it
> I think it useless at all
Did you mean this one:
+ char ch = c & 0xff;
+ SLsmg_write_nchars (&ch, 1);
SLsmg_write_nchars() has a prototype:
"void SLsmg_write_nchars (char *, unsigned int)"
i.e. it requires a "char *" as its first argument.
"c" variable is actually "int". If we do "SLsmg_write_nchars (&c, 1)", the "&c"
is "int *", not "char *" .
Let "c" has a value of 1 ("int" 0x00000001 or "char" 0x01) . On the
little-endian machines, the "int" value is stored in the memory as "0x01, 0x00,
0x00, 0x00", i.e. *((char *) &c) == 0x01 (== c)
But on the big-endian machines (ppc, riscs etc.), the "int" is stored as "0x00,
0x00, 0x00, 0x01", therefore *((char *) &c) == "0x00" (!= c)
It is the reason for the work-around present.
Considering "c & 0xff" -- yep, it seems like a paranoia :) (but "char"
theoretically can be more than one byte ;))
Dmitry, regarding the SLsmg_write_nchars(&c,1): doesn't the SLsmg_write_char(c)
seem to be more appropriate here? Do you see problems when you use
SLsmg_write_char? Note, there's no need to bother with the endianess problems
related to int ptr -> char ptr conversion then since tty_print_char() does the
SLsmg_write_char call actually when S-Lang is enabled.
> Do you see problems when you use SLsmg_write_char?
See on the utf8-patched slang (slang-1.4.9-13 on my FC3 for example) -- i.e.,
not tarball slang, but "rpmbuild -bp slang.spec":
in the src/slang.h you can see:
extern void SLsmg_write_char (wchar_t);
extern void SLsmg_write_nwchars (wchar_t *, unsigned int);
extern void SLsmg_write_char (char);
#endif /* UTF8 */
extern void SLsmg_write_nchars (char *, unsigned int);
i.e. when UTF8, the only way to write non-wide char is to use SLsmg_write_nchars()
SLsmg_write_char() always expects a wide character, it is the reason for MC's
Yeah, that's odd. In slang-2.0.5 we have in rawhide for a while is
SLsmg_write_char declared like this:
SL_EXTERN void SLsmg_write_char (SLwchar_Type ch);
where there's no such problem any more. Please consider to file a bug against
slang to have it fixed in FC3/FC4, because this bug is actually caused by slang
and you probably don't want to introduce hacks in mc to workaround this bug.
slang-2.0.5 comes with native UTF-8 support so there's no need to apply the ugly
debian slang patch any more. Thus the bug is fixed in rawhide/FC5. I know the
bug was in slang for rather long time, but now it's gone with slang-2.0.5. You
can Cc me on the new slang bug so that I can cooperate with slang maintainer to
have it fixed ASAP in FC3/4.
Thanks for the investigation.
> Please consider to file a bug against slang
I am not sure whether it is a bug or a feature...
> don't want to introduce hacks in mc to workaround this bug.
What about FC3/FC4 CVS branches only?
Or maybe compile for FC3/FC4 --with-screen=mcslang, as bundled slang (AFAIK) is
All FC3/FC4/rawhide mcs are almost the same so that I'll enable the internal
slang-2.0.5 in mc for FC3/FC4. It sounds good to me as there's no need to touch
the system slang then.
I've tried to compile with the internal slang. Unfortunately, the national
characters are completely garbled with it...
Perhaps it is because the internal slang actually is a stripped version
(According to its slang/README file).
IMHO it would be better either apply my patch for FC3/FC4 only, or just ignore
the part related to SLsmg_write_nchars(), as it is not too critical.
Using internal "stripped" slang can cause a lot of yet unknown side effects etc...
Ok, I applied your fix without the SLsmg_write_nchars() stuff in rawhide
already, so that I'll sync FC3/FC4 to rawhide mc and release updates after a while.