Bug 174007
Summary: | Viewer does not display some iso-8859-2 chars | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Rudolf Ulc <rudolf.ulc> | ||||||||||||||||
Component: | mc | Assignee: | Jindrich Novy <jnovy> | ||||||||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||||
Priority: | medium | ||||||||||||||||||
Version: | 4 | CC: | andy, dmitry, leonard-rh-bugzilla, pknirsch | ||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||
Target Release: | --- | ||||||||||||||||||
Hardware: | i386 | ||||||||||||||||||
OS: | Linux | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||
Last Closed: | 2005-12-28 14:19:50 UTC | Type: | --- | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Attachments: |
|
Description
Rudolf Ulc
2005-11-23 16:09:26 UTC
Created attachment 121402 [details]
Test iso-8859-2 file
Created attachment 121403 [details]
file in mc viewer
Created attachment 121404 [details]
file in mc editor
Created attachment 121405 [details]
file in mc hex viewer
The same problem with russian characters.
My investigations show that is_printable() function in non-UTF-8 locales is not
correct handles unicode symbols. UTF8 define is set, but rountine is thinking
that is simple 8bit symbol.
Problem here:
util.c:
is_printable (int c)
{
#ifdef UTF8
if (SLsmg_Is_Unicode)
return iswprint (c);
>>>> !!!
#endif
c &= 0xff;
My mark shows inoperate case.
Jindrich, please, consider this undesired behaviour, because I can not porpose
anything but dirty hack like following:
#ifdef UTF8
if (SLsmg_Is_Unicode)
return iswprint (c);
#else
c &= 0xff;
#endif
Oh, I forgot addition several words. In russian case for KOI8-R (for example) the problem range is 0x400 - 0x41f. Created attachment 121448 [details]
Dirty hack
Dirty hack for drop described issue. It works for me (russian case) but is
needed to be tested on other cases and locales.
Andy, thanks for the investigation. Just a question, why do you consider 127 and 155 unprintable? It's cut-and-past from is_8bit_printable(). Due to above that question is not for me. Sorry. Hack from Comment #7 work ok for cs_CZ.ISO-8859-2. mc-4.6.1a-4.FC4 have bug permanent. Created attachment 122602 [details]
clean patch
Jindrich,
The problem is in src/view.c:view_display_text(): a "wc" wide character is
always used regardless of whether locale is UTF8 or not. In such a situation we
should invoke iswprint(wc) rather than is_printable(wc).
is_printable() always checks SLsmg_Is_Unicode, and if it is not set, then
assumes that its argument is normal (not wide) char, but actually it is a wide
char :)
BTW there is no any SLsmg_Is_Unicode checking/fork in the src/view.c file.
Perhaps it should be added?
Also, probably something shoild be done for src/view.c:view_display_hex(), as
national characters are not printed in the text area...
Dmitry, thanks for the investigation. Some SLsmg_Is_Unicode checks seem really to be missing there. The fixed version is now built in rawhide. Regarding the hex mode I disabled the displaying of national characters in the hex editor as the hex editor should provide per-byte insight into the edited/viewed file and user can switch with F4 back to see the text correctly UTF-8ized. Created attachment 122614 [details]
final patch (with hex mode too)
Ooops, I am late a little.... :(
It is my completed patch.
- is_printable() --> iswprint() as above
- remove convert_to_display_c (wc), as "wc" is already computed with call to
it.
- When UTF8, change tty_print_char() to SLsmg_write_nchars(). Slang-utf8 always
consider that the agrument for tty_print_char() is wide char. As it is not
true, we should not print non-wide char under #ifdef UTF8 with tty_print_char()
...
Jindrich,
Please, take a look for it again...
Dmitry, can you describe your second part of changes fully? I think it useless at all (flexibility is decreases, but compiler optimizes it anyway). > I think it useless at all
Did you mean this one:
+ {
+ char ch = c & 0xff;
+ SLsmg_write_nchars (&ch, 1);
+ }
?
Well,
SLsmg_write_nchars() has a prototype:
"void SLsmg_write_nchars (char *, unsigned int)"
i.e. it requires a "char *" as its first argument.
"c" variable is actually "int". If we do "SLsmg_write_nchars (&c, 1)", the "&c"
is "int *", not "char *" .
Let "c" has a value of 1 ("int" 0x00000001 or "char" 0x01) . On the
little-endian machines, the "int" value is stored in the memory as "0x01, 0x00,
0x00, 0x00", i.e. *((char *) &c) == 0x01 (== c)
But on the big-endian machines (ppc, riscs etc.), the "int" is stored as "0x00,
0x00, 0x00, 0x01", therefore *((char *) &c) == "0x00" (!= c)
It is the reason for the work-around present.
Considering "c & 0xff" -- yep, it seems like a paranoia :) (but "char"
theoretically can be more than one byte ;))
Dmitry, regarding the SLsmg_write_nchars(&c,1): doesn't the SLsmg_write_char(c) seem to be more appropriate here? Do you see problems when you use SLsmg_write_char? Note, there's no need to bother with the endianess problems related to int ptr -> char ptr conversion then since tty_print_char() does the SLsmg_write_char call actually when S-Lang is enabled. > Do you see problems when you use SLsmg_write_char?
Sure!
See on the utf8-patched slang (slang-1.4.9-13 on my FC3 for example) -- i.e.,
not tarball slang, but "rpmbuild -bp slang.spec":
in the src/slang.h you can see:
#ifdef UTF8
extern void SLsmg_write_char (wchar_t);
extern void SLsmg_write_nwchars (wchar_t *, unsigned int);
#else
extern void SLsmg_write_char (char);
#endif /* UTF8 */
extern void SLsmg_write_nchars (char *, unsigned int);
i.e. when UTF8, the only way to write non-wide char is to use SLsmg_write_nchars()
SLsmg_write_char() always expects a wide character, it is the reason for MC's
view.c issues...
Yeah, that's odd. In slang-2.0.5 we have in rawhide for a while is SLsmg_write_char declared like this: SL_EXTERN void SLsmg_write_char (SLwchar_Type ch); where there's no such problem any more. Please consider to file a bug against slang to have it fixed in FC3/FC4, because this bug is actually caused by slang and you probably don't want to introduce hacks in mc to workaround this bug. slang-2.0.5 comes with native UTF-8 support so there's no need to apply the ugly debian slang patch any more. Thus the bug is fixed in rawhide/FC5. I know the bug was in slang for rather long time, but now it's gone with slang-2.0.5. You can Cc me on the new slang bug so that I can cooperate with slang maintainer to have it fixed ASAP in FC3/4. Thanks for the investigation. > Please consider to file a bug against slang I am not sure whether it is a bug or a feature... > don't want to introduce hacks in mc to workaround this bug. What about FC3/FC4 CVS branches only? Or maybe compile for FC3/FC4 --with-screen=mcslang, as bundled slang (AFAIK) is 2.0.5 ... All FC3/FC4/rawhide mcs are almost the same so that I'll enable the internal slang-2.0.5 in mc for FC3/FC4. It sounds good to me as there's no need to touch the system slang then. I've tried to compile with the internal slang. Unfortunately, the national characters are completely garbled with it... Perhaps it is because the internal slang actually is a stripped version (According to its slang/README file). Jindrich, IMHO it would be better either apply my patch for FC3/FC4 only, or just ignore the part related to SLsmg_write_nchars(), as it is not too critical. Using internal "stripped" slang can cause a lot of yet unknown side effects etc... Ok, I applied your fix without the SLsmg_write_nchars() stuff in rawhide already, so that I'll sync FC3/FC4 to rawhide mc and release updates after a while. |