Bug 174007

Summary: Viewer does not display some iso-8859-2 chars
Product: [Fedora] Fedora Reporter: Rudolf Ulc <rudolf.ulc>
Component: mcAssignee: Jindrich Novy <jnovy>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: andy, dmitry, leonard-rh-bugzilla, pknirsch
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-12-28 14:19:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
Test iso-8859-2 file
file in mc viewer
file in mc editor
file in mc hex viewer
Dirty hack
clean patch
final patch (with hex mode too) none

Description Rudolf Ulc 2005-11-23 16:09:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; cs-CZ; rv:1.7.12) Gecko/20050919 Firefox/1.0.7

Description of problem:
The embedded viewer does not show some iso-8859-2 chars. 

Locales is LANG="cs_CZ.ISO-8859-2"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
View test file with iso-8859-2 in embedded viewer. 

Actual Results:  Mcedit work OK. See attachment.

Additional info:

mc-4.6.1a-0.13.FC4 is OK.

Comment 1 Rudolf Ulc 2005-11-23 16:10:49 UTC
Created attachment 121402 [details]
Test iso-8859-2 file

Comment 2 Rudolf Ulc 2005-11-23 16:12:31 UTC
Created attachment 121403 [details]
file in mc viewer

Comment 3 Rudolf Ulc 2005-11-23 16:15:03 UTC
Created attachment 121404 [details]
file in mc editor

Comment 4 Rudolf Ulc 2005-11-23 16:15:36 UTC
Created attachment 121405 [details]
file in mc hex viewer

Comment 5 Andy Shevchenko 2005-11-24 12:03:03 UTC
The same problem with russian characters.

My investigations show that is_printable() function in non-UTF-8 locales is not 
correct handles unicode symbols. UTF8 define is set, but rountine is thinking 
that is simple 8bit symbol.

Problem here:
is_printable (int c)
#ifdef UTF8
    if (SLsmg_Is_Unicode)
        return iswprint (c);
>>>> !!!
    c &= 0xff;

My mark shows inoperate case.

Jindrich, please, consider this undesired behaviour, because I can not porpose 
anything but dirty hack like following:

#ifdef UTF8
    if (SLsmg_Is_Unicode)
        return iswprint (c);
        c &= 0xff;

Comment 6 Andy Shevchenko 2005-11-24 12:06:21 UTC
Oh, I forgot addition several words.

In russian case for KOI8-R (for example) the problem range is
0x400 - 0x41f.

Comment 7 Andy Shevchenko 2005-11-24 12:17:00 UTC
Created attachment 121448 [details]
Dirty hack

Dirty hack for drop described issue. It works for me (russian case) but is
needed to be tested on other cases and locales.

Comment 8 Jindrich Novy 2005-11-24 14:43:26 UTC
Andy, thanks for the investigation.

Just a question, why do you consider 127 and 155 unprintable?

Comment 9 Andy Shevchenko 2005-11-24 16:16:42 UTC
It's cut-and-past from is_8bit_printable().

Due to above that question is not for me. Sorry.

Comment 10 Rudolf Ulc 2005-12-14 07:08:12 UTC
Hack from Comment #7 work ok for cs_CZ.ISO-8859-2. mc-4.6.1a-4.FC4 have bug

Comment 11 Dmitry Butskoy 2005-12-27 18:21:22 UTC
Created attachment 122602 [details]
clean patch


The problem is in src/view.c:view_display_text(): a "wc" wide character is
always used regardless of whether locale is UTF8 or not. In such a situation we
should invoke iswprint(wc) rather than is_printable(wc).

is_printable() always checks SLsmg_Is_Unicode, and if it is not set, then
assumes that its argument is normal (not wide) char, but actually it is a wide
char :)

BTW there is no any SLsmg_Is_Unicode checking/fork in the src/view.c file.
Perhaps it should be added?
Also, probably something shoild be done for src/view.c:view_display_hex(), as
national characters are not printed in the text area...

Comment 12 Jindrich Novy 2005-12-28 14:19:50 UTC
Dmitry, thanks for the investigation. Some SLsmg_Is_Unicode checks seem really
to be missing there. The fixed version is now built in rawhide.

Regarding the hex mode I disabled the displaying of national characters in the
hex editor as the hex editor should provide per-byte insight into the
edited/viewed file and user can switch with F4 back to see the text correctly

Comment 13 Dmitry Butskoy 2005-12-28 14:50:02 UTC
Created attachment 122614 [details]
final patch (with hex mode too)

Ooops, I am late a little.... :(

It is my completed patch. 
- is_printable() --> iswprint()  as above
- remove convert_to_display_c (wc), as "wc" is already computed with call to
- When UTF8, change tty_print_char() to SLsmg_write_nchars(). Slang-utf8 always
consider that the agrument for tty_print_char() is wide char. As it is not
true, we should not print non-wide char under #ifdef UTF8 with tty_print_char()

Please, take a look for it again...

Comment 14 Andy Shevchenko 2005-12-29 10:34:26 UTC
Dmitry, can you describe your second part of changes fully?
I think it useless at all (flexibility is decreases, but compiler optimizes it 

Comment 15 Dmitry Butskoy 2005-12-29 13:19:32 UTC
> I think it useless at all
Did you mean this one:
+	    {
+		char ch = c & 0xff;
+		SLsmg_write_nchars (&ch, 1);
+	    }

SLsmg_write_nchars() has a prototype:
"void SLsmg_write_nchars (char *, unsigned int)"
i.e. it requires a "char *" as its first argument.

"c" variable is actually "int". If we do "SLsmg_write_nchars (&c, 1)", the "&c"
is "int *", not "char *" . 

Let "c" has a value of 1 ("int" 0x00000001 or "char" 0x01) . On the
little-endian machines, the "int" value is stored in the memory as "0x01, 0x00,
0x00, 0x00", i.e. *((char *) &c) == 0x01 (== c)
But on the big-endian machines (ppc, riscs etc.), the "int" is stored as "0x00,
0x00, 0x00, 0x01", therefore *((char *) &c) == "0x00" (!= c)

It is the reason for the work-around present.

Considering "c & 0xff" -- yep, it seems like a paranoia :) (but "char"
theoretically can be more than one byte ;))

Comment 16 Jindrich Novy 2005-12-29 18:00:44 UTC
Dmitry, regarding the SLsmg_write_nchars(&c,1): doesn't the SLsmg_write_char(c)
seem to be more appropriate here? Do you see problems when you use
SLsmg_write_char? Note, there's no need to bother with the endianess problems
related to int ptr -> char ptr conversion then since tty_print_char() does the
SLsmg_write_char call actually when S-Lang is enabled.

Comment 17 Dmitry Butskoy 2005-12-30 11:44:36 UTC
> Do you see problems when you use SLsmg_write_char?

See on the utf8-patched slang (slang-1.4.9-13 on my FC3 for example) -- i.e.,
not tarball slang, but "rpmbuild -bp slang.spec":

in the src/slang.h you can see:

#ifdef UTF8
extern void SLsmg_write_char (wchar_t);
extern void SLsmg_write_nwchars (wchar_t *, unsigned int);
extern void SLsmg_write_char (char);
#endif /* UTF8 */
extern void SLsmg_write_nchars (char *, unsigned int);

i.e. when UTF8, the only way to write non-wide char is to use SLsmg_write_nchars()

SLsmg_write_char() always expects a wide character, it is the reason for MC's
view.c issues...

Comment 18 Jindrich Novy 2005-12-30 13:19:46 UTC
Yeah, that's odd. In slang-2.0.5 we have in rawhide for a while is
SLsmg_write_char declared like this:

SL_EXTERN void SLsmg_write_char (SLwchar_Type ch);

where there's no such problem any more. Please consider to file a bug against
slang to have it fixed in FC3/FC4, because this bug is actually caused by slang
and you probably don't want to introduce hacks in mc to workaround this bug.
slang-2.0.5 comes with native UTF-8 support so there's no need to apply the ugly
debian slang patch any more. Thus the bug is fixed in rawhide/FC5. I know the
bug was in slang for rather long time, but now it's gone with slang-2.0.5. You
can Cc me on the new slang bug so that I can cooperate with slang maintainer to
have it fixed ASAP in FC3/4.

Thanks for the investigation.

Comment 19 Dmitry Butskoy 2005-12-30 13:30:41 UTC
> Please consider to file a bug against slang 
I am not sure whether it is a bug or a feature...

> don't want to introduce hacks in mc to workaround this bug.
What about FC3/FC4 CVS branches only?

Or maybe compile for FC3/FC4 --with-screen=mcslang, as bundled slang (AFAIK) is
2.0.5 ...

Comment 20 Jindrich Novy 2005-12-30 13:50:43 UTC
All FC3/FC4/rawhide mcs are almost the same so that I'll enable the internal
slang-2.0.5 in mc for FC3/FC4. It sounds good to me as there's no need to touch
the system slang then.

Comment 21 Dmitry Butskoy 2005-12-30 14:26:44 UTC
I've tried to compile with the internal slang. Unfortunately, the national
characters are completely garbled with it...
Perhaps it is because the internal slang actually is a stripped version
(According to its slang/README file).

IMHO it would be better either apply my patch for FC3/FC4 only, or just ignore
the part related to SLsmg_write_nchars(), as it is not too critical.

Using internal "stripped" slang can cause a lot of yet unknown side effects etc...

Comment 22 Jindrich Novy 2005-12-30 14:33:33 UTC
Ok, I applied your fix without the SLsmg_write_nchars() stuff in rawhide
already, so that I'll sync FC3/FC4 to rawhide mc and release updates after a while.