Bug 174007

Summary: Viewer does not display some iso-8859-2 chars
Product: [Fedora] Fedora Reporter: Rudolf Ulc <rudolf.ulc>
Component: mcAssignee: Jindrich Novy <jnovy>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: andy, dmitry, leonard-rh-bugzilla, pknirsch
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-12-28 14:19:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Test iso-8859-2 file
none
file in mc viewer
none
file in mc editor
none
file in mc hex viewer
none
Dirty hack
none
clean patch
none
final patch (with hex mode too) none

Description Rudolf Ulc 2005-11-23 16:09:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; cs-CZ; rv:1.7.12) Gecko/20050919 Firefox/1.0.7

Description of problem:
The embedded viewer does not show some iso-8859-2 chars. 

Locales is LANG="cs_CZ.ISO-8859-2"


Version-Release number of selected component (if applicable):
mc-4.6.1a-0.15.FC4

How reproducible:
Always

Steps to Reproduce:
View test file with iso-8859-2 in embedded viewer. 

Actual Results:  Mcedit work OK. See attachment.

Additional info:

mc-4.6.1a-0.13.FC4 is OK.

Comment 1 Rudolf Ulc 2005-11-23 16:10:49 UTC
Created attachment 121402 [details]
Test iso-8859-2 file

Comment 2 Rudolf Ulc 2005-11-23 16:12:31 UTC
Created attachment 121403 [details]
file in mc viewer

Comment 3 Rudolf Ulc 2005-11-23 16:15:03 UTC
Created attachment 121404 [details]
file in mc editor

Comment 4 Rudolf Ulc 2005-11-23 16:15:36 UTC
Created attachment 121405 [details]
file in mc hex viewer

Comment 5 Andy Shevchenko 2005-11-24 12:03:03 UTC
The same problem with russian characters.

My investigations show that is_printable() function in non-UTF-8 locales is not 
correct handles unicode symbols. UTF8 define is set, but rountine is thinking 
that is simple 8bit symbol.

Problem here:
util.c:
is_printable (int c)
{
#ifdef UTF8
    if (SLsmg_Is_Unicode)
        return iswprint (c);
>>>> !!!
#endif
    c &= 0xff;

My mark shows inoperate case.

Jindrich, please, consider this undesired behaviour, because I can not porpose 
anything but dirty hack like following:

#ifdef UTF8
    if (SLsmg_Is_Unicode)
        return iswprint (c);
#else
        c &= 0xff;
#endif


Comment 6 Andy Shevchenko 2005-11-24 12:06:21 UTC
Oh, I forgot addition several words.

In russian case for KOI8-R (for example) the problem range is
0x400 - 0x41f.


Comment 7 Andy Shevchenko 2005-11-24 12:17:00 UTC
Created attachment 121448 [details]
Dirty hack

Dirty hack for drop described issue. It works for me (russian case) but is
needed to be tested on other cases and locales.

Comment 8 Jindrich Novy 2005-11-24 14:43:26 UTC
Andy, thanks for the investigation.

Just a question, why do you consider 127 and 155 unprintable?

Comment 9 Andy Shevchenko 2005-11-24 16:16:42 UTC
It's cut-and-past from is_8bit_printable().

Due to above that question is not for me. Sorry.

Comment 10 Rudolf Ulc 2005-12-14 07:08:12 UTC
Hack from Comment #7 work ok for cs_CZ.ISO-8859-2. mc-4.6.1a-4.FC4 have bug
permanent.

Comment 11 Dmitry Butskoy 2005-12-27 18:21:22 UTC
Created attachment 122602 [details]
clean patch

Jindrich,

The problem is in src/view.c:view_display_text(): a "wc" wide character is
always used regardless of whether locale is UTF8 or not. In such a situation we
should invoke iswprint(wc) rather than is_printable(wc).

is_printable() always checks SLsmg_Is_Unicode, and if it is not set, then
assumes that its argument is normal (not wide) char, but actually it is a wide
char :)

BTW there is no any SLsmg_Is_Unicode checking/fork in the src/view.c file.
Perhaps it should be added?
Also, probably something shoild be done for src/view.c:view_display_hex(), as
national characters are not printed in the text area...

Comment 12 Jindrich Novy 2005-12-28 14:19:50 UTC
Dmitry, thanks for the investigation. Some SLsmg_Is_Unicode checks seem really
to be missing there. The fixed version is now built in rawhide.

Regarding the hex mode I disabled the displaying of national characters in the
hex editor as the hex editor should provide per-byte insight into the
edited/viewed file and user can switch with F4 back to see the text correctly
UTF-8ized.

Comment 13 Dmitry Butskoy 2005-12-28 14:50:02 UTC
Created attachment 122614 [details]
final patch (with hex mode too)

Ooops, I am late a little.... :(

It is my completed patch. 
- is_printable() --> iswprint()  as above
- remove convert_to_display_c (wc), as "wc" is already computed with call to
it.
- When UTF8, change tty_print_char() to SLsmg_write_nchars(). Slang-utf8 always
consider that the agrument for tty_print_char() is wide char. As it is not
true, we should not print non-wide char under #ifdef UTF8 with tty_print_char()
...

Jindrich,
Please, take a look for it again...

Comment 14 Andy Shevchenko 2005-12-29 10:34:26 UTC
Dmitry, can you describe your second part of changes fully?
I think it useless at all (flexibility is decreases, but compiler optimizes it 
anyway).


Comment 15 Dmitry Butskoy 2005-12-29 13:19:32 UTC
> I think it useless at all
Did you mean this one:
+	    {
+		char ch = c & 0xff;
+		SLsmg_write_nchars (&ch, 1);
+	    }
?

Well,
SLsmg_write_nchars() has a prototype:
"void SLsmg_write_nchars (char *, unsigned int)"
i.e. it requires a "char *" as its first argument.

"c" variable is actually "int". If we do "SLsmg_write_nchars (&c, 1)", the "&c"
is "int *", not "char *" . 

Let "c" has a value of 1 ("int" 0x00000001 or "char" 0x01) . On the
little-endian machines, the "int" value is stored in the memory as "0x01, 0x00,
0x00, 0x00", i.e. *((char *) &c) == 0x01 (== c)
But on the big-endian machines (ppc, riscs etc.), the "int" is stored as "0x00,
0x00, 0x00, 0x01", therefore *((char *) &c) == "0x00" (!= c)

It is the reason for the work-around present.

Considering "c & 0xff" -- yep, it seems like a paranoia :) (but "char"
theoretically can be more than one byte ;))




Comment 16 Jindrich Novy 2005-12-29 18:00:44 UTC
Dmitry, regarding the SLsmg_write_nchars(&c,1): doesn't the SLsmg_write_char(c)
seem to be more appropriate here? Do you see problems when you use
SLsmg_write_char? Note, there's no need to bother with the endianess problems
related to int ptr -> char ptr conversion then since tty_print_char() does the
SLsmg_write_char call actually when S-Lang is enabled.

Comment 17 Dmitry Butskoy 2005-12-30 11:44:36 UTC
> Do you see problems when you use SLsmg_write_char?
Sure!

See on the utf8-patched slang (slang-1.4.9-13 on my FC3 for example) -- i.e.,
not tarball slang, but "rpmbuild -bp slang.spec":

in the src/slang.h you can see:

#ifdef UTF8
extern void SLsmg_write_char (wchar_t);
extern void SLsmg_write_nwchars (wchar_t *, unsigned int);
#else
extern void SLsmg_write_char (char);
#endif /* UTF8 */
extern void SLsmg_write_nchars (char *, unsigned int);

i.e. when UTF8, the only way to write non-wide char is to use SLsmg_write_nchars()

SLsmg_write_char() always expects a wide character, it is the reason for MC's
view.c issues...


Comment 18 Jindrich Novy 2005-12-30 13:19:46 UTC
Yeah, that's odd. In slang-2.0.5 we have in rawhide for a while is
SLsmg_write_char declared like this:

SL_EXTERN void SLsmg_write_char (SLwchar_Type ch);

where there's no such problem any more. Please consider to file a bug against
slang to have it fixed in FC3/FC4, because this bug is actually caused by slang
and you probably don't want to introduce hacks in mc to workaround this bug.
slang-2.0.5 comes with native UTF-8 support so there's no need to apply the ugly
debian slang patch any more. Thus the bug is fixed in rawhide/FC5. I know the
bug was in slang for rather long time, but now it's gone with slang-2.0.5. You
can Cc me on the new slang bug so that I can cooperate with slang maintainer to
have it fixed ASAP in FC3/4.

Thanks for the investigation.

Comment 19 Dmitry Butskoy 2005-12-30 13:30:41 UTC
> Please consider to file a bug against slang 
I am not sure whether it is a bug or a feature...

> don't want to introduce hacks in mc to workaround this bug.
What about FC3/FC4 CVS branches only?

Or maybe compile for FC3/FC4 --with-screen=mcslang, as bundled slang (AFAIK) is
2.0.5 ...
 

Comment 20 Jindrich Novy 2005-12-30 13:50:43 UTC
All FC3/FC4/rawhide mcs are almost the same so that I'll enable the internal
slang-2.0.5 in mc for FC3/FC4. It sounds good to me as there's no need to touch
the system slang then.

Comment 21 Dmitry Butskoy 2005-12-30 14:26:44 UTC
I've tried to compile with the internal slang. Unfortunately, the national
characters are completely garbled with it...
Perhaps it is because the internal slang actually is a stripped version
(According to its slang/README file).

Jindrich,
IMHO it would be better either apply my patch for FC3/FC4 only, or just ignore
the part related to SLsmg_write_nchars(), as it is not too critical.

Using internal "stripped" slang can cause a lot of yet unknown side effects etc...

Comment 22 Jindrich Novy 2005-12-30 14:33:33 UTC
Ok, I applied your fix without the SLsmg_write_nchars() stuff in rawhide
already, so that I'll sync FC3/FC4 to rawhide mc and release updates after a while.