Bug 174007 - Viewer does not display some iso-8859-2 chars
Viewer does not display some iso-8859-2 chars
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: mc (Show other bugs)
4
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Jindrich Novy
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-11-23 11:09 EST by Rudolf Ulc
Modified: 2013-07-02 19:11 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-12-28 09:19:50 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Test iso-8859-2 file (16 bytes, text/plain)
2005-11-23 11:10 EST, Rudolf Ulc
no flags Details
file in mc viewer (36.22 KB, image/jpeg)
2005-11-23 11:12 EST, Rudolf Ulc
no flags Details
file in mc editor (35.53 KB, image/jpeg)
2005-11-23 11:15 EST, Rudolf Ulc
no flags Details
file in mc hex viewer (37.73 KB, image/jpeg)
2005-11-23 11:15 EST, Rudolf Ulc
no flags Details
Dirty hack (363 bytes, patch)
2005-11-24 07:17 EST, Andy Shevchenko
no flags Details | Diff
clean patch (378 bytes, patch)
2005-12-27 13:21 EST, Dmitry Butskoy
no flags Details | Diff
final patch (with hex mode too) (790 bytes, patch)
2005-12-28 09:50 EST, Dmitry Butskoy
no flags Details | Diff

  None (edit)
Description Rudolf Ulc 2005-11-23 11:09:26 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; cs-CZ; rv:1.7.12) Gecko/20050919 Firefox/1.0.7

Description of problem:
The embedded viewer does not show some iso-8859-2 chars. 

Locales is LANG="cs_CZ.ISO-8859-2"


Version-Release number of selected component (if applicable):
mc-4.6.1a-0.15.FC4

How reproducible:
Always

Steps to Reproduce:
View test file with iso-8859-2 in embedded viewer. 

Actual Results:  Mcedit work OK. See attachment.

Additional info:

mc-4.6.1a-0.13.FC4 is OK.
Comment 1 Rudolf Ulc 2005-11-23 11:10:49 EST
Created attachment 121402 [details]
Test iso-8859-2 file
Comment 2 Rudolf Ulc 2005-11-23 11:12:31 EST
Created attachment 121403 [details]
file in mc viewer
Comment 3 Rudolf Ulc 2005-11-23 11:15:03 EST
Created attachment 121404 [details]
file in mc editor
Comment 4 Rudolf Ulc 2005-11-23 11:15:36 EST
Created attachment 121405 [details]
file in mc hex viewer
Comment 5 Andy Shevchenko 2005-11-24 07:03:03 EST
The same problem with russian characters.

My investigations show that is_printable() function in non-UTF-8 locales is not 
correct handles unicode symbols. UTF8 define is set, but rountine is thinking 
that is simple 8bit symbol.

Problem here:
util.c:
is_printable (int c)
{
#ifdef UTF8
    if (SLsmg_Is_Unicode)
        return iswprint (c);
>>>> !!!
#endif
    c &= 0xff;

My mark shows inoperate case.

Jindrich, please, consider this undesired behaviour, because I can not porpose 
anything but dirty hack like following:

#ifdef UTF8
    if (SLsmg_Is_Unicode)
        return iswprint (c);
#else
        c &= 0xff;
#endif
Comment 6 Andy Shevchenko 2005-11-24 07:06:21 EST
Oh, I forgot addition several words.

In russian case for KOI8-R (for example) the problem range is
0x400 - 0x41f.
Comment 7 Andy Shevchenko 2005-11-24 07:17:00 EST
Created attachment 121448 [details]
Dirty hack

Dirty hack for drop described issue. It works for me (russian case) but is
needed to be tested on other cases and locales.
Comment 8 Jindrich Novy 2005-11-24 09:43:26 EST
Andy, thanks for the investigation.

Just a question, why do you consider 127 and 155 unprintable?
Comment 9 Andy Shevchenko 2005-11-24 11:16:42 EST
It's cut-and-past from is_8bit_printable().

Due to above that question is not for me. Sorry.
Comment 10 Rudolf Ulc 2005-12-14 02:08:12 EST
Hack from Comment #7 work ok for cs_CZ.ISO-8859-2. mc-4.6.1a-4.FC4 have bug
permanent.
Comment 11 Dmitry Butskoy 2005-12-27 13:21:22 EST
Created attachment 122602 [details]
clean patch

Jindrich,

The problem is in src/view.c:view_display_text(): a "wc" wide character is
always used regardless of whether locale is UTF8 or not. In such a situation we
should invoke iswprint(wc) rather than is_printable(wc).

is_printable() always checks SLsmg_Is_Unicode, and if it is not set, then
assumes that its argument is normal (not wide) char, but actually it is a wide
char :)

BTW there is no any SLsmg_Is_Unicode checking/fork in the src/view.c file.
Perhaps it should be added?
Also, probably something shoild be done for src/view.c:view_display_hex(), as
national characters are not printed in the text area...
Comment 12 Jindrich Novy 2005-12-28 09:19:50 EST
Dmitry, thanks for the investigation. Some SLsmg_Is_Unicode checks seem really
to be missing there. The fixed version is now built in rawhide.

Regarding the hex mode I disabled the displaying of national characters in the
hex editor as the hex editor should provide per-byte insight into the
edited/viewed file and user can switch with F4 back to see the text correctly
UTF-8ized.
Comment 13 Dmitry Butskoy 2005-12-28 09:50:02 EST
Created attachment 122614 [details]
final patch (with hex mode too)

Ooops, I am late a little.... :(

It is my completed patch. 
- is_printable() --> iswprint()  as above
- remove convert_to_display_c (wc), as "wc" is already computed with call to
it.
- When UTF8, change tty_print_char() to SLsmg_write_nchars(). Slang-utf8 always
consider that the agrument for tty_print_char() is wide char. As it is not
true, we should not print non-wide char under #ifdef UTF8 with tty_print_char()
...

Jindrich,
Please, take a look for it again...
Comment 14 Andy Shevchenko 2005-12-29 05:34:26 EST
Dmitry, can you describe your second part of changes fully?
I think it useless at all (flexibility is decreases, but compiler optimizes it 
anyway).
Comment 15 Dmitry Butskoy 2005-12-29 08:19:32 EST
> I think it useless at all
Did you mean this one:
+	    {
+		char ch = c & 0xff;
+		SLsmg_write_nchars (&ch, 1);
+	    }
?

Well,
SLsmg_write_nchars() has a prototype:
"void SLsmg_write_nchars (char *, unsigned int)"
i.e. it requires a "char *" as its first argument.

"c" variable is actually "int". If we do "SLsmg_write_nchars (&c, 1)", the "&c"
is "int *", not "char *" . 

Let "c" has a value of 1 ("int" 0x00000001 or "char" 0x01) . On the
little-endian machines, the "int" value is stored in the memory as "0x01, 0x00,
0x00, 0x00", i.e. *((char *) &c) == 0x01 (== c)
But on the big-endian machines (ppc, riscs etc.), the "int" is stored as "0x00,
0x00, 0x00, 0x01", therefore *((char *) &c) == "0x00" (!= c)

It is the reason for the work-around present.

Considering "c & 0xff" -- yep, it seems like a paranoia :) (but "char"
theoretically can be more than one byte ;))


Comment 16 Jindrich Novy 2005-12-29 13:00:44 EST
Dmitry, regarding the SLsmg_write_nchars(&c,1): doesn't the SLsmg_write_char(c)
seem to be more appropriate here? Do you see problems when you use
SLsmg_write_char? Note, there's no need to bother with the endianess problems
related to int ptr -> char ptr conversion then since tty_print_char() does the
SLsmg_write_char call actually when S-Lang is enabled.
Comment 17 Dmitry Butskoy 2005-12-30 06:44:36 EST
> Do you see problems when you use SLsmg_write_char?
Sure!

See on the utf8-patched slang (slang-1.4.9-13 on my FC3 for example) -- i.e.,
not tarball slang, but "rpmbuild -bp slang.spec":

in the src/slang.h you can see:

#ifdef UTF8
extern void SLsmg_write_char (wchar_t);
extern void SLsmg_write_nwchars (wchar_t *, unsigned int);
#else
extern void SLsmg_write_char (char);
#endif /* UTF8 */
extern void SLsmg_write_nchars (char *, unsigned int);

i.e. when UTF8, the only way to write non-wide char is to use SLsmg_write_nchars()

SLsmg_write_char() always expects a wide character, it is the reason for MC's
view.c issues...
Comment 18 Jindrich Novy 2005-12-30 08:19:46 EST
Yeah, that's odd. In slang-2.0.5 we have in rawhide for a while is
SLsmg_write_char declared like this:

SL_EXTERN void SLsmg_write_char (SLwchar_Type ch);

where there's no such problem any more. Please consider to file a bug against
slang to have it fixed in FC3/FC4, because this bug is actually caused by slang
and you probably don't want to introduce hacks in mc to workaround this bug.
slang-2.0.5 comes with native UTF-8 support so there's no need to apply the ugly
debian slang patch any more. Thus the bug is fixed in rawhide/FC5. I know the
bug was in slang for rather long time, but now it's gone with slang-2.0.5. You
can Cc me on the new slang bug so that I can cooperate with slang maintainer to
have it fixed ASAP in FC3/4.

Thanks for the investigation.
Comment 19 Dmitry Butskoy 2005-12-30 08:30:41 EST
> Please consider to file a bug against slang 
I am not sure whether it is a bug or a feature...

> don't want to introduce hacks in mc to workaround this bug.
What about FC3/FC4 CVS branches only?

Or maybe compile for FC3/FC4 --with-screen=mcslang, as bundled slang (AFAIK) is
2.0.5 ...
 
Comment 20 Jindrich Novy 2005-12-30 08:50:43 EST
All FC3/FC4/rawhide mcs are almost the same so that I'll enable the internal
slang-2.0.5 in mc for FC3/FC4. It sounds good to me as there's no need to touch
the system slang then.
Comment 21 Dmitry Butskoy 2005-12-30 09:26:44 EST
I've tried to compile with the internal slang. Unfortunately, the national
characters are completely garbled with it...
Perhaps it is because the internal slang actually is a stripped version
(According to its slang/README file).

Jindrich,
IMHO it would be better either apply my patch for FC3/FC4 only, or just ignore
the part related to SLsmg_write_nchars(), as it is not too critical.

Using internal "stripped" slang can cause a lot of yet unknown side effects etc...
Comment 22 Jindrich Novy 2005-12-30 09:33:33 EST
Ok, I applied your fix without the SLsmg_write_nchars() stuff in rawhide
already, so that I'll sync FC3/FC4 to rawhide mc and release updates after a while.

Note You need to log in before you can comment on or make changes to this bug.