174007 – Viewer does not display some iso-8859-2 chars

Bug 174007 - Viewer does not display some iso-8859-2 chars

Summary: Viewer does not display some iso-8859-2 chars

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mc
Sub Component:
Version:	4
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jindrich Novy
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-11-23 16:09 UTC by Rudolf Ulc
Modified:	2013-07-02 23:11 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-12-28 14:19:50 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Test iso-8859-2 file (16 bytes, text/plain) 2005-11-23 16:10 UTC, Rudolf Ulc	no flags	Details
file in mc viewer (36.22 KB, image/jpeg) 2005-11-23 16:12 UTC, Rudolf Ulc	no flags	Details
file in mc editor (35.53 KB, image/jpeg) 2005-11-23 16:15 UTC, Rudolf Ulc	no flags	Details
file in mc hex viewer (37.73 KB, image/jpeg) 2005-11-23 16:15 UTC, Rudolf Ulc	no flags	Details
Dirty hack (363 bytes, patch) 2005-11-24 12:17 UTC, Andy Shevchenko	no flags	Details \| Diff
clean patch (378 bytes, patch) 2005-12-27 18:21 UTC, Dmitry Butskoy	no flags	Details \| Diff
final patch (with hex mode too) (790 bytes, patch) 2005-12-28 14:50 UTC, Dmitry Butskoy	no flags	Details \| Diff
Show Obsolete (2) View All

Description Rudolf Ulc 2005-11-23 16:09:26 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; cs-CZ; rv:1.7.12) Gecko/20050919 Firefox/1.0.7

Description of problem:
The embedded viewer does not show some iso-8859-2 chars. 

Locales is LANG="cs_CZ.ISO-8859-2"


Version-Release number of selected component (if applicable):
mc-4.6.1a-0.15.FC4

How reproducible:
Always

Steps to Reproduce:
View test file with iso-8859-2 in embedded viewer. 

Actual Results:  Mcedit work OK. See attachment.

Additional info:

mc-4.6.1a-0.13.FC4 is OK.

Comment 1 Rudolf Ulc 2005-11-23 16:10:49 UTC

Created attachment 121402 [details]
Test iso-8859-2 file

Comment 2 Rudolf Ulc 2005-11-23 16:12:31 UTC

Created attachment 121403 [details]
file in mc viewer

Comment 3 Rudolf Ulc 2005-11-23 16:15:03 UTC

Created attachment 121404 [details]
file in mc editor

Comment 4 Rudolf Ulc 2005-11-23 16:15:36 UTC

Created attachment 121405 [details]
file in mc hex viewer

Comment 5 Andy Shevchenko 2005-11-24 12:03:03 UTC

The same problem with russian characters.

My investigations show that is_printable() function in non-UTF-8 locales is not 
correct handles unicode symbols. UTF8 define is set, but rountine is thinking 
that is simple 8bit symbol.

Problem here:
util.c:
is_printable (int c)
{
#ifdef UTF8
    if (SLsmg_Is_Unicode)
        return iswprint (c);
>>>> !!!
#endif
    c &= 0xff;

My mark shows inoperate case.

Jindrich, please, consider this undesired behaviour, because I can not porpose 
anything but dirty hack like following:

#ifdef UTF8
    if (SLsmg_Is_Unicode)
        return iswprint (c);
#else
        c &= 0xff;
#endif

Comment 6 Andy Shevchenko 2005-11-24 12:06:21 UTC

Oh, I forgot addition several words.

In russian case for KOI8-R (for example) the problem range is
0x400 - 0x41f.

Comment 7 Andy Shevchenko 2005-11-24 12:17:00 UTC

Created attachment 121448 [details]
Dirty hack

Dirty hack for drop described issue. It works for me (russian case) but is
needed to be tested on other cases and locales.

Comment 8 Jindrich Novy 2005-11-24 14:43:26 UTC

Andy, thanks for the investigation.

Just a question, why do you consider 127 and 155 unprintable?

Comment 9 Andy Shevchenko 2005-11-24 16:16:42 UTC

It's cut-and-past from is_8bit_printable().

Due to above that question is not for me. Sorry.

Comment 10 Rudolf Ulc 2005-12-14 07:08:12 UTC

Hack from Comment #7 work ok for cs_CZ.ISO-8859-2. mc-4.6.1a-4.FC4 have bug
permanent.

Comment 11 Dmitry Butskoy 2005-12-27 18:21:22 UTC

Created attachment 122602 [details]
clean patch

Jindrich,

The problem is in src/view.c:view_display_text(): a "wc" wide character is
always used regardless of whether locale is UTF8 or not. In such a situation we
should invoke iswprint(wc) rather than is_printable(wc).

is_printable() always checks SLsmg_Is_Unicode, and if it is not set, then
assumes that its argument is normal (not wide) char, but actually it is a wide
char :)

BTW there is no any SLsmg_Is_Unicode checking/fork in the src/view.c file.
Perhaps it should be added?
Also, probably something shoild be done for src/view.c:view_display_hex(), as
national characters are not printed in the text area...

Comment 12 Jindrich Novy 2005-12-28 14:19:50 UTC

Dmitry, thanks for the investigation. Some SLsmg_Is_Unicode checks seem really
to be missing there. The fixed version is now built in rawhide.

Regarding the hex mode I disabled the displaying of national characters in the
hex editor as the hex editor should provide per-byte insight into the
edited/viewed file and user can switch with F4 back to see the text correctly
UTF-8ized.

Comment 13 Dmitry Butskoy 2005-12-28 14:50:02 UTC

Created attachment 122614 [details]
final patch (with hex mode too)

Ooops, I am late a little.... :(

It is my completed patch. 
- is_printable() --> iswprint()  as above
- remove convert_to_display_c (wc), as "wc" is already computed with call to
it.
- When UTF8, change tty_print_char() to SLsmg_write_nchars(). Slang-utf8 always
consider that the agrument for tty_print_char() is wide char. As it is not
true, we should not print non-wide char under #ifdef UTF8 with tty_print_char()
...

Jindrich,
Please, take a look for it again...

Comment 14 Andy Shevchenko 2005-12-29 10:34:26 UTC

Dmitry, can you describe your second part of changes fully?
I think it useless at all (flexibility is decreases, but compiler optimizes it 
anyway).

Comment 15 Dmitry Butskoy 2005-12-29 13:19:32 UTC

> I think it useless at all
Did you mean this one:
+	    {
+		char ch = c & 0xff;
+		SLsmg_write_nchars (&ch, 1);
+	    }
?

Well,
SLsmg_write_nchars() has a prototype:
"void SLsmg_write_nchars (char *, unsigned int)"
i.e. it requires a "char *" as its first argument.

"c" variable is actually "int". If we do "SLsmg_write_nchars (&c, 1)", the "&c"
is "int *", not "char *" . 

Let "c" has a value of 1 ("int" 0x00000001 or "char" 0x01) . On the
little-endian machines, the "int" value is stored in the memory as "0x01, 0x00,
0x00, 0x00", i.e. *((char *) &c) == 0x01 (== c)
But on the big-endian machines (ppc, riscs etc.), the "int" is stored as "0x00,
0x00, 0x00, 0x01", therefore *((char *) &c) == "0x00" (!= c)

It is the reason for the work-around present.

Considering "c & 0xff" -- yep, it seems like a paranoia :) (but "char"
theoretically can be more than one byte ;))

Comment 16 Jindrich Novy 2005-12-29 18:00:44 UTC

Dmitry, regarding the SLsmg_write_nchars(&c,1): doesn't the SLsmg_write_char(c)
seem to be more appropriate here? Do you see problems when you use
SLsmg_write_char? Note, there's no need to bother with the endianess problems
related to int ptr -> char ptr conversion then since tty_print_char() does the
SLsmg_write_char call actually when S-Lang is enabled.

Comment 17 Dmitry Butskoy 2005-12-30 11:44:36 UTC

> Do you see problems when you use SLsmg_write_char?
Sure!

See on the utf8-patched slang (slang-1.4.9-13 on my FC3 for example) -- i.e.,
not tarball slang, but "rpmbuild -bp slang.spec":

in the src/slang.h you can see:

#ifdef UTF8
extern void SLsmg_write_char (wchar_t);
extern void SLsmg_write_nwchars (wchar_t *, unsigned int);
#else
extern void SLsmg_write_char (char);
#endif /* UTF8 */
extern void SLsmg_write_nchars (char *, unsigned int);

i.e. when UTF8, the only way to write non-wide char is to use SLsmg_write_nchars()

SLsmg_write_char() always expects a wide character, it is the reason for MC's
view.c issues...

Comment 18 Jindrich Novy 2005-12-30 13:19:46 UTC

Yeah, that's odd. In slang-2.0.5 we have in rawhide for a while is
SLsmg_write_char declared like this:

SL_EXTERN void SLsmg_write_char (SLwchar_Type ch);

where there's no such problem any more. Please consider to file a bug against
slang to have it fixed in FC3/FC4, because this bug is actually caused by slang
and you probably don't want to introduce hacks in mc to workaround this bug.
slang-2.0.5 comes with native UTF-8 support so there's no need to apply the ugly
debian slang patch any more. Thus the bug is fixed in rawhide/FC5. I know the
bug was in slang for rather long time, but now it's gone with slang-2.0.5. You
can Cc me on the new slang bug so that I can cooperate with slang maintainer to
have it fixed ASAP in FC3/4.

Thanks for the investigation.

Comment 19 Dmitry Butskoy 2005-12-30 13:30:41 UTC

> Please consider to file a bug against slang 
I am not sure whether it is a bug or a feature...

> don't want to introduce hacks in mc to workaround this bug.
What about FC3/FC4 CVS branches only?

Or maybe compile for FC3/FC4 --with-screen=mcslang, as bundled slang (AFAIK) is
2.0.5 ...

Comment 20 Jindrich Novy 2005-12-30 13:50:43 UTC

All FC3/FC4/rawhide mcs are almost the same so that I'll enable the internal
slang-2.0.5 in mc for FC3/FC4. It sounds good to me as there's no need to touch
the system slang then.

Comment 21 Dmitry Butskoy 2005-12-30 14:26:44 UTC

I've tried to compile with the internal slang. Unfortunately, the national
characters are completely garbled with it...
Perhaps it is because the internal slang actually is a stripped version
(According to its slang/README file).

Jindrich,
IMHO it would be better either apply my patch for FC3/FC4 only, or just ignore
the part related to SLsmg_write_nchars(), as it is not too critical.

Using internal "stripped" slang can cause a lot of yet unknown side effects etc...

Comment 22 Jindrich Novy 2005-12-30 14:33:33 UTC

Ok, I applied your fix without the SLsmg_write_nchars() stuff in rawhide
already, so that I'll sync FC3/FC4 to rawhide mc and release updates after a while.

Note You need to log in before you can comment on or make changes to this bug.