Bug 154516 - UTF8 patch and single byte codesets
UTF8 patch and single byte codesets
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: mc (Show other bugs)
3
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jindrich Novy
:
Depends On:
Blocks: 155974
  Show dependency treegraph
 
Reported: 2005-04-12 09:29 EDT by Andrew V. Samoilov
Modified: 2013-07-02 19:07 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-04-21 09:18:37 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
There are the same strings in UTF8 and KOI8-R (251 bytes, application/octet-stream)
2005-04-12 11:02 EDT, Andrew V. Samoilov
no flags Details
This patch fixes charset conversion in viewer (971 bytes, patch)
2005-04-13 06:23 EDT, Andrew V. Samoilov
no flags Details | Diff
Warnings, memory leak an possible use of unitialized variable (5.65 KB, patch)
2005-04-14 13:27 EDT, Andrew V. Samoilov
no flags Details | Diff
The non-UTF8 related fixes moved to separate patch. (2.62 KB, patch)
2005-04-15 06:58 EDT, Jindrich Novy
no flags Details | Diff
The complete FC3 UTF8 patch after the update. (31.17 KB, patch)
2005-04-15 07:00 EDT, Jindrich Novy
no flags Details | Diff

  None (edit)
Description Andrew V. Samoilov 2005-04-12 09:29:15 EDT
Description of problem:

UTF8ized mc show single byte codeset characters (>128) as dots '.'
Charset conversion feature is also bkoken.

Version-Release number of selected component (if applicable): 4.6.1-0.12.FC3

How reproducible: Always


Steps to Reproduce:
1. Set locale to en_US.UTF-8
2. Look at any Cyrillic single byte codeset text with mcview/mcedit.
  
Actual results:
Cyryllic characters are shown as dots.

Expected results: Readable text.
Comment 1 Jindrich Novy 2005-04-12 10:16:39 EDT
Hello Andrew,

could you please attach here a sample message in a single byte Cyrillic codeset
to let me see?
Comment 2 Andrew V. Samoilov 2005-04-12 11:02:20 EDT
Created attachment 113022 [details]
There are the same strings in UTF8 and KOI8-R
Comment 3 Leonard den Ottolander 2005-04-12 19:47:46 EDT
Probably the terminal you are using is involved in this. Without more
information on the terminal you are using we can't investigate this report.

When using a gnome-terminal, setting terminal char coding to KOI8-R and running
mc with LANG=ru_RU.KOI8-R things are as expected: UTF-8 string shows up as
double length garbage and KOI8-R shows up nicely. Nothing wrong there.

I expect this to be a limitation of the terminal you are using. I am personnaly
using xterm-192-1 on FC1 for its UTF-8 support or I couldn't use the utf8
patches with xterm at all.

Starting an xterm with LANG=ru_RU.KOI8-R xterm +u8 & and running mc(edit) with
LANG=ru_RU.KOI8-R gives me output that is *somewhat* expected (i.e. input
behaves strangely but string is shown "correctly" (in the sense that the 8 bit
chars are shown correctly)).

Again, I expect the limited support of charsets by the used terminal to be the
culprit, not so much the patch itself. Please provide more information.
Comment 4 Leonard den Ottolander 2005-04-12 19:55:30 EDT
My interpretation of the "correctness" of xterm when invoked with +u8 is
somewhat optimistic. Actually the only result is it shows the second string as
8bit, but still as a Latin charset.
Comment 5 Leonard den Ottolander 2005-04-12 20:01:05 EDT
Steps to Reproduce:
1. Set locale to en_US.UTF-8

Dang. Overlooked this. When using 8 bit charsets you should set your locale
accordingly. There is no magic in this patch to catch these kind of user errors.

Of course it would be nice if the user could override the current locale via a
menu option but that is of later concern.

Anyway, your example will *always* show only 2 of the 4 lines correctly. You
can't expect to be able to mix character sets in *any* application without
explicitely tagging them.

Closing NOTABUG.
Comment 6 Andrew V. Samoilov 2005-04-13 02:45:57 EDT
Hello, Leonard and Jindrich!

Anyway charset conversion feature is broken.

Steps to Reproduce:

1. configure --enable-charset ; make
2. Set locale to ru_RU.KOI8-R
3. Set "Input / display codepage:"  to KOI8-R: F9 -> Options... -> Display bits
4. Open attached ru-utf8-koi8r.txt.gz with mcedit/mcview
5. Try to covert text to CP1251: Ctrl-T

Actual results:

Nothing happens

Expected results:

Converted text.
Comment 7 Andrew V. Samoilov 2005-04-13 06:23:17 EDT
Created attachment 113087 [details]
This patch fixes charset conversion in viewer
Comment 8 Jindrich Novy 2005-04-14 11:00:30 EDT
Hello Andrew,

I'm still unable to see any difference following the steps to reproduce after
applying your fix to the UTF8 patch and having mc compiled with
--enable-charsets feature.

The problem is likely at my side, so please check what's wrong with this:

1. LANG=ru_RU.KOI8-R xterm +u8       (xterm-192-1)
2. export LANG=ru_RU.KOI8-R
3. mc (having display/codepage set to KOI8-R)
4. The UTF8 text turned to two byte mess, KOI8-R text is displayed as latin, not   
   cyrillic one
5. ctrl-T and selecting Windows 1251 translates the latin text to capital 
   letters.
Comment 9 Andrew V. Samoilov 2005-04-14 12:04:12 EDT
(In reply to comment #8)

> 4. The UTF8 text turned to two byte mess,

It's ok for now.

> KOI8-R text is displayed as latin, not cyrillic one

It's possible cyrillic fonts is not installed at your system.
Also KOI8-R was designed to be transliterated to phonetically like
English characters if 8 bit is cleaned.

> 5. ctrl-T and selecting Windows 1251 translates the latin text to capital 
>    letters.

It is possible for reasons above.
Does KOI8-R text on this step without last fix?

Comment 10 Andrew V. Samoilov 2005-04-14 12:11:44 EDT
(In reply to comment #9)
> Does KOI8-R text on this step without last fix?

Is KOI8-R text changed after C-T cp1251 without this fix at your system?
Comment 11 Andrew V. Samoilov 2005-04-14 13:27:44 EDT
Created attachment 113170 [details]
Warnings, memory leak an possible use of unitialized variable
Comment 12 Jindrich Novy 2005-04-15 03:49:06 EDT
wrt comment #10: Nope, the text wasn't changed at all before applying your
patch. It seems like your patch fixes the conversion, thanks!

comment #11:
Yes, there's a memory leak in edit_complete_word_cmd () introduced by the .utf8
patch when UTF8 support is disabled.

Andrew, please consider sending the patch for elimination of the redundant
variable in src/widget.c/backward_delete () also to mc-devel as upstream is also
affected by this.

Leonard, could you please take care of committing this fix in upstream after
it's sent to mc-devel?

In case of edit/editcmd.c/edit_replace_cmd () I replaced

-                           t = (unsigned char *) &sargs[k - 1][0];
+                           t = (mc_wchar_t *) &sargs[k - 1][0];

with

-                           t = (unsigned char *) &sargs[k - 1][0];
+#ifndef UTF8
+                           t = (unsigned char *) &sargs[k - 1][0];
+#else /* UTF8 */
+                           t = (mc_wchar_t *) &sargs[k - 1][0];
+#endif /* UTF8 */

Comment 13 Jindrich Novy 2005-04-15 04:11:18 EDT
I merged the UTF8 related fixes with the .utf8 patch and the rest I moved to
.miscfix patch.
Comment 14 Andrew V. Samoilov 2005-04-15 04:48:16 EDT
(In reply to comment #12)
> In case of edit/editcmd.c/edit_replace_cmd () I replaced
> 
> -                           t = (unsigned char *) &sargs[k - 1][0];
> +                           t = (mc_wchar_t *) &sargs[k - 1][0];
> 
> with
> 
> -                           t = (unsigned char *) &sargs[k - 1][0];
> +#ifndef UTF8
> +                           t = (unsigned char *) &sargs[k - 1][0];
> +#else /* UTF8 */
> +                           t = (mc_wchar_t *) &sargs[k - 1][0];
> +#endif /* UTF8 */

What's a reason for this replace?  

(In reply to comment #13)
> I merged the UTF8 related fixes with the .utf8 patch and the rest I moved to
> .miscfix patch.

Where these patches are available?
Comment 15 Jindrich Novy 2005-04-15 05:35:31 EDT
I replaced this because it's good to separate the non-UTF8ized variant from the
UTF8 one in order to find out that the change was done as UTF8 related just by
looking into the code. This makes the code more comprehensive from that point of
view if you consider we have to rewrite the UTF8 patches in future in order to
have some unified interface for the whole mc. Note when UTF8 is undefined
mc_wchar_t is unsigned char anyway.

The best way how to check the patches is cvsweb interface:
http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/rpms/mc/FC-3/
Comment 16 Andrew V. Samoilov 2005-04-15 06:52:10 EDT
(In reply to comment #15)
> http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/rpms/mc/FC-3/

Unable to determine IP address from host name for cvs.devel.redhat.com
Comment 17 Jindrich Novy 2005-04-15 06:54:41 EDT
Ok, I'll add the patches here as an attachment.
Comment 18 Jindrich Novy 2005-04-15 06:58:30 EDT
Created attachment 113217 [details]
The non-UTF8 related fixes moved to separate patch.
Comment 19 Jindrich Novy 2005-04-15 07:00:25 EDT
Created attachment 113218 [details]
The complete FC3 UTF8 patch after the update.
Comment 20 Jindrich Novy 2005-04-15 07:02:15 EDT
Note that the attached UTF8 patch is bzip2'd.
Comment 21 Leonard den Ottolander 2005-04-16 16:48:02 EDT
 static void
 backward_delete (WInput *in)
 {
-    int i;

?? i is used in the for loop just after it's being declared. Do you suggest we
move the declaration inside the for loop or after the "if (!in->point)"?
Comment 22 Jindrich Novy 2005-04-18 03:34:48 EDT
My fault. I saw this code in the CVS checkout from 24th Mar in belief that I'm
in a freshly checked out directory, where I saw:

static void
backward_delete (WInput *in)
{
    int i;

    if (!in->point)
        return;

    move_buffer_backward(in, in->point - 1);
    in->charpoint = 0;
    in->need_push = 1;
    in->point--;
}

the move_buffer_backward call is now replaced with for loop that uses the i
variable. Sorry for confusion.
Comment 23 Andy Shevchenko 2005-04-21 06:24:23 EDT
The Andrew's patch affects only for 8-bit locales. I try to recode koi8-r coded 
file into utf-8 (under ru_RU.UTF-8 locale) now and have got nothing.

My proposed patch (just for view mode) as UTF-8 part I shall attached some later 
(after clearness).
Comment 24 Andy Shevchenko 2005-04-21 07:59:08 EDT
Another comment to a small changes in the given patch (in menu.c)

Diff:
@@ -112,8 +112,7 @@ create_menu (const char *name, menu_entr
       int len;
 
       menu->wentries = (wchar_t **)
-                       g_malloc (sizeof (wchar_t *) * menu->count
-                                 + wlen * sizeof (wchar_t));
+       g_malloc ((menu->count + wlen) * sizeof (wchar_t));


Is it correct fix? I think we need to use sizeof(wchar_t *) instead of 
sizeof(wchar_t) due to conversation to (wchar_t **) for array.

Comment 25 Andrew V. Samoilov 2005-04-21 08:27:11 EDT
(In reply to comment #24)

Hello, Andy!

> Is it correct fix? I think we need to use sizeof(wchar_t *) instead of 
> sizeof(wchar_t) due to conversation to (wchar_t **) for array.

You are right, it must to be rewritten as 

g_new (wchar_t *, (menu->count + wlen))

Thanks!
Comment 26 Jindrich Novy 2005-04-21 08:36:22 EDT
Andrew, Andy, yes, I've noticed that and it's fixed already, please see bug 155468.
Comment 27 Andy Shevchenko 2005-04-21 08:53:07 EDT
Oh, my mistake - I have no applied patch for viewing & Ctrl-T.
Jindrich, why do you miss that patch in 4.6.1-0.13 release?

P.S> But with applied patch my comment #23 is truth anyway.
Comment 28 Andrew V. Samoilov 2005-04-21 09:08:03 EDT
Jindrich, I think http://mail.gnome.org/archives/mc-devel/2005-April/msg00039.html
will be usefull for you.  It fixes rare segfault in vfs (ftpfs/fish).
Comment 29 Jindrich Novy 2005-04-21 09:18:37 EDT
Andrew, could you please file a separate bug for the ftpfs/fish issue? Yes, this
patch seems to be helpful, thanks!

Andy, which patch do you mean? The Andrew's patch is applied in 4.6.1-0.13 (one
part is merged to UTF8 patch and second part is in separate miscfix patch).

Closing this rawhide as the update is released.
Comment 30 Andy Shevchenko 2005-04-21 09:32:13 EDT
I mean the https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=113087 patch.
In the last FC3 update this attachment is applied w/o any rejections. It seems 
to be missed patch. Am I wrong?
Comment 31 Jindrich Novy 2005-04-22 03:33:24 EDT
Andy, yes I was too much focused to the second patch that I forgot to apply the
first one.

Andrew, please don't hesitate to create new bugs for different issues with
separate patches, this makes tracking much easier and actually prevents me from
forgetting patches. ;)

The first patch is now applied and will occur in the next update after some time
(merged with the UTF8 patch), the 

http://mail.gnome.org/archives/mc-devel/2005-April/msg00039.html

I merged with miscfix patch.

Note You need to log in before you can comment on or make changes to this bug.