Red Hat Bugzilla – Bug 188079
mc with utf8 patch displays random garbage on non-unicode terminals
Last modified: 2013-07-02 19:15:11 EDT
Description of problem:
Using mcview (no matter if as internal viewer or external program) on a file not
containing proper UTF-8 data makes it display randomly looking garbage (eg.
"Za|Ã³B^ g^[l^ jazD." (actually mcedit reveals it's trying to display "Za|Ã³B^G
g^Y[l^E jazD.") instead of "ZaÅ¼Ã³ÅÄ geÅlÄ
jaÅºÅ." in ISO-8859-2) on non-unicode
terminals (or rather just having LANG without .UTF-8).
Version-Release number of selected component (if applicable):
Always since one of the FC3 updates when it broke (the first version from FC3
was safe, all earlier versions too). Then I just installed official mc instead
of the broken one shipped with Fedora, but enough is enough, it's time to fix it! :)
When I have LANG pl_PL.UTF-8, my ISO-8859-2 characters show as dots (even if I
set ISO-8859-2 as the "Input / display codepage", that option is still a mystery
to me). But that's okay to some point, personally I wouldn't even expect mc to
convert codepages/charsets on the fly if not the existence of mentioned option.
Besides, I don't use UTF-8 mc other than todays tests.
I don't use UTF-8 in terminal (I even proposed a patch for one of the main
reasons, which went ignored, I'm still waiting though :)) and expect mcview to
properly display things I write in files.
It seems to me mbrtowc is somewhat broken (it depends on LC_CTYPE as the man
page says) when I have LANG set to pl_PL. The fix is not to use it, I applied it
to view.c when the terminal is not UTF-8-ready. The "codepage" option doesn't
work anyhow [I tried to dig up the reason and saw: Breakpoint 1,
init_translation_table (cpsource=-1, cpdisplay=2), don't want to think why
cpsource is -1 when I have], so it's better to allow full 8-bit display than
show something worse than the dots from Unicode view :)
The patch is very simple, uses indent of 2 spaces to better show where it goes
and what it does. It's for view.c patched with the mc-utf8.patch, I can make
patch against mc-utf8.patch or new mc-utf8.patch, but my approach seems easier
to look at and understand in a second instead of digging it from somewhere.
I'm not going to deal with mcedit now because it's much harder and I don't even
use it, and you may say you don't give a ... about mc broken somewhere
non-UTF-8. Please comment on this issue, I may supply some info and commit some
time if you're willing to fix it.
Created attachment 127370 [details]
the simplest patch for mcview
Leszek, thanks for the patch. You wrote that you propose some patch, but it's
ignored. Could you please point me to the patch or better: could you attach the
patch here so that I can review/commit it?
I was talking about a reason not to use UTF-8 terminal in the first place. It's
in the bug #186961 and it's not rejected, so I'm just waiting :)
I hope I don't seem rude speaking about ignoring, it's not so hard to see my
english is poor, please be tolerant.
The same problem under my KOI8-R locale.
See my variant of a patch below (I use SLsmg_is_utf8_mode () call, as "is_utf8"
variable is not an upstream code...).
Also some notes:
IMHO, the convert_to_display_c() seems to be unusable with wide characters (wc),
as the source data for mbrtowc() was already converted ("mbbuf =
It seems that tty_print_char() (which is actually SLsmg_write_char() ) has a
problem when writes WCHAR_T characters under non-UTF8 locale. There was no such
a problem under FC3, therefore it is probably some FC5's slang issue.
As mbrtows() actually uses glibc's iconv features (i.e, it looks like just a
call "iconv -f CURR_LOCALE -t WCHAR_T"), maybe it is possible to find a way to
avoid "convert_to_display_c" at all? I.e, call
"setlocale(LC_CTYPE,<src_codepage>)", do mbrtowc(), and then go back to the
previous LC_CTYPE ?.. Or even just use glibc's iconv_* calls instead of
This way we can provide a possibility to read non-UTF8 texts under UTF8 locale,
and vice versa.
Created attachment 128659 [details]
Some another variant of a patch
see "man 3 iconv", "man iconv_open" etc. It seems that it is possible to use
iconv() instead of mbrtowc (at least for Linux). This way we can "eat" all the
data to some wc array, and do charset conversions at the same time (even between
UTF-8 non-UTF-8 etc...).
------- Additional Comments From firstname.lastname@example.org 2006-06-09 11:09 EST -------
Dmitry, the is_utf8 variable is introduced by the UTF-8 patch, that's true. But
remember that our patches fix a bug in the UTF-8 patch, so there's no point in
trying to fix the fix [:)] using function present even without the buggy patch.
We got mc-4.6.1a-13.FC5 as an update yesterday. %changelog mentions some UTF-8
fixes, but I don't see any changes in mc*utf*patch.
BTW, I looked into mcedit and it looks like it can edit files without problems,
only the displaying part is broken similar to the viewer. So it should be as
easy to fix.
Jindrich, who is the person behind mc-utf8.patch? Can he/she tell us, how can we
help with getting it done? Non-working "display bits" menu looks like someone is
working on it, so maybe we're duplicating someone's efforts?
Personally I'm tired with patching every update :)
------- Additional Comments From email@example.com 2006-06-09 12:04 EST -------
Leszek, the changelog line related to UTF-8 fixes is not valid and is caused by
my cut&paste mistake while backporting to FC5 branch. (the fix is applied in
devel for now) The plan is to incorporate more fixes to the UTF-8 patch (I'm
now looking at Egmont's, Arpi's and Vladimir's fixes) and make the UTF-8 patch
looking less ugly. I'll include your patch as well into the review pool and the
new update is to be expected soon. The only person in charge for the UTF-8 patch
is me at the moment, but the authors of the most of fixes is Jakub Jelinek and
Vladimir Nadvornik. I'm not currently working on the support for non-UTF8
locales, so feel free to contribute.
------- Additional Comments From firstname.lastname@example.org 2006-06-13 06:38 EST -------
(for comment #7):
> the is_utf8 variable is introduced by the UTF-8 patch, that's true. But
> remember that our patches fix a bug in the UTF-8 patch, so there's no point in
> trying to fix the fix
"is_utf8" was introduced by "mc-utf8-look-and-feel.patch", not "mc-utf8.patch" ...
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.
If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
Thanks for your help, and we apologize again that we haven't handled
these issues to this point.
The process we are following is outlined here:
We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.
And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers
This bug is still present in F8, with mc-4.6.1a-50.20070604cvs.fc8.
By the way, is it just me, or is "Zażółć gęślą jaźń" garbled in my first
comment? It wasn't that way 2 years ago...
Ouch! I was making the comment before logging in and Bugzilla said it will
automatically get rid of NEEDINFO, but didn't. Now it gives me a separate
option. Fixing :)
Seems nothing changed for this in the source code...
Dmitry, when we were writing that, your patched was on its way to F9, check out
*** This bug has been marked as a duplicate of 426756 ***