Description of problem: Using mcview (no matter if as internal viewer or external program) on a file not containing proper UTF-8 data makes it display randomly looking garbage (eg. "Za|óB^ g^[l^ jazD." (actually mcedit reveals it's trying to display "Za|óB^G g^Y[l^E jazD.") instead of "ZażóÅÄ geÅlÄ jaźÅ." in ISO-8859-2) on non-unicode terminals (or rather just having LANG without .UTF-8). Version-Release number of selected component (if applicable): mc-4.6.1a-12.FC5 How reproducible: Always since one of the FC3 updates when it broke (the first version from FC3 was safe, all earlier versions too). Then I just installed official mc instead of the broken one shipped with Fedora, but enough is enough, it's time to fix it! :) When I have LANG pl_PL.UTF-8, my ISO-8859-2 characters show as dots (even if I set ISO-8859-2 as the "Input / display codepage", that option is still a mystery to me). But that's okay to some point, personally I wouldn't even expect mc to convert codepages/charsets on the fly if not the existence of mentioned option. Besides, I don't use UTF-8 mc other than todays tests. I don't use UTF-8 in terminal (I even proposed a patch for one of the main reasons, which went ignored, I'm still waiting though :)) and expect mcview to properly display things I write in files. It seems to me mbrtowc is somewhat broken (it depends on LC_CTYPE as the man page says) when I have LANG set to pl_PL. The fix is not to use it, I applied it to view.c when the terminal is not UTF-8-ready. The "codepage" option doesn't work anyhow [I tried to dig up the reason and saw: Breakpoint 1, init_translation_table (cpsource=-1, cpdisplay=2), don't want to think why cpsource is -1 when I have], so it's better to allow full 8-bit display than show something worse than the dots from Unicode view :) The patch is very simple, uses indent of 2 spaces to better show where it goes and what it does. It's for view.c patched with the mc-utf8.patch, I can make patch against mc-utf8.patch or new mc-utf8.patch, but my approach seems easier to look at and understand in a second instead of digging it from somewhere. I'm not going to deal with mcedit now because it's much harder and I don't even use it, and you may say you don't give a ... about mc broken somewhere non-UTF-8. Please comment on this issue, I may supply some info and commit some time if you're willing to fix it.
Created attachment 127370 [details] the simplest patch for mcview
Leszek, thanks for the patch. You wrote that you propose some patch, but it's ignored. Could you please point me to the patch or better: could you attach the patch here so that I can review/commit it?
I was talking about a reason not to use UTF-8 terminal in the first place. It's in the bug #186961 and it's not rejected, so I'm just waiting :) I hope I don't seem rude speaking about ignoring, it's not so hard to see my english is poor, please be tolerant.
The same problem under my KOI8-R locale. See my variant of a patch below (I use SLsmg_is_utf8_mode () call, as "is_utf8" variable is not an upstream code...). Also some notes: IMHO, the convert_to_display_c() seems to be unusable with wide characters (wc), as the source data for mbrtowc() was already converted ("mbbuf[0] = convert_to_display_c(c) ...) It seems that tty_print_char() (which is actually SLsmg_write_char() ) has a problem when writes WCHAR_T characters under non-UTF8 locale. There was no such a problem under FC3, therefore it is probably some FC5's slang issue. As mbrtows() actually uses glibc's iconv features (i.e, it looks like just a call "iconv -f CURR_LOCALE -t WCHAR_T"), maybe it is possible to find a way to avoid "convert_to_display_c" at all? I.e, call "setlocale(LC_CTYPE,<src_codepage>)", do mbrtowc(), and then go back to the previous LC_CTYPE ?.. Or even just use glibc's iconv_* calls instead of mbrtowc() ??... This way we can provide a possibility to read non-UTF8 texts under UTF8 locale, and vice versa.
Created attachment 128659 [details] Some another variant of a patch
Yep, see "man 3 iconv", "man iconv_open" etc. It seems that it is possible to use iconv() instead of mbrtowc (at least for Linux). This way we can "eat" all the data to some wc array, and do charset conversions at the same time (even between UTF-8 non-UTF-8 etc...).
------- Additional Comments From lam 2006-06-09 11:09 EST ------- Dmitry, the is_utf8 variable is introduced by the UTF-8 patch, that's true. But remember that our patches fix a bug in the UTF-8 patch, so there's no point in trying to fix the fix [:)] using function present even without the buggy patch. We got mc-4.6.1a-13.FC5 as an update yesterday. %changelog mentions some UTF-8 fixes, but I don't see any changes in mc*utf*patch. BTW, I looked into mcedit and it looks like it can edit files without problems, only the displaying part is broken similar to the viewer. So it should be as easy to fix. Jindrich, who is the person behind mc-utf8.patch? Can he/she tell us, how can we help with getting it done? Non-working "display bits" menu looks like someone is working on it, so maybe we're duplicating someone's efforts? Personally I'm tired with patching every update :)
------- Additional Comments From jnovy 2006-06-09 12:04 EST ------- Leszek, the changelog line related to UTF-8 fixes is not valid and is caused by my cut&paste mistake while backporting to FC5 branch. (the fix is applied in devel for now) The plan is to incorporate more fixes to the UTF-8 patch (I'm now looking at Egmont's, Arpi's and Vladimir's fixes) and make the UTF-8 patch looking less ugly. I'll include your patch as well into the review pool and the new update is to be expected soon. The only person in charge for the UTF-8 patch is me at the moment, but the authors of the most of fixes is Jakub Jelinek and Vladimir Nadvornik. I'm not currently working on the support for non-UTF8 locales, so feel free to contribute.
------- Additional Comments From dmitry 2006-06-13 06:38 EST ------- (for comment #7): > the is_utf8 variable is introduced by the UTF-8 patch, that's true. But > remember that our patches fix a bug in the UTF-8 patch, so there's no point in > trying to fix the fix "is_utf8" was introduced by "mc-utf8-look-and-feel.patch", not "mc-utf8.patch" ...
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
This bug is still present in F8, with mc-4.6.1a-50.20070604cvs.fc8. By the way, is it just me, or is "Zażółć gęślą jaźń" garbled in my first comment? It wasn't that way 2 years ago...
Ouch! I was making the comment before logging in and Bugzilla said it will automatically get rid of NEEDINFO, but didn't. Now it gives me a separate option. Fixing :)
Seems nothing changed for this in the source code...
Dmitry, when we were writing that, your patched was on its way to F9, check out bug #426756.
*** This bug has been marked as a duplicate of 426756 ***