Description of problem: $ LC_MESSAGES=ja_JP.utf8 man xxd I got a lot of UTF8 surrogate pair encoding. looks like a garbled page. I found when I saw the xxd manpage. Same issue occured in vim-common package includs in Japanese locale manpage. In other Japanese manpage, it looks like ok (ex. LC_MESSAGES=ja_JP.utf8 man nkf) Version-Release number of selected component (if applicable): vim-common-7.4.027-2.fc19.x86_64 How reproducible: It`s easy, following command below. Steps to Reproduce: 1. $ LC_MESSAGES=ja_JP.utf8 man xxd 2. $ LC_MESSAGES=ja_JP.utf8 man evim 3. $ LC_MESSAGES=ja_JP.utf8 man vim 4. $ LC_MESSAGES=ja_JP.utf8 man vimdiff 5. $ LC_MESSAGES=ja_JP.utf8 man vimdiff 6. $ LC_MESSAGES=ja_JP.utf8 man vimtutor (It`s ok example: LC_MESSAGES=ja_JP.utf8 man nkf) Actual results: UTF8 surrogate pair encoding in manpage. Expected results: Japanese character looks like fine. Additional info: Invalid manpage same as below: $ zcat /usr/share/man/ja/man1/xxd.1.gz | file - /dev/stdin: troff or preprocessor input, UTF-8 Unicode text, with LF, NEL line terminators $ Valid manpage same as below: $ zcat /usr/share/man/ja/man1/nkf.1.gz | file - /dev/stdin: troff or preprocessor input, UTF-8 Unicode text $ I AM sorry my dirrty english.
Anybody else?
Hi Thank you for filing a bug. This project is not part of Fedora Project its self, but is one of upstream project for Fedora distributions. It seems that this project owns their separate localization team, and it is best to contact them for any bug. http://vim-jp.org/
Thanks Noriko-san. I intend to contact the their vim-jp team. cheer.
Re-open request. Because I cannot illustrate by English, I illustrate by a native language. 自分が英語で書ける範囲を超えるので日本語で説明します。 他のdistributionで問題になっていないのを不審に思い upstreamに伝える前に自分で調べられるだけのことはやろうと考え 時間をみつけて再調査しました。その結果、vim.spec の修正が必要だとの結論に至りました。 以下に説明します。 vim.spec の494-497行目の辺りで 元からあるファイルをlatin1 -> UTF8 変換しています。 ここの処理の対象になっている日本語以外の他の自然言語(フランス語,イタリア語,ポーランド語,ロシア語 処理順)では LANG=fr_FR.utf8 man vim LANG=it_IT.utf8 man vim LANG=pl_PL.utf8 man vim LANG=ru_RU.utf8 man vim いずれも「latin1の元ファイルをlatin1として扱ってUTF8に変換する」という処理内容のため問題は生じていないのですが 日本語では「UTF8の元ファイルをlatin1として扱ってUTF8に変換する」というおかしな処理内容のため 変換後のファイルが文字化けしてしまっています。CRLF文字が現れるのもこの変換後のファイルです。 確認したrpmbuildの途中経過の抜粋(vim_ja_man_got_garbled.txt)を後ほど添付します。 この494-497行目の処理は「latin1で記述されたファイルからUTF8のファイルを生成する」という処理内容です。 したがって元々UTF8である日本語のファイルはこの494-497行目の処理をスキップする必要がありそうです。
Created attachment 834318 [details] rpmbuild with modified vim.spec logs This log is a part of "rpmbuild -bb vim.spec", vim.spec was a modified by my hand. Add file command. In French, Italian, Polish, Russian: From ISO-8859 text to UTF-8 Unicode text By "iconv -f latin1 -t UTF8 blar", works fine. In Japanese: From UTF-8 Unicode text to TF-8 Unicode text By "iconv -f latin1 -t UTF8 blar", got garbled.
Created attachment 834320 [details] A patch for vim.spec I wrote patch for vim.spec . $ diff -Naur vim.spec-7.4.027-2 vim.spec > vim-manpagefixes-ja-1035606.patch $ $ cat vim-manpagefixes-ja-1035606.patch --- vim.spec-7.4.027-2 2013-09-11 12:22:45.000000000 +0000 +++ vim.spec 2013-12-09 11:56:07.074830794 +0000 @@ -491,7 +491,7 @@ rm -rf %{buildroot}/%{_datadir}/vim/%{vimdir}/doc/vim2html.pl rm -f %{buildroot}/%{_datadir}/vim/%{vimdir}/tutor/tutor.gr.utf-8~ ( cd %{buildroot}/%{_mandir} - for i in `find ??/ -type f`; do + for i in `find ??/ \( -path "ja/*" -prune \) -o -type f -print`; do bi=`basename $i` iconv -f latin1 -t UTF8 $i > %{buildroot}/$bi mv -f %{buildroot}/$bi $i $ Patch: $ cd your-spec-dir $ $ patch -p0 < vim-manpagefixes-ja-1035606.patch $
Hi It is now identified that this bug is specific to Fedora, and that the cause is in vim.spec (line#494-497). The lines in question work fine with French, Italian, Polish and Russian (as latin1), but those do not work for Japanese (as utf8). It needs the package maintainer's attention and fix. Please see the attachments above. Now this bug is reset it's product and component to 'Fedora' and 'vim'.
Created attachment 840637 [details] A patch for current newest vesion vim.spec 7.4.131-1 I added 'file' command for iconv don`t needed in UTF-8 manpage. So now new depend on 'file' command in vim-common package. Master Karsten, could you please accept this humble pull-request ? Diff: $ LANG=en_US.utf8 TZ=UTC0 diff -Naur vim.spec-7.4.131-1 vim.spec > vim-manpage-ja-1035606.patch $ $ cat vim-manpage-ja-1035606.patch --- vim.spec-7.4.131-1 2013-12-17 14:19:06.000000000 +0000 +++ vim.spec 2013-12-22 16:43:39.523683741 +0000 @@ -20,7 +20,7 @@ URL: http://www.vim.org/ Name: vim Version: %{baseversion}.%{patchlevel} -Release: 1%{?dist} +Release: 2%{?dist} License: Vim Group: Applications/Editors Source0: ftp://ftp.vim.org/pub/vim/unix/vim-%{baseversion}.tar.bz2 @@ -222,6 +222,7 @@ Conflicts: man-pages-fr < 0.9.7-14 Conflicts: man-pages-it < 0.3.0-17 Conflicts: man-pages-pl < 0.24-2 +BuildRequires: file Requires: %{name}-filesystem %description common @@ -700,6 +701,9 @@ rm -f %{buildroot}/%{_datadir}/vim/%{vimdir}/tutor/tutor.gr.utf-8~ ( cd %{buildroot}/%{_mandir} for i in `find ??/ -type f`; do + if [[ "`file $i`" == *UTF-8\ Unicode\ text* ]]; then + continue + fi bi=`basename $i` iconv -f latin1 -t UTF8 $i > %{buildroot}/$bi mv -f %{buildroot}/$bi $i @@ -954,6 +958,9 @@ %{_datadir}/icons/hicolor/*/apps/* %changelog +* Sun Dec 22 2013 Masayuki Oshima <moshima.web> 7.4.131-2 +- fix xxd manpage got garbled in Japanese locale (#1035606) + * Tue Dec 17 2013 Karsten Hopp <karsten> 7.4.131-1 - patchlevel 131 $ Patch: $ cd your-spec-dir $ patch -p0 < vim-manpage-ja-1035606.patch
Created attachment 840638 [details] A part of rpmbuild -ba vim.spec log (7.4.131-2) A part of 'rpmbuild -ba vim.spec' command log (7.4.131-2). See Line: 4,11,18,25,32 and compare to Line: 74,78,82,86,90
Steps to Reproduce: 1. $ LC_MESSAGES=ja_JP.utf8 man xxd In vim-7.4.131-1 or less Actual results: XXD(1) General Commands Manual XXD(1) å<U+0090>^H<U+0090><U+008D>^H<U+008D>å<U+0089>^H<U+0089><U+008D>^H<U+008D> (snip) Expected results: XXD(1) General Commands Manual XXD(1) 名前 xxd - 16 進ダンプを作成したり、元に戻したり。 (snip) Additional info: vim-7.4.131-2(vim-manpage-ja-1035606.patch), This patch can be get Expected results. Fix same problem on ja/man1/{vim.1,evim.1,xxd.1,vimtutor.1,vimdiff.1}.* . And no-effect on already existing other natural-language`s manpage.
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle. Changing version to '22'. More information and reason for this action is here: https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22
I've added your patch to the git repository, the next vim update will have this fix. Thanks !