Bug 1035606

Summary: xxd manpage got garbled in Japanese locale
Product: [Fedora] Fedora Reporter: Masa Oshima <moshima.web>
Component: vimAssignee: Karsten Hopp <karsten>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: karsten, kealthou, noriko
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-22 13:29:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rpmbuild with modified vim.spec logs
none
A patch for vim.spec
none
A patch for current newest vesion vim.spec 7.4.131-1
none
A part of rpmbuild -ba vim.spec log (7.4.131-2) none

Description Masa Oshima 2013-11-28 08:07:48 UTC
Description of problem:
  $ LC_MESSAGES=ja_JP.utf8 man xxd
  I got a lot of UTF8 surrogate pair encoding. looks like a garbled page.

  I found when I saw the xxd manpage.
  Same issue occured in vim-common package includs in Japanese locale manpage.
  In other Japanese manpage, it looks like ok (ex. LC_MESSAGES=ja_JP.utf8 man nkf)

Version-Release number of selected component (if applicable):
  vim-common-7.4.027-2.fc19.x86_64


How reproducible:
  It`s easy, following command below.

Steps to Reproduce:
1. $ LC_MESSAGES=ja_JP.utf8 man xxd
2. $ LC_MESSAGES=ja_JP.utf8 man evim
3. $ LC_MESSAGES=ja_JP.utf8 man vim
4. $ LC_MESSAGES=ja_JP.utf8 man vimdiff
5. $ LC_MESSAGES=ja_JP.utf8 man vimdiff
6. $ LC_MESSAGES=ja_JP.utf8 man vimtutor

(It`s ok example: LC_MESSAGES=ja_JP.utf8 man nkf)

Actual results:
  UTF8 surrogate pair encoding in manpage.

Expected results:
  Japanese character looks like fine.

Additional info:
  Invalid manpage same as below:
  $ zcat /usr/share/man/ja/man1/xxd.1.gz | file -
  /dev/stdin: troff or preprocessor input, UTF-8 Unicode text, with LF, NEL line terminators
  $ 

  Valid manpage same as below:
  $ zcat /usr/share/man/ja/man1/nkf.1.gz | file -
  /dev/stdin: troff or preprocessor input, UTF-8 Unicode text
  $

I AM sorry my dirrty english.

Comment 1 Masa Oshima 2013-11-30 09:20:03 UTC
Anybody else?

Comment 2 Noriko Mizumoto 2013-12-01 22:12:20 UTC
Hi

Thank you for filing a bug.
This project is not part of Fedora Project its self, but is one of upstream project for Fedora distributions. It seems that this project owns their separate localization team, and it is best to contact them for any bug.
http://vim-jp.org/

Comment 3 Masa Oshima 2013-12-02 02:53:25 UTC
Thanks Noriko-san.

I intend to contact the their vim-jp team.

cheer.

Comment 4 Masa Oshima 2013-12-09 12:53:19 UTC
Re-open request.

Because I cannot illustrate by English, I illustrate by a native language.

自分が英語で書ける範囲を超えるので日本語で説明します。

他のdistributionで問題になっていないのを不審に思い
upstreamに伝える前に自分で調べられるだけのことはやろうと考え
時間をみつけて再調査しました。その結果、vim.spec の修正が必要だとの結論に至りました。

以下に説明します。
vim.spec の494-497行目の辺りで 元からあるファイルをlatin1 -> UTF8 変換しています。
ここの処理の対象になっている日本語以外の他の自然言語(フランス語,イタリア語,ポーランド語,ロシア語 処理順)では

LANG=fr_FR.utf8 man vim
LANG=it_IT.utf8 man vim
LANG=pl_PL.utf8 man vim
LANG=ru_RU.utf8 man vim

いずれも「latin1の元ファイルをlatin1として扱ってUTF8に変換する」という処理内容のため問題は生じていないのですが
日本語では「UTF8の元ファイルをlatin1として扱ってUTF8に変換する」というおかしな処理内容のため
変換後のファイルが文字化けしてしまっています。CRLF文字が現れるのもこの変換後のファイルです。

確認したrpmbuildの途中経過の抜粋(vim_ja_man_got_garbled.txt)を後ほど添付します。

この494-497行目の処理は「latin1で記述されたファイルからUTF8のファイルを生成する」という処理内容です。
したがって元々UTF8である日本語のファイルはこの494-497行目の処理をスキップする必要がありそうです。

Comment 5 Masa Oshima 2013-12-09 13:05:48 UTC
Created attachment 834318 [details]
rpmbuild with modified vim.spec logs

This log is a part of "rpmbuild -bb vim.spec", vim.spec was a modified by my hand. Add file command.

In French, Italian, Polish, Russian:
From ISO-8859 text to UTF-8 Unicode text By "iconv -f latin1 -t UTF8 blar", works fine.

In Japanese:
From UTF-8 Unicode text to TF-8 Unicode text By "iconv -f latin1 -t UTF8 blar", got garbled.

Comment 6 Masa Oshima 2013-12-09 13:30:33 UTC
Created attachment 834320 [details]
A patch for vim.spec

I wrote patch for vim.spec .

$ diff -Naur vim.spec-7.4.027-2 vim.spec > vim-manpagefixes-ja-1035606.patch
$ 
$ cat vim-manpagefixes-ja-1035606.patch 
--- vim.spec-7.4.027-2	2013-09-11 12:22:45.000000000 +0000
+++ vim.spec	2013-12-09 11:56:07.074830794 +0000
@@ -491,7 +491,7 @@
 rm -rf %{buildroot}/%{_datadir}/vim/%{vimdir}/doc/vim2html.pl
 rm -f %{buildroot}/%{_datadir}/vim/%{vimdir}/tutor/tutor.gr.utf-8~
 ( cd %{buildroot}/%{_mandir}
-  for i in `find ??/ -type f`; do
+  for i in `find ??/ \( -path "ja/*" -prune \) -o -type f -print`; do
     bi=`basename $i`
     iconv -f latin1 -t UTF8 $i > %{buildroot}/$bi
     mv -f %{buildroot}/$bi $i
$ 

Patch:
$ cd your-spec-dir
$ 
$ patch -p0 < vim-manpagefixes-ja-1035606.patch
$

Comment 7 Noriko Mizumoto 2013-12-11 01:03:49 UTC
Hi

It is now identified that this bug is specific to Fedora, and that the cause is in vim.spec (line#494-497). The lines in question work fine with French, Italian, Polish and Russian (as latin1), but those do not work for Japanese (as utf8). It needs the package maintainer's attention and fix.
Please see the attachments above.

Now this bug is reset it's product and component to 'Fedora' and 'vim'.

Comment 8 Masa Oshima 2013-12-23 04:09:05 UTC
Created attachment 840637 [details]
A patch for current newest vesion vim.spec 7.4.131-1

I added 'file' command for iconv don`t needed in UTF-8 manpage.
So now new depend on 'file' command in vim-common package.
Master Karsten, could you please accept this humble pull-request ?


Diff:
$ LANG=en_US.utf8 TZ=UTC0 diff -Naur vim.spec-7.4.131-1 vim.spec > vim-manpage-ja-1035606.patch
$
$ cat vim-manpage-ja-1035606.patch 
--- vim.spec-7.4.131-1	2013-12-17 14:19:06.000000000 +0000
+++ vim.spec	2013-12-22 16:43:39.523683741 +0000
@@ -20,7 +20,7 @@
 URL:     http://www.vim.org/
 Name: vim
 Version: %{baseversion}.%{patchlevel}
-Release: 1%{?dist}
+Release: 2%{?dist}
 License: Vim
 Group: Applications/Editors
 Source0: ftp://ftp.vim.org/pub/vim/unix/vim-%{baseversion}.tar.bz2
@@ -222,6 +222,7 @@
 Conflicts: man-pages-fr < 0.9.7-14
 Conflicts: man-pages-it < 0.3.0-17
 Conflicts: man-pages-pl < 0.24-2
+BuildRequires: file
 Requires: %{name}-filesystem
 
 %description common
@@ -700,6 +701,9 @@
 rm -f %{buildroot}/%{_datadir}/vim/%{vimdir}/tutor/tutor.gr.utf-8~
 ( cd %{buildroot}/%{_mandir}
   for i in `find ??/ -type f`; do
+    if [[ "`file $i`" == *UTF-8\ Unicode\ text* ]]; then
+      continue
+    fi
     bi=`basename $i`
     iconv -f latin1 -t UTF8 $i > %{buildroot}/$bi
     mv -f %{buildroot}/$bi $i
@@ -954,6 +958,9 @@
 %{_datadir}/icons/hicolor/*/apps/*
 
 %changelog
+* Sun Dec 22 2013 Masayuki Oshima <moshima.web> 7.4.131-2
+- fix xxd manpage got garbled in Japanese locale (#1035606)
+
 * Tue Dec 17 2013 Karsten Hopp <karsten> 7.4.131-1
 - patchlevel 131
 
$

Patch:
$ cd your-spec-dir
$ patch -p0 < vim-manpage-ja-1035606.patch

Comment 9 Masa Oshima 2013-12-23 04:21:15 UTC
Created attachment 840638 [details]
A part of rpmbuild -ba vim.spec log (7.4.131-2)

A part of 'rpmbuild -ba vim.spec' command log (7.4.131-2).

See
  Line: 4,11,18,25,32
and compare to 
  Line: 74,78,82,86,90

Comment 10 Masa Oshima 2013-12-23 04:44:16 UTC
Steps to Reproduce:
1. $ LC_MESSAGES=ja_JP.utf8 man xxd

In vim-7.4.131-1 or less
Actual results:
XXD(1)                                           General Commands Manual                                          XXD(1)

å<U+0090>^H<U+0090><U+008D>^H<U+008D>å<U+0089>^H<U+0089><U+008D>^H<U+008D>
(snip)

Expected results:
XXD(1)                                           General Commands Manual                                          XXD(1)

名前
       xxd - 16 進ダンプを作成したり、元に戻したり。
(snip)

Additional info:
vim-7.4.131-2(vim-manpage-ja-1035606.patch), This patch can be get Expected results.
Fix same problem on ja/man1/{vim.1,evim.1,xxd.1,vimtutor.1,vimdiff.1}.* .
And no-effect on already existing other natural-language`s manpage.

Comment 11 Jaroslav Reznik 2015-03-03 15:14:36 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 12 Karsten Hopp 2015-09-22 13:29:01 UTC
I've added your patch to the git repository, the next vim update will have this fix. Thanks !