Bug 117053 - utf8 man pages on non-utf8 terminal
Summary: utf8 man pages on non-utf8 terminal
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: man
Version: rawhide
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Eido Inoue
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-02-27 19:35 UTC by Ville Herva
Modified: 2007-11-30 22:10 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-02-27 20:08:43 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Ville Herva 2004-02-27 19:35:57 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; X11; Linux i686) Opera 7.23  [en]

Description of problem:
> * Tue Feb 10 2004 Adrian Havill <havill> 1.5m2-2
> 
> - add all locale man pages
> - convert all msgs and manpages to utf-8
> - iconv patch no longer needed now that utf-8-to-legacy conversion is not
>   needed

I use man with non-utf8 terminals quite a lot (local rxvt, through ssh connection from 
windows). After this change, all man pages have the special characters garbled on 
non-utf8 terminals. 

Are they supposed work any more? I don't quite follow the logic why the iconv patch 
was dropped (if I understand what it was doing) - not all terminal are nor will be utf8. 
Think of a shell server - most people log on from non-utf terminals (Windows putty etc)
.

One of the failing configurations:
man-1.5m2-3
less-382-1
rxvt-2.7.8-4

COLORTERM=rxvt-xpm
LC_ALL=en_US
LESSCHARSET=latin1
TERM=rxvt

Version-Release number of selected component (if applicable):
1.5m2

How reproducible:
Always

Steps to Reproduce:
1. open up any man page on non-utf8 terminal

Actual Results:  
------------------------------------------------------------------------------------------------
NAME
       man - format and display the on-line manual pages
       manpath - determine user�<80><99>s search path for man pages
------------------------------------------------------------------------------------------------



Expected Results:
------------------------------------------------------------------------------------------------
NAME
       man - format and display the on-line manual pages
       manpath - determine user's search path for man pages
------------------------------------------------------------------------------------------------

Comment 1 Eido Inoue 2004-02-27 20:08:43 UTC
the "iconv patch" referred to in the changelog was a corner-special
case that only applied to Cyrillic character sets. It was a bad hack,
and a special case to deal with the transition period (if there was
one) where western european locales were in UTF-8 but everything else
(especially Cyrillic and CJK) was in local character sets, yet
Cyrillic man pages were mixed UTF-8 and KOI8-R.

The problem of supporting more than UTF-8 is that man itself pipes
though nroff, nroff pipes through groff, and groff pipes through the
pager. Through all that level of indirection, it is impossible for man
to know what the target terminals character set was, which is why
everything is output as UTF-8. (It is still possible to have man pages
in non-UTF-8 for the purpose of backwards compatibility, but they will
get normalized to UTF-8. We can do this because we CAN determine
whether a man page is UTF-8 or not; but we cannot determine whether
the terminal is UTF-8 or not)

There is a simple workaround to your problem; simply redefine the
environment variable for PAGER (or better, MANPAGER) so that it
includes an 'iconv -f utf-8 -t your-favorite-charset |'.

Or you can create a shell script wrapper for the pager that performs
the iconv conversion and pipes it to the pager.

Hope these suggestions help.


Comment 2 Ville Herva 2004-02-27 20:15:40 UTC
Yes it helps. Thank you very much.

I think I now understand the limitations.

I can live with the workaround.

(Perhaps it was good to bring this up, since I fear it will be asked a lot, once fc2 ships.)

Comment 3 Ville Herva 2004-02-27 20:27:04 UTC
Umm, there still seems to be issue. If I do

MANPAGER='iconv -f utf-8 -t latin1 | less' man man

I get

--------------------------------------------------------------------------------------
man(1)                                                                  man(1)


NAME
       man - format and display the on-line manual pages
       manpath - determine user
iconv: illegal input sequence at position 183
--------------------------------------------------------------------------------------

Ie. it halts when encountering the first non-plain-ascii character.

rpm -qf =iconv
glibc-common-2.3.3-10



Comment 4 Eido Inoue 2004-02-27 20:49:52 UTC
that's because the apostrophe that man is using for "user's" is a
"pretty apostrophe" (wohoo! the benefits of unicode! pretty
apostrophes and quotes! ;) ) that iconv thinks doesn't exist in latin-1.

Force iconv to "dumb-down"  the source by adding "//translit" (for
"transliterate into approximate equivalent characters) to the end of
"latin1"

Comment 5 Ville Herva 2004-02-27 21:28:22 UTC
Oh, ok. The //translit trick works. 

Now I'll just have to ponder whether or not a man can live without *pretty* 
apostrophes.

Thanks again. 

Ps: after some serious pondering I took the manly stance that I like my apostrophes 
rough. If the command was called woman, then everything would have to be pretty.


Note You need to log in before you can comment on or make changes to this bug.