Bug 39139 - man fails in UTF-8 locales
Summary: man fails in UTF-8 locales
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: less
Version: 7.1
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Karsten Hopp
QA Contact: Aaron Brown
URL: http://mail.nl.linux.org/linux-utf8/2...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-05-04 21:40 UTC by Markus Kuhn
Modified: 2007-04-18 16:33 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-03-08 13:42:53 UTC
Embargoed:


Attachments (Terms of Use)

Description Markus Kuhn 2001-05-04 21:40:55 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.3-0.3.cl i686; Nav)

Description of problem:
The display of man pages in UTF-8 locales fails, because
groff is not called with the -Tutf8 option.

How reproducible:
Always

Steps to Reproduce:
1. Open a UTF-8 shell, for instance with

  LANG=en_GB.UTF-8 xterm -fn
'-Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1'

2. Enter

  man man
  man groff_char

	

Actual Results:  Soft hyphens at line ends get ISO 8859-1 encoded, even
though we are in a UTF-8 locale, which garbles them. Most of the non-ASCII
characters listed on the groff_char page get garbled, as one would expect
if the encoding used by groff and the locale don't match.

Expected Results:  The man and groff_char pages should have been displayed
flawlessly, without occurences of the default character (dashed box) that
signals a malformed UTF-8 sequence.

Additional info:

The fix is simple:

In file /etc/man.config, change the line

  NROFF           /usr/bin/groff -Tlatin1 -mandoc

to

  NROFF           /usr/bin/nroff -mandoc

Note that nroff is already a shell script that tests the encoding used in
the current locale (using "locale charmap") and then calls groff with the
appropriate -T option.

Note that the less version that you ship has a bug with bold and
underlining in UTF-8 locales. A fix has been posted:

  http://mail.nl.linux.org/linux-utf8/2001-05/msg00023.html

and will hopefully be soon integrated in a new release of "less".

Comment 1 Bernhard Rosenkraenzer 2001-05-17 13:31:24 UTC
Fixed in man 1.5i-5; assigning to less so the patch you mentioned can be 
included in our next build.


Comment 2 Karsten Hopp 2001-06-25 13:13:34 UTC
The patch isn't necessary as we already have an i18n patch for that.
The garbled characters seem to be caused by incomplete fonts.
Try 'export PAGER=cat' and you'll see the same result.


Comment 3 Markus Kuhn 2001-06-25 15:00:15 UTC
Reply to comment by karsten:

My bug report was definitely *not* related to garbled fonts. The xterm I use is
perfectly able to display standard UTF-8 test files such as

  http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt

The problem with "less" that was fixed in the quoted patch is a bug in the
original UTF-8 support in less, which caused a missinterpretation of
bold-by-backspace in UTF-8 mode. This is a problem of less, not my terminal
emulator or the font. (There might be an additional problem with your fonts, but
that is a separate issue, use the xterm command line options shown above.)

The less problem can be reproduced as follows:

Test case:
 
   perl -e 'use utf8; print "a\ba_\bb\n"' | less
 
correctly shows a bold "a" and an underlined "b", but
 
   perl -e 'use utf8; print "\x{20ac}\b\x{20ac}_\b\x{2203}\n"' | less

fails to show either a bold euro sign or an underlined there-exists sign.
(Perl 5.6 or newer required here).

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>


Comment 4 Markus Kuhn 2002-05-01 18:45:06 UTC
I just confirmed that this bug is still present in the less-358-21 RPM that
comes with Red Hat Linux 7.2. It is still a bug and you didn't fix it yet.

To demonstrate to karsten again, why your reply was incorrect:

It is *not* a font problem, because

  perl -e 'use utf8; print "\x{20ac}\x{2203}\n"' | less

does show on my xterm in UTF-8 mode the euro and exists sign correctly, but

  perl -e 'use utf8; print "\x{20ac}\b\x{20ac}_\b\x{2203}\n"' | less

shows that less messes up the display of bold and underlines UTF-8 characters
and sends malformed UTF-8 sequences to the terminal instead, because of a simple
bug that is fixed by the patch posted in

  http://mail.nl.linux.org/linux-utf8/2001-05/msg00023.html



Comment 5 Karsten Hopp 2002-06-18 23:05:38 UTC
fixed in less-358-27 


Comment 6 John Levon 2003-09-09 02:13:27 UTC
What's the story here ? RH8 still appears to have this bug (i.e.
with LANG=en_GB.UTF-8, man pages are junk where the boldification should be)

Comment 7 Miloslav Trmac 2004-03-08 13:42:53 UTC
Fix confirmed in less-378-11.1


Note You need to log in before you can comment on or make changes to this bug.