Bug 39139 - man fails in UTF-8 locales
man fails in UTF-8 locales
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: less (Show other bugs)
7.1
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Karsten Hopp
Aaron Brown
http://mail.nl.linux.org/linux-utf8/2...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-05-04 17:40 EDT by Markus Kuhn
Modified: 2007-04-18 12:33 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-03-08 08:42:53 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Markus Kuhn 2001-05-04 17:40:55 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.3-0.3.cl i686; Nav)

Description of problem:
The display of man pages in UTF-8 locales fails, because
groff is not called with the -Tutf8 option.

How reproducible:
Always

Steps to Reproduce:
1. Open a UTF-8 shell, for instance with

  LANG=en_GB.UTF-8 xterm -fn
'-Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1'

2. Enter

  man man
  man groff_char

	

Actual Results:  Soft hyphens at line ends get ISO 8859-1 encoded, even
though we are in a UTF-8 locale, which garbles them. Most of the non-ASCII
characters listed on the groff_char page get garbled, as one would expect
if the encoding used by groff and the locale don't match.

Expected Results:  The man and groff_char pages should have been displayed
flawlessly, without occurences of the default character (dashed box) that
signals a malformed UTF-8 sequence.

Additional info:

The fix is simple:

In file /etc/man.config, change the line

  NROFF           /usr/bin/groff -Tlatin1 -mandoc

to

  NROFF           /usr/bin/nroff -mandoc

Note that nroff is already a shell script that tests the encoding used in
the current locale (using "locale charmap") and then calls groff with the
appropriate -T option.

Note that the less version that you ship has a bug with bold and
underlining in UTF-8 locales. A fix has been posted:

  http://mail.nl.linux.org/linux-utf8/2001-05/msg00023.html

and will hopefully be soon integrated in a new release of "less".
Comment 1 Bernhard Rosenkraenzer 2001-05-17 09:31:24 EDT
Fixed in man 1.5i-5; assigning to less so the patch you mentioned can be 
included in our next build.
Comment 2 Karsten Hopp 2001-06-25 09:13:34 EDT
The patch isn't necessary as we already have an i18n patch for that.
The garbled characters seem to be caused by incomplete fonts.
Try 'export PAGER=cat' and you'll see the same result.
Comment 3 Markus Kuhn 2001-06-25 11:00:15 EDT
Reply to comment by karsten:

My bug report was definitely *not* related to garbled fonts. The xterm I use is
perfectly able to display standard UTF-8 test files such as

  http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt

The problem with "less" that was fixed in the quoted patch is a bug in the
original UTF-8 support in less, which caused a missinterpretation of
bold-by-backspace in UTF-8 mode. This is a problem of less, not my terminal
emulator or the font. (There might be an additional problem with your fonts, but
that is a separate issue, use the xterm command line options shown above.)

The less problem can be reproduced as follows:

Test case:
 
   perl -e 'use utf8; print "a\ba_\bb\n"' | less
 
correctly shows a bold "a" and an underlined "b", but
 
   perl -e 'use utf8; print "\x{20ac}\b\x{20ac}_\b\x{2203}\n"' | less

fails to show either a bold euro sign or an underlined there-exists sign.
(Perl 5.6 or newer required here).

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
Comment 4 Markus Kuhn 2002-05-01 14:45:06 EDT
I just confirmed that this bug is still present in the less-358-21 RPM that
comes with Red Hat Linux 7.2. It is still a bug and you didn't fix it yet.

To demonstrate to karsten again, why your reply was incorrect:

It is *not* a font problem, because

  perl -e 'use utf8; print "\x{20ac}\x{2203}\n"' | less

does show on my xterm in UTF-8 mode the euro and exists sign correctly, but

  perl -e 'use utf8; print "\x{20ac}\b\x{20ac}_\b\x{2203}\n"' | less

shows that less messes up the display of bold and underlines UTF-8 characters
and sends malformed UTF-8 sequences to the terminal instead, because of a simple
bug that is fixed by the patch posted in

  http://mail.nl.linux.org/linux-utf8/2001-05/msg00023.html

Comment 5 Karsten Hopp 2002-06-18 19:05:38 EDT
fixed in less-358-27 
Comment 6 John Levon 2003-09-08 22:13:27 EDT
What's the story here ? RH8 still appears to have this bug (i.e.
with LANG=en_GB.UTF-8, man pages are junk where the boldification should be)
Comment 7 Miloslav Trmac 2004-03-08 08:42:53 EST
Fix confirmed in less-378-11.1

Note You need to log in before you can comment on or make changes to this bug.