From Bugzilla Helper: User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.3-0.3.cl i686; Nav) Description of problem: The display of man pages in UTF-8 locales fails, because groff is not called with the -Tutf8 option. How reproducible: Always Steps to Reproduce: 1. Open a UTF-8 shell, for instance with LANG=en_GB.UTF-8 xterm -fn '-Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1' 2. Enter man man man groff_char Actual Results: Soft hyphens at line ends get ISO 8859-1 encoded, even though we are in a UTF-8 locale, which garbles them. Most of the non-ASCII characters listed on the groff_char page get garbled, as one would expect if the encoding used by groff and the locale don't match. Expected Results: The man and groff_char pages should have been displayed flawlessly, without occurences of the default character (dashed box) that signals a malformed UTF-8 sequence. Additional info: The fix is simple: In file /etc/man.config, change the line NROFF /usr/bin/groff -Tlatin1 -mandoc to NROFF /usr/bin/nroff -mandoc Note that nroff is already a shell script that tests the encoding used in the current locale (using "locale charmap") and then calls groff with the appropriate -T option. Note that the less version that you ship has a bug with bold and underlining in UTF-8 locales. A fix has been posted: http://mail.nl.linux.org/linux-utf8/2001-05/msg00023.html and will hopefully be soon integrated in a new release of "less".
Fixed in man 1.5i-5; assigning to less so the patch you mentioned can be included in our next build.
The patch isn't necessary as we already have an i18n patch for that. The garbled characters seem to be caused by incomplete fonts. Try 'export PAGER=cat' and you'll see the same result.
Reply to comment by karsten: My bug report was definitely *not* related to garbled fonts. The xterm I use is perfectly able to display standard UTF-8 test files such as http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt The problem with "less" that was fixed in the quoted patch is a bug in the original UTF-8 support in less, which caused a missinterpretation of bold-by-backspace in UTF-8 mode. This is a problem of less, not my terminal emulator or the font. (There might be an additional problem with your fonts, but that is a separate issue, use the xterm command line options shown above.) The less problem can be reproduced as follows: Test case: perl -e 'use utf8; print "a\ba_\bb\n"' | less correctly shows a bold "a" and an underlined "b", but perl -e 'use utf8; print "\x{20ac}\b\x{20ac}_\b\x{2203}\n"' | less fails to show either a bold euro sign or an underlined there-exists sign. (Perl 5.6 or newer required here). Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
I just confirmed that this bug is still present in the less-358-21 RPM that comes with Red Hat Linux 7.2. It is still a bug and you didn't fix it yet. To demonstrate to karsten again, why your reply was incorrect: It is *not* a font problem, because perl -e 'use utf8; print "\x{20ac}\x{2203}\n"' | less does show on my xterm in UTF-8 mode the euro and exists sign correctly, but perl -e 'use utf8; print "\x{20ac}\b\x{20ac}_\b\x{2203}\n"' | less shows that less messes up the display of bold and underlines UTF-8 characters and sends malformed UTF-8 sequences to the terminal instead, because of a simple bug that is fixed by the patch posted in http://mail.nl.linux.org/linux-utf8/2001-05/msg00023.html
fixed in less-358-27
What's the story here ? RH8 still appears to have this bug (i.e. with LANG=en_GB.UTF-8, man pages are junk where the boldification should be)
Fix confirmed in less-378-11.1