Red Hat Bugzilla – Bug 108991
non UTF-8 encoding of man-pages
Last modified: 2007-11-30 17:10:33 EST
Version-Release number of selected component (if applicable):
man-pages-1.53-3 (RedHat 9, fully up2date system)
How reproducible: Always
Description of problem:
The following files:
rpm -q --filesbypkg man-pages | sed "s@.*/usr@/usr@" | while read i;
do zcat "$i" | iconv -f UTF-8 -t ISO-8859-1 2>/dev/null >/dev/null ||
echo "$i"; done
fail conversion from UTF-8 to <anything> since they are not UTF-8
encoded. This is important since this step is performed by man
resulting in "iconv: illegal input sequence at position ####" error
messages from i.e. "man 2 close" et al. I've also seen this error in
Polish language manual pages. Furthermore the above list may be
incomplete as it only catches manpages with invalid chars (thus
obviously not correct) and not possibly correct UTF-8 man pages which
aren't encoded as UTF-8.
Either /usr/bin/nroff should be changed to not use iconv -f UTF-8 or
all man-pages should be converted to UTF-8 (or some auto-detection
This is very annoying as it makes many man pages useless (LC_ALL et
all settings change nothing as the problem lies in the input encoding,
which isn't UTF-8 as expected and not the output encoding which can be
changed via locale settings)
Still present in Fedora Core 1
To reproduce the bug, one can run "man iso_8859_1" in a
gnome-terminal. The character table is just full of question marks
instead of different characters.
The way encoding is handled has changed from RHL 9, which expected the
source character set for the man pages from Western European language
localed to be in ISO-8859-1 (which was then converted to UTF-8).
They do need to be re-encoded.