Red Hat Bugzilla – Bug 189956
Bad UTF-8 encoding in charset-related man pages
Last modified: 2007-11-30 17:07:24 EST
A number of man pages on character sets installed by man-pages-1.67-7.EL4 in the
directory /usr/share/man/en/man7 have bad UTF-8 encodings and incorrect text.
The affected pages are:
There are two problems. It appears that originally these pages were
encoded using the charset they documented. Each page has a sample
table of characters from the charset, and the man page (incorrectly)
claims they'll only display properly when viewed when the locale is
set to use the charset the man page references.
At some point in time, someone attempted to make these pages display
properly in the default UTF-8 environment by converting the sample
column in the table to UTF-8. However, they did not update the text
of the man page to indicate this. In addition, the job was botched,
and the man pages were converted under the incorrect assumption that
they *ALL* were encoded in ISO 8859-1 charset. This means that the
only man page that shows the correct characters in the sample table
is iso_8859-1.7.gz; the rest *always* show the wrong characters now.
I've reencoded the man pages so that the sample tables display the
correct characters when viewed as a UTF-8 document, and I've fixed
the documentation indicating that the characters are only displayed
correctly when viewed in a *UTF-8*-based locale.
Corrected man pages attached. Please use these or something similar
instead of the ones currently shipped in the man-pages RPM.
Created attachment 128238 [details]
Tar file containing corrected man pages
This bug also occurs on Fedora Core 5
Note that on some (all?) installs there are correctly UTF-8 encoded manual pages
installed in the /en/ language specific directory, but they are not used, or at
least they aren't used on my system.
In response to comment #8, look more closely at the table; the characters
displayed may appear correct for iso_8859-1, but not the other man pages.
Compare the CHAR column to the DESCRIPTION column on the table for the
iso_8859-2.7.gz, for example. 0xA1 is supposed to be a "LATIN
CAPITAL LETTER A WITH OGONEK", but it's displayed in CHAR as an "INVERTED
EXCLAMATION MARK", at least on my test Fedora Core 5 system. The same
issues on the man page for iso_8859-7 (Greek) may be even more obvious.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.