A number of man pages on character sets installed by man-pages-1.67-7.EL4 in the directory /usr/share/man/en/man7 have bad UTF-8 encodings and incorrect text. The affected pages are: iso_8859-1.7.gz iso_8859-2.7.gz iso_8859-7.7.gz iso_8859-9.7.gz iso_8859-15.7.gz iso_8859-16.7.gz koi8-r.7.gz There are two problems. It appears that originally these pages were encoded using the charset they documented. Each page has a sample table of characters from the charset, and the man page (incorrectly) claims they'll only display properly when viewed when the locale is set to use the charset the man page references. At some point in time, someone attempted to make these pages display properly in the default UTF-8 environment by converting the sample column in the table to UTF-8. However, they did not update the text of the man page to indicate this. In addition, the job was botched, and the man pages were converted under the incorrect assumption that they *ALL* were encoded in ISO 8859-1 charset. This means that the only man page that shows the correct characters in the sample table is iso_8859-1.7.gz; the rest *always* show the wrong characters now. I've reencoded the man pages so that the sample tables display the correct characters when viewed as a UTF-8 document, and I've fixed the documentation indicating that the characters are only displayed correctly when viewed in a *UTF-8*-based locale. Corrected man pages attached. Please use these or something similar instead of the ones currently shipped in the man-pages RPM.
Created attachment 128238 [details] Tar file containing corrected man pages
This bug also occurs on Fedora Core 5 Note that on some (all?) installs there are correctly UTF-8 encoded manual pages installed in the /en/ language specific directory, but they are not used, or at least they aren't used on my system.
In response to comment #8, look more closely at the table; the characters displayed may appear correct for iso_8859-1, but not the other man pages. Compare the CHAR column to the DESCRIPTION column on the table for the iso_8859-2.7.gz, for example. 0xA1 is supposed to be a "LATIN CAPITAL LETTER A WITH OGONEK", but it's displayed in CHAR as an "INVERTED EXCLAMATION MARK", at least on my test Fedora Core 5 system. The same issues on the man page for iso_8859-7 (Greek) may be even more obvious.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0647.html