Bug 189956 - Bad UTF-8 encoding in charset-related man pages
Bad UTF-8 encoding in charset-related man pages
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: man-pages (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Ivana Varekova
Ben Levenson
:
Depends On:
Blocks: FAST4.5APPROVED
  Show dependency treegraph
 
Reported: 2006-04-26 01:16 EDT by Steve Bonneville
Modified: 2007-11-30 17:07 EST (History)
0 users

See Also:
Fixed In Version: RHBA-2006-0647
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-09-14 00:45:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Tar file containing corrected man pages (20.63 KB, application/octet-stream)
2006-04-26 01:17 EDT, Steve Bonneville
no flags Details

  None (edit)
Description Steve Bonneville 2006-04-26 01:16:06 EDT
A number of man pages on character sets installed by man-pages-1.67-7.EL4 in the
directory /usr/share/man/en/man7 have bad UTF-8 encodings and incorrect text.

The affected pages are:
  iso_8859-1.7.gz
  iso_8859-2.7.gz
  iso_8859-7.7.gz
  iso_8859-9.7.gz
  iso_8859-15.7.gz
  iso_8859-16.7.gz
  koi8-r.7.gz

There are two problems.  It appears that originally these pages were
encoded using the charset they documented.  Each page has a sample
table of characters from the charset, and the man page (incorrectly)
claims they'll only display properly when viewed when the locale is
set to use the charset the man page references.

At some point in time, someone attempted to make these pages display
properly in the default UTF-8 environment by converting the sample
column in the table to UTF-8.  However, they did not update the text
of the man page to indicate this.  In addition, the job was botched,
and the man pages were converted under the incorrect assumption that
they *ALL* were encoded in ISO 8859-1 charset.  This means that the
only man page that shows the correct characters in the sample table
is iso_8859-1.7.gz; the rest *always* show the wrong characters now.

I've reencoded the man pages so that the sample tables display the
correct characters when viewed as a UTF-8 document, and I've fixed
the documentation indicating that the characters are only displayed
correctly when viewed in a *UTF-8*-based locale.

Corrected man pages attached.  Please use these or something similar
instead of the ones currently shipped in the man-pages RPM.
Comment 1 Steve Bonneville 2006-04-26 01:17:48 EDT
Created attachment 128238 [details]
Tar file containing corrected man pages
Comment 8 Nick Lamb 2006-08-17 06:58:39 EDT
This bug also occurs on Fedora Core 5

Note that on some (all?) installs there are correctly UTF-8 encoded manual pages
installed in the /en/ language specific directory, but they are not used, or at
least they aren't used on my system.
Comment 9 Steve Bonneville 2006-08-17 13:47:23 EDT
In response to comment #8, look more closely at the table; the characters
displayed may appear correct for iso_8859-1, but not the other man pages.  
Compare the CHAR column to the DESCRIPTION column on the table for the
iso_8859-2.7.gz, for example.  0xA1 is supposed to be a "LATIN
CAPITAL LETTER A WITH OGONEK", but it's displayed in CHAR as an "INVERTED
EXCLAMATION MARK", at least on my test Fedora Core 5 system.  The same
issues on the man page for iso_8859-7 (Greek) may be even more obvious.
Comment 13 Red Hat Bugzilla 2006-09-14 00:45:06 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0647.html

Note You need to log in before you can comment on or make changes to this bug.