Bug 189956 - Bad UTF-8 encoding in charset-related man pages
Summary: Bad UTF-8 encoding in charset-related man pages
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: man-pages
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Ivana Varekova
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks: FAST4.5APPROVED
TreeView+ depends on / blocked
 
Reported: 2006-04-26 05:16 UTC by Steve Bonneville
Modified: 2007-11-30 22:07 UTC (History)
0 users

Fixed In Version: RHBA-2006-0647
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-09-14 04:45:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Tar file containing corrected man pages (20.63 KB, application/octet-stream)
2006-04-26 05:17 UTC, Steve Bonneville
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0647 0 normal SHIPPED_LIVE man-pages bug fix update 2007-05-01 14:31:32 UTC

Description Steve Bonneville 2006-04-26 05:16:06 UTC
A number of man pages on character sets installed by man-pages-1.67-7.EL4 in the
directory /usr/share/man/en/man7 have bad UTF-8 encodings and incorrect text.

The affected pages are:
  iso_8859-1.7.gz
  iso_8859-2.7.gz
  iso_8859-7.7.gz
  iso_8859-9.7.gz
  iso_8859-15.7.gz
  iso_8859-16.7.gz
  koi8-r.7.gz

There are two problems.  It appears that originally these pages were
encoded using the charset they documented.  Each page has a sample
table of characters from the charset, and the man page (incorrectly)
claims they'll only display properly when viewed when the locale is
set to use the charset the man page references.

At some point in time, someone attempted to make these pages display
properly in the default UTF-8 environment by converting the sample
column in the table to UTF-8.  However, they did not update the text
of the man page to indicate this.  In addition, the job was botched,
and the man pages were converted under the incorrect assumption that
they *ALL* were encoded in ISO 8859-1 charset.  This means that the
only man page that shows the correct characters in the sample table
is iso_8859-1.7.gz; the rest *always* show the wrong characters now.

I've reencoded the man pages so that the sample tables display the
correct characters when viewed as a UTF-8 document, and I've fixed
the documentation indicating that the characters are only displayed
correctly when viewed in a *UTF-8*-based locale.

Corrected man pages attached.  Please use these or something similar
instead of the ones currently shipped in the man-pages RPM.

Comment 1 Steve Bonneville 2006-04-26 05:17:48 UTC
Created attachment 128238 [details]
Tar file containing corrected man pages

Comment 8 Nick Lamb 2006-08-17 10:58:39 UTC
This bug also occurs on Fedora Core 5

Note that on some (all?) installs there are correctly UTF-8 encoded manual pages
installed in the /en/ language specific directory, but they are not used, or at
least they aren't used on my system.

Comment 9 Steve Bonneville 2006-08-17 17:47:23 UTC
In response to comment #8, look more closely at the table; the characters
displayed may appear correct for iso_8859-1, but not the other man pages.  
Compare the CHAR column to the DESCRIPTION column on the table for the
iso_8859-2.7.gz, for example.  0xA1 is supposed to be a "LATIN
CAPITAL LETTER A WITH OGONEK", but it's displayed in CHAR as an "INVERTED
EXCLAMATION MARK", at least on my test Fedora Core 5 system.  The same
issues on the man page for iso_8859-7 (Greek) may be even more obvious.

Comment 13 Red Hat Bugzilla 2006-09-14 04:45:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0647.html



Note You need to log in before you can comment on or make changes to this bug.