108991 – non UTF-8 encoding of man-pages

Bug 108991 - non UTF-8 encoding of man-pages

Summary: non UTF-8 encoding of man-pages

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	man-pages
Sub Component:
Version:	1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Eido Inoue
QA Contact:	Ben Levenson
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-11-04 02:42 UTC by Maciej Żenczykowski
Modified:	2007-11-30 22:10 UTC (History)
CC List:	2 users (show)
Fixed In Version:	1.64-1
Clone Of:
Environment:
Last Closed:	2003-12-15 22:12:56 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Maciej Żenczykowski 2003-11-04 02:42:42 UTC

Version-Release number of selected component (if applicable):
man-pages-1.53-3 (RedHat 9, fully up2date system)

How reproducible: Always

Description of problem:

The following files:
//generated via
rpm -q --filesbypkg man-pages | sed "s@.*/usr@/usr@" | while read i;
do zcat "$i" | iconv -f UTF-8 -t ISO-8859-1 2>/dev/null >/dev/null ||
echo "$i"; done

/usr/share/man/man2/close.2.gz
/usr/share/man/man2/getdomainname.2.gz
/usr/share/man/man2/getrlimit.2.gz
/usr/share/man/man2/madvise.2.gz
/usr/share/man/man2/sysinfo.2.gz
/usr/share/man/man2/time.2.gz
/usr/share/man/man2/umask.2.gz
/usr/share/man/man3/encrypt.3.gz
/usr/share/man/man3/fclose.3.gz
/usr/share/man/man3/fcloseall.3.gz
/usr/share/man/man3/fflush.3.gz
/usr/share/man/man3/lockf.3.gz
/usr/share/man/man3/printf.3.gz
/usr/share/man/man3/rand.3.gz
/usr/share/man/man3/strtok.3.gz
/usr/share/man/man3/toupper.3.gz
/usr/share/man/man3/updwtmp.3.gz
/usr/share/man/man4/st.4.gz
/usr/share/man/man5/environ.5.gz
/usr/share/man/man5/utmp.5.gz
/usr/share/man/man7/glob.7.gz
/usr/share/man/man7/hier.7.gz
/usr/share/man/man7/iso_8859-1.7.gz
/usr/share/man/man7/iso_8859-15.7.gz
/usr/share/man/man7/iso_8859-2.7.gz
/usr/share/man/man7/iso_8859-7.7.gz
/usr/share/man/man7/koi8-r.7.gz
/usr/share/man/man7/suffixes.7.gz

fail conversion from UTF-8 to <anything> since they are not UTF-8
encoded.  This is important since this step is performed by man
resulting in "iconv: illegal input sequence at position ####" error
messages from i.e. "man 2 close" et al.  I've also seen this error in
Polish language manual pages.  Furthermore the above list may be
incomplete as it only catches manpages with invalid chars (thus
obviously not correct) and not possibly correct UTF-8 man pages which
aren't encoded as UTF-8.

Either /usr/bin/nroff should be changed to not use iconv -f UTF-8 or
all man-pages should be converted to UTF-8 (or some auto-detection
code?)...

This is very annoying as it makes many man pages useless (LC_ALL et
all settings change nothing as the problem lies in the input encoding,
which isn't UTF-8 as expected and not the output encoding which can be
changed via locale settings)

Comment 1 Maciej Żenczykowski 2003-11-06 15:02:22 UTC

Still present in Fedora Core 1

Comment 2 Roozbeh Pournader 2003-11-26 18:54:43 UTC

To reproduce the bug, one can run "man iso_8859_1" in a
gnome-terminal. The character table is just full of question marks
instead of different characters.

Comment 3 Eido Inoue 2003-11-27 02:11:49 UTC

The way encoding is handled has changed from RHL 9, which expected the
source character set for the man pages from Western European language
localed to be in ISO-8859-1 (which was then converted to UTF-8).

They do need to be re-encoded.

Note You need to log in before you can comment on or make changes to this bug.