Bug 98969 - man 2 close fails with iconv: illegal input sequence at position 1722
Summary: man 2 close fails with iconv: illegal input sequence at position 1722
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: man-pages
Version: 9
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Eido Inoue
QA Contact: Ben Levenson
URL:
Whiteboard:
: 88148 89203 90784 96943 99014 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-07-11 01:08 UTC by Marc MERLIN
Modified: 2007-04-18 16:55 UTC (History)
6 users (show)

Fixed In Version: 1.58-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-08-08 16:22:59 UTC
Embargoed:


Attachments (Terms of Use)

Description Marc MERLIN 2003-07-11 01:08:37 UTC
man 2 close fails with
iconv: illegal input sequence at position 1722

LANG is set to en_US.ISO-8859-1
The problem comes from this line
.\" Modified 2000-07-22 by Nicolás Lichtmaier <nick>

iconv dies on á and the man page works fine if I replace this with 'a'

I don't know enough about man pages to say if it's a bug in the man page itself
(i.e. if 8 bit chars are forbidden in english man pages), or if the problem is
other.

Can you advise?

Comment 1 Eido Inoue 2003-07-11 15:35:39 UTC
The man pages need to be cleaned of latin-1 and converted to utf-8.

Comment 2 Eido Inoue 2003-07-11 18:04:02 UTC
*** Bug 99014 has been marked as a duplicate of this bug. ***

Comment 3 Eido Inoue 2003-07-11 18:11:42 UTC
*** Bug 88148 has been marked as a duplicate of this bug. ***

Comment 4 Eido Inoue 2003-07-11 18:12:35 UTC
*** Bug 89203 has been marked as a duplicate of this bug. ***

Comment 5 Eido Inoue 2003-07-11 18:13:20 UTC
*** Bug 96943 has been marked as a duplicate of this bug. ***

Comment 6 Eido Inoue 2003-07-11 18:14:17 UTC
*** Bug 90784 has been marked as a duplicate of this bug. ***

Comment 7 acount closed by user 2003-07-11 18:15:29 UTC
maybe this utf problem is gone with latest man-pages 1.56 :-?

Comment 8 Eido Inoue 2003-07-11 18:19:46 UTC
*** Bug 89629 has been marked as a duplicate of this bug. ***

Comment 9 acount closed by user 2003-07-18 22:17:45 UTC
manpages has new version 1.57 with more pages:

Differences from version 1.56:

    The man pages

        epoll_create.2 epoll_ctl.2 epoll_wait.2 getresuid.2 ioctl_list.2
        lookup_dcookie.2 mmap.2 open.2 poll.2 semop.2 semtimedop.2

        cabs.3 cabsf.3 cabsl.3 cacos.3 cacosh.3 cacoshf.3 cacoshl.3 carg.3
        cargf.3 cargl.3 casin.3 casinf.3 casinh.3 casinhf.3 casinhl.3
        casinl.3 catan.3 catanf.3 catanh.3 catanhf.3 catanhl.3 catanl.3
        cbrt.3 cbrtf.3 cbrtl.3 ccos.3 ccosf.3 ccosh.3 ccoshf.3 ccoshl.3
        ccosl.3 cerf.3 cerfc.3 cerfcf.3 cerfcl.3 cerff.3 cerfl.3 cexp2.3
        cexp2f.3 cexp2l.3 cexp.3 cexpf.3 cexpl.3 cimag.3 cimagf.3 cimagl.3
        clog10.3 clog10f.3 clog10l.3 clog2.3 clog2f.3 clog2l.3 clog.3
        clogf.3 clogl.3 conj.3 conjf.3 conjl.3 cpow.3 cpowf.3 cpowl.3
        cproj.3 cprojf.3 cprojl.3 creal.3 crealf.3 creall.3 csin.3 csinf.3
        csinh.3 csinhf.3 csinhl.3 csinl.3 csqrt.3 csqrtf.3 csqrtl.3 ctan.3
        ctanf.3 ctanh.3 ctanhf.3 ctanhl.3 ctanl.3 dlopen.3 encrypt.3 lockf.3
        mtrace.3 rtime.3

        epoll.4

        complex.5 proc.5

        iso_8859-16.7 ip.7

    are new or have been updated. Typographical or grammatical errors
    have been corrected in several other places.


Comment 10 Eido Inoue 2003-08-08 16:22:59 UTC
confirmed that this problem is no longer present with 1.58-1

Comment 11 claw 2003-09-22 19:26:20 UTC
It's not fixed. Try:
export LANG=en_US
man 2 close

The man page for close was not cleaned in /usr/share/man/en/man2. At least, not
as of man-pages-1.60-3.noarch.rpm

Also, why not fix it in shrike? Not applicable to the plain-jane distribution?

Comment 12 Eido Inoue 2003-09-25 22:10:09 UTC
comment 11:

use release 4 in rawhide. release 3 wasn't fixed, true.


Comment 13 Dan Harkless 2003-10-05 08:19:24 UTC
No, it's still broken in 1.60-4:

% rpm -q man-pages
man-pages-1.60-4
% rpm -V man-pages
% man 2 close | head -1
iconv: illegal input sequence at position 1729
% echo $LANG
en_US
% env LANG=C man 2 close | head -1
CLOSE(2)                   Linux Programmer's Manual                  CLOSE(2)
% ls -oF /usr/share/man{/,/en/}man2/close.2.gz
-rw-r--r--    1 root         1811 Dec 13  2001 /usr/share/man/en/man2/close.2.gz
-rw-r--r--    1 root         1809 Sep 24 08:40 /usr/share/man/man2/close.2.gz
% zdiff -u /usr/share/man{/,/en/}man2/close.2.gz
--- -   2003-10-05 01:04:09.309207000 -0700
+++ /tmp/close2.gz.XXXXhuhMPP   2003-10-05 01:04:09.000000000 -0700
@@ -29,7 +29,7 @@
 .\"   corrected description of effect on locks (thanks to
 .\"   Tigran Aivazian <tigran>).
 .\" Modified Fri Jan 31 16:21:46 1997 by Eric S. Raymond <esr>
-.\" Modified 2000-07-22 by Nicol?s Lichtmaier <nick>
+.\" Modified 2000-07-22 by Nicolás Lichtmaier <nick>
 .\"   added note about close(2) not guaranteeing that data is safe on close.
 .\"
 .TH CLOSE 2 2001-12-13 "" "Linux Programmer's Manual"

Is the suggested solution to replace 'man' with an alias of 'env LANG=C man', or
what?

Also, pretty goofy fix for the C version of close.2.  I'd think it'd make more
sense to replace 'á' with 'a' rather than '?'.

Finally, claw's question about why this isn't being fixed with an RH9 RPM was
not addressed...

Comment 14 Eido Inoue 2003-10-06 17:53:07 UTC
> Is the suggested solution to replace 'man' with an alias of 'env LANG=C man', or
> what?

No. The fix solution and rational are described in bug 103214

> Also, pretty goofy fix for the C version of close.2.  I'd think it'd make more
> sense to replace 'á' with 'a' rather than '?'.

No this does not make more sense. Depending on the language and country, how a
non-ASCII letter is transliterated into "non-accent English" is ambiguous. For
example, certain single letters in German become "ss" or "oe", but they DON'T
become this if the same letters are used by another language). There is no
context as to the original language for each non-ASCII word in the man pages

> Finally, claw's question about why this isn't being fixed with an RH9 RPM was
> not addressed...

The rawhide package will install on RH9 without changing the dependencies

Comment 15 Dan Harkless 2003-10-06 21:52:15 UTC
> No. The fix solution and rational are described in bug 103214

Okay, I've now read all of bug 103214, and I still don't know what the solution
is supposed to be.  Considering this bug is marked "CLOSED" yet people are still
getting the failure, could you please be more specific?

One thing that you may not have picked up on is that claw and I and presumably a
whole lot of other users are using LANG=en_US, not the RH9 default setting of
en_US.UTF-8.  With the UTF-8 setting, I was getting serious problems in Perl, my
shell, my terminal emulator, and other programs, so I backed off to en_US.

If I do:

% env LANG=en_US.UTF-8 man 2 close | head -1
CLOSE(2)                   Linux Programmerâs Manual                  CLOSE(2)

then, again, man works, just like if I do LANG=C.  Well, not "just like".  My
terminal program doesn't handle UTF-8, so the three non-ASCII characters that
appear after "Programmer" get shown as an a-circumflex, as above.

The problem is that there's a single /usr/share/man/en directory that assumes
everyone uses en_US.UTF-8, not en_US (or the equivalent for the English locales
of other countries).  Can /usr/bin/nrofff be fixed to properly detect if a
non-UTF-8 locale is being used, and call iconv appropriately?

> No this does not make more sense.

Hmm, that conflicts with your bug 103214 comment 4:

> 2) be "transliterated" for the POSIX locale. That is, convert the "acute a"s 
> and the umlauts into plain ASCII "a" and "u" respectively. Yes, I know that an
> umlaut and a "u" are entirely different things and this is going to upset some
> people who will get their name mangled, but...

Is the difference that the LANG=C man pages in the man-pages RPM are getting
filtered with no user intervention?  If there is a human involved, then
non-ASCII characters should be mapped to a "best-effort" equivalent, just as you
described for u-umlaut.

Even if we are talking pure machine translation, using a "most likely"
translation like á -> a will still convey the most information in the average
case.  I don't see why we should all have to suffer with '?' because there might
be some obscure language with a different transliteration.

> The rawhide package will install on RH9 without changing the dependencies

It wasn't a question of compatibility.  The point is that the man pages are
BROKEN in RH9, so why isn't an update RPM being issued?  Most users aren't
sophisticated enough to find this bug on Bugzilla, understand the vague
references to "rawhide", and go find and install the rawhide RPM.  (And then, as
I've stated, it still doesn't fix the bug.)


Note You need to log in before you can comment on or make changes to this bug.