Bug 146849

Summary: makewhatis from cron - "zcat: stdout: Broken pipe"
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: manAssignee: Ivana Varekova <varekova>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: 13640887, bakers, bookreviewer, dch, dean, grdetil, ihok, kjetilho, mail, mattdm, mwigge, nobody+pnasrat, orion, rhbgz, se, simon.andrews, trevor
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-06 09:22:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Changed makewhatis script.
none
Changed makewhatis script.
none
corrected script
none
Debug output from makewhatis
none
/usr/share/man/man1/ddate.1.gz
none
/usr/share/man/man1/perluts.1.gz
none
list of files which triggered "Broken pipe" on a test run
none
The latest makewhatis script.
none
another variant of makewhatis
none
redirected output from makewhatis in cron.daily
none
Output from updated makewhatis
none
/var/cache/man/whatis
none
Used patch.
none
Updated patch to silence broken pipe messages
none
The last version of makewhatis. none

Description Michal Jaegermann 2005-02-02 02:24:31 UTC
Description of problem:

Observed already on a number of machines.  After a weekly
run of makewhatis from cron I am getting in mail:

Subject: Cron <root@zeno> run-parts /etc/cron.weekly
X-Cron-Env: <SHELL=/bin/bash>
X-Cron-Env: <PATH=/sbin:/bin:/usr/sbin:/usr/bin>
X-Cron-Env: <MAILTO=root>
X-Cron-Env: <HOME=/>
X-Cron-Env: <LOGNAME=root>
X-Cron-Env: <USER=root>

/etc/cron.weekly/00-makewhatis.cron:


zcat: stdout: Broken pipe

zcat: stdout: Broken pipe

... and 3560 times like the above.

Curiously enough I am not able to reproduce that from a command
line even with a wrapper which sets PATH as above, i.e to
/sbin:/bin:/usr/sbin:/usr/bin.  Also LANG="", or LANG=C, which
look like other likely candidates for the culprit, do not seem
to have any ill effects.  Something expects tty?

I do not have in /etc/cron.weekly/ anything but
00-makewhatis.cron and 0anacron.  The later is doing nothing
as the machine is up all the time.

Version-Release number of selected component (if applicable):
man-1.5o1-7

Comment 1 Eido Inoue 2005-02-02 18:51:47 UTC

*** This bug has been marked as a duplicate of 64836 ***

Comment 2 Michal Jaegermann 2005-02-05 16:41:34 UTC
Bug #64836 was closed, but only for RHEL3, and I was told by
a person responsible for closing to reopen this bug.

Comment 3 Trevor Cordes 2005-02-06 18:22:56 UTC
This bug just started happening to me about 2 weeks ago as well.  I'm
FC3 and it must be some recent update (I keep fully updated at least
every 2 weeks) that did it as I've never seen this before in my life.
 It is also occurring on another machine I administer which is nearly
identical to my own.

/etc/cron.weekly/00-makewhatis.cron:
zcat: stdout: Broken pipe

man-1.5o1-7.  Strange, but this does not yet seem to occur on other
machines I administer that are also nearly identical (and are using
man-1.5o1-7).  But those machines may not have been rebooted since
updating that package.



Comment 4 Simon Andrews 2005-02-10 11:08:16 UTC
Just wanted to add a "me too" to this bug.  All of our 4 FC3 servers are producing these errors in the 
cron logs.  These are diverse machines, but are all fully updated and recently rebooted.

I also can't reproduce this from the command line where makewhatis works fine.

Does anyone know what was done in RHEL to fix this bug?  The linked report doesn't say.

Comment 5 Jan Houtsma 2005-02-27 10:53:33 UTC
another me too on FC3 on 2 different independent and u2date machines.

Comment 6 Niki W. Waibel 2005-02-28 10:23:36 UTC
(another me too, on 2 FC3 servers)

maybe it happens because
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
the daily and weekly stuff is running
at the same time...

starting
/etc/cron.weekly/00-makewhatis.cron & /etc/cron.daily/00-makewhatis.
cron &
a couple of times after each other makes sure that the locking 
mechanism is really *not* working properly:
===
# ps -ef | grep mak
root     15608  3779  0 11:15 pts/0    00:00:00 /bin/bash /etc/cron.
weekly/00-makewhatis.cron
root     15612 15608  6 11:15 pts/0    00:00:02 /bin/bash /usr/bin/
makewhatis -w
root     21150  3779  0 11:16 pts/0    00:00:00 /bin/bash /etc/cron.
weekly/00-makewhatis.cron
root     21155 21150  5 11:16 pts/0    00:00:00 /bin/bash /usr/bin/
makewhatis -w
root     24037  3779  0 11:16 pts/0    00:00:00 /bin/bash /etc/cron.
weekly/00-makewhatis.cron
root     24048 24037  5 11:16 pts/0    00:00:00 /bin/bash /usr/bin/
makewhatis -w
root     25842  3779  0 11:16 pts/0    00:00:00 /bin/bash /etc/cron.
weekly/00-makewhatis.cron
root     25849 25842  4 11:16 pts/0    00:00:00 /bin/bash /usr/bin/
makewhatis -w
root     31874 25849  0 11:16 pts/0    00:00:00 /bin/bash /usr/bin/
makewhatis -w
root     31876  3779  0 11:16 pts/0    00:00:00 grep mak
root     31885 24048  0 11:16 pts/0    00:00:00 /bin/bash /usr/bin/
makewhatis -w
===

not strange enough it seems --- i could not reproduce the zcat errors 
which i can also see in the weekly mails.

it could also have sthg to do with the filedescriptors opened by the 
process:
===
# ls -al /proc/12611/fd/
total 5
dr-x------  2 root root  0 Feb 28 11:21 .
dr-xr-xr-x  3 root root  0 Feb 28 11:21 ..
lrwx------  1 root root 64 Feb 28 11:21 0 -> /dev/pts/0
l-wx------  1 root root 64 Feb 28 11:21 1 -> /tmp/whatis.j12613
lrwx------  1 root root 64 Feb 28 11:21 10 -> /dev/pts/0
lrwx------  1 root root 64 Feb 28 11:21 2 -> /dev/pts/0
lr-x------  1 root root 64 Feb 28 11:21 255 -> /usr/bin/makewhatis
===
not really sure what cron does... but then it would happen every day 
i'd geuess.

Comment 7 Trevor Cordes 2005-03-05 16:13:18 UTC
Strange, but I haven't seen this bug for a couple of weeks now, and
I've been watching pretty closely for it.  Perhaps it got fixed by
some other update?


Comment 8 Michal Jaegermann 2005-03-05 17:03:11 UTC
I do not think so.  I am seeing that regularly on various machines
although I still have no idea what is really the cause and I never
managed to reproduce it from a command line.  A recent comment
on bug #64836 seems to indicate that the matter is open there too
despite of a formally CLOSED status.

I guess that I will add on the top of 00-makewhatis a test like this

running=$(pgrep -f makewhatis); [ "$running" ] && exit 0

to see if a suggestion from a comment #6 really pinpoints the case.

The only real difference between /etc/cron.daily/00-makewhatis.cron
and /etc/cron.weekly/00-makewhatis.cron is that the former is
using 'makewhatis -u -w' and the later 'makewhatis -w'.  Why
/etc/cron.weekly/00-makewhatis.cron is needed at all?


Comment 9 Michal Jaegermann 2005-03-05 18:00:35 UTC
Er.... make the above "running=$(pgrep makewhatis)" or we will pick
up 00-makewhatis too.  This will "fix" the problem but not exactly
in a desired way. :-)

Comment 10 Orion Poplawski 2005-03-07 16:38:57 UTC
Me too on lots of up to date FC3 machines.

Comment 11 Michal Jaegermann 2005-03-07 17:08:38 UTC
Could somebody explain to me why /etc/cron.weekly/00-makewhatis.cron
is needed at all.  It looks to me more and more that this is
what is really causing the problem in conjunction with
/etc/cron.daily/00-makewhatis.cron.  If there are indeed different
options needed from time to time then a construct like

[ "$(date +%u)" = 7 ] && opts="some opts" || opts="some other ones"

should do regardless of a locale.

A proposition to those who are hit by this.  If you will turn off
/etc/cron.weekly/00-makewhatis.cron ('exit 0' as the first executable
line will do nicely) then what happens?

Comment 12 Trevor Cordes 2005-03-13 14:40:23 UTC
An update: I still swear that I have not received any indication this
bug still exists on my main system (for several weeks now), even
though it was doing it before that.  Besides reboots and normal yum
updates, I have not done anything to try to fix the bug.

However, 3 of the servers I administer just sent me emails showing the
bug.  They all have the time of Sunday 4:22am and it says from
cron.weekly, so I think you guys are on the right track.  I just wish
I could figure out what I've done differently on my own system!

I'm going to tweak 00-makewhatis.cron on one of these buggy systems to
see what happens.  I'll report back next week. (note to self: DufDen)


Comment 13 Michal Jaegermann 2005-03-13 16:10:17 UTC
> I still swear that I have not received any indication this
> bug still exists on my main system ...

That would support an idea that there is not so subtle race there.
Where and why is somewhat a mystery as there seem to be enough of
a time gap between "daily" and "weekly" runs and
   [ -f $LOCKFILE ] && exit 0
is supposed to prevent exactly that.  For some reasons does not seem
to be effective.

I turned off /etc/cron.weekly/00-makewhatis.cron on these few
machines I can watch, as described in a comment #11, and so far
I did not observe any recurrence of the problem.  Not that
surprisingly timestamps on /var/cache/man/whatis are current.

Comment 14 Ivana Varekova 2005-03-18 13:34:14 UTC
Hello,
I tried to reproduce this bug, but I was unsuccesful. 
Can somebody write versions of gzip, man and vixie-cron packages (and anacron
package if it is installed and used), which are on some system which makes this bug.

/etc/cron.weekly/00-makewhatis.cron is necessary file.
/etc/cron.daily/00-makewhatis.cron use command "makewhatis -u -w" which
registers those changes which happened last day (there is fault in manwhatis
man-page it will be repaired soon). If your computer is not running one day in
the right time, cron.daily will not be executed and whatis database will not be
complete. 
There were bug in makewhatis
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140207). If you use option
-u  this command update only changes which are less then 1 minute old. If you
use older version of man package your daily makewhatis update is probably wrong
(incomplete) so weekly update is necessary. This problem is fixed in man-1.5p-2. 

Ivana Varekova

Comment 15 Trevor Cordes 2005-03-18 14:47:29 UTC
On a system that still shows the bug:
#rpm -q gzip man vixie-cron
gzip-1.3.3-13
man-1.5o1-7
vixie-cron-4.1-24_FC3
anacron-2.3-32

On my system that used to but no longer shows the bug:
#rpm -q gzip man vixie-cron
gzip-1.3.3-13
man-1.5o1-7
vixie-cron-4.1-24_FC3
anacron-2.3-32

As you can see, they are the same packages.  I'll see this Sunday what happens
with my test computer where I modified the cron.weekly script to exit without
doing anything.


Comment 16 Ivana Varekova 2005-04-13 08:27:49 UTC
Trevor,
thank you for your comment.
I can't reproduce this bug, but I think the attached makewhatis should be correct.
Can anybody test it?
Thank you.
Ivana Varekova

Comment 17 Ivana Varekova 2005-04-13 08:31:03 UTC
Created attachment 113083 [details]
Changed makewhatis script.

Comment 18 Trevor Cordes 2005-04-13 14:19:13 UTC
I'll test it out on the last couple machines that are showing the bug.

FYI, the test machine where I edited the cron.weekly script to exit did not show
the bug symptoms after the edit.


Comment 19 Steve Fox 2005-04-13 14:52:22 UTC
I'm still seeing this error too, so I can test it as well. I have not modified
any cron scripts so I should be able to provide a "clean" system test.

Comment 20 Steve Fox 2005-04-14 14:46:07 UTC
I now see this nightly instead of weekly.

/etc/cron.daily/00-makewhatis.cron:


zcat: stdout: Broken pipe

zcat: stdout: Broken pipe

zcat: stdout: Broken pipe



Comment 21 Michal Jaegermann 2005-04-15 00:51:49 UTC
Hm. it appears that the only change between what is released and 'makewhatis'
from a comment #17 which could be responsible for what is described in comment
#20 is that one:

-            if ${cat} ${x} | iconv -f utf-8 -t utf-8 -o /dev/null 2>/dev/null
+            if (${cat} ${x} | iconv -f utf-8 -t utf-8 -o /dev/null) 2>/dev/null

It is a semantic change and cron seems to be doing something unexpected
to stdio.  But I do not understand why it would have such effect.


Comment 22 Michal Jaegermann 2005-04-15 00:54:24 UTC
I should add that from the time I turned off makewhatis in cron.weekly
I did not see "zcat: stdout: Broken pipe" on any machine where I did that
and with the current /usr/bin/makewhatis

Comment 23 Ivana Varekova 2005-04-15 08:17:59 UTC
Hello Steve,
which man version do you use? This problem may be caused by new man version. 
(In older versions (man-1.5p-1 and older) there was bug 140207. It causes that
cron.daily (it uses command makewhatis -u -w) updates whatis database with that
man-pages which was "really" new. In most cases this command does nothing. In
fixed version this command updates database with one day old man pages. So it
should produce the same error as weekly update.)
If you update man makewhatis script was rewriten and you don't use the version
from comment 17.
Can you please write which man version do you use and wheather you use
makewhatis script from comment 17.
Thank you.

Ivana Varekova

Comment 24 Ivana Varekova 2005-04-15 08:23:53 UTC
Hello Michal,
which man version do you use?
If you use man-1.5p-1 or older version your whatis database is not updated. If
you turned off makewhatis in cron.weekly whatis database is updated only with
cron.daily and this uses makewhatis -u -w which contains bug (see comment 23 or
bug 140207). 
Ivana

Comment 25 Steve Fox 2005-04-15 14:51:57 UTC
I use man-1.5o1-7 should be the latest FC3 errata version. My system is pretty
much unmodified. Yes, I am using the new makewhatis from comment 17.

Comment 26 Michal Jaegermann 2005-04-15 15:35:39 UTC
> Hello Michal, which man version do you use?

I also have installed man-1.5o1-7, i.e. current for FC3.  As for
/var/cache/man/whatis not getting updated I indeed did not check if
putting new manpages into a fray will get reflected there.  OTOH timestamps
on this file do change every morning and its content, which is a plain text,
seem to look quite sane.  Also responses from 'man -k' are what you would
expect.

Comment 27 Michal Jaegermann 2005-04-17 17:27:10 UTC
How about instead of a clearly racy arrangement of
/etc/cron.daily/00-makewhatis.cron and /etc/cron.weekly/00-makewhatis.cron,
which assumes that some lock file will be created in time so these scripts will
not stomp on each other, to have only one script in /etc/cron.daily/ but
which does something like that:

#!/bin/bash

[ 7 = "$(date +%u)" ] && mopts="-w" || mopts="-u -w"
....
makewhatis $mopts
exit 0

Regardless of other possible issues this should clear at least this mess.

Comment 28 Trevor Cordes 2005-04-18 02:44:00 UTC
Comment #17 didn't fix it on one of the boxes I administer (I double-checked the
new script was in place).  However, on another test box it did seem to fix it
(this week, at least).

Strange, but my personal workstation started giving me the errors again, even
though it hadn't been for a while.  The only difference I can think of is I ran
a bunch of updates mid-week, including a new glibc and kernel, though I have yet
to reboot (way too many windows open).  Admittedly, upgrading glibc and not
rebooting isn't the brightest thing in the world, but the info may help point to
the root cause of this error.


Comment 29 Ivana Varekova 2005-04-18 15:22:13 UTC
Michal,
in command makewhatis -w -u change your /var/cache/man/whatis file, but this
command add only information about that man pages which was changed last second
(this command is executed daily). So timestamp is increased, file looks sane,
but there is missing information about new man-pages.

Comment 30 Michal Jaegermann 2005-04-18 15:59:40 UTC
See comment #27.  On a machine where I left cron setup intact I was hit
again by "thousands of broken pipes".  As this is "on again, off again" this
really looks like a race and AFAIK nobody reported it when only one of these
cron scripts is operational.  Closing a race window looks like a no-brainer
and changes from comment #17 do not really have much to do with that.

'man makewhatis' says:

       -u     Update database with new pages.
       -w     Use manpath obtained from 'man --path'

which to me does not make really clear a difference between '-w' and '-u -w'
(possible bugs notwithstanding).


Comment 31 Ivana Varekova 2005-04-28 13:54:05 UTC
Created attachment 113779 [details]
Changed makewhatis script.

I still can't reproduce this problem. I need some more information about
makewhatis behaviour. Can anybody who detect this problem exchange man's
makewhatis file with attached makewhatis file. (This file produce more detailed
output.) And attach (or send me) cron output error message.
Thank you very much.
Ivana Varekova

Comment 32 Trevor Cordes 2005-04-28 14:36:48 UTC
I'll test it.  The bug always occurs on Sunday so I'll report back then.  BTW,
diff against your new script shows that it's not much different.


Comment 33 Ivana Varekova 2005-04-28 14:56:07 UTC
Trevor, thank you. In the new makewhatis there is only one new command which
display more information about script behavior. 
Ivana Varekova

Comment 34 Michal Jaegermann 2005-04-28 15:39:38 UTC
A script from an attachment to comment #31 has a bug.  Namely it does
'${cat} ${x} --quiet' where ${cat} can be either zcat, bzcat or cat; at
least for now.  The problem is that cat does not have an option --quiet.

This can be fixed by putting this option, and/or any other desired option
which makes sense for a given utility, into a 'cat=...' assignment.  Like that:

               if [ ${x%.gz} != ${x} ]
               then
                  cat="zcat --quiet"
               elif [ ${x%.bz2} != ${x} ]
               then
                  cat="bzcat --quiet"
               else
                  cat=cat
               fi

               if ${cat} ${x} | .....

Comment 35 Ivana Varekova 2005-04-29 08:09:59 UTC
Created attachment 113822 [details]
corrected script

Michal, you are right, I forgot this change. There should not be --quiet. Thank
you. Here is the correct script.
Trevor, can you please use this script or delete --quite. Thank you very much.
Sorry for my error.
Ivana Varekova

Comment 36 Michal Jaegermann 2005-04-29 17:17:55 UTC
On one of my machines I replaced 'makewhatis -u -w' by 'makewhatis -w' in
/etc/cron.daily/00-makewhatis.cron.  The effect was that today I found in
mail from cron "zcat: stdout: Broken pipe" repeated 74 times.  Of course
I cannot reproduce that feat if I am running that directly from a command
line instead of cron.  At least 74 is better than the original 3560. :-)

It appears in any case that whatever that is it is not a race as I originally
suspected; or at least not in this place.  Also for those who are trying to
hunt that down it seems to give a way to try to reproduce that more often
than once per week.

Comment 37 Ivana Varekova 2005-05-02 12:53:13 UTC
Trevor, can you send me your last cron.weekly message please? 
Ivana Varekova


Comment 38 Ivana Varekova 2005-05-02 13:41:33 UTC
Hi Michal,
can you please add command: 
set -xET
to your makewhatis script (as a second line) (see attachment 113822 [details], you can add
this command or use script 113822) and send me the part of your cron.daily
report concerning makewhatis.
(set command only print more information about makewhatis run).
Ivana Varekova

Comment 39 Orion Poplawski 2005-05-02 15:15:01 UTC
Created attachment 113924 [details]
Debug output from makewhatis

This shows the zcat: stdout: Broken pipe error.

Comment 40 Trevor Cordes 2005-05-02 16:33:50 UTC
Sorry, I updated my script but forgot to chmod 755 so the cron execution failed.
 I've 755'd it now but will have to wait a week for the results.

Comment 41 Ivana Varekova 2005-05-03 12:37:54 UTC
Orion,
can you attach your man-pages:
/usr/share/man/man1/ddate.1.gz
and 
/usr/share/man/man1/perluts.1.gz.
Thank you.
Ivana Varekova


Comment 42 Orion Poplawski 2005-05-03 14:34:25 UTC
Created attachment 113962 [details]
/usr/share/man/man1/ddate.1.gz

Comment 43 Orion Poplawski 2005-05-03 14:35:28 UTC
Created attachment 113963 [details]
/usr/share/man/man1/perluts.1.gz

Comment 44 Michal Jaegermann 2005-05-03 16:03:29 UTC
Created attachment 113975 [details]
list of files which triggered "Broken pipe" on a test run

I put in my crontab '*/10 * * * * /etc/cron.weekly/00-makewhatis.cron'
and 'set -xET' in /usr/bin/makewhatis.	In roughly fifteen minutes
after that I had two traces, each 74M in size from these runs after
which I turn that crontab entry off. :-)  The third trace with
'set -xET' I got from a regular overnight run where I used
'makewhatis -w' in /etc/cron.daily/00-makewhatis.cron instead of '-u -w'.

Each of those runs reported 'zcat: stdout: Broken pipe' but a number
of occurences were different each time - 72, 76 and 73 respectively.
All traces, outside of size, are very similar to what was already
attached to comment #39 and this 'Broken pipe' always happens in the
same place.  A list manpages from the first run _after_ which this
"Broken pipe" on an awk program (apparently) was reported is attached.

Despite of count differences this list is quite stable.  Here are
differences in a list from the first run and the second

--- list1	2005-05-03 09:38:28.000000000 -0600
+++ list2	2005-05-03 09:38:36.000000000 -0600
@@ -17,6 +17,7 @@
 /usr/share/man/man1/zshcompsys.1.gz
 /usr/share/man/man1/perlport.1.gz
 /usr/share/man/man1/gpg.1.gz
+/usr/share/man/man1/perlcall.1.gz
 /usr/share/man/man1/perlmodlib.1.gz
 /usr/share/man/man1/perlos2.1.gz
 /usr/share/man/man1/screen.1.gz
@@ -25,6 +26,7 @@
 /usr/share/man/man1/tcsh.1.gz
 /usr/share/man/man1/groffer.1.gz
 /usr/share/man/man1/cdrecord.1.gz
+/usr/share/man/man1/curl.1.gz
 /usr/share/man/man1/cvs.1.gz
 /usr/share/man/man1/perltoc.1.gz
 /usr/share/man/man1/perlfunc.1.gz
@@ -48,6 +50,7 @@
 /usr/share/man/man3/Math::BigInt.3pm.gz
 /usr/share/man/man3/DBI::DBD.3pm.gz
 /usr/share/man/man3/Config.3pm.gz
+/usr/share/man/man3/Tcl_Eof.3.gz
 /usr/share/man/man4/ethereal-filter.4.gz
 /usr/share/man/man5/terminfo.5.gz
 /usr/share/man/man5/muttrc.5.gz
@@ -70,3 +73,4 @@
 /usr/share/man/manl/dhsein.l.gz
 /usr/share/man/manl/dhsein.l.gz
 /usr/share/man/manl/dhsein.l.gz
+/usr/share/man/manl/dhsein.l.gz

and here the same between the first and the third

--- list1	2005-05-03 09:38:28.000000000 -0600
+++ list3	2005-05-03 09:38:43.000000000 -0600
@@ -17,6 +17,7 @@
 /usr/share/man/man1/zshcompsys.1.gz
 /usr/share/man/man1/perlport.1.gz
 /usr/share/man/man1/gpg.1.gz
+/usr/share/man/man1/perlcall.1.gz
 /usr/share/man/man1/perlmodlib.1.gz
 /usr/share/man/man1/perlos2.1.gz
 /usr/share/man/man1/screen.1.gz
@@ -25,6 +26,7 @@
 /usr/share/man/man1/tcsh.1.gz
 /usr/share/man/man1/groffer.1.gz
 /usr/share/man/man1/cdrecord.1.gz
+/usr/share/man/man1/zshcontrib.1.gz
 /usr/share/man/man1/cvs.1.gz
 /usr/share/man/man1/perltoc.1.gz
 /usr/share/man/man1/perlfunc.1.gz
@@ -53,10 +55,8 @@
 /usr/share/man/man5/muttrc.5.gz
 /usr/share/man/man5/smb.conf.5.gz
 /usr/share/man/man7/groff_mdoc.7.gz
-/usr/share/man/man7/mdoc.samples.7.gz
 /usr/share/man/man7/groff_diff.7.gz
 /usr/share/man/man8/pppd.8.gz
-/usr/share/man/man8/smartd.8.gz
 /usr/share/man/man8/tcpdump.8.gz
 /usr/share/man/man8/lsof.8.gz
 /usr/share/man/man8/mkisofs.8.gz
@@ -70,3 +70,4 @@
 /usr/share/man/manl/dhsein.l.gz
 /usr/share/man/manl/dhsein.l.gz
 /usr/share/man/manl/dhsein.l.gz
+/usr/share/man/manl/dhsein.l.gz

Yes, this 'dhsein.l.gz' indeed shows up in traces multiple times.

I case somebody does a similar traces here is a filter which gets file
names from an output with '-x' is shell set:

#!/usr/bin/perl

while (<>) {
  if (m{zcat: stdout: Broken pipe}) {
    $saved =~ s{^\+ echo }{};
    print $saved;
  }
  $saved = $_ if m{echo /usr/share/man/man};
}

Comment 45 Michal Jaegermann 2005-05-03 16:56:20 UTC
I still do not know what is the culprit for the observed trouble
but searching for it I looked closer at /usr/bin/makewhatis
and it includes a rather opaque and baroque piece of awk which repeats
work done already few lines earlier, uses undocumented and therefore
not guaranteed to work awk features, and is trying hard to parse
various variations of manpages which is bound to screw up one day
no matter how big "conditional mess" you are going to heap over
there in time.

OTOH there is already a program which is designed to deal with
manpages.  It is called 'man'.  AFAICS that stuff in question really
tries hard to get a content of a "NAME" section and print it.  The
following piece of shell and awk deals with this task:

for m in $@ ; do
    man -- $m | awk '
	        $1 ~ /^XXX/ {next;}
		NF > 0 && flag == 1 { print;
			   while (getline > 0) {
				   if (NF == 0) {
					   exit;
				   }
				   print;
			   }
			 }
	        NF == 1 && $1 ~ /^[[:alpha:]]/ {flag += 1;}
	     ' 
done  2>/dev/null

You can feed here as an argument a man page compressed in any
way you want or uncompressed.  A language does not matter.  If 'man'
can handle that so can this script (on an assumption that this
is really a manpage).

This is not a complete solution. One needs to handle 'cat' pages too
but this is basically the same thing only you do not need to invoke
'man'.  Also mulitiple lines has to be correctly folded, i.e. paying
an attention if the previous line was terminated with a hyphen or not,
and pages which result in a "multi-name" output like

       uri,  url, urn - uniform resource identifier (URI), including
       a URL or URN

handled properly.  That is quite simple.

Beyond "regular manpages" I run the above also on a collection of
manpages in Polish, Russian and on an output of
'find /usr/share/man/pt_BR/ -type f', as this gives a few files on my
box, and so far it works as expected.

And one more thing.  Guessing a file compression type by looking
at its suffix is broken beyond words.  There is a 'file' utility
for such checks.

Comment 46 Michal Jaegermann 2005-05-03 20:45:19 UTC
Here is an more workable version of a script from the previous comment.
A variable 'handles' can be set to 'man' for troff sources man pages, regardless
if compressed or not, or to 'cat', 'zcat' or 'bzcat' for "cat" pages depending
on how they are compressed.

#! /bin/sh

handler=man

for m in $@ ; do
    $handler -- $m | col -b | awk '
	function fold_lines(add) {
		sub(/^[[:blank:]]+/, "", add);
		if ( sub(/-$/, "", line) > 0) {
			line = line add;
		}
		else {
			line = line " " add;
		}
	}

	$1 ~ /^XXX/ {next;}
	NF > 0 && n == 1 { line = $0;
			   if (match(line, "[[:blank:]]-[[:blank:]]") == 0) {
				   exit;
			   }
			   sub(/^[[:blank:]]+/, "", line);
		           while (getline > 0) {
				   if (NF == 0) {
					   gsub(/[[:blank:]]+/, " ", line);
					   print line;
					   exit;
				   }
				   fold_lines($0);
			   }
			 }
	NF == 1 && $1 ~ /^[[:alpha:]]/ {n += 1;}
	' 
done  2>/dev/null


Comment 47 Ivana Varekova 2005-05-04 11:49:53 UTC
Created attachment 114000 [details]
The latest makewhatis script.

This version should fix this problem. In this version there is use command set
and it produce quite large output.
Can anybody test this version of makewhatis script and attache or send me cron
output error message and attached file /var/cache/man/whatis.
Thank you.
Ivana Varekova

Comment 48 Ivana Varekova 2005-05-04 12:08:15 UTC
Michal, 
can you test the latest makewhatis script?
Thank you.
Ivana Varekova

Comment 49 Michal Jaegermann 2005-05-04 16:30:28 UTC
> Michal, can you test the latest makewhatis script?

Yes, I can.  On two runs with this version, ten minutes apart, 'zcat: stdout: 
Broken pipe' 74 times on the first run and and 76 times on the second one.
In a comparison with a list posted in a comment #44 here are new files which
showed with "Broken pipe" in results:

/usr/share/man/man1/perlcall.1.gz
/usr/share/man/man1/perllocale.1.gz
/usr/share/man/man3/DBI::FAQ.3pm.gz  (only the second time)

while /usr/share/man/man1/perlre.1.gz on one run went through smoothly
and showed up "Broken pipe" another time.


Comment 50 Michal Jaegermann 2005-05-04 16:54:04 UTC
Looking again at traces it looks possible that 'makewhatis' triggers an
obscure bug in 'awk'.  This is only a hypothesis and what that bug may be
I have no idea.  An overflow somewhere?

Finishing what I put in comment #46 into a full replacement of a relevant
/usr/bin/makewhatis fragment seems to be rather simple and quite short job.

Comment 51 Michal Jaegermann 2005-05-05 01:40:47 UTC
Created attachment 114047 [details]
another variant of makewhatis

I replaced /usr/bin/makewhatis by an attached script and I run with
'makewhatis -w' from cron five times in a row.	I did not see a single
"zcat: stdout: Broken pipe".

This script actually creates quite a bit bigger /var/cache/man/whatis
file then the original one but 'man -k' works as expected.  It may also
have some bugs.  I really only tried to check if "Broken pipe" will vanish.
AFAICS it did.

A note:  instead of 'mkw_line man --  $(find ${d}/man$i -type f)' one
can do

   find ${d}/man$i -type f | while read arg ; do
       mkw_line man -- "$arg"
   done

but the first way is likely faster and my bash did not complain about very
long command lines so I did not bother.

I am not sure if two spaces which precede '-' in results was really intended
or just an artifact of how the original was written.  I left it the same way.

Comment 52 Trevor Cordes 2005-05-07 21:16:52 UTC
Created attachment 114131 [details]
redirected output from makewhatis in cron.daily

Not sure if you guys still need this, but here's the complete output from a run
of makewhatis -w >/tmp/makewhatis-daily-`date +"%Y-%m-%d"` 2>&1 in cron.daily

I think Michal is correct in that all that awk should be replaced, or we need
to bring in an awk expert to figure out what is going on.

Comment 53 Michal Jaegermann 2005-05-07 21:48:27 UTC
Nice!  "Broken pipe" 1431 times. :-)

You may actually kill that noise, and I am pretty sure by now that this is
really only a noise caused by buffering and possibly lost signals when
running from cron, by replacing in /usr/bin/makewhatis a line

   ' pages=$pages section=$section verbose=$verbose curdir=$curdir

with

   ' pages=$pages section=$section verbose=$verbose curdir=$curdir 2>/dev/null

Also 'makewhatis -w >/dev/null 2>&1', and similar, in both scripts
/etc/cron.*/00-makewhatis.cron should have the same effect.

Not that this changes very much what I think about the current state of that
script.  Every time I look closer I am finding something new.  The last one
happens to be this: "AWK=/usr/bin/gawk" but 'gawk' is really located in
/usr/bin. Luckily we have also a symlink /usr/bin/gawk.
   

Comment 54 Orion Poplawski 2005-05-09 14:53:50 UTC
Created attachment 114157 [details]
Output from updated makewhatis

Still seeing lots of zcat: stdout: Broken pipe

Comment 55 Orion Poplawski 2005-05-09 14:57:41 UTC
Created attachment 114161 [details]
/var/cache/man/whatis

From affected machine.

Comment 56 Ivana Varekova 2005-05-10 12:58:01 UTC
Created attachment 114201 [details]
Used patch.

I build new version (man-1.5p-5), this version should not produce error message
"Broken pipe". I can't test this problem, but Michal helped with it (thank you
very much). I think there should not be any other problem. Thank you for your
help and your tests.
Please update your man package and reopen this bug, if you find any problem. 
Ivana Varekova

Comment 57 Gilles Detillieux 2005-05-10 17:02:40 UTC
Ivana and Michal, I don't think redirecting ALL of gawk's stderr output to
/dev/null is such a good idea.  It kills any legitimate error reporting from
gawk, including the -v output of the list of files added.  If the real problem
is the broken pipe error messages from zcat or bzcat, why not just hit the
stderr output of these two commands, right within the pipe_cmd command string,
and leave the rest of gawk's stderr output alone?

As for the cause of the problem, I think the race condition between the weekly
and daily scripts was a red herring.  I'm getting the broken pipe messages on
one of my two FC3 boxes, and the daily script is clearly ending well before the
weekly script begins its run on that box.  I can also reproduce the problem from
an interactive shell, simply by typing "trap '' PIPE" followed by "makewhatis -w".

Why does this happen?  Because when the gawk script sets the "done" flag, it
stops reading its input from the pipe, even though zcat is still stuffing bytes
into that pipe.  If zcat is still writing to the pipe after gawk closes its end
of it (when it exits), zcat will either be killed with a SIGPIPE, or if that
signal is ignored (via the trap command or via cron) it will get a write error
and report it.

Why doesn't this happen consistently?  I think it's a timing issue.  If zcat
runs through the file faster than gawk does, it will go away quietly.  I suspect
that on faster processors, though, gawk will have less trouble keeping up with
zcat, and if the file is large enough, zcat won't be done writing it to the pipe
before gawk exits.  So it's a race between zcat and gawk, not the daily and
weekly scripts.

Comment 58 Gilles Detillieux 2005-05-10 17:06:40 UTC
Created attachment 114211 [details]
Updated patch to silence broken pipe messages

Here's my replacement for Ivana's latest patch, which redirects stderr to
/dev/null only at the points where it's really necessary.

Comment 59 Michal Jaegermann 2005-05-10 19:33:47 UTC
> Ivana and Michal, I don't think redirecting ALL of gawk's stderr output to
> /dev/null is such a good idea.

Well, at least to tide you through for now it is not bad.

> As for the cause of the problem, I think the race condition between the weekly
> and daily scripts was a red herring. 

See comment #36 from 2005-04-29.  Actually once I looked closer the reason
why there are no such messages when you are running 'makewhatis -u -w',
like from 'cron.daily', is that with the current system this command is
nothing more than an elaborate copy of whatis database to /tmp and back.
At least nearly always.

I actually had the same idea as in comment #58 and even tested it with
multiple runs from cron.  In trial runs I did not get "Broken pipe" messages
on a machine where I can reproduce these reliably with the original.

This still does not change my opinion about /usr/bin/makewhatis.
I am now aware of other issues in this script and I would be really surprised
if that what I found would be all.

Comment 60 Ivana Varekova 2005-05-11 09:53:38 UTC
Hello Gilles,
I can't test your patch but you are right the solvation which is in makewhatis
is rather hard. If anybody test this patch I will change makewhatis patch and
use this solution.
Thank you.
Ivana

Comment 61 Michal Jaegermann 2005-05-11 14:43:52 UTC
> If anybody test this patch ...
I wrote in comment #59 that I already tested such solution and it works for me.
A confirmation from somebody else would be nice.

Comment 62 Gilles Detillieux 2005-05-11 17:16:43 UTC
Well, for what it's worth, I did test the patch I posted in comment #58, but I guess it would be nice 
to get confirmation from a third party. Ivana, are you not able to reproduce the error on your 
system with the trap command I suggested?  I.e.:

    trap "" PIPE
    makewhatis -w

I think as long as you have a reasonably fast processor (at least 1.8 GHz P4) and at least 256 MB of 
RAM, you're likely to run into this at least occasionally on large man pages.

Comment 63 Trevor Cordes 2005-05-12 09:39:51 UTC
I can test these patches but the attachments here are getting way too confusing.
 Can someone post a *patched* makewhatis script so I can just plunk that in?


Comment 64 Ivana Varekova 2005-05-12 13:46:22 UTC
Created attachment 114294 [details]
The last version of makewhatis.

Hello,
this is the last version of makewhatis script (with changes form comment 58).
I think this changes should be right but I can't reproduce it and it will be
nice if any other person test it. Trevor can you test it? 
Gilles I'm sorry but I still can't reproduce. Thank you for your help and test.

Michal thant you for your help and tests too.
Ivana

Comment 65 Trevor Cordes 2005-05-15 11:22:21 UTC
Attachment from comment 64 has not produced any errors on my test system (the
one that usually shows errors) for the past 3+ nights.  It seems to have fixed
the problem.

Comment 66 Ivana Varekova 2005-05-17 07:21:28 UTC
Thank you for your excellent help.  
This solution is in the last version man-1.5p-6.
If you find any problem, please report it.
Ivana Varekova

Comment 67 Trevor Cordes 2005-06-05 01:00:54 UTC
I have not seen this error since May 15 on any boxes so it appears to have been
fixed.

Comment 68 Jack Tanner 2005-07-17 23:00:16 UTC
I got hit by this tonight on FC3 w/ all updates; specifically, man-1.5o1-7.

I can reproduce this as per comment 62. If the version in rawhide really does
fix this, could you push it out as an update?

Comment 69 Kjetil T. Homme 2005-07-18 07:13:50 UTC
Ivana, your patch, although innocuous is actually missing the point.  the bug
isn't in makewhatis, but in rpm!  the default behaviour for SIGPIPE is to
terminate the process, with no noise.  the process will have to install a signal
handler itself to override it.  not many programs need to do this.

however, in the environment vixie-cron _sometimes_ sets up for its children,
SIGPIPE is set to ignore the signal (SIG_IGN).  "ignore" sounds good, but the
signal will interrupt the call to write() so that write returns -1, and errno
will be set to EPIPE, or "Broken pipe".  sound familiar?

the cause for this is as I mentioned RPM.  the problem occurs after an update of
the vixie-cron package by up2date or rpm.

in ./rpmdb/rpmdb.c, SIGPIPE is set to be ignored before opening the RPM
database.  it is not reset until the database has been closed.  in the meantime,
 some packages will spawn scripts to do pre- or post-install, and these will
inherit the SIGPIPE setting.  in our case, this is crond, which in turn is
passing it on to its children.

hope this helps!

Comment 70 Ivana Varekova 2005-07-18 09:15:36 UTC
Paul could you please write your judgement on this problem (especially on
previous comment).
Ivana Varekova

Comment 71 Markus Wigge 2005-07-29 07:32:32 UTC
Wow, with 'trap "" PIPE' I am able to reproduce this bug too at least...

When it in fact is a rpm-Bug, has anybody filed it there?


Comment 72 Markus Wigge 2005-07-29 07:36:15 UTC
by the way, this also appears on FC4 machines. And on FC4 I noticed it sometimes
in the cron.daily makewhatis script, not only in the weekly part.

Comment 73 Kjetil T. Homme 2005-07-29 10:31:29 UTC
I was wrong in my initial analysis, I think rpm does the right thing and the
real culprit is rhnsd.  See bug 163483.  I also filed a bug against vixie-cron
for not restoring the SIGPIPE handler on startup, bug 163484.

Comment 74 Trevor Cordes 2005-08-14 11:30:57 UTC
This is strange.  I thought this bug to be solved.  I recently ran a full yum
update on a box that was fully updated about 2 months ago.  I had not gotten
this bug on that box in those 2 months.  Just tonight I got the bug again.  This
is the first (of like 14) box I've had this occur on since this bug was "fixed".
 The only thing weird I can think of is a) I didn't update the kernel (I need to
run a 2.6.11 one for now), and b) I didn't reboot yet.  I'll reboot the box and
see if it recurs.

Also strange, it only gave me 50 of the "broken pipe" error lines, when normally
(ie: before the bug was "fixed") I see hundreds.


Comment 75 Ivana Varekova 2005-08-23 12:41:01 UTC
Trevor,
could you please attach your man version.
Thank you.


Comment 76 Trevor Cordes 2005-08-24 06:30:36 UTC
man-1.5o1-7

Oh, reading back it appears this bug is NOT solved in the current FC3/4 errata
but is instead still sitting in rawhide?  This bug didn't warrant an errata? 
Strange how I never see this bug anymore (except this rare occasion) on the 20
machines I administer.


Comment 77 Trevor Cordes 2005-09-03 13:14:55 UTC
Did it again this week.  44 "broken pipe" lines of output.  Same machine.  I'm
going to upgrade that machine to kernel 1376 to see if that helps any.  None of
my other boxes are reporting this error anymore, and all are nearly identical.


Comment 78 Ivana Varekova 2005-09-05 07:31:40 UTC
Trevor, 
is the problematic machine's man version man1.5o1-7? This version has not patch
against this bug. 


Comment 79 Trevor Cordes 2005-09-05 11:56:00 UTC
Yes, man-1.5o1-7.  Sorry, I thought errata for FC3 had been issued as I don't
see the bug on any other machine anymore.


Comment 81 Shiraz Esat 2005-12-28 10:36:52 UTC
I have the same problem on my box.

What I want to know is how important is the whatis database? Is it only used by 
man, apropos and whatis? I'm happy to just use google to get manual pages.

Alternatively, can't I just live with the weekly updates and remove the daily 
ones?

Finally, if I desperately need an updated whatis db, why can't I just update it 
from the command line?

Comment 82 Kjetil T. Homme 2005-12-28 14:10:16 UTC
if you use Google to find the manual pages, you won't necessarily get a manual
page which corresponds to the version you are running on your system.

while you may be content with having outdated manual page information on your
system, this is not really an option for Red Hat.

anyway, I don't understand why it has taken so long to fix the bug, it's still
in RHEL 4.  all that is needed is to add

   trap PIPE

at the top of the script to reset the SIGPIPE handler.

(never mind that the bug should be fixed elsewhere (see comment #69 and comment
#73, adding that line is innocuous enough.)


Comment 83 Shiraz Esat 2005-12-28 18:08:20 UTC
OK - I'm sure RH are working on a fix as we scribble away. It's surely bad for 
their corporate image to have such lame issues hanging around.

BUT: is there a temporary quickfix that doesn't get too technical? How about 
removing /etc/cron.daily/00-makewhatis?

Comment 85 Ivana Varekova 2006-01-02 10:27:01 UTC
This bug report is against fc (which is fixed now), the rhel4 version of this
bug is 170402. In 170402 bug report, there are test versions of man package
(comment 4) and there is corresponding patch (comment 7) too, so you can only
apply this patch if you want. Could you please test this version (the patch is
similar as patch which is in fc) and attach your comment (to 170402). 
Thank you.

If you remove /etc/cron.daily/00-makewhatis.cron file your whatis database will
not be updated properly (daily update is not sufficient), you can update this
database from command line, but it is not so convenient. Daily updates are not
necessary, if you don't insist on them. But I think the best fix is to apply
patch from comment 7 or to use fixed test versions (comment 4).

Comment 86 Markus Wigge 2006-02-13 08:15:36 UTC
Why does this started to reapear last week with FC4?

Comment 87 Ivana Varekova 2006-02-13 08:38:37 UTC
Markus,
which version of man is installed?

Comment 88 Markus Wigge 2006-02-13 08:46:40 UTC
vixie-cron-4.1-41.FC4
man-1.5p-4
man-pages-1.67-8

daily yum updates are activated on all machines.

Comment 89 Ivana Varekova 2006-02-13 10:26:16 UTC
man-1.5p-4 is not fixed version. The first fixed version is man-1.5p-6. Could
you update your man package (devel version is man-1.6c-1.2 now). If there is any
problem in version man-1.5p-6 or newer please reopen this bug. 

Comment 90 Markus Wigge 2006-02-14 08:47:39 UTC
man-1.5p6 seems to be in the updates repository now, some machines already
updated to it. So lets wait for the weekly run to see if it fixed the bug.