Bug 146849
Description
Michal Jaegermann
2005-02-02 02:24:31 UTC
*** This bug has been marked as a duplicate of 64836 *** Bug #64836 was closed, but only for RHEL3, and I was told by a person responsible for closing to reopen this bug. This bug just started happening to me about 2 weeks ago as well. I'm FC3 and it must be some recent update (I keep fully updated at least every 2 weeks) that did it as I've never seen this before in my life. It is also occurring on another machine I administer which is nearly identical to my own. /etc/cron.weekly/00-makewhatis.cron: zcat: stdout: Broken pipe man-1.5o1-7. Strange, but this does not yet seem to occur on other machines I administer that are also nearly identical (and are using man-1.5o1-7). But those machines may not have been rebooted since updating that package. Just wanted to add a "me too" to this bug. All of our 4 FC3 servers are producing these errors in the cron logs. These are diverse machines, but are all fully updated and recently rebooted. I also can't reproduce this from the command line where makewhatis works fine. Does anyone know what was done in RHEL to fix this bug? The linked report doesn't say. another me too on FC3 on 2 different independent and u2date machines. (another me too, on 2 FC3 servers) maybe it happens because 02 4 * * * root run-parts /etc/cron.daily 22 4 * * 0 root run-parts /etc/cron.weekly the daily and weekly stuff is running at the same time... starting /etc/cron.weekly/00-makewhatis.cron & /etc/cron.daily/00-makewhatis. cron & a couple of times after each other makes sure that the locking mechanism is really *not* working properly: === # ps -ef | grep mak root 15608 3779 0 11:15 pts/0 00:00:00 /bin/bash /etc/cron. weekly/00-makewhatis.cron root 15612 15608 6 11:15 pts/0 00:00:02 /bin/bash /usr/bin/ makewhatis -w root 21150 3779 0 11:16 pts/0 00:00:00 /bin/bash /etc/cron. weekly/00-makewhatis.cron root 21155 21150 5 11:16 pts/0 00:00:00 /bin/bash /usr/bin/ makewhatis -w root 24037 3779 0 11:16 pts/0 00:00:00 /bin/bash /etc/cron. weekly/00-makewhatis.cron root 24048 24037 5 11:16 pts/0 00:00:00 /bin/bash /usr/bin/ makewhatis -w root 25842 3779 0 11:16 pts/0 00:00:00 /bin/bash /etc/cron. weekly/00-makewhatis.cron root 25849 25842 4 11:16 pts/0 00:00:00 /bin/bash /usr/bin/ makewhatis -w root 31874 25849 0 11:16 pts/0 00:00:00 /bin/bash /usr/bin/ makewhatis -w root 31876 3779 0 11:16 pts/0 00:00:00 grep mak root 31885 24048 0 11:16 pts/0 00:00:00 /bin/bash /usr/bin/ makewhatis -w === not strange enough it seems --- i could not reproduce the zcat errors which i can also see in the weekly mails. it could also have sthg to do with the filedescriptors opened by the process: === # ls -al /proc/12611/fd/ total 5 dr-x------ 2 root root 0 Feb 28 11:21 . dr-xr-xr-x 3 root root 0 Feb 28 11:21 .. lrwx------ 1 root root 64 Feb 28 11:21 0 -> /dev/pts/0 l-wx------ 1 root root 64 Feb 28 11:21 1 -> /tmp/whatis.j12613 lrwx------ 1 root root 64 Feb 28 11:21 10 -> /dev/pts/0 lrwx------ 1 root root 64 Feb 28 11:21 2 -> /dev/pts/0 lr-x------ 1 root root 64 Feb 28 11:21 255 -> /usr/bin/makewhatis === not really sure what cron does... but then it would happen every day i'd geuess. Strange, but I haven't seen this bug for a couple of weeks now, and I've been watching pretty closely for it. Perhaps it got fixed by some other update? I do not think so. I am seeing that regularly on various machines although I still have no idea what is really the cause and I never managed to reproduce it from a command line. A recent comment on bug #64836 seems to indicate that the matter is open there too despite of a formally CLOSED status. I guess that I will add on the top of 00-makewhatis a test like this running=$(pgrep -f makewhatis); [ "$running" ] && exit 0 to see if a suggestion from a comment #6 really pinpoints the case. The only real difference between /etc/cron.daily/00-makewhatis.cron and /etc/cron.weekly/00-makewhatis.cron is that the former is using 'makewhatis -u -w' and the later 'makewhatis -w'. Why /etc/cron.weekly/00-makewhatis.cron is needed at all? Er.... make the above "running=$(pgrep makewhatis)" or we will pick up 00-makewhatis too. This will "fix" the problem but not exactly in a desired way. :-) Me too on lots of up to date FC3 machines. Could somebody explain to me why /etc/cron.weekly/00-makewhatis.cron is needed at all. It looks to me more and more that this is what is really causing the problem in conjunction with /etc/cron.daily/00-makewhatis.cron. If there are indeed different options needed from time to time then a construct like [ "$(date +%u)" = 7 ] && opts="some opts" || opts="some other ones" should do regardless of a locale. A proposition to those who are hit by this. If you will turn off /etc/cron.weekly/00-makewhatis.cron ('exit 0' as the first executable line will do nicely) then what happens? An update: I still swear that I have not received any indication this bug still exists on my main system (for several weeks now), even though it was doing it before that. Besides reboots and normal yum updates, I have not done anything to try to fix the bug. However, 3 of the servers I administer just sent me emails showing the bug. They all have the time of Sunday 4:22am and it says from cron.weekly, so I think you guys are on the right track. I just wish I could figure out what I've done differently on my own system! I'm going to tweak 00-makewhatis.cron on one of these buggy systems to see what happens. I'll report back next week. (note to self: DufDen) > I still swear that I have not received any indication this > bug still exists on my main system ... That would support an idea that there is not so subtle race there. Where and why is somewhat a mystery as there seem to be enough of a time gap between "daily" and "weekly" runs and [ -f $LOCKFILE ] && exit 0 is supposed to prevent exactly that. For some reasons does not seem to be effective. I turned off /etc/cron.weekly/00-makewhatis.cron on these few machines I can watch, as described in a comment #11, and so far I did not observe any recurrence of the problem. Not that surprisingly timestamps on /var/cache/man/whatis are current. Hello, I tried to reproduce this bug, but I was unsuccesful. Can somebody write versions of gzip, man and vixie-cron packages (and anacron package if it is installed and used), which are on some system which makes this bug. /etc/cron.weekly/00-makewhatis.cron is necessary file. /etc/cron.daily/00-makewhatis.cron use command "makewhatis -u -w" which registers those changes which happened last day (there is fault in manwhatis man-page it will be repaired soon). If your computer is not running one day in the right time, cron.daily will not be executed and whatis database will not be complete. There were bug in makewhatis (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140207). If you use option -u this command update only changes which are less then 1 minute old. If you use older version of man package your daily makewhatis update is probably wrong (incomplete) so weekly update is necessary. This problem is fixed in man-1.5p-2. Ivana Varekova On a system that still shows the bug: #rpm -q gzip man vixie-cron gzip-1.3.3-13 man-1.5o1-7 vixie-cron-4.1-24_FC3 anacron-2.3-32 On my system that used to but no longer shows the bug: #rpm -q gzip man vixie-cron gzip-1.3.3-13 man-1.5o1-7 vixie-cron-4.1-24_FC3 anacron-2.3-32 As you can see, they are the same packages. I'll see this Sunday what happens with my test computer where I modified the cron.weekly script to exit without doing anything. Trevor, thank you for your comment. I can't reproduce this bug, but I think the attached makewhatis should be correct. Can anybody test it? Thank you. Ivana Varekova Created attachment 113083 [details]
Changed makewhatis script.
I'll test it out on the last couple machines that are showing the bug. FYI, the test machine where I edited the cron.weekly script to exit did not show the bug symptoms after the edit. I'm still seeing this error too, so I can test it as well. I have not modified any cron scripts so I should be able to provide a "clean" system test. I now see this nightly instead of weekly. /etc/cron.daily/00-makewhatis.cron: zcat: stdout: Broken pipe zcat: stdout: Broken pipe zcat: stdout: Broken pipe Hm. it appears that the only change between what is released and 'makewhatis' from a comment #17 which could be responsible for what is described in comment #20 is that one: - if ${cat} ${x} | iconv -f utf-8 -t utf-8 -o /dev/null 2>/dev/null + if (${cat} ${x} | iconv -f utf-8 -t utf-8 -o /dev/null) 2>/dev/null It is a semantic change and cron seems to be doing something unexpected to stdio. But I do not understand why it would have such effect. I should add that from the time I turned off makewhatis in cron.weekly I did not see "zcat: stdout: Broken pipe" on any machine where I did that and with the current /usr/bin/makewhatis Hello Steve, which man version do you use? This problem may be caused by new man version. (In older versions (man-1.5p-1 and older) there was bug 140207. It causes that cron.daily (it uses command makewhatis -u -w) updates whatis database with that man-pages which was "really" new. In most cases this command does nothing. In fixed version this command updates database with one day old man pages. So it should produce the same error as weekly update.) If you update man makewhatis script was rewriten and you don't use the version from comment 17. Can you please write which man version do you use and wheather you use makewhatis script from comment 17. Thank you. Ivana Varekova Hello Michal, which man version do you use? If you use man-1.5p-1 or older version your whatis database is not updated. If you turned off makewhatis in cron.weekly whatis database is updated only with cron.daily and this uses makewhatis -u -w which contains bug (see comment 23 or bug 140207). Ivana I use man-1.5o1-7 should be the latest FC3 errata version. My system is pretty much unmodified. Yes, I am using the new makewhatis from comment 17. > Hello Michal, which man version do you use?
I also have installed man-1.5o1-7, i.e. current for FC3. As for
/var/cache/man/whatis not getting updated I indeed did not check if
putting new manpages into a fray will get reflected there. OTOH timestamps
on this file do change every morning and its content, which is a plain text,
seem to look quite sane. Also responses from 'man -k' are what you would
expect.
How about instead of a clearly racy arrangement of /etc/cron.daily/00-makewhatis.cron and /etc/cron.weekly/00-makewhatis.cron, which assumes that some lock file will be created in time so these scripts will not stomp on each other, to have only one script in /etc/cron.daily/ but which does something like that: #!/bin/bash [ 7 = "$(date +%u)" ] && mopts="-w" || mopts="-u -w" .... makewhatis $mopts exit 0 Regardless of other possible issues this should clear at least this mess. Comment #17 didn't fix it on one of the boxes I administer (I double-checked the new script was in place). However, on another test box it did seem to fix it (this week, at least). Strange, but my personal workstation started giving me the errors again, even though it hadn't been for a while. The only difference I can think of is I ran a bunch of updates mid-week, including a new glibc and kernel, though I have yet to reboot (way too many windows open). Admittedly, upgrading glibc and not rebooting isn't the brightest thing in the world, but the info may help point to the root cause of this error. Michal, in command makewhatis -w -u change your /var/cache/man/whatis file, but this command add only information about that man pages which was changed last second (this command is executed daily). So timestamp is increased, file looks sane, but there is missing information about new man-pages. See comment #27. On a machine where I left cron setup intact I was hit again by "thousands of broken pipes". As this is "on again, off again" this really looks like a race and AFAIK nobody reported it when only one of these cron scripts is operational. Closing a race window looks like a no-brainer and changes from comment #17 do not really have much to do with that. 'man makewhatis' says: -u Update database with new pages. -w Use manpath obtained from 'man --path' which to me does not make really clear a difference between '-w' and '-u -w' (possible bugs notwithstanding). Created attachment 113779 [details]
Changed makewhatis script.
I still can't reproduce this problem. I need some more information about
makewhatis behaviour. Can anybody who detect this problem exchange man's
makewhatis file with attached makewhatis file. (This file produce more detailed
output.) And attach (or send me) cron output error message.
Thank you very much.
Ivana Varekova
I'll test it. The bug always occurs on Sunday so I'll report back then. BTW, diff against your new script shows that it's not much different. Trevor, thank you. In the new makewhatis there is only one new command which display more information about script behavior. Ivana Varekova A script from an attachment to comment #31 has a bug. Namely it does '${cat} ${x} --quiet' where ${cat} can be either zcat, bzcat or cat; at least for now. The problem is that cat does not have an option --quiet. This can be fixed by putting this option, and/or any other desired option which makes sense for a given utility, into a 'cat=...' assignment. Like that: if [ ${x%.gz} != ${x} ] then cat="zcat --quiet" elif [ ${x%.bz2} != ${x} ] then cat="bzcat --quiet" else cat=cat fi if ${cat} ${x} | ..... Created attachment 113822 [details]
corrected script
Michal, you are right, I forgot this change. There should not be --quiet. Thank
you. Here is the correct script.
Trevor, can you please use this script or delete --quite. Thank you very much.
Sorry for my error.
Ivana Varekova
On one of my machines I replaced 'makewhatis -u -w' by 'makewhatis -w' in /etc/cron.daily/00-makewhatis.cron. The effect was that today I found in mail from cron "zcat: stdout: Broken pipe" repeated 74 times. Of course I cannot reproduce that feat if I am running that directly from a command line instead of cron. At least 74 is better than the original 3560. :-) It appears in any case that whatever that is it is not a race as I originally suspected; or at least not in this place. Also for those who are trying to hunt that down it seems to give a way to try to reproduce that more often than once per week. Trevor, can you send me your last cron.weekly message please? Ivana Varekova Hi Michal,
can you please add command:
set -xET
to your makewhatis script (as a second line) (see attachment 113822 [details], you can add
this command or use script 113822) and send me the part of your cron.daily
report concerning makewhatis.
(set command only print more information about makewhatis run).
Ivana Varekova
Created attachment 113924 [details]
Debug output from makewhatis
This shows the zcat: stdout: Broken pipe error.
Sorry, I updated my script but forgot to chmod 755 so the cron execution failed. I've 755'd it now but will have to wait a week for the results. Orion, can you attach your man-pages: /usr/share/man/man1/ddate.1.gz and /usr/share/man/man1/perluts.1.gz. Thank you. Ivana Varekova Created attachment 113962 [details]
/usr/share/man/man1/ddate.1.gz
Created attachment 113963 [details]
/usr/share/man/man1/perluts.1.gz
Created attachment 113975 [details] list of files which triggered "Broken pipe" on a test run I put in my crontab '*/10 * * * * /etc/cron.weekly/00-makewhatis.cron' and 'set -xET' in /usr/bin/makewhatis. In roughly fifteen minutes after that I had two traces, each 74M in size from these runs after which I turn that crontab entry off. :-) The third trace with 'set -xET' I got from a regular overnight run where I used 'makewhatis -w' in /etc/cron.daily/00-makewhatis.cron instead of '-u -w'. Each of those runs reported 'zcat: stdout: Broken pipe' but a number of occurences were different each time - 72, 76 and 73 respectively. All traces, outside of size, are very similar to what was already attached to comment #39 and this 'Broken pipe' always happens in the same place. A list manpages from the first run _after_ which this "Broken pipe" on an awk program (apparently) was reported is attached. Despite of count differences this list is quite stable. Here are differences in a list from the first run and the second --- list1 2005-05-03 09:38:28.000000000 -0600 +++ list2 2005-05-03 09:38:36.000000000 -0600 @@ -17,6 +17,7 @@ /usr/share/man/man1/zshcompsys.1.gz /usr/share/man/man1/perlport.1.gz /usr/share/man/man1/gpg.1.gz +/usr/share/man/man1/perlcall.1.gz /usr/share/man/man1/perlmodlib.1.gz /usr/share/man/man1/perlos2.1.gz /usr/share/man/man1/screen.1.gz @@ -25,6 +26,7 @@ /usr/share/man/man1/tcsh.1.gz /usr/share/man/man1/groffer.1.gz /usr/share/man/man1/cdrecord.1.gz +/usr/share/man/man1/curl.1.gz /usr/share/man/man1/cvs.1.gz /usr/share/man/man1/perltoc.1.gz /usr/share/man/man1/perlfunc.1.gz @@ -48,6 +50,7 @@ /usr/share/man/man3/Math::BigInt.3pm.gz /usr/share/man/man3/DBI::DBD.3pm.gz /usr/share/man/man3/Config.3pm.gz +/usr/share/man/man3/Tcl_Eof.3.gz /usr/share/man/man4/ethereal-filter.4.gz /usr/share/man/man5/terminfo.5.gz /usr/share/man/man5/muttrc.5.gz @@ -70,3 +73,4 @@ /usr/share/man/manl/dhsein.l.gz /usr/share/man/manl/dhsein.l.gz /usr/share/man/manl/dhsein.l.gz +/usr/share/man/manl/dhsein.l.gz and here the same between the first and the third --- list1 2005-05-03 09:38:28.000000000 -0600 +++ list3 2005-05-03 09:38:43.000000000 -0600 @@ -17,6 +17,7 @@ /usr/share/man/man1/zshcompsys.1.gz /usr/share/man/man1/perlport.1.gz /usr/share/man/man1/gpg.1.gz +/usr/share/man/man1/perlcall.1.gz /usr/share/man/man1/perlmodlib.1.gz /usr/share/man/man1/perlos2.1.gz /usr/share/man/man1/screen.1.gz @@ -25,6 +26,7 @@ /usr/share/man/man1/tcsh.1.gz /usr/share/man/man1/groffer.1.gz /usr/share/man/man1/cdrecord.1.gz +/usr/share/man/man1/zshcontrib.1.gz /usr/share/man/man1/cvs.1.gz /usr/share/man/man1/perltoc.1.gz /usr/share/man/man1/perlfunc.1.gz @@ -53,10 +55,8 @@ /usr/share/man/man5/muttrc.5.gz /usr/share/man/man5/smb.conf.5.gz /usr/share/man/man7/groff_mdoc.7.gz -/usr/share/man/man7/mdoc.samples.7.gz /usr/share/man/man7/groff_diff.7.gz /usr/share/man/man8/pppd.8.gz -/usr/share/man/man8/smartd.8.gz /usr/share/man/man8/tcpdump.8.gz /usr/share/man/man8/lsof.8.gz /usr/share/man/man8/mkisofs.8.gz @@ -70,3 +70,4 @@ /usr/share/man/manl/dhsein.l.gz /usr/share/man/manl/dhsein.l.gz /usr/share/man/manl/dhsein.l.gz +/usr/share/man/manl/dhsein.l.gz Yes, this 'dhsein.l.gz' indeed shows up in traces multiple times. I case somebody does a similar traces here is a filter which gets file names from an output with '-x' is shell set: #!/usr/bin/perl while (<>) { if (m{zcat: stdout: Broken pipe}) { $saved =~ s{^\+ echo }{}; print $saved; } $saved = $_ if m{echo /usr/share/man/man}; } I still do not know what is the culprit for the observed trouble but searching for it I looked closer at /usr/bin/makewhatis and it includes a rather opaque and baroque piece of awk which repeats work done already few lines earlier, uses undocumented and therefore not guaranteed to work awk features, and is trying hard to parse various variations of manpages which is bound to screw up one day no matter how big "conditional mess" you are going to heap over there in time. OTOH there is already a program which is designed to deal with manpages. It is called 'man'. AFAICS that stuff in question really tries hard to get a content of a "NAME" section and print it. The following piece of shell and awk deals with this task: for m in $@ ; do man -- $m | awk ' $1 ~ /^XXX/ {next;} NF > 0 && flag == 1 { print; while (getline > 0) { if (NF == 0) { exit; } print; } } NF == 1 && $1 ~ /^[[:alpha:]]/ {flag += 1;} ' done 2>/dev/null You can feed here as an argument a man page compressed in any way you want or uncompressed. A language does not matter. If 'man' can handle that so can this script (on an assumption that this is really a manpage). This is not a complete solution. One needs to handle 'cat' pages too but this is basically the same thing only you do not need to invoke 'man'. Also mulitiple lines has to be correctly folded, i.e. paying an attention if the previous line was terminated with a hyphen or not, and pages which result in a "multi-name" output like uri, url, urn - uniform resource identifier (URI), including a URL or URN handled properly. That is quite simple. Beyond "regular manpages" I run the above also on a collection of manpages in Polish, Russian and on an output of 'find /usr/share/man/pt_BR/ -type f', as this gives a few files on my box, and so far it works as expected. And one more thing. Guessing a file compression type by looking at its suffix is broken beyond words. There is a 'file' utility for such checks. Here is an more workable version of a script from the previous comment. A variable 'handles' can be set to 'man' for troff sources man pages, regardless if compressed or not, or to 'cat', 'zcat' or 'bzcat' for "cat" pages depending on how they are compressed. #! /bin/sh handler=man for m in $@ ; do $handler -- $m | col -b | awk ' function fold_lines(add) { sub(/^[[:blank:]]+/, "", add); if ( sub(/-$/, "", line) > 0) { line = line add; } else { line = line " " add; } } $1 ~ /^XXX/ {next;} NF > 0 && n == 1 { line = $0; if (match(line, "[[:blank:]]-[[:blank:]]") == 0) { exit; } sub(/^[[:blank:]]+/, "", line); while (getline > 0) { if (NF == 0) { gsub(/[[:blank:]]+/, " ", line); print line; exit; } fold_lines($0); } } NF == 1 && $1 ~ /^[[:alpha:]]/ {n += 1;} ' done 2>/dev/null Created attachment 114000 [details]
The latest makewhatis script.
This version should fix this problem. In this version there is use command set
and it produce quite large output.
Can anybody test this version of makewhatis script and attache or send me cron
output error message and attached file /var/cache/man/whatis.
Thank you.
Ivana Varekova
Michal, can you test the latest makewhatis script? Thank you. Ivana Varekova > Michal, can you test the latest makewhatis script? Yes, I can. On two runs with this version, ten minutes apart, 'zcat: stdout: Broken pipe' 74 times on the first run and and 76 times on the second one. In a comparison with a list posted in a comment #44 here are new files which showed with "Broken pipe" in results: /usr/share/man/man1/perlcall.1.gz /usr/share/man/man1/perllocale.1.gz /usr/share/man/man3/DBI::FAQ.3pm.gz (only the second time) while /usr/share/man/man1/perlre.1.gz on one run went through smoothly and showed up "Broken pipe" another time. Looking again at traces it looks possible that 'makewhatis' triggers an obscure bug in 'awk'. This is only a hypothesis and what that bug may be I have no idea. An overflow somewhere? Finishing what I put in comment #46 into a full replacement of a relevant /usr/bin/makewhatis fragment seems to be rather simple and quite short job. Created attachment 114047 [details]
another variant of makewhatis
I replaced /usr/bin/makewhatis by an attached script and I run with
'makewhatis -w' from cron five times in a row. I did not see a single
"zcat: stdout: Broken pipe".
This script actually creates quite a bit bigger /var/cache/man/whatis
file then the original one but 'man -k' works as expected. It may also
have some bugs. I really only tried to check if "Broken pipe" will vanish.
AFAICS it did.
A note: instead of 'mkw_line man -- $(find ${d}/man$i -type f)' one
can do
find ${d}/man$i -type f | while read arg ; do
mkw_line man -- "$arg"
done
but the first way is likely faster and my bash did not complain about very
long command lines so I did not bother.
I am not sure if two spaces which precede '-' in results was really intended
or just an artifact of how the original was written. I left it the same way.
Created attachment 114131 [details]
redirected output from makewhatis in cron.daily
Not sure if you guys still need this, but here's the complete output from a run
of makewhatis -w >/tmp/makewhatis-daily-`date +"%Y-%m-%d"` 2>&1 in cron.daily
I think Michal is correct in that all that awk should be replaced, or we need
to bring in an awk expert to figure out what is going on.
Nice! "Broken pipe" 1431 times. :-) You may actually kill that noise, and I am pretty sure by now that this is really only a noise caused by buffering and possibly lost signals when running from cron, by replacing in /usr/bin/makewhatis a line ' pages=$pages section=$section verbose=$verbose curdir=$curdir with ' pages=$pages section=$section verbose=$verbose curdir=$curdir 2>/dev/null Also 'makewhatis -w >/dev/null 2>&1', and similar, in both scripts /etc/cron.*/00-makewhatis.cron should have the same effect. Not that this changes very much what I think about the current state of that script. Every time I look closer I am finding something new. The last one happens to be this: "AWK=/usr/bin/gawk" but 'gawk' is really located in /usr/bin. Luckily we have also a symlink /usr/bin/gawk. Created attachment 114157 [details]
Output from updated makewhatis
Still seeing lots of zcat: stdout: Broken pipe
Created attachment 114161 [details]
/var/cache/man/whatis
From affected machine.
Created attachment 114201 [details]
Used patch.
I build new version (man-1.5p-5), this version should not produce error message
"Broken pipe". I can't test this problem, but Michal helped with it (thank you
very much). I think there should not be any other problem. Thank you for your
help and your tests.
Please update your man package and reopen this bug, if you find any problem.
Ivana Varekova
Ivana and Michal, I don't think redirecting ALL of gawk's stderr output to /dev/null is such a good idea. It kills any legitimate error reporting from gawk, including the -v output of the list of files added. If the real problem is the broken pipe error messages from zcat or bzcat, why not just hit the stderr output of these two commands, right within the pipe_cmd command string, and leave the rest of gawk's stderr output alone? As for the cause of the problem, I think the race condition between the weekly and daily scripts was a red herring. I'm getting the broken pipe messages on one of my two FC3 boxes, and the daily script is clearly ending well before the weekly script begins its run on that box. I can also reproduce the problem from an interactive shell, simply by typing "trap '' PIPE" followed by "makewhatis -w". Why does this happen? Because when the gawk script sets the "done" flag, it stops reading its input from the pipe, even though zcat is still stuffing bytes into that pipe. If zcat is still writing to the pipe after gawk closes its end of it (when it exits), zcat will either be killed with a SIGPIPE, or if that signal is ignored (via the trap command or via cron) it will get a write error and report it. Why doesn't this happen consistently? I think it's a timing issue. If zcat runs through the file faster than gawk does, it will go away quietly. I suspect that on faster processors, though, gawk will have less trouble keeping up with zcat, and if the file is large enough, zcat won't be done writing it to the pipe before gawk exits. So it's a race between zcat and gawk, not the daily and weekly scripts. Created attachment 114211 [details]
Updated patch to silence broken pipe messages
Here's my replacement for Ivana's latest patch, which redirects stderr to
/dev/null only at the points where it's really necessary.
> Ivana and Michal, I don't think redirecting ALL of gawk's stderr output to > /dev/null is such a good idea. Well, at least to tide you through for now it is not bad. > As for the cause of the problem, I think the race condition between the weekly > and daily scripts was a red herring. See comment #36 from 2005-04-29. Actually once I looked closer the reason why there are no such messages when you are running 'makewhatis -u -w', like from 'cron.daily', is that with the current system this command is nothing more than an elaborate copy of whatis database to /tmp and back. At least nearly always. I actually had the same idea as in comment #58 and even tested it with multiple runs from cron. In trial runs I did not get "Broken pipe" messages on a machine where I can reproduce these reliably with the original. This still does not change my opinion about /usr/bin/makewhatis. I am now aware of other issues in this script and I would be really surprised if that what I found would be all. Hello Gilles, I can't test your patch but you are right the solvation which is in makewhatis is rather hard. If anybody test this patch I will change makewhatis patch and use this solution. Thank you. Ivana > If anybody test this patch ... I wrote in comment #59 that I already tested such solution and it works for me. A confirmation from somebody else would be nice. Well, for what it's worth, I did test the patch I posted in comment #58, but I guess it would be nice to get confirmation from a third party. Ivana, are you not able to reproduce the error on your system with the trap command I suggested? I.e.: trap "" PIPE makewhatis -w I think as long as you have a reasonably fast processor (at least 1.8 GHz P4) and at least 256 MB of RAM, you're likely to run into this at least occasionally on large man pages. I can test these patches but the attachments here are getting way too confusing. Can someone post a *patched* makewhatis script so I can just plunk that in? Created attachment 114294 [details] The last version of makewhatis. Hello, this is the last version of makewhatis script (with changes form comment 58). I think this changes should be right but I can't reproduce it and it will be nice if any other person test it. Trevor can you test it? Gilles I'm sorry but I still can't reproduce. Thank you for your help and test. Michal thant you for your help and tests too. Ivana Attachment from comment 64 has not produced any errors on my test system (the one that usually shows errors) for the past 3+ nights. It seems to have fixed the problem. Thank you for your excellent help. This solution is in the last version man-1.5p-6. If you find any problem, please report it. Ivana Varekova I have not seen this error since May 15 on any boxes so it appears to have been fixed. I got hit by this tonight on FC3 w/ all updates; specifically, man-1.5o1-7. I can reproduce this as per comment 62. If the version in rawhide really does fix this, could you push it out as an update? Ivana, your patch, although innocuous is actually missing the point. the bug isn't in makewhatis, but in rpm! the default behaviour for SIGPIPE is to terminate the process, with no noise. the process will have to install a signal handler itself to override it. not many programs need to do this. however, in the environment vixie-cron _sometimes_ sets up for its children, SIGPIPE is set to ignore the signal (SIG_IGN). "ignore" sounds good, but the signal will interrupt the call to write() so that write returns -1, and errno will be set to EPIPE, or "Broken pipe". sound familiar? the cause for this is as I mentioned RPM. the problem occurs after an update of the vixie-cron package by up2date or rpm. in ./rpmdb/rpmdb.c, SIGPIPE is set to be ignored before opening the RPM database. it is not reset until the database has been closed. in the meantime, some packages will spawn scripts to do pre- or post-install, and these will inherit the SIGPIPE setting. in our case, this is crond, which in turn is passing it on to its children. hope this helps! Paul could you please write your judgement on this problem (especially on previous comment). Ivana Varekova Wow, with 'trap "" PIPE' I am able to reproduce this bug too at least... When it in fact is a rpm-Bug, has anybody filed it there? by the way, this also appears on FC4 machines. And on FC4 I noticed it sometimes in the cron.daily makewhatis script, not only in the weekly part. I was wrong in my initial analysis, I think rpm does the right thing and the real culprit is rhnsd. See bug 163483. I also filed a bug against vixie-cron for not restoring the SIGPIPE handler on startup, bug 163484. This is strange. I thought this bug to be solved. I recently ran a full yum update on a box that was fully updated about 2 months ago. I had not gotten this bug on that box in those 2 months. Just tonight I got the bug again. This is the first (of like 14) box I've had this occur on since this bug was "fixed". The only thing weird I can think of is a) I didn't update the kernel (I need to run a 2.6.11 one for now), and b) I didn't reboot yet. I'll reboot the box and see if it recurs. Also strange, it only gave me 50 of the "broken pipe" error lines, when normally (ie: before the bug was "fixed") I see hundreds. Trevor, could you please attach your man version. Thank you. man-1.5o1-7 Oh, reading back it appears this bug is NOT solved in the current FC3/4 errata but is instead still sitting in rawhide? This bug didn't warrant an errata? Strange how I never see this bug anymore (except this rare occasion) on the 20 machines I administer. Did it again this week. 44 "broken pipe" lines of output. Same machine. I'm going to upgrade that machine to kernel 1376 to see if that helps any. None of my other boxes are reporting this error anymore, and all are nearly identical. Trevor, is the problematic machine's man version man1.5o1-7? This version has not patch against this bug. Yes, man-1.5o1-7. Sorry, I thought errata for FC3 had been issued as I don't see the bug on any other machine anymore. I have the same problem on my box. What I want to know is how important is the whatis database? Is it only used by man, apropos and whatis? I'm happy to just use google to get manual pages. Alternatively, can't I just live with the weekly updates and remove the daily ones? Finally, if I desperately need an updated whatis db, why can't I just update it from the command line? if you use Google to find the manual pages, you won't necessarily get a manual page which corresponds to the version you are running on your system. while you may be content with having outdated manual page information on your system, this is not really an option for Red Hat. anyway, I don't understand why it has taken so long to fix the bug, it's still in RHEL 4. all that is needed is to add trap PIPE at the top of the script to reset the SIGPIPE handler. (never mind that the bug should be fixed elsewhere (see comment #69 and comment #73, adding that line is innocuous enough.) OK - I'm sure RH are working on a fix as we scribble away. It's surely bad for their corporate image to have such lame issues hanging around. BUT: is there a temporary quickfix that doesn't get too technical? How about removing /etc/cron.daily/00-makewhatis? This bug report is against fc (which is fixed now), the rhel4 version of this bug is 170402. In 170402 bug report, there are test versions of man package (comment 4) and there is corresponding patch (comment 7) too, so you can only apply this patch if you want. Could you please test this version (the patch is similar as patch which is in fc) and attach your comment (to 170402). Thank you. If you remove /etc/cron.daily/00-makewhatis.cron file your whatis database will not be updated properly (daily update is not sufficient), you can update this database from command line, but it is not so convenient. Daily updates are not necessary, if you don't insist on them. But I think the best fix is to apply patch from comment 7 or to use fixed test versions (comment 4). Why does this started to reapear last week with FC4? Markus, which version of man is installed? vixie-cron-4.1-41.FC4 man-1.5p-4 man-pages-1.67-8 daily yum updates are activated on all machines. man-1.5p-4 is not fixed version. The first fixed version is man-1.5p-6. Could you update your man package (devel version is man-1.6c-1.2 now). If there is any problem in version man-1.5p-6 or newer please reopen this bug. man-1.5p6 seems to be in the updates repository now, some machines already updated to it. So lets wait for the weekly run to see if it fixed the bug. |