Description of problem: Version-Release number of selected component (if applicable): vixie-cron-4.1-82.fc7 How reproducible: Easy if you set up the right conditions Steps to Reproduce: 1. incapacitate sendmail by moving aside /usr/sbin/sendmail, e.g.: # mv /usr/sbin/sendmail /usr/sbin/sendmail.hide 2. run a cron job which is a shell script that produces more output than default stdio buffer. This demonstrates the problem well (also attached): # cat /tmp/cronjob-sigpipe.sh; echo ==== #!/bin/bash mkdir -p /tmp/cronjob-sigpipe cd /tmp/cronjob-sigpipe num=$RANDOM let num=num%20 PID=$$ touch $PID-creating-$num-files loop=0 gubbish="`dd if=/etc/termcap bs=4k count=1 2>/dev/null`" while [ $loop -lt $num ]; do let loop=loop+1 touch $PID-file-$loop echo "$gubbish" done touch $PID-succeeded ==== 3. Start this as a cron job. Do this as a regular user, one who doesn't currently have any cron jobs (it will blow them away): $ echo '* * * * * /tmp/cronjob-sigpipe.sh' | crontab Allow it to run for 20 minutes or so. Actual results: Files appear in /tmp/cronjob-sigpipe, but not as many as it says it's going to make; the terminating "succeeded" file is sometimes missing. Nota bene: the files it's creating are empty. The output that causes the SIGPIPE and cron job failure is going to the job's stdout, to be collected for sending as mail to the job owner. Creating files is a way of recording the job's progress through a channel other than the failing mail channel that one normally uses to observe cron job outcomes. Expected results: All the files the script is supposed to create should be created. Additional info: Of course this is just an example script. It produces a random amount of output, thus demonstrating that scripts which produce more than a certain amount of output suddenly stop running. This is because `crond` pipes the job's stdout/stderr to a `sendmail` process in order to mail the output to the initiating user. It uses an internal function, cron_popen(). This function calls execvp() and assumes success, when in fact it can easily fail if /usr/sbin/sendmail is missing. In that case, what is essentially a poisoned stdio file pointer is created. The cron job blithely continues for a while, even producing output, until the in-memory stdio buffer becomes full. Then stdio tries to flush to the pipe, gets SIGPIPE, and the process(es) of the cron job are killed. There are several workarounds, starting with the most obvious: 1. make sure /usr/sbin/sendmail is not incapacitated 2. use `MAILTO=""' in all crontabs (don't forget to cover both user & /etc/cron.d files) 3. arrange for `crond` to be run with a "-m /some/other/program" flag, specifying something that disposes of the attempted mail output one way or another But these are all workarounds available to the person who has already discovered _why_ his cron jobs sometimes work and sometimes mysteriously die. Before he can use them, he must suffer through the discovery process... Suggested fix: cron_popen() must _notice_ if execvp() fails. It must inform the parent process (I'm not sure what's the best way). The jobs must the behave deterministically. Either they should succeed (while e.g. sending the mail output to /dev/null); or they must fail every time. `crond` could also check, at startup time, whether the binary it intends to use as `sendmail` exists and is executable. Such a startup-time check is only a partial fix (`sendmail` could disappear during `crond`'s uptime), but it affords an opportunity to print a useful error message; it will explain yesterday's peculiar behavior during today's reboot. Even better, check each time it's going to run a job, log a warning in syslog if necessary. On a system which deliberately has no MTA, the admin can disable the warnings by using "-m /bin/fake-sendmail-dump-to-dev-null" Original context: VMware ESX Server 2.5.4, with RHEL2.1-based Console OS, with vixie-cron-4.1-11.EL3. But I am now reproducing exactly the same problem in an FC7 live CD boot (in an ESX VM...) `sendmail` is "incapacitated" on ESX, in that it isn't installed at all. Much the same thing could happen in any sort of small embedded environment. (Ref: VMware PR 144651)
Created attachment 158639 [details] Shell script, to be run as a repeating cron job
Thank you for report. I've added syslog report (warning about possible problem), but I'm thinking about some better fix. Some checking of the sendmail or other mail service could solve this issue.
I'd like to see the text of the "possible problem" syslog warning. For the full fix, remember a system may deliberately omit mailers for enhanced security. My analysis shows that the root cause is crond's cron_popen() not noticing execvp() failure. I recommend fixing by: 1. fix cron_popen() to notice execvp() failure, return failure to its caller. 2. cron_popen()'s caller in cron shouldn't exit on failure, just syslog a message [including errno or other specifics of _why_ it failed], then run the command without logging -- as if `mailto' was empty. This makes cron jobs on my hypothetical no-mailer system somewhat noisy. I think that's acceptable: system designer/operator who wants to avoid the noise can rebuild cron without it, force mailto="" for all cron jobs, or supply a dummy /usr/sbin/sendmail. The important thing is that they'll actually _notice_ the issue and be able to deal with it. Which is much much better than having some random subset of cron jobs mysteriously die in mid-operation.
The first problem is solved with message: CRON: Exec of (/usr/lib/sendmail) had failed because: (No such file or directory) The solution of the second problem is in progress.
"has failed" should be "failed". Adding relevant ISC engineers. Ok, not adding them -- apparently can't add arbitrary email addresses. I would like to add Evan Hunt & Paul Vixie.
The fix is complete. I added it in F-8 (updates). I'm not sure, when will be available.
Now is fix also in devel of vixie-cron. If you have any thoughts about it, please let me know.