Bug 247228 - cron jobs fail semi-randomly if sendmail incapacitated
Summary: cron jobs fail semi-randomly if sendmail incapacitated
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: vixie-cron
Version: rawhide
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Marcela Mašláňová
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-06 05:28 UTC by Bela Lubkin
Modified: 2007-11-30 22:12 UTC (History)
0 users

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2007-10-30 08:48:39 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Shell script, to be run as a repeating cron job (315 bytes, text/plain)
2007-07-06 05:28 UTC, Bela Lubkin
no flags Details

Description Bela Lubkin 2007-07-06 05:28:06 UTC
Description of problem:


Version-Release number of selected component (if applicable):

vixie-cron-4.1-82.fc7

How reproducible:

Easy if you set up the right conditions

Steps to Reproduce:
1. incapacitate sendmail by moving aside /usr/sbin/sendmail, e.g.:

    # mv /usr/sbin/sendmail /usr/sbin/sendmail.hide

2. run a cron job which is a shell script that produces more output than
default stdio buffer.  This demonstrates the problem well (also attached):

# cat /tmp/cronjob-sigpipe.sh; echo ====
#!/bin/bash
mkdir -p /tmp/cronjob-sigpipe
cd /tmp/cronjob-sigpipe

num=$RANDOM
let num=num%20

PID=$$

touch $PID-creating-$num-files

loop=0
gubbish="`dd if=/etc/termcap bs=4k count=1 2>/dev/null`"
while [ $loop -lt $num ]; do
  let loop=loop+1
  touch $PID-file-$loop
  echo "$gubbish"
done

touch $PID-succeeded
====

3. Start this as a cron job.  Do this as a regular user, one who doesn't
currently have any cron jobs (it will blow them away):

  $ echo '* * * * * /tmp/cronjob-sigpipe.sh' | crontab

Allow it to run for 20 minutes or so.

Actual results:

Files appear in /tmp/cronjob-sigpipe, but not as many as it says it's
going to make; the terminating "succeeded" file is sometimes missing.

Nota bene: the files it's creating are empty.  The output that causes
the SIGPIPE and cron job failure is going to the job's stdout, to be
collected for sending as mail to the job owner.  Creating files is a
way of recording the job's progress through a channel other than the
failing mail channel that one normally uses to observe cron job
outcomes.

Expected results:

All the files the script is supposed to create should be created.

Additional info:

Of course this is just an example script.  It produces a random amount
of output, thus demonstrating that scripts which produce more than a
certain amount of output suddenly stop running.

This is because `crond` pipes the job's stdout/stderr to a `sendmail`
process in order to mail the output to the initiating user.  It uses an
internal function, cron_popen().  This function calls execvp() and assumes
success, when in fact it can easily fail if /usr/sbin/sendmail is missing.
In that case, what is essentially a poisoned stdio file pointer is created.
The cron job blithely continues for a while, even producing output, until
the in-memory stdio buffer becomes full.  Then stdio tries to flush to the
pipe, gets SIGPIPE, and the process(es) of the cron job are killed.

There are several workarounds, starting with the most obvious:

1. make sure /usr/sbin/sendmail is not incapacitated
2. use `MAILTO=""' in all crontabs (don't forget to cover both user &
   /etc/cron.d files)
3. arrange for `crond` to be run with a "-m /some/other/program" flag,
   specifying something that disposes of the attempted mail output one
   way or another

But these are all workarounds available to the person who has already
discovered _why_ his cron jobs sometimes work and sometimes mysteriously
die.  Before he can use them, he must suffer through the discovery
process...

Suggested fix: cron_popen() must _notice_ if execvp() fails.  It must
inform the parent process (I'm not sure what's the best way).  The jobs
must the behave deterministically.  Either they should succeed (while
e.g. sending the mail output to /dev/null); or they must fail every time.
`crond` could also check, at startup time, whether the binary it intends
to use as `sendmail` exists and is executable.  Such a startup-time
check is only a partial fix (`sendmail` could disappear during `crond`'s
uptime), but it affords an opportunity to print a useful error message;
it will explain yesterday's peculiar behavior during today's reboot.  Even
better, check each time it's going to run a job, log a warning in syslog
if necessary.  On a system which deliberately has no MTA, the admin can
disable the warnings by using "-m /bin/fake-sendmail-dump-to-dev-null"

Original context: VMware ESX Server 2.5.4, with RHEL2.1-based Console OS,
with vixie-cron-4.1-11.EL3.  But I am now reproducing exactly the same
problem in an FC7 live CD boot (in an ESX VM...)  `sendmail` is
"incapacitated" on ESX, in that it isn't installed at all.  Much the same
thing could happen in any sort of small embedded environment.  (Ref:
VMware PR 144651)

Comment 1 Bela Lubkin 2007-07-06 05:28:06 UTC
Created attachment 158639 [details]
Shell script, to be run as a repeating cron job

Comment 2 Marcela Mašláňová 2007-09-17 09:09:34 UTC
Thank you for report.

I've added syslog report (warning about possible problem), but I'm thinking
about some better fix. Some checking of the sendmail or other mail service could
solve this issue.

Comment 3 Bela Lubkin 2007-09-17 18:44:47 UTC
I'd like to see the text of the "possible problem" syslog warning.

For the full fix, remember a system may deliberately omit mailers for enhanced 
security.

My analysis shows that the root cause is crond's cron_popen() not noticing 
execvp() failure.  I recommend fixing by:

1. fix cron_popen() to notice execvp() failure, return failure to its caller.

2. cron_popen()'s caller in cron shouldn't exit on failure, just syslog a 
message [including errno or other specifics of _why_ it failed], then run the 
command without logging -- as if `mailto' was empty.

This makes cron jobs on my hypothetical no-mailer system somewhat noisy.  I 
think that's acceptable: system designer/operator who wants to avoid the noise 
can rebuild cron without it, force mailto="" for all cron jobs, or supply a 
dummy /usr/sbin/sendmail.

The important thing is that they'll actually _notice_ the issue and be able to 
deal with it.  Which is much much better than having some random subset of cron 
jobs mysteriously die in mid-operation.

Comment 4 Marcela Mašláňová 2007-10-26 14:49:54 UTC
The first problem is solved with message:
CRON: Exec of (/usr/lib/sendmail) had failed because: (No such file or directory)
The solution of the second problem is in progress.

Comment 5 Bela Lubkin 2007-10-26 23:07:24 UTC
"has failed" should be "failed".

Adding relevant ISC engineers.

Ok, not adding them -- apparently can't add arbitrary email addresses.
I would like to add Evan Hunt & Paul Vixie.

Comment 6 Marcela Mašláňová 2007-10-29 15:26:33 UTC
The fix is complete. I added it in F-8 (updates). I'm not sure, when will be
available. 


Comment 7 Marcela Mašláňová 2007-10-30 08:48:39 UTC
Now is fix also in devel of vixie-cron. If you have any thoughts about it,
please let me know.


Note You need to log in before you can comment on or make changes to this bug.