Bug 178436
| Summary: | network service interruption can cause initgroups() to delay cron job execution by more than one minute | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | Jason Vas Dias <jvdias> |
| Component: | vixie-cron | Assignee: | Jason Vas Dias <jvdias> |
| Status: | CLOSED ERRATA | QA Contact: | Brock Organ <borgan> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.0 | CC: | j.f.wheeler |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHSA-2006-0117 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2006-03-15 15:31:34 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 168424 | ||
|
Description
Jason Vas Dias
2006-01-20 15:51:25 UTC
fixed with vixie-cron-4.1-10.EL3 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0117.html The fix for this bug does not take account of situations where the crontab entry is like this: */10 * * * * root /usr/local/sbin/CSFsure >/dev/null With such entries the cron job is still occasionally suppressed; here is output of the command "grep delay /var/log/cron": Mar 28 07:51:34 pat crond[2272]: (root) error: Job execution of per-minute job scheduled for 07:50 delayed into subsequent minute 07:51. Skipping job run. Mar 29 10:33:10 pat crond[22412]: (root) error: Job execution of per-minute job scheduled for 10:30 delayed into subsequent minute 10:33. Skipping job run. Mar 29 10:33:10 pat crond[22413]: (root) error: Job execution of per-minute job scheduled for 10:30 delayed into subsequent minute 10:33. Skipping job run. I have looked at the source and think that the problem lies in the test in routine do.command.c which works for all cron commands with * in the crontab minutes field and does not exclude those that have a step count as well. RE: Comment #8 : Sorry that some of your cron jobs were skipped. But, the rationale was, jobs that are scheduled to be run at any specific minute should not be run if execution is delayed by eg. network authentication until the next minute. Many programs, eg. logwatch / logrotate, have problems if they are meant to be run in distinct minutes, but are run in the same minute . I've just re-tested 'skip minute' jobs (ie. '*/2...' '*/10...'), and they all ran fine for me over a 48 hour period - none were skipped. Your skipped jobs were delayed by network authentication for over one minute (and in one case 3 minutes) . The time interval measurement in the code is quite precise - it measures the time between authentication start and the job being run, and for jobs scheduled to be run at a specific minute, if the run cannot occur until the next minute, the run is skipped. There is nothing else in the code that could cause such a delay - only a network authentication method (eg. NIS, LDAP, Kerberos) experiencing a timeout could do this. If your skip had been '*/2', then the job that was delayed by 3 minutes could have been run in the same minute as a subsequent job run - this can cause real problems for some cron jobs. To make the code second guess if a 'skip minute' job will conflict or not would be horrendously complex and wrong - if jobs are to be run in a specific minute, they must be run in that minute or not at all. I am working on a better solution, to enhance cron to provide a class of 'exclusive' jobs that cannot be run if a previous instance is running; then we could remove the potential to skip per-minute jobs if they are delayed past the minute for which they were scheduled. I think it is better for cron not to run a job scheduled to be run in a specific minute if the job cannot be run in that minute . To avoid the potential for network timeouts to cause cron jobs to be skipped, start crond when /etc/nsswitch.conf contains only : 'passwd: files group: files ' Then cron will use only the 'files' lookup method for users and groups, even if /etc/nsswitch.conf is later changed to use network lookup methods. Further to your last comment, it seems to me that if it is important only one instance of a cron job is run at any one time, then the cron job itself should have some sort of check to prevent this happening, e.g. a lock file of some sort. We have worked around the problem introduced by your recent change by using syntax like 0-59/5 in the minutes field of the crontab entry rather than */5, but we could have used 0,5,10... etc. This shows that a check to prevent more than one instance of a cron job running is (as you say) very diffcult to generalise. |