Bug 110894 - run-parts misses a spread-over-time mechanism
run-parts misses a spread-over-time mechanism
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: crontabs (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Marcela Mašláňová
Brock Organ
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-11-25 06:46 EST by Peter Bieringer
Modified: 2010-02-22 07:45 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-02-22 07:45:57 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Skip sleep if anacron is running (589 bytes, patch)
2007-04-11 08:29 EDT, Peter Bieringer
no flags Details | Diff
script for creating added delay for cronjobs executed from /etc/crontab (707 bytes, application/octet-stream)
2009-10-23 06:40 EDT, Marcela Mašláňová
no flags Details
/etc/sysconfig/crontabs (346 bytes, application/octet-stream)
2009-10-23 06:42 EDT, Marcela Mašláňová
no flags Details

  None (edit)
Description Peter Bieringer 2003-11-25 06:46:48 EST
Description of problem:
Because /usr/bin/run-parts misses a spread-over-time mechanism, this
can cause a DDoS or at least heavy load on a central mail system if
many hosts reporting to them e.g. by daily "logwatch".

Currently, all systems by default start daily/weekly/monthly crontab
entries at the same time (and because of all systems are time-synced
via NTP).

BTW: there exists also another issue e.g. by using AV software which
places their pattern update script in e.g. /etc/cron.hourly or
/etc/cron.daily. Now all Unix boxes around the world start the pattern
update at the same time...which leads to a DDoS against the update server.


Version-Release number of selected component (if applicable):
crontabs-1.10-5 and below

How reproducible:
Always

Steps to Reproduce:
1.install many systems
2.setup forwarder for root to an account on a central mail system
3.run e.g. spamassassin and AV software on this mail system
4.monitor load on this mail system
5.wait for 04:02 (now on every system, logwatch is starting)
6.take a look on the load of the mail system for e.g. one hour
    

Actual Results:  Heavy load because many e-mails receiving at the same
time

Expected Results:  A time-spread mechanism, which delays the start of
run-parts for some minutes

Additional info:

Here a suggestion:

In case of "/usr/bin/run-parts" is called by cron, start of the jobs
are randomly delayed for 0 to 15 minutes.

--- /usr/bin/run-parts.orig     2003-02-07 22:07:32.000000000 +0100
+++ /usr/bin/run-parts  2003-11-25 12:43:18.000000000 +0100
@@ -2,6 +2,15 @@

 # run-parts - concept taken from Debian

+# Check for caller "crond",
+caller="`ps --no-headers -o fname -p ${PPID}`"
+if [ "$caller" = "crond" ]; then
+       # Sleep random seconds (0-899 = 0-15 minutes) to avoid
flodding e.g. remote mail systems
+       #  in case of many boxes reporting information (e.g. logwatch)
at same time
+       seconds=$[ $RANDOM % 900 ]
+       sleep $seconds
+fi
+
 # keep going when something fails
 set +e
Comment 1 Jason Vas Dias 2005-06-03 12:23:56 EDT
Sorry for the delay in getting around to this bug. 

We are planning to do something about this finally .

Here's some recent discussion on this issue:

On Tue, 2005-05-17 at 22:34, Sean Reifschneider wrote:
> I'm thinking about changing the cron.daily runparts stuff so that instead
> of them being just running at static times after 4am, setting up something
> that would sequentially run the correct runparts, one after another, and
> optionally randomly putting a delay of some time period up front.
> 
> In other words, read a value from the sysconfig for a max number of minutes
> to sleep, then "sleep $[RANDOM%SKEWMINUTES]m", then run the appropriate
> runparts.
> 
> The idea being to stagger the crons.  A rack full of systems can pull an
> extra 5 or 7 amps when all the crons are running at the same time.
> 

REPLY:
On Tue, May 17, 2005 at 11:03:33PM -0400, Jason Vas Dias wrote:

This "randomized stagger" feature is top of my list
for cron enhancements which I hope to be able to
get to within the next few weeks for rawhide / FC5 .

I'd implement it as a new cron job time specification,
format tag, so that /etc/crontab would look like:

# run-parts
~01-15 * * * * root run-parts /etc/cron.hourly
~16-31 4 * * * root run-parts /etc/cron.daily
~32-47 4 * * 0 root run-parts /etc/cron.weekly
~48-59 4 1 * * root run-parts /etc/cron.monthly

so that cron would schedule run-parts at the first number,
of minutes past each hour (1,16,32,48), but when that time
arrives,  cron would choose a random number between 0 and 
the second number (15,31,47,59) and, if not 0,re-schedule 
the job for that number of minutes hence.

Re-scheduled randomly staggered jobs would not be subject to
any further rescheduling until their unstaggered schedule time
arrives. 

On Tue, 2005-05-17 at 23:50, Sean Reifschneider wrote:
I thought about that, and figured that would be pretty easy to implement in
> run-parts:
> 
>    01 * * * * root run-parts -s 15 /etc/cron.hourly
>    16 4 * * * root run-parts -s 15 /etc/cron.daily
>    32 4 * * 0 root run-parts -s 15 /etc/cron.weekly
>    48 4 1 * * root run-parts -s 15 /etc/cron.monthly
> 
> I rejected that because right now the crons run at 1, 2, 22, and 42 minutes
> after the hour.  Presumably this is to try to keep them from overlapping.
> The easy way to ensure that there is no overlap would be, in simple
> pseudo-code:
> 
>    crontab:
>       01 * * * * root stagger-parts
> 
>    stagger-parts:
>       sleep $[RANDOM%30]m
>       run-parts /etc/cron.hourly
>       [ "$ISDAILY" ] && run-parts /etc/cron.daily
>       [ "$ISWEEKLY" ] && run-parts /etc/cron.weekly
>       [ "$ISYEARLY" ] && run-parts /etc/cron.yearly
> 
> 
> Of course, I haven't come up with a good way of determining the ISDAILY,
> etc...  There are ways with flag files saying what the last day/week/month
> it was run, and would get set to true in the event that it's different.
> 
> I think the ~ idea is interesting, could be quite useful in some places,
> but here I think it would be good to be able to explicitly ensure that the
> jobs do not overlap.  Saying "between 1 and 15" and "16 and 30" means that
> one job might run at :15 and the other at :16, which may not give enough
> time for the first to finish, possibly leading to unexpected behavior
> sporadicly when the jobs overlap.
> 
> Maybe that's not a rational fear, but that's why I was looking at a job to
> schedule them.
> 

Yes, this feature would also have to be implemented along with 
a feature that ensures jobs do not overlap ( bug # 144133 ) .

So we are working on these features for a new "enhanced cron" in
FC5 / rawhide .

Comment 2 Peter Bieringer 2006-03-22 07:57:38 EST
I digged into man page and source RPM of vixie-cron-4.1-54.FC5, but didn't found
any news about the "~" feature. Is it implemented in this version?

BTW: for the meantime I've created an alternate solution, which is imho better
because it creates a per system static but different delay - hopefully, the
algorithm is patent-free ;-)
And same delay is used for daily,weekly,monthly, so the original timing distance
in /etc/crontab will be remain.

Note also that such file can be easily distributed in a separate crontab
package, e.g. crontab-delay-$version-$release.rpm


File: /etc/cron.daily/000-delay.cron
#!/bin/bash

# Generate per system static delay for cron.{daily,weekly,monthly}

# (P) & (C) by Peter Bieringer <pb@bieringer.de>

factor=1        # max. ~ 68 minutes
#factor=2       # max. ~ 34 minutes

# Create md5sum of hostname (static over system lifetime)
md5sum="`echo ${HOSTNAME} | /usr/bin/md5sum`"

# Extract the first 3 hexdigits (12 Bit:0-4095)
hexvalue="${md5sum:0:3}"

# Create decimal value
decvalue="`printf "%d" "0x${hexvalue}"`"

# Divide delay by factor
DELAY=$[ $decvalue / $factor ]

sleep $DELAY
exit 0


for weekly and monthly, only a softlink is required:
# find /etc/cron.* -name 000-delay.cron -ls
321272    8 -rwxr-xr-x   1 root     root          504 Mar 22 13:51
/etc/cron.daily/000-delay.cron
321265    0 lrwxrwxrwx   1 root     root           28 Mar 22 13:54
/etc/cron.monthly/000-delay.cron -> ../cron.daily/000-delay.cron
320650    0 lrwxrwxrwx   1 root     root           28 Mar 22 13:54
/etc/cron.weekly/000-delay.cron -> ../cron.daily/000-delay.cron
Comment 3 Peter Bieringer 2006-03-23 10:55:28 EST
I got some time and created RPM packages for that:

Available at:

ftp://ftp.aerasec.de/pub/linux/repository/public/redhat/enterprise/3/i386/
ftp://ftp.aerasec.de/pub/linux/repository/public/redhat/enterprise/4/i386/

$ rpm -qilpv crontabs-runpartsdelay-0.0.1-1.RHEL4.AERAsec.3.noarch.rpm
Name        : crontabs-runpartsdelay       Relocations: (not relocatable)
Version     : 0.0.1                             Vendor: AERAsec Network Services
and Security GmbH <http://www.aerasec.de/>
Release     : 1.RHEL4.AERAsec.3             Build Date: Thu 23 Mar 2006 04:25:38
PM CET
Install Date: (not installed)               Build Host:
rpmbuild-rhel4.muc.aerasec.de
Group       : System Environment/Base       Source RPM:
crontabs-runpartsdelay-0.0.1-1.RHEL4.AERAsec.3.src.rpm
Size        : 1731                             License: GPLv2
Signature   : (none)
Packager    : Peter Bieringer <pbieringer@aerasec.de>
Summary     : Extension for a system specific static delay of daily, weekly and
monthly run-parts entries
Description :
"crontab-runpartsdelay" add files named "000-delay.cron" to daily,
weekly and monthly run-parts directories of "crontabs".
The delay is system specific but static as long as the system name
is unchanged.
Use this e.g. an all of your servers if your central mail system
will become overloaded because all servers send e.g. logwatch
e-mails at the same time.
drwxr-xr-x    2 root    root                0 Mar 23 16:25 /etc
drwxr-xr-x    2 root    root                0 Mar 23 16:25 /etc/cron.daily
-rwxr-xr-x    1 root    root              577 Mar 23 16:25
/etc/cron.daily/000-delay.cron
drwxr-xr-x    2 root    root                0 Mar 23 16:25 /etc/cron.monthly
-rwxr-xr-x    1 root    root              577 Mar 23 16:25
/etc/cron.monthly/000-delay.cron
drwxr-xr-x    2 root    root                0 Mar 23 16:25 /etc/cron.weekly
-rwxr-xr-x    1 root    root              577 Mar 23 16:25
/etc/cron.weekly/000-delay.cron

Comment 4 Peter Bieringer 2007-03-18 15:07:38 EDT
I'm very happy to see that such mechanism was included in
crontabs-1.10-9.fc6 release:

* Do Okt 12 2006 Marcela Maslanova <mmaslano@redhat.com> 1.10-9
- patch (#110894) for delaying more emails in the moment

But I found that in case of anacron does the job for partial active system, this
delay should be skipped.

Any ideas how to detect whether the cronjob was started by anacron?
If there is nothing set by anacron in the environment, a workaround would be to
set a value in /etc/anacrontab (like e.g. ANACRON=1) and check such existence in
the delay script.
Comment 5 Marcela Mašláňová 2007-03-28 09:24:35 EDT
Now I'm checking log files. I'm switching off computer once a week and the
second day run daily jobs in 4:02. So it will need more work. 

Maybe we can check pid number in /var/spool/anacron.
Comment 6 Peter Bieringer 2007-04-11 08:29:08 EDT
Created attachment 152264 [details]
Skip sleep if anacron is running

Perhaps the same mechanism as used in /etc/cron.daily/0anacron can be used:

# Don't run anacron if this script is called by anacron.
if [ ! -e /var/run/anacron.pid ]; then
    anacron -u cron.daily
fi
Comment 7 Marcela Mašláňová 2009-10-23 06:40:55 EDT
Created attachment 365830 [details]
script for creating added delay for cronjobs executed from /etc/crontab
Comment 8 Marcela Mašláňová 2009-10-23 06:42:12 EDT
Created attachment 365831 [details]
/etc/sysconfig/crontabs
Comment 9 Marcela Mašláňová 2009-10-23 06:51:55 EDT
The script above is used in releases of Fedora from 6 to 11. Since Fedora 12 was this functionality included in cronie in different way, which is documented in man pages and Deployment Guide. 

Feature implemented in comment#7 has a down-side. It isn't possible to execute regularly jobs immediately by command `anacron -n`.
Comment 12 Chris Van Tuin 2009-10-30 18:58:20 EDT
*** Bug 532157 has been marked as a duplicate of this bug. ***
Comment 14 Marcela Mašláňová 2010-02-22 07:45:57 EST
The last planned update of RHEL-4 will be focused on performance and security bugs only. This bug has bz#532157 for RHEL-5 release. Please contact your support in case you are interested in this change.

Note You need to log in before you can comment on or make changes to this bug.