Description of problem: Because /usr/bin/run-parts misses a spread-over-time mechanism, this can cause a DDoS or at least heavy load on a central mail system if many hosts reporting to them e.g. by daily "logwatch". Currently, all systems by default start daily/weekly/monthly crontab entries at the same time (and because of all systems are time-synced via NTP). BTW: there exists also another issue e.g. by using AV software which places their pattern update script in e.g. /etc/cron.hourly or /etc/cron.daily. Now all Unix boxes around the world start the pattern update at the same time...which leads to a DDoS against the update server. Version-Release number of selected component (if applicable): crontabs-1.10-5 and below How reproducible: Always Steps to Reproduce: 1.install many systems 2.setup forwarder for root to an account on a central mail system 3.run e.g. spamassassin and AV software on this mail system 4.monitor load on this mail system 5.wait for 04:02 (now on every system, logwatch is starting) 6.take a look on the load of the mail system for e.g. one hour Actual Results: Heavy load because many e-mails receiving at the same time Expected Results: A time-spread mechanism, which delays the start of run-parts for some minutes Additional info: Here a suggestion: In case of "/usr/bin/run-parts" is called by cron, start of the jobs are randomly delayed for 0 to 15 minutes. --- /usr/bin/run-parts.orig 2003-02-07 22:07:32.000000000 +0100 +++ /usr/bin/run-parts 2003-11-25 12:43:18.000000000 +0100 @@ -2,6 +2,15 @@ # run-parts - concept taken from Debian +# Check for caller "crond", +caller="`ps --no-headers -o fname -p ${PPID}`" +if [ "$caller" = "crond" ]; then + # Sleep random seconds (0-899 = 0-15 minutes) to avoid flodding e.g. remote mail systems + # in case of many boxes reporting information (e.g. logwatch) at same time + seconds=$[ $RANDOM % 900 ] + sleep $seconds +fi + # keep going when something fails set +e
Sorry for the delay in getting around to this bug. We are planning to do something about this finally . Here's some recent discussion on this issue: On Tue, 2005-05-17 at 22:34, Sean Reifschneider wrote: > I'm thinking about changing the cron.daily runparts stuff so that instead > of them being just running at static times after 4am, setting up something > that would sequentially run the correct runparts, one after another, and > optionally randomly putting a delay of some time period up front. > > In other words, read a value from the sysconfig for a max number of minutes > to sleep, then "sleep $[RANDOM%SKEWMINUTES]m", then run the appropriate > runparts. > > The idea being to stagger the crons. A rack full of systems can pull an > extra 5 or 7 amps when all the crons are running at the same time. > REPLY: On Tue, May 17, 2005 at 11:03:33PM -0400, Jason Vas Dias wrote: This "randomized stagger" feature is top of my list for cron enhancements which I hope to be able to get to within the next few weeks for rawhide / FC5 . I'd implement it as a new cron job time specification, format tag, so that /etc/crontab would look like: # run-parts ~01-15 * * * * root run-parts /etc/cron.hourly ~16-31 4 * * * root run-parts /etc/cron.daily ~32-47 4 * * 0 root run-parts /etc/cron.weekly ~48-59 4 1 * * root run-parts /etc/cron.monthly so that cron would schedule run-parts at the first number, of minutes past each hour (1,16,32,48), but when that time arrives, cron would choose a random number between 0 and the second number (15,31,47,59) and, if not 0,re-schedule the job for that number of minutes hence. Re-scheduled randomly staggered jobs would not be subject to any further rescheduling until their unstaggered schedule time arrives. On Tue, 2005-05-17 at 23:50, Sean Reifschneider wrote: I thought about that, and figured that would be pretty easy to implement in > run-parts: > > 01 * * * * root run-parts -s 15 /etc/cron.hourly > 16 4 * * * root run-parts -s 15 /etc/cron.daily > 32 4 * * 0 root run-parts -s 15 /etc/cron.weekly > 48 4 1 * * root run-parts -s 15 /etc/cron.monthly > > I rejected that because right now the crons run at 1, 2, 22, and 42 minutes > after the hour. Presumably this is to try to keep them from overlapping. > The easy way to ensure that there is no overlap would be, in simple > pseudo-code: > > crontab: > 01 * * * * root stagger-parts > > stagger-parts: > sleep $[RANDOM%30]m > run-parts /etc/cron.hourly > [ "$ISDAILY" ] && run-parts /etc/cron.daily > [ "$ISWEEKLY" ] && run-parts /etc/cron.weekly > [ "$ISYEARLY" ] && run-parts /etc/cron.yearly > > > Of course, I haven't come up with a good way of determining the ISDAILY, > etc... There are ways with flag files saying what the last day/week/month > it was run, and would get set to true in the event that it's different. > > I think the ~ idea is interesting, could be quite useful in some places, > but here I think it would be good to be able to explicitly ensure that the > jobs do not overlap. Saying "between 1 and 15" and "16 and 30" means that > one job might run at :15 and the other at :16, which may not give enough > time for the first to finish, possibly leading to unexpected behavior > sporadicly when the jobs overlap. > > Maybe that's not a rational fear, but that's why I was looking at a job to > schedule them. > Yes, this feature would also have to be implemented along with a feature that ensures jobs do not overlap ( bug # 144133 ) . So we are working on these features for a new "enhanced cron" in FC5 / rawhide .
I digged into man page and source RPM of vixie-cron-4.1-54.FC5, but didn't found any news about the "~" feature. Is it implemented in this version? BTW: for the meantime I've created an alternate solution, which is imho better because it creates a per system static but different delay - hopefully, the algorithm is patent-free ;-) And same delay is used for daily,weekly,monthly, so the original timing distance in /etc/crontab will be remain. Note also that such file can be easily distributed in a separate crontab package, e.g. crontab-delay-$version-$release.rpm File: /etc/cron.daily/000-delay.cron #!/bin/bash # Generate per system static delay for cron.{daily,weekly,monthly} # (P) & (C) by Peter Bieringer <pb> factor=1 # max. ~ 68 minutes #factor=2 # max. ~ 34 minutes # Create md5sum of hostname (static over system lifetime) md5sum="`echo ${HOSTNAME} | /usr/bin/md5sum`" # Extract the first 3 hexdigits (12 Bit:0-4095) hexvalue="${md5sum:0:3}" # Create decimal value decvalue="`printf "%d" "0x${hexvalue}"`" # Divide delay by factor DELAY=$[ $decvalue / $factor ] sleep $DELAY exit 0 for weekly and monthly, only a softlink is required: # find /etc/cron.* -name 000-delay.cron -ls 321272 8 -rwxr-xr-x 1 root root 504 Mar 22 13:51 /etc/cron.daily/000-delay.cron 321265 0 lrwxrwxrwx 1 root root 28 Mar 22 13:54 /etc/cron.monthly/000-delay.cron -> ../cron.daily/000-delay.cron 320650 0 lrwxrwxrwx 1 root root 28 Mar 22 13:54 /etc/cron.weekly/000-delay.cron -> ../cron.daily/000-delay.cron
I got some time and created RPM packages for that: Available at: ftp://ftp.aerasec.de/pub/linux/repository/public/redhat/enterprise/3/i386/ ftp://ftp.aerasec.de/pub/linux/repository/public/redhat/enterprise/4/i386/ $ rpm -qilpv crontabs-runpartsdelay-0.0.1-1.RHEL4.AERAsec.3.noarch.rpm Name : crontabs-runpartsdelay Relocations: (not relocatable) Version : 0.0.1 Vendor: AERAsec Network Services and Security GmbH <http://www.aerasec.de/> Release : 1.RHEL4.AERAsec.3 Build Date: Thu 23 Mar 2006 04:25:38 PM CET Install Date: (not installed) Build Host: rpmbuild-rhel4.muc.aerasec.de Group : System Environment/Base Source RPM: crontabs-runpartsdelay-0.0.1-1.RHEL4.AERAsec.3.src.rpm Size : 1731 License: GPLv2 Signature : (none) Packager : Peter Bieringer <pbieringer> Summary : Extension for a system specific static delay of daily, weekly and monthly run-parts entries Description : "crontab-runpartsdelay" add files named "000-delay.cron" to daily, weekly and monthly run-parts directories of "crontabs". The delay is system specific but static as long as the system name is unchanged. Use this e.g. an all of your servers if your central mail system will become overloaded because all servers send e.g. logwatch e-mails at the same time. drwxr-xr-x 2 root root 0 Mar 23 16:25 /etc drwxr-xr-x 2 root root 0 Mar 23 16:25 /etc/cron.daily -rwxr-xr-x 1 root root 577 Mar 23 16:25 /etc/cron.daily/000-delay.cron drwxr-xr-x 2 root root 0 Mar 23 16:25 /etc/cron.monthly -rwxr-xr-x 1 root root 577 Mar 23 16:25 /etc/cron.monthly/000-delay.cron drwxr-xr-x 2 root root 0 Mar 23 16:25 /etc/cron.weekly -rwxr-xr-x 1 root root 577 Mar 23 16:25 /etc/cron.weekly/000-delay.cron
I'm very happy to see that such mechanism was included in crontabs-1.10-9.fc6 release: * Do Okt 12 2006 Marcela Maslanova <mmaslano> 1.10-9 - patch (#110894) for delaying more emails in the moment But I found that in case of anacron does the job for partial active system, this delay should be skipped. Any ideas how to detect whether the cronjob was started by anacron? If there is nothing set by anacron in the environment, a workaround would be to set a value in /etc/anacrontab (like e.g. ANACRON=1) and check such existence in the delay script.
Now I'm checking log files. I'm switching off computer once a week and the second day run daily jobs in 4:02. So it will need more work. Maybe we can check pid number in /var/spool/anacron.
Created attachment 152264 [details] Skip sleep if anacron is running Perhaps the same mechanism as used in /etc/cron.daily/0anacron can be used: # Don't run anacron if this script is called by anacron. if [ ! -e /var/run/anacron.pid ]; then anacron -u cron.daily fi
Created attachment 365830 [details] script for creating added delay for cronjobs executed from /etc/crontab
Created attachment 365831 [details] /etc/sysconfig/crontabs
The script above is used in releases of Fedora from 6 to 11. Since Fedora 12 was this functionality included in cronie in different way, which is documented in man pages and Deployment Guide. Feature implemented in comment#7 has a down-side. It isn't possible to execute regularly jobs immediately by command `anacron -n`.
*** Bug 532157 has been marked as a duplicate of this bug. ***
The last planned update of RHEL-4 will be focused on performance and security bugs only. This bug has bz#532157 for RHEL-5 release. Please contact your support in case you are interested in this change.