472275 – Anacron cron jobs are completely broken

Bug 472275 - Anacron cron jobs are completely broken

Summary: Anacron cron jobs are completely broken

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	anacron
Sub Component:
Version:	8
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Marcela Mašláňová
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-11-19 18:59 UTC by Philip Spencer
Modified:	2009-01-09 07:57 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-01-09 07:57:03 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Philip Spencer 2008-11-19 18:59:58 UTC

Description of problem:

anacron-2.3-58.fc8 has files /etc/cron.{daily,weekly,monthly} that are so broken it is hard to know where to begin.

PROBLEM #1: They never call anacron -u to update timestamps. Thus, when normal cron.daily runs at 3:02 a.m., /etc/cron.daily/0anacron simply exits without doing anything (because of the

if [ `date +%H` -le 4 ]; then
exit 0;
fi

code).

If the machine is rebooted later that day, anacron runs at startup and sees the timestamps for cron.daily have not been updated and runs it over again. This is wrong.

PROBLEM #2: These scripts invoke "anacron -s" if run at an hour after 4am (such as if run by anacron itself). This causes multiple nested anacron invocations and we see on several machines numerous attempts to run the same cron.daily jobs every day.

SOLUTION: In order for anacron to work properly, the scripts /etc/cron.{daily,weekly,monthly}/0anacron must do NOTHING except call "anacron -u <appropriate file>". In other words, back out all the brokenness that has been put into these scripts, and revert to the original debian/anacron.daily etc.

Then add another cron job, NOT INSIDE /etc/cron.{daily,weekly,monthly} (i.e., a separate line in /etc/crontab, or a separate file in /etc/cron.d) to call anacron -s daily at 5am. This will pick up any need there might be to rerun anacron after a power suspension or whatever and will eliminate the need to
have calls to anacron (other than anacron -u) inside /etc/cron.{daily,weekly,monthly}.

For the delay and power testing logic, simply create a separate shell script
/usr/sbin/runanacron containing the code that was previously inside /etc/cron.d/0daily and have that be run as the 5am cron job.

Finally, adjust the init script to not start anacron if the system is booted
close to midnight or between midnight and 4am. This will prevent duplicate jobs being run on the same day -- and the cron job running at 5am will pick up any monthly or weekly jobs that still need to be run.

SUMMARY:

/etc/cron.daily/0anacron should contain only the lines
#!/bin/sh
anacron -u cron.daily

/etc/cron.weekly/0anacron should contain only the lines
#!/bin/sh
anacron -u cron.weekly

/etc/cron.monthly/0anacron should contain only the lines
#!/bin/sh
anacron -u cron.monthly

There should be a file /etc/cron.d/anacron with the following contents:
# Run anacron once a day, after regular cron.{daily,weekly,monthly} jobs
# have started, to catch up on any missed jobs.
02 5 * * * root /usr/sbin/runanacron

The file /usr/sbin/runanacron should contain the code currently in /etc/cron.daily/0anacron (except for the hour-testing code which is unnecessary now).

/etc/init.d/anacron should be modified as follows:

start() {
echo -n $"Starting $prog: "
# on_ac_power doesn't exist, on_ac_power returns 0 (ac power being used)
if test -x /usr/bin/on_ac_power
then
/usr/bin/on_ac_power > /dev/null
if test $? -eq 1
then
echo "deferred while on battery power."
RETVAL=0
exit 0
fi
fi

# If starting within 15 minutes of midnight or before 5 am, defer until
# the scheduled 5am anacron run -- this allows the regular cron jobs to
# run first, so that anacron doesn't repeat jobs that the regular cron
# job will pick up as well.

if [ `date +%H -d "15 minutes"` -le 4 ] ; then
echo "deferred until after the regular morning cron job run"
exit 0
fi
daemon +19 anacron -s

Comment 1 Marcela Mašláňová 2008-11-21 12:21:59 UTC

PROBLEM 1:
I removed the original debian script on purpose.
#test -x /usr/sbin/anacron || exit 0
#anacron -u cron.daily
This original script will check, whether anacron exists and then update the timestamp. If the computer is switched off before jobs are run, then they are not run at all for whole day/week/month.

PROBLEM 2:
You're right. I'll try to rewrite it a little bit, but they are not running twice, because the first invocation of anacron make a lock. Anyway, I'll fix this in init script.

Comment 2 Philip Spencer 2008-11-21 14:45:00 UTC

I don't think you understand. It is wrong to remove the original debian initscript and wrong to call anacron -s from inside crond.{daily,weekly,monthly}.

JOBS NOW ARE RUN TWICE. When a bug reporter tells you jobs are routinely being run twice on lots of machines don't try to say it isn't happening. I have logs to prove it, and the opportunity for duplicate runs is evident from the obviously flawed logic in the current package.

Because there is no call to anacron -u, TIMESTAMPS ARE NEVER UPDATED DURING A NORMAL CRON RUN. Therefore, whenever you turn on the computer, all cron.{daily,weekly.monthly} jobs are run over again even if they already ran that day. THIS IS WRONG!!!

Secondly, if the computer's hostname is such that the delay in /etc/cron.daily.000-delay.cron is more than one hour, then by the time 0anacron is called the time will already be later than 5am and so the test for hour being less than 5am fails and it runs anacron -s which causes cron.daily to be run AGAIN even though it is already running from regular cron!

I have already provided a complete fix for the problem. I am sad that you refuse to accept the fix by reverting to the original, correct debian scripts and adding a separate cron job outside /etc/cron.{daily,weekly,monthly} to catch the other cases. This refusal to accept the bug report and fix the problem properly means that we (and all others affected by the problem) are forced to fork our own version.

I also fail to understand your comment "If the computer is switched off before jobs are run, then they are not run at all for whole day/week/month." The anacron -u is PART OF THE JOB. So, if the computer is switched off before jobs are run, the anacron -u will not be run either, so the jobs will be run when the computer is next turned on. This is how anacron is supposed to function.

If you mean that the computer is turned off DURING the cron run, well in that case some jobs will have already been run and others will not have and some will have been stopped in midstream. One therefore has to choose: do you want some jobs run twice? Or do you want some jobs missed? It's not clear which is the "correct" choice here.

The debian configuration errs on the side of missing some jobs rather than repeating some jobs.

BUT IF YOU WANT TO CHANGE THAT, THEN ALL YOU DO IS RENAME THE CRON JOB THAT INVOKES ANACRON -U SO THAT IT RUNS LAST INSTEAD OF FIRST. You don't change the contents of the script or mess with invoking new anacron commands inside it or use unreliable time-of-day checks that don't work unless the machine's hostname's md5 sum has a particular value. Just rename it from 0anacron to ~anacron or something like that. Then, the timestamp will get updated after (instead of before) the other jobs in the cron run finish.

Note that if you are going to do that then it makes the other part of my fix more complicated because you can't simply run it at (say) 5am every day, because the run-parts may not be finished yet. So you'd probably have to have some logic that checks for running run-parts processes and waits for them to terminate before calling anacron -s. Perhaps the simplest way to do this is:

Have file 0anacron inside /etc/cron.{daily,weekly,monthly} that writes its parent process id (the pid of the run-parts process) inside some lock file, say /var/lock/run-parts-{daily,weekly,monthly}.

Have the file ~anacron remove that lock file before calling anacron -u.

Have the script that runs in the separate cron job wait until the lock files
are gone (or the corresponding processes are all dead) before doing anything.

Comment 3 Philip Spencer 2008-11-21 15:45:05 UTC

Here's a complete but not yet tested fix that will adopt the convention of repeating run-parts scripts that were interrupted by a poweroff (the previous fix I posted adopts the other convention of not repeating run-parts scripts that were interrupted by a poweroff):

/etc/cron.daily/0anacron:
#!/bin/bash
echo $PPID > /var/lock/run-parts-daily

/etc/cron.daily/~anacron:
#!/bin/bash
/usr/sbin/anacron -u cron.daily
rm /var/lock/run-parts-daily

[ and similarly for monthly, weekly ]

/etc/cron.d/anacron:
# Run anacron once a day, after regular cron.{daily,weekly,monthly} jobs
# have finished, to catch up on any missed jobs.
02 5 * * * root /usr/sbin/runanacron

/usr/sbin/runanacron:
#!/bin/bash

# Wait for any running cron.{daily,weekly,monthly} jobs to finish
for JOBTYPE in daily weekly monthly ; do
    while [ -e /var/lock/run-parts-$JOBTYPE ] ; do
        PID=`cat /var/lock/run-parts-$JOBTYPE`
        if [ -n "$PID" -a -e "/proc/$PID" ] ; then
            sleep 1;
        else
            rm /var/lock/run-parts-$JOBTYPE
        fi
# If we hit midnight of the following day, there's a stuck job somewhere;
#  exit with an error message
        if [`date +%H` -le 0] ; then
             echo "Unable to run catch-up anacron:"
             echo "previous day's $JOBTYPE run-parts script is still running."
             exit 1
        fi
    done
done

# Run anacron to catch up on any missed jobs, if running on ac power.
# (If not on ac power, perhaps the right thing to do is sleep for up to
# an hour or two until ac power is resumed, rather than exiting right away?
# But I'll leave that for someone else to decide).

if test -x /usr/bin/on_ac_power; then
        /usr/bin/on_ac_power &> /dev/null
        if test $? -eq 1; then
                exit 0
        fi
fi

/usr/sbin/anacron -s

Comment 4 Marcela Mašláňová 2008-11-24 12:20:51 UTC

Your fix might work and I'm really appreciating your input, but I'd like to clean these scripts as much as possible and don't create new one. All of them are only hacks around some originally unintentional behaviour f.e. suspend, power off three times a day. Those scripts were fixed by many people, but they were never working properly.

Comment 5 Philip Spencer 2008-11-24 16:54:57 UTC

I feel rather as if I'd encountered someone trying to carry water in a sieve and showed them a cup, only to be told "your cup might work but I'd rather try to make the sieve work instead of adding something new".

Let me try to explain what HAS to happen in order for anacron to work.

(1) Each of the jobs cron.daily, cron.weekly, cron.monthly, when executed HAS to record the fact that it ran. If no record is kept of when the job runs, how is anacron to know whether or not it has been missed?

Therefore: /etc/cron.daily HAS to contain a script that runs "anacron -u cron.daily". /etc/cron.weekly HAS to contain a script that runs "anacron -u cron.weekly". /etc/cron.monthly HAS to contain a script that runs "anacron -u cron.monthly". That is precisely what the original debian scripts /etc/cron.{daily,weekly,monthly}/0anacron do.

You cannot have a working anacron without those files having at least that content.

(2) The cases not anticipated by the original anacron design (such as cron jobs being missed during suspend when there's no /etc/init.d/anacron execution upon resume to catch them, or a reboot shortly after midnight when you don't want anacron to duplicate what that morning's regular cron run will already do)
can be handled by having a daily call to "anacron -s" after the day's regular cron runs have completed. This is what you are trying to do in your scripts, right?

(3) The only two ways to accomplish goal #2 are:
(a) a separate cron job script (what I have proposed)
or (b) adding it in to /etc/cron.daily/0anacron in some fashion.

(4) In no case is there any need to put this anacron -s call into /etc/cron.weekly/0anacron or /etc/cron.monthly/0anacron.

(5) If you adopt option (3b) -- which is what you seem to be trying to do -- the /etc/cron.daily/0anacron script must still call "anacron -u /etc/cron.daily". Then you would have to put in a delay to wait until the weekly and monthly cron jobs have a chance to get started and update their timestamps. Then after that you could call anacron -s to pick up missed jobs.

BUT: You have just inserted an additional delay (perhaps up to an hour or more) which further delays all the other jobs in cron.daily. This is not desirable -- why should all the remaining cron jobs have to be delayed?

ALSO: With this approach, if the machine gets rebooted during this delay, the remaining jobs will never get executed (because the timestamp was updated but the rest of the jobs were never finished).

(6) THEREFORE: To get a properly working anacron, you need to adopt (3a) instead of (3b). That way, there's no need for any additional delay. As soon as /etc/cron/daily/0anacron gets executed, the cron.daily timestamp is updated and the system immediately moves on to begin the next daily job.

(7) Now, you have a completely working anacron (as long as you also modify /etc/init.d/anacron to not run if invoked in the early hours of the morning before the regular cron run.) This is my first fix outlined above.

(8) There is still the possibility that the machine may be rebooted while the
cron jobs are executing, and any partially finished job will not be restarted (note that this is unlike the 3b case in which there was a long window of time when the machine could get rebooted after the timestamp update but before the first real job began. Now we're only talking about reboots that occur DURING the execution of the real jobs).

I personally think this is okay; anacron's job is only to execute jobs that never started, not to restart ones that were killed while executing. If, however, that kind of restart is desired, then you simply have to rename the 0anacron jobs to ~anacron and also put in some code so that the daily call to anacron -s checks to see if the regular cron jobs are finished before doing anything.

I think this conclusively demonstrates that you HAVE to abandon your idea of modifying the /etc/cron.{daily,weekly,monthly}/0anacron scripts.

At least I have done the best I can to persuade you of that. Please either implement one of the two fixes I have suggested (depending on whether or not you need interrupted jobs to be restarted) or close the bug report as WONTFIX to indicate your unwillingness to have a working anacron in Fedora.

The scripts you currently have are completely useless: they do NOTHING on most machines. To see this, consider the fact that they are executed only in two cases:

(1) during a regular cron run. This occurs usually during 4-5 am (except
on machines where the delay in /etc/cron.daily/000delay is very long)
so in this case your script simply exits doing nothing.
(2) during an anacron run. In this case your script may call "anacron -s"
which is useless because anacron is already running!

So in neither case do the scripts accomplish anything at all. Hence my characterization of them in the subject line as "completely broken".

Comment 6 Bug Zapper 2008-11-26 11:19:33 UTC

This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Phil Lobbes 2008-12-18 15:14:03 UTC

I haven't spent much time troubleshooting, but certainly agree that the latest anacron-2.3-58.fc8.i386.rpm seems to have really caused anacron to start misbehaving.  I'm noticing things like logwatch running twice each morning now.

Here's an example of what is logged in /var/log/cron:
Dec 18 06:17:13 trojan anacron[30047]: Anacron 2.3 started on 2008-12-18
Dec 18 06:17:13 trojan anacron[30047]: Will run job `cron.daily' in 65 min.
Dec 18 06:17:13 trojan anacron[30047]: Jobs will be executed sequentially
Dec 18 07:22:13 trojan anacron[30047]: Job `cron.daily' started
Dec 18 09:37:20 trojan anacron[3719]: Anacron 2.3 started on 2008-12-18
Dec 18 09:37:20 trojan anacron[3719]: Job `cron.daily' locked by another anacron - skipping
Dec 18 09:37:20 trojan anacron[3719]: Normal exit (0 jobs run)
Dec 18 09:40:59 trojan anacron[30047]: Job `cron.daily' terminated
Dec 18 09:40:59 trojan anacron[30047]: Normal exit (1 job run)

It is a bit ironic that it was working fine until the latest changes for me.  When the latest changes claim to prevent running double jobs:

* Thu Oct 30 2008 Marcela Mašláňová <mmaslano> 2.3-58
- same script for all cron.something should prevent double jobs
- correct spooldir is logged

Any hope we can revert or change to something that works better?  As it stands right now, things are broken.

Comment 8 Marcela Mašláňová 2008-12-19 08:07:11 UTC

The anacron is started twice, but the jobs aren't run twice. The anacron's daemon check whether another anacron exists and jobs are terminated. This is the second invocation of daemon:
Dec 18 09:37:20 trojan anacron[3719]: Anacron 2.3 started on 2008-12-18
Dec 18 09:37:20 trojan anacron[3719]: Job `cron.daily' locked by another
anacron - skipping
Dec 18 09:37:20 trojan anacron[3719]: Normal exit (0 jobs run)

As you see no jobs are run.

I agree this is not the best way how to log it. I'm working on better cooperation of cronie and anacron, but I don't want put such huge change in stable F-8.

Comment 9 Philip Spencer 2008-12-19 14:04:12 UTC

You still don't get it, do you?

THE JOBS ARE RUN TWICE!

He's showing you the logs from the SECOND time the job is run.

The FIRST time the job is run by the regular cron.daily (not by anacron). This reporter has not shown you those logs; they'd be from earlier in the morning.

The SECOND time it is run by the first anacron invocation (process 30047 in his log example). His log shows you cron.daily being started by anacron at 6:17 a.m. This is the SECOND TIME that cron.daily would have been run that morning.

Then there's a THIRD attempt made by the second anacron invocation -- this is the anacron that exits without actually running the job.

In summary:

  -- the regular cron.daily runs once
  -- anacron then tries to run it two more times, for a total of three times(!)
  -- the second attempt by anacron exits without doing anything because
     it sees the job is locked, but all that means is that the job runs
     only twice a day instead of three times a day.

I have already explained how it needs to be fixed AND given you a semi-formal proof as why that is the ONLY way in which it can be fixed and yet you've ignored it and have not pushed out a working package even though I've explained how to create a working package.

So we have had to fork our own anacron. For anyone interested in a working anacron package for Fedora 8, you can download it from:

http://www.fields.utoronto.ca/~pspencer/anacron-2.3-58.fc8.FIfixed.1.x86_64.rpm
(64-bit) or
http://www.fields.utoronto.ca/~pspencer/anacron-2.3-58.fc8.FIfixed.1.i386.rpm
(32-bit). The source RPM is
http://www.fields.utoronto.ca/~pspencer/anacron-2.3-58.fc8.FIfixed.1.src.rpm

Comment 10 Philip Spencer 2008-12-19 14:11:22 UTC

Something else that may be in the mix here: You mention "interacting with cronie". Cronie does not exist in Fedora 8. We're talking here about ancaron and vixie-cron, not cronie.

I have no idea how cronie works but it is possible its cron table mechanisms are so different from vixie-cron that you're tying to take concepts that might make sense with cronie in Fedora 9/10 but are nonsensical when combined with vixie-cron which is what Fedora 8 uses (unless you're an fcron user, but in that case you wouldn't even have anacron installed as fcron replaces anacron).

Comment 11 Phil Lobbes 2008-12-19 15:18:43 UTC

(In reply to comment #8)
It isn't an issue about logging, it is an issue about cron jobs being run multiple times!  What pspencer says (comment #9) is completely true, please reread them with an open mind.  The latest RPMs are broken and causing cron jobs to run multiple times.

If you didn't want to make changes to fedora 8 you probably shouldn't have released 2.3-58 because it is what caused these problems.  While I'm sure it was intended to fix things, it clearly missed the mark.  Please don't leave this in a broken state.

Comment 12 Phil Lobbes 2008-12-19 15:23:51 UTC

For your reference, here's a more complete log for today.  You will notice cron.daily clearly is executed multiple times (and my daily duplicated logwatch emails confirm this too):

Dec 19 00:00:01 trojan CROND[26133]: (root) CMD (/usr/share/clamav/freshclam-sleep)
Dec 19 00:01:01 trojan CROND[26145]: (root) CMD (run-parts /etc/cron.hourly)
Dec 19 01:01:01 trojan CROND[27595]: (root) CMD (run-parts /etc/cron.hourly)
Dec 19 02:01:01 trojan CROND[29038]: (root) CMD (run-parts /etc/cron.hourly)
Dec 19 03:00:01 trojan CROND[30488]: (root) CMD (/usr/share/clamav/freshclam-sleep)
Dec 19 03:01:01 trojan CROND[30506]: (root) CMD (run-parts /etc/cron.hourly)
Dec 19 04:01:01 trojan CROND[31921]: (root) CMD (run-parts /etc/cron.hourly)
Dec 19 04:02:01 trojan CROND[31942]: (root) CMD (run-parts /etc/cron.daily)
Dec 19 05:01:02 trojan CROND[867]: (root) CMD (run-parts /etc/cron.hourly)
Dec 19 06:00:01 trojan CROND[2422]: (root) CMD (/usr/share/clamav/freshclam-sleep)
Dec 19 06:01:01 trojan CROND[2433]: (root) CMD (run-parts /etc/cron.hourly)
Dec 19 06:17:13 trojan anacron[2787]: Anacron 2.3 started on 2008-12-19
Dec 19 06:17:14 trojan anacron[2787]: Will run job `cron.daily' in 65 min.
Dec 19 06:17:14 trojan anacron[2787]: Jobs will be executed sequentially
Dec 19 07:01:01 trojan CROND[4996]: (root) CMD (run-parts /etc/cron.hourly)
Dec 19 07:22:13 trojan anacron[2787]: Job `cron.daily' started
Dec 19 08:01:01 trojan CROND[6465]: (root) CMD (run-parts /etc/cron.hourly)
Dec 19 09:00:01 trojan CROND[7896]: (root) CMD (/usr/share/clamav/freshclam-sleep)
Dec 19 09:01:01 trojan CROND[7907]: (root) CMD (run-parts /etc/cron.hourly)
Dec 19 09:37:20 trojan anacron[8771]: Anacron 2.3 started on 2008-12-19
Dec 19 09:37:20 trojan anacron[8771]: Job `cron.daily' locked by another anacron - skipping
Dec 19 09:37:20 trojan anacron[8771]: Normal exit (0 jobs run)
Dec 19 09:41:11 trojan anacron[2787]: Job `cron.daily' terminated
Dec 19 09:41:11 trojan anacron[2787]: Normal exit (1 job run)

Comment 13 Fedora Update System 2009-01-05 10:44:37 UTC

anacron-2.3-59.fc8 has been submitted as an update for Fedora 8.
http://admin.fedoraproject.org/updates/anacron-2.3-59.fc8

Comment 14 Phil Lobbes 2009-01-06 15:49:22 UTC

I installed anacron-2.3-59.fc8 yesterday and it appears that things ran properly this morning. Thanks!

Comment 15 Fedora Update System 2009-01-07 09:11:00 UTC

anacron-2.3-59.fc8 has been pushed to the Fedora 8 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing-newkey update anacron'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2009-0045

Comment 16 Bug Zapper 2009-01-09 07:57:03 UTC

Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.