Bug 515814 - /etc/cron.daily/yum.cron has broken locking that prevent updates to be done
Summary: /etc/cron.daily/yum.cron has broken locking that prevent updates to be done
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: yum-cron
Version: 11
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Habig, Alec
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-05 19:56 UTC by Milan Kerslager
Modified: 2009-09-14 06:53 UTC (History)
2 users (show)

Fixed In Version: 0.8.4-2.fc10
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-08-31 23:34:11 UTC
Type: ---


Attachments (Terms of Use)

Description Milan Kerslager 2009-08-05 19:56:37 UTC
The script /etc/cron.daily/yum.cron does dir-based locking but the directory is not removed during startup. When stale lock-dir is present (reboot or crash during this script running), no update ever will be run with no error message.

As yum does its own much better locking there is no need to do own locking. Or the script should do file-based locking that does not lead to stale-lock problem.

File based locking (race condition avoided):

LOCKNAME=yum-cron
set -o noclobber
echo > /var/lock/subsys/$LOCKNAME
[ $? -ne 0 ] && failure
set +o noclobber

The best solution is to remove all checks as current yum waits until another application removes the lock (this may be update applet, PackageKit or so). So please, remove the locking code.

Comment 1 Habig, Alec 2009-08-05 20:17:23 UTC
This is a continuation of Bug 469566, which got auto-closed when F9 was EOL'd.  Sorry I lost track of things there.

The reason that yum-cron does locking at all is that yum itself does wait-based locking rather than terminate.  So, if you run a second instance of yum, it sits around and waits for the lock to free rather than just dying.  In a cron-based application, this is a recipe for fork-bombing your system.

file-based locking was disapproved of since it wasn't atomic (part of Bug 327401), so a directory-based one was implemented.

And yes, you're right that this solution is susceptible to stale locks.  Clearing the locks at startup (which is what happens if a lock file is used rather than a directory), isn't a complete solution however, as a non-reboot induced process death isn't fixed.

So, I need to throw a file into that directory which contains the PID of the locking process.  If that PID does not exist, then the current process is allowed to blow away the lockdir and start afresh.

Time to dust off my sh-script foo and go to work.

Comment 2 Milan Kerslager 2009-08-05 20:28:14 UTC
The code for file locking works and avoids race. This code is in most shell-tutorials. No PIDs are needed. Just use it if you really want locking.

And wait - there are a lot of concurrent yum-based processes around which are not suspected for fork-bombing as I wrote before.

Comment 3 Habig, Alec 2009-08-05 20:44:42 UTC
If users are the people doing concurrent yum processes then fine.  If it's dumb old cron, then a new, waiting yum process is created at every execution of the job.  For a daily script, sure, that's a very slowly breeding fork-bomb, but a problem nonetheless.  The first guy to notice this had dozens of yum processes patiently waiting by the time he noticed.

Reading around to see if there's a more elegant lock mechanism that knows about non-existent processes, did you have something specific in mind?  Simply being a file rather than a dir isn't magic, all it does is make it more likely to be cleaned up during a reboot, which is only a solution if the machine has rebooted since the process died and left the stale lock.

However, the window for badness is a little better than I worried at first.  The lockdir is made like:

# Try mkdir for the lockfile, will test for and make it in one atomic action
mkdir $LOCKFILE 2>/dev/null || exit
# and clean up that lockfile and tempfile if the script exits or is killed  
trap "{ rmdir $LOCKFILE 2>/dev/null; rm -f $YUMTMP; exit 255; }" INT TERM EXIT

so if the yum-cron process is killed in an orderly shutdown, a stale lockfile isn't left.  If the machine just plain blows up (or the process is kill -9'd)  then the stale lockfile is left, so there's certainly still a need for a PID
check and cleanup to cover this case.

why mkdir?  Since it tries and fails/creates all in one step.  A stat followed by a file creation leaves a window for another process to grab the lock in between steps, hence the race.

Comment 4 Habig, Alec 2009-08-05 20:53:34 UTC
Googling around to find that elusive more clever way (I'm not being facetious - there is _always_ a better way somewhere, especially to the non-experts such as myself!) I did find a decent explanation of what I was babbling about here:

  http://bash-hackers.org/wiki/doku.php/howto/mutex

Comment 5 Milan Kerslager 2009-08-05 20:59:39 UTC
This simple code avoids race and uses old-fashioned shell feature:

LOCKNAME=yum-cron
set -o noclobber
echo > /var/lock/subsys/$LOCKNAME
[ $? -ne 0 ] && exit 1
set +o noclobber

You don't need more. This is code I wrote down to my notebook at the beginning of '90. Try it.

Comment 6 Habig, Alec 2009-08-11 00:25:34 UTC
You are right, the echo-pipe file creation you do avoids the race condition.  However, it does not address the problem of what the code should do when it encounters a lockfile - one needs to crosscheck to see if the locking process is still alive.  Checking for the PID gets us 99% of the way there.  Checking for time delay since creation of the file would cover us in the unlikely case of duplicate PIDs after reboot, something that's not worth it here since the penalty is a single extra instance of yum hanging there waiting to proceed.

So, I've stuck with the lockdir rather than the lockfile, and added a PIDfile in that lockdir.  A package with this logic in the cron scripts is available here:

  http://koji.fedoraproject.org/koji/buildinfo?buildID=126659

Tested it as follows:

  1) ran a second instance of yum-cron, it decided the lockfile was valid and quit;
  2) kill -9'd yum cron to leave a stale lock, ran a second instance, it cleaned up and proceeded;
  3) created just an empty directory (the bad state you found yourself in with the old package), it cleans this up and proceeds

Anyway - please test out the above RPM and see if it works for you.  If it does, I'll push it to testing then updates for the world.

Comment 7 Habig, Alec 2009-08-21 19:26:43 UTC
No additional input, so am pushing the fix to updates-testing for supported fedora releases to get more exposure.

Comment 8 Fedora Update System 2009-08-21 20:05:43 UTC
yum-cron-0.8.4-2.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/yum-cron-0.8.4-2.fc11

Comment 9 Fedora Update System 2009-08-21 20:06:10 UTC
yum-cron-0.8.4-2.fc10 has been submitted as an update for Fedora 10.
http://admin.fedoraproject.org/updates/yum-cron-0.8.4-2.fc10

Comment 10 Fedora Update System 2009-08-25 04:27:57 UTC
yum-cron-0.8.4-2.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update yum-cron'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-8905

Comment 11 Fedora Update System 2009-08-25 04:42:03 UTC
yum-cron-0.8.4-2.fc10 has been pushed to the Fedora 10 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update yum-cron'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-8970

Comment 12 Fedora Update System 2009-08-31 23:34:06 UTC
yum-cron-0.8.4-2.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2009-08-31 23:43:50 UTC
yum-cron-0.8.4-2.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 14 Milan Kerslager 2009-09-14 06:53:46 UTC
You are not right. You may freely save PID to the lockfile after the check or just when trying to create lock, because locking is working that way and you are permitted to do this. You do not need extra directory with extra file. It is worth nothing.

Your code does not count that other process may take the PID that is stored in the stale lockfile (err: a file under your lockdir). If this PID is a daemon, no update will be ever done until reboot.


Note You need to log in before you can comment on or make changes to this bug.