The code that drives cron jobs in gears uses flock to ensure that a cron job isn't run multiple times. The way it is used, though, an open write-mode file descriptor is passed to child processes. In one case, I've seen a cron job that starts a service (which is ugly, but that's orthogonal), which then keeps the file locked indefinitely, so the cron job from which the service was launched can never run as long as the service is running and holding the lock.
Potential fix using &- to close flocked descriptor. E.g.,
( flock -n 9 || exit 1 ; sleep 100 9>&- ) 9>.flock
This class of issue potentially exists in the following scripts/modules:
Added extra protections to the lock file descriptor in various scripts.
Commit pushed to master at https://github.com/openshift/origin-server
Bug 977493 - Avoid leaking the lock file descriptor to child processes.
Here's the steps to Q/E:
1. Create an app with cron embedded.
rhc app create rm1 php-5.3 cron-1.4
2. Create a cron script that will attempt to write to the lock file descriptor.
cat > .openshift/cron/minutely/broken <<_EOF_
echo "foo" >&9
chmod +x .openshift/cron/minutely/broken
git add .openshift/cron/minutely/broken
git commit -m 'Add broken crontab'
3. Wait a few minutes and inspect the logs from cron in the gear.
4. You should see the following in the logs:
line 1: 9: Bad file descriptor
Checked on devenv_3430, with step in comment#5.
tail the cron log, can get the following lines:
[php1-bmengdev.dev.rhcloud.com log]\> tailf cron.minutely.log
Mon Jul 1 02:01:06 EDT 2013: START minutely cron run
/var/lib/openshift/35ee7d48e21311e285a322000aa40b62/app-root/runtime/repo//.openshift/cron/minutely/broken: line 1: 9: Bad file descriptor
Mon Jul 1 02:01:07 EDT 2013: END minutely cron run - status=0
Move bug to verified.