Bug 661966

Summary: Jobs dropped due to falling out of allowed hour range should not be locked
Product: [Fedora] Fedora Reporter: Marcela Mašláňová <mmaslano>
Component: cronieAssignee: Marcela Mašláňová <mmaslano>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: anders.blomdell, mmaslano, pertusus, tmraz
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cronie-1.4.5-4.fc14 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-23 19:59:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
problem
none
lock
none
pstree
none
Log showing daily and weekly active at the same time none

Description Marcela Mašláňová 2010-12-10 07:17:37 UTC
Description of problem:
Occasionally (have only been observed when job running is delayed past the next
execution of cron) the weekly anacron task locks out the daily task. Atttached
you will find the output at such an instance of:

  pstree -p
  lslk
  /var/log/cron

Version-Release number of selected component (if applicable):
cronie-anacron-1.4.5-2.fc14.i686

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Marcela Mašláňová 2010-12-10 07:18:24 UTC
Created attachment 467907 [details]
problem

Comment 2 Marcela Mašláňová 2010-12-10 07:19:48 UTC
Created attachment 467908 [details]
lock

Comment 3 Marcela Mašláňová 2010-12-10 07:20:13 UTC
Created attachment 467909 [details]
pstree

Comment 4 Tomas Mraz 2010-12-10 08:13:02 UTC
Well this is clearly a bug in the 99-raid-check script which hangs for some reason and it should be fixed in the package that owns this script. Please open a bug against this package.

On the other hand we could add a feature to anacron such as a nowait flag that would make the job flagged with this flag to not wait for it to finish and mark it as finished immediately after its child process forks.

Comment 5 Anders Blomdell 2010-12-10 08:57:10 UTC
No it's not a bug in the 99-raid-check, it only takes 3 days to complete on heavily loaded 2TB disks, fine with me, 4 days left until next time.

Comment 6 Tomas Mraz 2010-12-10 09:52:54 UTC
Then it cannot be run from anacron at least not before the nowait feature is added and also cannot be run from the cron.weekly directory but directly with its own entry in /etc/anacrontab. The other possiblity is to handle the spawning of the long-running process directly in the 99-raid-check script.

Comment 7 Anders Blomdell 2010-12-10 12:35:57 UTC
Then why does it work all the weeks when start delay does not exceed the next invocation of anacron? 

I.e we have only seen it has lock out daily jobs when 'random(RANDOM_DELAY) + cron.weekly.delay > 60' (numerically: random(45) + 25 > 60).

Your comment seems to imply that no anacron task may last longer than the shortest period?

Comment 8 Tomas Mraz 2010-12-10 13:29:56 UTC
If it takes more than one day to complete then it will block the daily jobs the next day anyway regardless of the random delay.

Comment 9 Anders Blomdell 2010-12-10 14:14:44 UTC
Created attachment 467976 [details]
Log showing daily and weekly active at the same time

Also shows that the weird locking behavior does not always occur.

Comment 10 Tomas Mraz 2010-12-10 15:15:56 UTC
OK now I see what is the problem - it happens when the weekly jobs are started in an anacron instance that is started at 2am or earlier in the day. In that case the daily job falls out of the allowed range however its file is being locked - that is the bug. It should not have been locked in that case.

Comment 11 Marcela Mašláňová 2010-12-13 09:38:40 UTC
(In reply to comment #9)
> Created attachment 467976 [details]
> Log showing daily and weekly active at the same time
> 
> Also shows that the weird locking behavior does not always occur.

Could you test the update and let us now?

Comment 12 Anders Blomdell 2010-12-13 11:14:49 UTC
Where do I find the update?

Was only able to find the stable ones in https://admin.fedoraproject.org/updates

Comment 13 Marcela Mašláňová 2010-12-13 11:39:30 UTC
Rawhide doesn't have updates, packages are just synced on mirrors. It should be fixed by release 1.4.6-5.

Comment 14 Tomas Mraz 2010-12-13 17:26:20 UTC
I built a F14 package here in koji:
http://koji.fedoraproject.org/koji/buildinfo?buildID=209038
You can download it from there.

Comment 15 Fedora Update System 2010-12-14 08:52:01 UTC
cronie-1.4.5-3.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/cronie-1.4.5-3.fc14

Comment 16 Fedora Update System 2010-12-15 09:01:41 UTC
cronie-1.4.5-3.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update cronie'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/cronie-1.4.5-3.fc14

Comment 17 Fedora Update System 2010-12-16 14:20:36 UTC
cronie-1.4.5-4.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/cronie-1.4.5-4.fc14

Comment 18 Fedora Update System 2010-12-23 19:58:57 UTC
cronie-1.4.5-4.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.