Description of problem: /distribution/reservesys has a RESERVE_IF_FAIL feature which seems to be badly affected by 'Updating' status despite to used 'sleep 40'. Users seem to already report that recently 'Updating' status seeing for very long time (e.g. more than a 60s) and also reservesys can be seen triggered by /distribution/reservesys if RESERVE_IF_FAIL used despite to all previous jobs Passed. This really looks like an issue with 'Updating' status (or any other "unexpected" value). It might be worth to consider looping on 'Updating' status until something real is read rather then static sleep 40. Actual results: False alarms reported (besides others /distribution/reservesys fails after the period if left untouched), unwanted reservations of machines Expected results: Opposite ot actual (does not matter what's the method to achieve this)
I guess you would have been hitting this problem last week when the data migration was slowing down the scheduler, so that update_dirty_jobs was taking several minutes to run, right? Since Tuesday last week the scheduler was back to normal and processing status updates in ~20 seconds, so you should only hit this extremely rarely now.
Dear Marian, thanks for your report. Based on Dan's reply I'm thinking of closing this bug, since it is due to the load of the data migration. I know it can be very frustrating of false alarms. Would this be acceptable?
(In reply to Roman Joost from comment #2) > Dear Marian, > > thanks for your report. Based on Dan's reply I'm thinking of closing this > bug, since it is due to the load of the data migration. I know it can be > very frustrating of false alarms. Would this be acceptable? An alternative approach is to implement a kind of loop which waits until "known" state is available to avoid faulty behavior under any condition. Do as you wish.
So the problem is "Updating..." is not a status, that's a hack in the web UI to avoid showing the current status from the database when we know it's wrong because the job is "dirty". ("Dirty" means that a status update is pending in beakerd.) However in the recipe XML (which is what /distribution/reservesys is looking at, to determine if the previous task passed or not) we don't expose the "dirty" flag on the job, nor the "Updating..." status. Instead it just appears with the old values status="Running" result="New" until beakerd updates them. We could probably make it loop until the result is something other than New. In theory an alternative harness can produce tasks with New result but I think none intentionally do that.
Dear Marian, we had another look at this. Dan pointed me to a discussion about the reservesys element which currently lacks RESERVE_IF_FAIL functionality. We think the better way out of this would be to equip Beaker to handle reservation in case of failure with <reservesys /> instead of adding more functionality around this task. Until we have a backlog item for this, I'll keep this report open.
Dear Marian, we'd like to proceed with implementing the RFE from Bug 1100593 (Conditional reservation support for harness independent reservation) in favour of this bug. I've bumped the priority and think time spent on this support would benefit everyone than adding more hacks to /distribution/reservesys. Personally I'd like to close this bug as WONTFIX with reference to Bug 1100593, but I'm also happy to keep it open and close it when Bug 1100593 if you feel like it should be kept. Let me know what you think. Cheers!