Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 595714 - job_server: need seamless rollover from the live jobs to the historical jobs
job_server: need seamless rollover from the live jobs to the historical jobs
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
beta
All Linux
low Severity medium
: 1.3
: ---
Assigned To: Pete MacKinnon
MRG Quality Engineering
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-25 08:45 EDT by Pete MacKinnon
Modified: 2010-10-19 11:00 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-19 11:00:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pete MacKinnon 2010-05-25 08:45:44 EDT
Once a submission is created it should always retain access to its jobs. When the live jobs complete or are removed then their place in the submission needs to be filled by their historical equivalents from the history file.
Comment 1 Pete MacKinnon 2010-05-25 08:50:22 EDT
Matt, I would have expected the JobLogReader to send me a DestroyClassAd when the job comes out of the queue. That way I know it will be picked up the next time the history file is processed. This doesn't seem to be happening albeit I need to do more testing.

Can I assume that is a consistent transaction? Out of job log into history file? Or is there a gray area? If there is, which entry should win...I'm thinking the job queue entry.
Comment 2 Matthew Farrellee 2010-05-25 10:52:15 EDT
qmgmt.cpp:int DestroyProc(int cluster_id, int proc_id) -

 0) AppendHistory
 ...
 2) JobQueue->DestroyClassAd
 ...

It is definitely gray. There is no transaction between history file and job_queue.log.

I'm thinking the history file should win.
Comment 3 Pete MacKinnon 2010-05-25 12:57:02 EDT
Moving to MODIFIED as a place holder for further discussion. Current implementation works as expected. There is a "dead zone" between when the classad has been destroyed and the history scan has not yet picked up the stored job info. Thus calling GetJobSummaries in that span can return inaccurate info.
Comment 4 Pete MacKinnon 2010-05-25 15:47:11 EDT
Schizophrenia...I might be able to downcast to see if I can/should delete the job as a LiveJob from the job table
Comment 5 Pete MacKinnon 2010-08-11 11:48:07 EDT
After testing this more extensively in the pool with a very short scan interval of 10 sec, jobs can still fall through "cracks" of the history file scan interval. The issue is when the LiveJob has not been destroyed yet but the ad has shown up in the history file and is ready for processing.
Comment 6 Pete MacKinnon 2010-08-18 15:35:51 EDT
FH sha d70c4be

Note You need to log in before you can comment on or make changes to this bug.