Red Hat Bugzilla – Bug 595714
job_server: need seamless rollover from the live jobs to the historical jobs
Last modified: 2010-10-19 11:00:21 EDT
Once a submission is created it should always retain access to its jobs. When the live jobs complete or are removed then their place in the submission needs to be filled by their historical equivalents from the history file.
Matt, I would have expected the JobLogReader to send me a DestroyClassAd when the job comes out of the queue. That way I know it will be picked up the next time the history file is processed. This doesn't seem to be happening albeit I need to do more testing.
Can I assume that is a consistent transaction? Out of job log into history file? Or is there a gray area? If there is, which entry should win...I'm thinking the job queue entry.
qmgmt.cpp:int DestroyProc(int cluster_id, int proc_id) -
It is definitely gray. There is no transaction between history file and job_queue.log.
I'm thinking the history file should win.
Moving to MODIFIED as a place holder for further discussion. Current implementation works as expected. There is a "dead zone" between when the classad has been destroyed and the history scan has not yet picked up the stored job info. Thus calling GetJobSummaries in that span can return inaccurate info.
Schizophrenia...I might be able to downcast to see if I can/should delete the job as a LiveJob from the job table
After testing this more extensively in the pool with a very short scan interval of 10 sec, jobs can still fall through "cracks" of the history file scan interval. The issue is when the LiveJob has not been destroyed yet but the ad has shown up in the history file and is ready for processing.
FH sha d70c4be