Bug 595714 - job_server: need seamless rollover from the live jobs to the historical jobs
Summary: job_server: need seamless rollover from the live jobs to the historical jobs
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor   
(Show other bugs)
Version: beta
Hardware: All
OS: Linux
low
medium
Target Milestone: 1.3
: ---
Assignee: Pete MacKinnon
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-25 12:45 UTC by Pete MacKinnon
Modified: 2010-10-19 15:00 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-19 15:00:21 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Pete MacKinnon 2010-05-25 12:45:44 UTC
Once a submission is created it should always retain access to its jobs. When the live jobs complete or are removed then their place in the submission needs to be filled by their historical equivalents from the history file.

Comment 1 Pete MacKinnon 2010-05-25 12:50:22 UTC
Matt, I would have expected the JobLogReader to send me a DestroyClassAd when the job comes out of the queue. That way I know it will be picked up the next time the history file is processed. This doesn't seem to be happening albeit I need to do more testing.

Can I assume that is a consistent transaction? Out of job log into history file? Or is there a gray area? If there is, which entry should win...I'm thinking the job queue entry.

Comment 2 Matthew Farrellee 2010-05-25 14:52:15 UTC
qmgmt.cpp:int DestroyProc(int cluster_id, int proc_id) -

 0) AppendHistory
 ...
 2) JobQueue->DestroyClassAd
 ...

It is definitely gray. There is no transaction between history file and job_queue.log.

I'm thinking the history file should win.

Comment 3 Pete MacKinnon 2010-05-25 16:57:02 UTC
Moving to MODIFIED as a place holder for further discussion. Current implementation works as expected. There is a "dead zone" between when the classad has been destroyed and the history scan has not yet picked up the stored job info. Thus calling GetJobSummaries in that span can return inaccurate info.

Comment 4 Pete MacKinnon 2010-05-25 19:47:11 UTC
Schizophrenia...I might be able to downcast to see if I can/should delete the job as a LiveJob from the job table

Comment 5 Pete MacKinnon 2010-08-11 15:48:07 UTC
After testing this more extensively in the pool with a very short scan interval of 10 sec, jobs can still fall through "cracks" of the history file scan interval. The issue is when the LiveJob has not been destroyed yet but the ad has shown up in the history file and is ready for processing.

Comment 6 Pete MacKinnon 2010-08-18 19:35:51 UTC
FH sha d70c4be


Note You need to log in before you can comment on or make changes to this bug.