Bug 633402

Summary: jobserver: HistoricalJobs submissions destroyed on some restarts
Product: Red Hat Enterprise MRG Reporter: Pete MacKinnon <pmackinn>
Component: condor-qmfAssignee: Pete MacKinnon <pmackinn>
Status: CLOSED CURRENTRELEASE QA Contact: Jan Sarenik <jsarenik>
Severity: high Docs Contact:
Priority: high    
Version: betaCC: jsarenik, matt, mkudlej
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-19 15:07:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pete MacKinnon 2010-09-13 17:40:17 UTC
Currently observed on mrg31. Running the job server directly (-t -f) creates them but a service restart doesn't. Logs indicate that the jobs are being created.

Suspect a possible global init problem somewhere in the code or race in Job object construction...?

Comment 1 Pete MacKinnon 2010-09-13 17:41:33 UTC
Will need to debug directly on mrg31, difficult to reproduce on FC13 local environment.

Comment 2 Pete MacKinnon 2010-09-13 22:02:12 UTC
Believed to be due to a race between the Reset called by the JobLogConsumer that erases g_jobs and g_submissions, and loading of same by the history processing.

Comment 3 Pete MacKinnon 2010-09-14 02:52:50 UTC
FH sha a81362f

Code has been reworked to account for a Reset of the JobLogConsumer in the event of a PROBE_ERROR. Note that these resets are infrequent but a QMF interaction is currently not blocked by any locking of these volatile job and submission data structures. Thus, there is a non-zero chance of SEGV until mutexes are added.