Bug 708435

Summary: QMF Job Server returning empty/bad strings from live jobs
Product: Red Hat Enterprise MRG Reporter: Pete MacKinnon <pmackinn>
Component: condor-qmfAssignee: Pete MacKinnon <pmackinn>
Status: CLOSED CURRENTRELEASE QA Contact: Jan Sarenik <jsarenik>
Severity: urgent Docs Contact:
Priority: urgent    
Version: DevelopmentCC: iboverma, jneedle, jsarenik, matt
Target Milestone: 2.0   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: condor-7.6.1-0.10 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-27 14:28:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Fix to dupe strings when collecting job summaries
none
Additional fix for GetStatus string deletion leading to crash none

Description Pete MacKinnon 2011-05-27 16:02:06 UTC
After the valgrind memleak cleanup, it would seem that we are deleting string memory before it gets properly assigned to a string field in the QMF object for transmission.

Discovered in condor-qmf-7.6.1-0.6.el5.

Only applies to jobs that are still in the queue, not the recorded history jobs.

Bad (note JobStatus):
{u'ProcId': 99, u'Args': '""', u'CurrentTime': 1306511177, u'QDate': 1306504702, u'Cmd': '""', u'ClusterId': 3, u'JobStatus': 1, u'EnteredCurrentStatus': 1306504703, u'GlobalJobId': '\x0b'}

Good (note JobStatus):
{u'ProcId': 0, u'Args': '$$([15 + random(31)])', u'Submission': 'slanina.brq.redhat.com#3', u'CurrentTime': 1306511177, u'QDate': 1306504702, u'Cmd': '/bin/sleep', u'ClusterId': 3, u'JobStatus': 4, u'Owner': 'test', u'EnteredCurrentStatus': 1306509511, u'GlobalJobId': 'slanina.brq.redhat.com#3.0#1306504702'},

Comment 1 Pete MacKinnon 2011-05-27 22:59:53 UTC
Created attachment 501425 [details]
Fix to dupe strings when collecting job summaries

Checked in upstream at UW 7ac3ef7a9d638
format-patch from FH master diff

Comment 2 Pete MacKinnon 2011-05-31 19:52:07 UTC
Created attachment 502086 [details]
Additional fix for GetStatus string deletion leading to crash

Comment 4 Pete MacKinnon 2011-06-03 13:02:36 UTC
Test procedure:

1) submit new job (either via qmf or cmd line)
2) use qpid-tool to get the submission summary while the job is still active (i.e., not COMPLETED or REMOVED) -> "call XXX GetJobSummaries"
3) confirm that the job server doesn't crash
4) confirm that the string values in the summary are correct

Comment 5 Jan Sarenik 2011-06-03 13:13:26 UTC
Verified using Cumin on RHEL5.6 x86_64 with following packages
  condor-7.6.1-0.10.el5
  condor-qmf-7.6.1-0.10.el5
  cumin-0.1.4794-1.el5
  qpid-cpp-server-0.10-7.el5
  qpid-qmf-0.10-10.el5

Will check on i386 and RHEL6 soon, but I consider it being well fixed
and working already. Thanks!

Comment 6 Jan Sarenik 2011-06-03 13:15:35 UTC
qpid: list submission
Object Summary:
    ID   Created   Destroyed  Index
    ====================================================
    475  12:58:51  -          slanina.brq.redhat.com#1
    476  13:01:11  -          slanina.brq.redhat.com#2
qpid: call 475 GetJobSummaries
qpid: OK (0) - {u'Jobs': [{u'ProcId': 0, u'Args': '20m', u'CurrentTime': 1307106917, u'QDate': 1307105925, u'Cmd': '/bin/sleep', u'ClusterId': 1, u'JobStatus': 2, u'EnteredCurrentStatus': 1307105927, u'GlobalJobId': 'slanina.brq.redhat.com#1.0#1307105925'}]}

Comment 7 Jan Sarenik 2011-06-03 13:32:38 UTC
Verified on RHEL5.6 i386

Comment 8 Jan Sarenik 2011-06-03 14:16:36 UTC
On RHEL6.1 both x86_64 and i386 I do not see submissions in Cumin,
but that might be an error elsewhere. Otherwise qpid-tool shows
everything as expected.

Comment 9 Jan Sarenik 2011-06-03 14:21:30 UTC
No, above (comment #8) was a configuration error.
Everything works on RHEL6 and both the submissions and the jobs
appear in Cumin as well. Really. Have a nice weekend! :)

Comment 10 Jan Sarenik 2011-06-06 08:15:29 UTC
*** Bug 707911 has been marked as a duplicate of this bug. ***