Hide Forgot
After the valgrind memleak cleanup, it would seem that we are deleting string memory before it gets properly assigned to a string field in the QMF object for transmission. Discovered in condor-qmf-7.6.1-0.6.el5. Only applies to jobs that are still in the queue, not the recorded history jobs. Bad (note JobStatus): {u'ProcId': 99, u'Args': '""', u'CurrentTime': 1306511177, u'QDate': 1306504702, u'Cmd': '""', u'ClusterId': 3, u'JobStatus': 1, u'EnteredCurrentStatus': 1306504703, u'GlobalJobId': '\x0b'} Good (note JobStatus): {u'ProcId': 0, u'Args': '$$([15 + random(31)])', u'Submission': 'slanina.brq.redhat.com#3', u'CurrentTime': 1306511177, u'QDate': 1306504702, u'Cmd': '/bin/sleep', u'ClusterId': 3, u'JobStatus': 4, u'Owner': 'test', u'EnteredCurrentStatus': 1306509511, u'GlobalJobId': 'slanina.brq.redhat.com#3.0#1306504702'},
Created attachment 501425 [details] Fix to dupe strings when collecting job summaries Checked in upstream at UW 7ac3ef7a9d638 format-patch from FH master diff
Created attachment 502086 [details] Additional fix for GetStatus string deletion leading to crash
Test procedure: 1) submit new job (either via qmf or cmd line) 2) use qpid-tool to get the submission summary while the job is still active (i.e., not COMPLETED or REMOVED) -> "call XXX GetJobSummaries" 3) confirm that the job server doesn't crash 4) confirm that the string values in the summary are correct
Verified using Cumin on RHEL5.6 x86_64 with following packages condor-7.6.1-0.10.el5 condor-qmf-7.6.1-0.10.el5 cumin-0.1.4794-1.el5 qpid-cpp-server-0.10-7.el5 qpid-qmf-0.10-10.el5 Will check on i386 and RHEL6 soon, but I consider it being well fixed and working already. Thanks!
qpid: list submission Object Summary: ID Created Destroyed Index ==================================================== 475 12:58:51 - slanina.brq.redhat.com#1 476 13:01:11 - slanina.brq.redhat.com#2 qpid: call 475 GetJobSummaries qpid: OK (0) - {u'Jobs': [{u'ProcId': 0, u'Args': '20m', u'CurrentTime': 1307106917, u'QDate': 1307105925, u'Cmd': '/bin/sleep', u'ClusterId': 1, u'JobStatus': 2, u'EnteredCurrentStatus': 1307105927, u'GlobalJobId': 'slanina.brq.redhat.com#1.0#1307105925'}]}
Verified on RHEL5.6 i386
On RHEL6.1 both x86_64 and i386 I do not see submissions in Cumin, but that might be an error elsewhere. Otherwise qpid-tool shows everything as expected.
No, above (comment #8) was a configuration error. Everything works on RHEL6 and both the submissions and the jobs appear in Cumin as well. Really. Have a nice weekend! :)
*** Bug 707911 has been marked as a duplicate of this bug. ***