| Summary: | QMF Job Server returning empty/bad strings from live jobs | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Pete MacKinnon <pmackinn> | ||||||
| Component: | condor-qmf | Assignee: | Pete MacKinnon <pmackinn> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Jan Sarenik <jsarenik> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | Development | CC: | iboverma, jneedle, jsarenik, matt | ||||||
| Target Milestone: | 2.0 | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | condor-7.6.1-0.10 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2011-06-27 14:28:39 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Created attachment 501425 [details]
Fix to dupe strings when collecting job summaries
Checked in upstream at UW 7ac3ef7a9d638
format-patch from FH master diff
Created attachment 502086 [details]
Additional fix for GetStatus string deletion leading to crash
Test procedure: 1) submit new job (either via qmf or cmd line) 2) use qpid-tool to get the submission summary while the job is still active (i.e., not COMPLETED or REMOVED) -> "call XXX GetJobSummaries" 3) confirm that the job server doesn't crash 4) confirm that the string values in the summary are correct Verified using Cumin on RHEL5.6 x86_64 with following packages condor-7.6.1-0.10.el5 condor-qmf-7.6.1-0.10.el5 cumin-0.1.4794-1.el5 qpid-cpp-server-0.10-7.el5 qpid-qmf-0.10-10.el5 Will check on i386 and RHEL6 soon, but I consider it being well fixed and working already. Thanks! qpid: list submission
Object Summary:
ID Created Destroyed Index
====================================================
475 12:58:51 - slanina.brq.redhat.com#1
476 13:01:11 - slanina.brq.redhat.com#2
qpid: call 475 GetJobSummaries
qpid: OK (0) - {u'Jobs': [{u'ProcId': 0, u'Args': '20m', u'CurrentTime': 1307106917, u'QDate': 1307105925, u'Cmd': '/bin/sleep', u'ClusterId': 1, u'JobStatus': 2, u'EnteredCurrentStatus': 1307105927, u'GlobalJobId': 'slanina.brq.redhat.com#1.0#1307105925'}]}
Verified on RHEL5.6 i386 On RHEL6.1 both x86_64 and i386 I do not see submissions in Cumin, but that might be an error elsewhere. Otherwise qpid-tool shows everything as expected. No, above (comment #8) was a configuration error. Everything works on RHEL6 and both the submissions and the jobs appear in Cumin as well. Really. Have a nice weekend! :) *** Bug 707911 has been marked as a duplicate of this bug. *** |
After the valgrind memleak cleanup, it would seem that we are deleting string memory before it gets properly assigned to a string field in the QMF object for transmission. Discovered in condor-qmf-7.6.1-0.6.el5. Only applies to jobs that are still in the queue, not the recorded history jobs. Bad (note JobStatus): {u'ProcId': 99, u'Args': '""', u'CurrentTime': 1306511177, u'QDate': 1306504702, u'Cmd': '""', u'ClusterId': 3, u'JobStatus': 1, u'EnteredCurrentStatus': 1306504703, u'GlobalJobId': '\x0b'} Good (note JobStatus): {u'ProcId': 0, u'Args': '$$([15 + random(31)])', u'Submission': 'slanina.brq.redhat.com#3', u'CurrentTime': 1306511177, u'QDate': 1306504702, u'Cmd': '/bin/sleep', u'ClusterId': 3, u'JobStatus': 4, u'Owner': 'test', u'EnteredCurrentStatus': 1306509511, u'GlobalJobId': 'slanina.brq.redhat.com#3.0#1306504702'},