Description of problem: I was submitting VM jobs from cumin and at the same time other person was working on the pool. During our session we hit according to log file several of "Exception: Object '...' is unknown" --- 3972 2012-01-18 14:18:10,557 INFO Request GET /index.update?session=index.html%3Fframe%3Dmain.gri d%3Bmain.m%3Dgrid%3Bmain.grid.id%3D8%3Bmain.grid.view.body.m%3Dpool_submissions;widget=main.tasks ;widget=main.grid.view.body.pool_submissions.table 3972 2012-01-18 14:18:10,581 INFO Response 200 OK 3972 2012-01-18 14:18:14,863 ERROR Object 'rhel5i-xen-4' is unknown Traceback (most recent call last): File "/usr/share/cumin/python/cumin/model.py", line 729, in run self.store.update(cursor) File "/usr/share/cumin/python/cumin/model.py", line 786, in update self.machine_name) File "/usr/share/cumin/python/sage/qmf/qmfoperations.py", line 230, in get_job_summaries return self._call(submission, "GetJobSummaries", callback, 0, 0) File "/usr/share/cumin/python/sage/qmf/qmfoperations.py", line 236, in _call self.session.call_method(cb, obj, meth, args) File "/usr/share/cumin/python/cumin/session.py", line 109, in call_method raise Exception("Object '%s' is unknown" % object_id) Exception: Object 'rhel5i-xen-4' is unknown 3972 2012-01-18 14:18:20,407 INFO Request GET /index.update?session=index.html%3Fframe%3Dmain.gri d%3Bmain.m%3Dgrid%3Bmain.grid.id%3D8%3Bmain.grid.view.body.m%3Dpool_submissions;widget=main.tasks ;widget=main.grid.view.body.pool_submissions.table 3972 2012-01-18 14:18:20,425 INFO Response 200 OK --- And one "Exception: Agent disconnected" --- 3972 2012-01-18 14:07:57,155 INFO Request GET /index.update?session=index.html%3Fframe%3Dmain.gri d.submission%3Bmain.m%3Dgrid%3Bmain.grid.m%3Dsubmission%3Bmain.grid.id%3D4%3Bmain.grid.view.body. m%3Dpool_submissions%3Bmain.grid.submission.m%3Dview%3Bmain.grid.submission.id%3D11;widget=main.t asks;widget=main.grid.submission.view.body.jobs.table 3972 2012-01-18 14:07:57,337 INFO Response 200 OK 3972 2012-01-18 14:08:30,155 ERROR Agent disconnected Traceback (most recent call last): File "/usr/share/cumin/python/cumin/model.py", line 729, in run self.store.update(cursor) File "/usr/share/cumin/python/cumin/model.py", line 786, in update self.machine_name) File "/usr/share/cumin/python/sage/qmf/qmfoperations.py", line 230, in get_job_summaries return self._call(submission, "GetJobSummaries", callback, 0, 0) File "/usr/share/cumin/python/sage/qmf/qmfoperations.py", line 236, in _call self.session.call_method(cb, obj, meth, args) File "/usr/share/cumin/python/cumin/session.py", line 104, in call_method qmf_objs = agent.getObjects(_objectId=oid) File "/usr/lib/python2.4/site-packages/qmf/console.py", line 3277, in getObjects raise Exception(context.exception) Exception: Agent disconnected 3972 2012-01-18 14:08:40,849 INFO Expired 0 client sessions 3972 2012-01-18 14:08:53,705 INFO Request GET /index.update?session=index.html%3Fframe%3Dmain.grid.submission%3Bmain.m%3Dgrid%3Bmain.grid.m%3Dsubmission%3Bmain.grid.id%3D4%3Bmain.grid.view.body.m%3Dpool_submissions%3Bmain.grid.submission.m%3Dview%3Bmain.grid.submission.id%3D11;widget=main.tasks;widget=main.grid.submission.view.body.jobs.table --- Version-Release number of selected component (if applicable): cumin-0.1.5184-1.el5.noarch How reproducible: 10% Steps to Reproduce: 1. Work with cumin and simultaneously with condor pool Actual results: Condor generates several exception/tracebacks and errors. Expected results: Cumin handles problems on the pool without errors. Additional info: Maybe related to Bug 760567 Cumin web interface wasn't vissibly affected.
Fixed in revision 5218. All log.exception(e) calls have been changed to the form log.debug(msg, exc_info=True). This will remove all exception traces from log files unless the logging level is set to debug. None of these traces was useful to users in general, and in all cases the exception was already being handled (there were no cases of log.exception followed by a raise statement).
imho, it is very difficult to do any testing around this beyond a careful code review of a diff against the previous revision. There is no way to inject exceptions in all of the various different places to test all execution paths without editing the code as a white box test to raise an exception in each affected block.
According to comment 3 I did a code review of the diff between revision 5217 and 5218 of cumin. All log.exception calls have been changed to log.debug calls or something more suitable. --> VERIFIED