Description of problem: I was playing with changing dynamic quota... I probably try to set 100% and nothing changed, so I click again to change and this error occurs. Condor setup: condor_config_val -dump > condor_config.txt (see attachment) Cumin web log: cp /var/log/cumin/web.log cumin_web.txt (see attachment) Error mesage from browser: http://s6x:45672/form.html?modes.m=NegotiatorEditDynamicQuota_66392912;modes.NegotiatorEditDynamicQuota_66392912.return=index.html%3Fframe%3Dmain.grid%3Bmain.m%3Dgrid%3Bmain.grid.id%3D1%3Bmain.grid.view.body.m%3Dquotas%3Bmain.grid.view.body.quotas.expand%3Dgroup_chemistry;modes.NegotiatorEditDynamicQuota_66392912.id=1;modes.NegotiatorEditDynamicQuota_66392912.neg=1;modes.NegotiatorEditDynamicQuota_66392912.group_leader=group_chemistry.lab1 Traceback (most recent call last): File "/usr/share/cumin/python/wooly/wsgiserver/__init__.py", line 1250, in communicate req.respond() File "/usr/share/cumin/python/wooly/wsgiserver/__init__.py", line 663, in respond self._respond() File "/usr/share/cumin/python/wooly/wsgiserver/__init__.py", line 669, in _respond response = self.wsgi_app(env, self.start_response) File "/usr/share/cumin/python/wooly/server.py", line 83, in service_request status, headers, content = self.service_page_request(page, env) File "/usr/share/cumin/python/wooly/server.py", line 130, in service_page_request content = page.service_error(session) File "/usr/share/cumin/python/wooly/__init__.py", line 484, in service_error return self.render(session) File "/usr/share/cumin/python/wooly/__init__.py", line 314, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 328, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 90, in render result = frag(self.__object, session, *args) File "/usr/share/cumin/python/cumin/widgets.py", line 1290, in render_content writer.write(self.modes.render(session)) File "/usr/share/cumin/python/wooly/__init__.py", line 314, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 328, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 90, in render result = frag(self.__object, session, *args) File "/usr/share/cumin/python/wooly/widgets.py", line 34, in render_content return mode.render(session) File "/usr/share/cumin/python/wooly/__init__.py", line 314, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 328, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 95, in render result = frag.render(session) File "/usr/share/cumin/python/wooly/__init__.py", line 314, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 328, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 90, in render result = frag(self.__object, session, *args) File "/usr/share/cumin/python/cumin/stat.py", line 217, in render_img_href return super(StatFlashChart, self).render_href(session) File "/usr/share/cumin/python/cumin/stat.py", line 176, in render_href params = self.get_href_params(session) File "/usr/share/cumin/python/cumin/grid/negotiator.py", line 390, in get_href_params names = ["name=%s" % x for x in groups if info[x]["GROUP_QUOTA_DYNAMIC"] != ""] KeyError: 'GROUP_QUOTA_DYNAMIC' Version-Release number of selected component (if applicable): MRG 2.1 cumin-0.1.5098-2.el6.noarch condor-7.6.5-0.8.el6.x86_64 How reproducible: I'm not sure. Steps to Reproduce: I'm not sure. Actual results: Cumin generates error. Expected results: Cumin runs without error. Additional info:
Created attachment 541412 [details] condor config
Created attachment 541413 [details] cumin web.log errors
I'm seeing dozens of the following error in the web.log file. I'm not sure if this is indicating a config problem, or perhaps something else is going wrong. 24570 2011-12-06 14:07:44,295 ERROR CallSync: call returned with status of Unknown config: GROUP_QUOTA_DYNAMIC_group_chemistry 24570 2011-12-06 14:07:44,468 ERROR CallSync: call returned with status of Unknown config: GROUP_QUOTA_DYNAMIC_group_physics 24570 2011-12-06 15:05:02,901 ERROR CallSync: call returned with status of Unknown config: GROUP_QUOTA_group_physics.lab1 24570 2011-12-06 15:05:03,002 ERROR CallSync: call returned with status of Unknown config: GROUP_QUOTA_group_physics.lab2 24570 2011-12-06 15:05:03,155 ERROR CallSync: call returned with status of Unknown config: GROUP_QUOTA_group_physics.lab3 24570 2011-12-06 15:05:03,311 ERROR CallSync: call returned with status of Unknown config: GROUP_QUOTA_group_chemistry.lab1 24570 2011-12-06 15:05:03,481 ERROR CallSync: call returned with status of Unknown config: GROUP_QUOTA_group_chemistry.lab2 Later on in the log file, I'm seeing a few instances of this exception, which is also surprising: 24570 2011-12-06 14:52:20,808 ERROR Request timed out after 60 seconds Traceback (most recent call last): File "/usr/share/cumin/python/cumin/model.py", line 729, in run self.store.update(cursor) File "/usr/share/cumin/python/cumin/model.py", line 809, in update completion) File "/usr/share/cumin/python/sage/qmf/qmfoperations.py", line 214, in get_raw_config return self._call(negotiator, "GetRawConfig", callback, 0, 0, name) File "/usr/share/cumin/python/sage/qmf/qmfoperations.py", line 236, in _call self.session.call_method(cb, obj, meth, args) File "/usr/share/cumin/python/cumin/session.py", line 104, in call_method qmf_objs = agent.getObjects(_objectId=oid) File "/usr/lib64/python2.6/site-packages/qmf/console.py", line 3277, in getObjects raise Exception(context.exception) I'm not sure what to make of this yet.
Thought this might be related to making quota editing dynamic (739658), but I did my best to overdrive the system and couldn't get a stack trace. I'll try to synthesize some exceptions in some of the places above and see if I can figure out what happened. We may just need an exception handler in negotiator.py.
I was able to hit this again... Use configuration example from GUG: Hierarchical Fair Share (HFS) ----- GROUP_NAMES = group_physics, group_chemistry, group_physics.lab1, group_physics.lab2, group_physics.lab3, group_physics.lab3.team1, group_physics.lab3.team2, group_physics.lab3.team3, group_chemistry.lab1, group_chemistry.lab2 GROUP_QUOTA_DYNAMIC_group_physics = .4 GROUP_QUOTA_DYNAMIC_group_chemistry = .4 GROUP_QUOTA_DYNAMIC_group_chemistry.lab1 = .4 GROUP_QUOTA_DYNAMIC_group_chemistry.lab2 = .6 GROUP_QUOTA_DYNAMIC_group_physics.lab1= .2 GROUP_QUOTA_DYNAMIC_group_physics.lab2= .2 GROUP_QUOTA_DYNAMIC_group_physics.lab3= .6 GROUP_QUOTA_DYNAMIC_group_physics.lab3.team1 = .2 GROUP_QUOTA_DYNAMIC_group_physics.lab3.team2 = .2 GROUP_QUOTA_DYNAMIC_group_physics.lab3.team3 = .4 ----- And then stress cumin whit constant changing quota, while 'Pending' from previous change is still there. This time cumin was OK and error happend while I was changing quota in popup window on the background of original cumin window, when I change quota, window refreshed ok. ----- 12392 2012-01-13 10:39:05,853 INFO Request GET /index.update?session=index.html%3Fframe%3Dmain.gr id%3Bmain.m%3Dgrid%3Bmain.grid.id%3D1%3Bmain.grid.view.body.m%3Dquotas;widget=main.tasks;widget=m ain.grid.view.body.quotas.table 12392 2012-01-13 10:39:05,871 INFO Response 200 OK 12392 2012-01-13 10:39:09,697 ERROR Request timed out after 60 seconds Traceback (most recent call last): File "/usr/share/cumin/python/cumin/model.py", line 669, in execute remote.get_raw_config(negotiator, prepend+group, sync.get_completion()) File "/usr/share/cumin/python/sage/qmf/qmfoperations.py", line 214, in get_raw_config return self._call(negotiator, "GetRawConfig", callback, 0, 0, name) File "/usr/share/cumin/python/sage/qmf/qmfoperations.py", line 236, in _call self.session.call_method(cb, obj, meth, args) File "/usr/share/cumin/python/cumin/session.py", line 104, in call_method qmf_objs = agent.getObjects(_objectId=oid) File "/usr/lib64/python2.4/site-packages/qmf/console.py", line 3277, in getObjects raise Exception(context.exception) Exception: Request timed out after 60 seconds 12392 2012-01-13 10:39:09,769 ERROR Request timed out after 60 seconds Traceback (most recent call last): File "/usr/share/cumin/python/cumin/model.py", line 669, in execute remote.get_raw_config(negotiator, prepend+group, sync.get_completion()) File "/usr/share/cumin/python/sage/qmf/qmfoperations.py", line 214, in get_raw_config return self._call(negotiator, "GetRawConfig", callback, 0, 0, name) File "/usr/share/cumin/python/sage/qmf/qmfoperations.py", line 236, in _call self.session.call_method(cb, obj, meth, args) File "/usr/share/cumin/python/cumin/session.py", line 104, in call_method qmf_objs = agent.getObjects(_objectId=oid) File "/usr/lib64/python2.4/site-packages/qmf/console.py", line 3277, in getObjects raise Exception(context.exception) Exception: Request timed out after 60 seconds 12392 2012-01-13 10:39:09,788 ERROR CallSync: call returned with status of Unknown config: GROUP_ QUOTA_group_physics 12392 2012-01-13 10:39:10,809 INFO Request GET /index.update?session=index.html%3Fframe%3Dmain.gr id%3Bmain.m%3Dgrid%3Bmain.grid.id%3D1%3Bmain.grid.view.body.m%3Dquotas;widget=main.tasks;widget=m ain.grid.view.body.quotas.table 12392 2012-01-13 10:39:10,827 INFO Response 200 OK ----- Version: cumin-0.1.5184-1.el5.noarch Notes: I saved broken machine for further examining. I can probably reproduce this again. Cumin wasn't patched for delay to see pending. This is original build.
When you say that you "stress" cumin while reproducing this, does that mean that you were running additioal load against it (via some automated tool)? Or were you just editing quotas several times in a row rapidly?
Cumin was running on the same machine as condor - VM guest with 512MB ram, 2 cores. I was just editing quotas several times in a row rapidly. Between my browser and guest was slow VPN. No other tasks was performed on guest during this test.
- CallSync ERROR is not regression (reproduced on older version too, moved into Bug 782359 - python exception is probably regression introduced by Bug 739658, but cannot be easily reproduced
Splitting this into multiple issues, changing the title. 1) KeyError, the original report on this BZ. Cumin needs to deal with the case of empty quota data gracefully. 2) The exception from timeout listed above, this is a separate exception 3) A BZ against the condor plugins investigating why we receive "Uknown config" messages. Assigning this to croberts, but tmckay is working on this as well...
(In reply to comment #10) > Splitting this into multiple issues, changing the title. > 3) A BZ against the condor plugins investigating why we receive "Uknown config" > messages. > This is actually https://bugzilla.redhat.com/show_bug.cgi?id=782359 as noted by sgraf above.
Update, looks like we really are dealing with only a single issue. (In reply to comment #10) > Splitting this into multiple issues, changing the title. > > 1) KeyError, the original report on this BZ. Cumin needs to deal with the case > of empty quota data gracefully. This is a real issue, still investigating > 2) The exception from timeout listed above, this is a separate exception > The timeout exception is handled gracefully and does not cause a screen trace. It is only noted in the logs, and the user is notified of the timeout. > 3) A BZ against the condor plugins investigating why we receive "Uknown config" > messages. > See the noted BZ. This is apparently due to a limitation in the data available to cumin at this time and is expected. The errors in the log file are troublesome, but expected. > Assigning this to croberts, but tmckay is working on this as well...
Possible that this is another manifestation of empty group data? Got this trace after repeatedly modifying grid.one quota Traceback (most recent call last): File "/usr/share/cumin/python/wooly/wsgiserver/__init__.py", line 1250, in communicate req.respond() File "/usr/share/cumin/python/wooly/wsgiserver/__init__.py", line 663, in respond self._respond() File "/usr/share/cumin/python/wooly/wsgiserver/__init__.py", line 669, in _respond response = self.wsgi_app(env, self.start_response) File "/usr/share/cumin/python/wooly/server.py", line 83, in service_request status, headers, content = self.service_page_request(page, env) File "/usr/share/cumin/python/wooly/server.py", line 130, in service_page_request content = page.service_error(session) File "/usr/share/cumin/python/wooly/__init__.py", line 484, in service_error return self.render(session) File "/usr/share/cumin/python/wooly/__init__.py", line 314, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 328, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 90, in render result = frag(self.__object, session, *args) File "/usr/share/cumin/python/cumin/widgets.py", line 1290, in render_content writer.write(self.modes.render(session)) File "/usr/share/cumin/python/wooly/__init__.py", line 314, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 328, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 90, in render result = frag(self.__object, session, *args) File "/usr/share/cumin/python/wooly/widgets.py", line 34, in render_content return mode.render(session) File "/usr/share/cumin/python/wooly/__init__.py", line 314, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 328, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 95, in render result = frag.render(session) File "/usr/share/cumin/python/wooly/__init__.py", line 314, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 328, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 90, in render result = frag(self.__object, session, *args) File "/usr/share/cumin/python/cumin/stat.py", line 217, in render_img_href return super(StatFlashChart, self).render_href(session) File "/usr/share/cumin/python/cumin/stat.py", line 176, in render_href params = self.get_href_params(session) File "/usr/share/cumin/python/cumin/grid/negotiator.py", line 390, in get_href_params names = ["name=%s" % x for x in groups if info[x]["GROUP_QUOTA_DYNAMIC"] != ""] KeyError: 'GROUP_QUOTA_DYNAMIC'
This seems to be very intermittent. Tried again, and could not reproduce any abnormal behavior. Previously, I was able to see the error multiple times over a few hours. This may have an environmental component. Recommend patching cumin in general to handle empty group data with a notice to the user and a retry to get the config data.
Changing to public.
Fixed in revision 5379 on trunk.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Occasionally the group information could be None when the call to the Negotiator was made, the cumin code in place was not able to handle that. Consequence: Stack trace on screen Fix: The fix was a natural fallout of switching to the non-flash-based charting. The new mechanism has a try/except block that would log an error if this condition ever came up again, but should not show a stack trace on screen. Result: Quota editing should be free of stack traces on screen again.
I have accidentally found a very similar bug: Steps to reproduce: 1. Go to Quotas page 2. Let the page expire 3. After it expired, click on a displayed dynamic quota value to edit it 4. Redirected to login page (url=http://<ip>:45672/login.html?origin=form.html%3Fmodes.m%3DNegotiatorEditDynamicQuota_174437580%3Bmodes.NegotiatorEditDynamicQuota_174437580.return%3Dindex.html%253Fframe%253Dmain.grid%253Bmain.grid.id%253D12%253Bmain.grid.view.body.m%253Dquotas%3Bmodes.NegotiatorEditDynamicQuota_174437580.id%3D12%3Bmodes.NegotiatorEditDynamicQuota_174437580.neg%3D29%3Bmodes.NegotiatorEditDynamicQuota_174437580.group_leader%3Dgroup_chemistry) 5. After logging in we get to page showing the traceback (url=http://<ip>:45672/form.html?modes.m=NegotiatorEditDynamicQuota_174437580;modes.NegotiatorEditDynamicQuota_174437580.return=index.html%3Fframe%3Dmain.grid%3Bmain.grid.id%3D12%3Bmain.grid.view.body.m%3Dquotas;modes.NegotiatorEditDynamicQuota_174437580.id=12;modes.NegotiatorEditDynamicQuota_174437580.neg=29;modes.NegotiatorEditDynamicQuota_174437580.group_leader=group_chemistry) Traceback (most recent call last): File "/usr/share/cumin/python/wooly/wsgiserver/__init__.py", line 1250, in communicate req.respond() File "/usr/share/cumin/python/wooly/wsgiserver/__init__.py", line 663, in respond self._respond() File "/usr/share/cumin/python/wooly/wsgiserver/__init__.py", line 669, in _respond response = self.wsgi_app(env, self.start_response) File "/usr/share/cumin/python/wooly/server.py", line 104, in service_request status, headers, content = self.service_page_request(page, env) File "/usr/share/cumin/python/wooly/server.py", line 170, in service_page_request content = page.service_error(session) File "/usr/share/cumin/python/wooly/__init__.py", line 593, in service_error return self.render(session) File "/usr/share/cumin/python/wooly/__init__.py", line 348, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 362, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 90, in render result = frag(self.__object, session, *args) File "/usr/share/cumin/python/cumin/widgets.py", line 1423, in render_content writer.write(self.modes.render(session)) File "/usr/share/cumin/python/wooly/__init__.py", line 348, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 362, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 90, in render result = frag(self.__object, session, *args) File "/usr/share/cumin/python/wooly/widgets.py", line 34, in render_content return mode.render(session) File "/usr/share/cumin/python/wooly/__init__.py", line 348, in render string = self.do_render(session, *args) File "/usr/share/cumin/python/wooly/__init__.py", line 362, in do_render self.__main_tmpl.render(writer, session, *args) File "/usr/share/cumin/python/wooly/template.py", line 90, in render result = frag(self.__object, session, *args) File "/usr/share/cumin/python/cumin/grid/negotiator.py", line 439, in render_subshares_title subshares = self.group_helper.get_children(session, self.group_leader.get(session)) File "/usr/share/cumin/python/cumin/grid/negotiator.py", line 134, in get_children info[group]['children'] = list() KeyError: u'group_chemistry' (6. When I reloaded that page, I got to the "Edit Dynamic Group Quota" page showing an empty chart) I could reproduce this 100% on RHEL 5/6 i386/x86_64, with cumin-0.1.5658-1.el5 I could not decide whether this is the same bug or a different one. The traceback is similar, but not exactly the same, and the steps to reproduce are different as well, but the root cause might be the same thing. So first I just wanted to add this comment, so that we can decide whether to open a new bz or set this one back to assigned.
1. I have opened bug 908266 for the issue mentioned in comment 27. 2. Regarding this bug: I could not reproduce it and I don't know how I could verify that this is no longer an issue. Can you give us any suggestions or hints on testing?
Okay, here is the best I can do. It is a white box test using a common patch for cumin-0.1.5444-3 and the latest version. The patch forces GROUP_QUOTA_DYNAMIC to be deleted for a specific quota subshare in a routine that is common to both versions. This routine is called immediately before line 390 in negotiator.py which appears in the traceback in comment 1 and comment 13. The necessary corresponding quota config file is attached as well to set up the expected quotas in condor. Under cumin-0.1.5444-3, this will cause the KeyError 100% when the subshare or one of its siblings is edited (GRID.ONE or GRID.TWO). The traceback in comment 1 and comment 13 will be seen (or something very, very close as this was produced against an earlier cumin) In the latest version, editing will proceed without an error. In both cases, web.stdout will contain this message to confirm that the quota values were modified programmatically: Delete config GROUP_QUOTA_DYNAMIC for group coupling.normal Steps to reproduce on old cumin: 1) Install cumin-0.1.5444-3 and condor on the same machine 2) cd /usr/share/cumin 3) patch -p1 < BZ760567.patch 4) put the attached quota config file in /etc/condor/config.d/80quota.config to create the quotas. 5) service condor restart 6) service cumin restart 7) Navigate to the quota tab. 8) Notice that any of the top level groups can be edited 9) Expand the Grid group. Click on the link to edit Grid.One or Grid.Two 10) Boom! traceback Test against new Cumin: 1) Install latest cumin for testing 2) cd /usr/share/cumin 3) patch -p1 < BZ760567.patch 4) put the attached quota config file in /etc/condor/config.d/80quota.config to create the quotas. 5) service condor restart 6) service cumin restart 7) Navigate to the quota tab. 8) Notice that any of the top level groups can be edited 9) Expand the Grid group. Click on the link to edit Grid.One or Grid.Two 10) Everything is fine! 11) Examine web.stderr to confirm that the quota group was deleted. Steps to verify on both versions that the patch does not have unintended side effects beyond the test: 1) Set up either/both versions as above 2) Modify the /etc/condor/config.d/80quota.config file to set up dynamic quota groups with subshares but *CHANGE* the GRID group names to something else so the deletion code will not run 3) service condor restart 4) Edit dynamic quotas in cumin. Unless the original BZ is inadvertently stumbled upon in the early version, editing should be successful with no errors. Even if the error does occur on the older version, it should not be 100% reproducible as it is when the quota names match the patch. The error should *never* occur on the newer version. If a traceback is seen, please attach it as it is likely to be a new error.
Created attachment 694684 [details] Patch to introduce an intentional whitebox test error This patch is common to cumin-0.1.5444-3 and the latest versions under test
Created attachment 694685 [details] Quota config file to place in /etc/condor/config.d to set up test
Thank you very much for the whitebox test. Using the patch I was able to reproduce the bug on cumin-0.1.5444-3, the traceback was almost identical to the original one. After changing the group names it was working fine, so the patch did not have unintended side effects. For the new version I tested cumin-0.1.5675-1 on RHEL 5/6 i386/x86_64, everything was working as expected, I did not see any tracebacks, and the patch did not have unintended side effects. --> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html