Description of problem: -------- Symptom -------- After creating or deleting a service and propagating the updated configuration, the Cluster Management window does not get refreshed automatically with the changed configuration. This symptom occurs after a traceback is caused by trying to enable, disable, or restart a service. The traceback occurs intermittently. See traceback output in "Additional info". The display *does* get refreshed after changing the state of a service (for example, enabling, disabling, or restarting a service). However, sometimes the node information in the "Members" display vanishes and cannot be refreshed. ----------- Workaround ----------- Restart system-config-cluster or run clustat to view status. Version-Release number of selected component (if applicable): system-config-cluster 1.0.12 How reproducible: Steps to Reproduce: 1. At the Cluster Configuration window, create or delete a service. Save and propagate the configuration file. 2. At the Cluster Management window, observe that the display reflects the changes resulting from creating or deleting a service. 3, At the Cluster Management window, and while observing the command line, change the state of a service (for example, disable and enable a service) several times until a traceback occurs. 4. At the Cluster Configuration window, create or delete a service. Save and propagate the configuration file. 5. At the Cluster Management window, observe that the changes made in the previous step are not refreshed. In addition, sometimes the nodes are not displayed in the "Members" display box. 6. At the Cluster Management window, change the state of a service (for example, disable, enable, or restart a service. 7. Observe that the display gets refreshed after changing the state of a service in step 5. However, if the node display was lost (in step 5), the node display does not get refreshed. It remains blank. Actual results: Expected results: Additional info: ----------------- Traceback output ----------------- In this example, the traceback happened after enabling and disabling the service "Mr. Slate" twice. Other times may require more state changes. Member tng3-3 trying to enable Mr. Slate...failed Member tng3-3 disabling Mr. Slate...success Member tng3-3 trying to enable Mr. Slate...failed Member tng3-3 disabling Mr. Slate...success rhpl.executil waitpid: No child processes Traceback (most recent call last): File "/usr/share/system-config-cluster/MgmtTab.py", line 232, in onTimer self.prep_tree() File "/usr/share/system-config-cluster/MgmtTab.py", line 182, in prep_tree nodes = self.command_handler.getNodesInfo(self.model_builder.getLockType()) File "/usr/share/system-config-cluster/CommandHandler.py", line 253, in getNodesInfo out,err,res = rhpl.executil.execWithCaptureErrorStatus("/sbin/cman_tool",args) File "/usr/lib/python2.3/site-packages/rhpl/executil.py", line 267, in execWithCaptureErrorStatus if os.WIFEXITED(status) and (os.WEXITSTATUS(status) == 0): UnboundLocalError: local variable 'status' referenced before assignment
Fixed in Errata Candidate
I have still hit this assert a couple of times today while playing around with starting and stoping and moving around services. I can't pinpoint what exactly causes this though because it usually works. Spurious OS signal error in waitpid attempt Traceback (most recent call last): File "/usr/share/system-config-cluster/MgmtTab.py", line 232, in onTimer self.prep_tree() File "/usr/share/system-config-cluster/MgmtTab.py", line 182, in prep_tree nodes = self.command_handler.getNodesInfo(self.model_builder.getLockType()) File "/usr/share/system-config-cluster/CommandHandler.py", line 252, in getNodesInfo out,err,res = executil.execWithCaptureErrorStatus("/sbin/cman_tool",args) File "/usr/share/system-config-cluster/executil.py", line 20, in execWithCaptureErrorStatus return __execWithCaptureErrorStatus(BASH_PATH, [BASH_PATH, '-c', command]) File "/usr/share/system-config-cluster/executil.py", line 91, in __execWithCaptureErrorStatus (pid, status) = os.waitpid(childpid, 0) File "/usr/share/system-config-cluster/ForkedCommand.py", line 133, in serviceSignalHandler if(reaped == EXT_PID): UnboundLocalError: local variable 'reaped' referenced before assignment
Fixed in 1.0.16
Not sure if this is the exact same bug but I was doing the same senario (playing around with serivces; start and stopping...) and I hit this similar traceback: Traceback (most recent call last): File "/usr/share/system-config-cluster/MgmtTab.py", line 234, in onTimer self.prep_service_tree() File "/usr/share/system-config-cluster/MgmtTab.py", line 205, in prep_service_tree services = self.command_handler.getServicesInfo() File "/usr/share/system-config-cluster/CommandHandler.py", line 327, in getServicesInfo out,err,res = executil.execWithCaptureErrorStatus(clustat_path,args) File "/usr/share/system-config-cluster/executil.py", line 20, in execWithCaptureErrorStatus return __execWithCaptureErrorStatus(BASH_PATH, [BASH_PATH, '-c', command]) File "/usr/share/system-config-cluster/executil.py", line 72, in __execWithCaptureErrorStatus i,o,e = select.select(in_list, [], []) select.error: (4, 'Interrupted system call') If this is a different bug, let me know and I'll close this one and open a new one.
This traceback is a different bug, but since it also originates from the code that executes shell commands, it fits under the same umbrella.
Shouldn't python's select.select() take care of EINTR, and enter select() syscall again? See traceback in comment #4 Adding misa to CC I can check for EINTR in s-c-cluster; just wondering if exception on EINTR is expected behavior.
FYI: hit this again today while attempting to stop a running service.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-753.html
Why was this bug closed? There was never a final "fixed" message. Plus, I just hit this bug again.
Fixed in U3 errata build...
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0198.html