Bug 159965

Summary: Cluster Management window does not refresh after configuration change and traceback
Product: [Retired] Red Hat Cluster Suite Reporter: Paul Kennedy <pkennedy>
Component: redhat-config-clusterAssignee: Jim Parsons <jparsons>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: adstrong, cluster-maint, jha, kupcevic, mihai.ibanescu
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0198 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-09 19:49:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Paul Kennedy 2005-06-09 19:30:58 UTC
Description of problem:

--------
Symptom
--------

After creating or deleting a service and propagating the updated configuration,
the Cluster Management window does not get refreshed automatically with the
changed configuration. This symptom occurs after a traceback is caused by trying
to enable, disable, or restart a service. The traceback occurs intermittently.
See traceback output in "Additional info".

The display *does* get refreshed after changing the state of a service (for
example, enabling, disabling, or restarting a service). However, sometimes the
node information in the "Members" display vanishes and cannot be refreshed.

-----------
Workaround
-----------
Restart system-config-cluster or run clustat to view status.

Version-Release number of selected component (if applicable):
system-config-cluster 1.0.12

How reproducible:


Steps to Reproduce:
1.  At the Cluster Configuration window, create or delete a
    service. Save and propagate the configuration file.
2.  At the Cluster Management window, observe that the display reflects 
    the changes resulting from creating or deleting a service.
3,  At the Cluster Management window, and while observing the command line,
    change the state of a service (for example, disable and enable a service)
    several times until a traceback occurs.
4.  At the Cluster Configuration window, create or delete a service. Save
    and propagate the configuration file.
5.  At the Cluster Management window, observe that the changes made in
    the previous step are not refreshed. In addition, sometimes the nodes
    are not displayed in the "Members" display box. 
6.  At the Cluster Management window, change the state of a service (for
    example, disable, enable, or restart a service.
7.  Observe that the display gets refreshed after changing the state of a
    service in step 5. However, if the node display was lost (in step 5), the
    node display does not get refreshed. It remains blank.

  
Actual results:


Expected results:


Additional info:

-----------------
Traceback output 
-----------------
In this example, the traceback happened after enabling and disabling the service
"Mr. Slate" twice. Other times may require more state changes. 

Member tng3-3 trying to enable Mr. Slate...failed
Member tng3-3 disabling Mr. Slate...success
Member tng3-3 trying to enable Mr. Slate...failed
Member tng3-3 disabling Mr. Slate...success
rhpl.executil waitpid: No child processes
Traceback (most recent call last):
  File "/usr/share/system-config-cluster/MgmtTab.py", line 232, in onTimer
    self.prep_tree()
  File "/usr/share/system-config-cluster/MgmtTab.py", line 182, in prep_tree
    nodes = self.command_handler.getNodesInfo(self.model_builder.getLockType())
  File "/usr/share/system-config-cluster/CommandHandler.py", line 253, in
getNodesInfo
    out,err,res =  rhpl.executil.execWithCaptureErrorStatus("/sbin/cman_tool",args)
  File "/usr/lib/python2.3/site-packages/rhpl/executil.py", line 267, in
execWithCaptureErrorStatus
    if os.WIFEXITED(status) and (os.WEXITSTATUS(status) == 0):
UnboundLocalError: local variable 'status' referenced before assignment

Comment 1 Stanko Kupcevic 2005-08-03 20:20:06 UTC
Fixed in Errata Candidate

Comment 2 Corey Marthaler 2005-09-07 19:03:44 UTC
I have still hit this assert a couple of times today while playing around with
starting and stoping and moving around services. I can't pinpoint what exactly
causes this though because it usually works. 

Spurious OS signal error in waitpid attempt
Traceback (most recent call last):
  File "/usr/share/system-config-cluster/MgmtTab.py", line 232, in onTimer
    self.prep_tree()
  File "/usr/share/system-config-cluster/MgmtTab.py", line 182, in prep_tree
    nodes = self.command_handler.getNodesInfo(self.model_builder.getLockType())
  File "/usr/share/system-config-cluster/CommandHandler.py", line 252, in
getNodesInfo
    out,err,res =  executil.execWithCaptureErrorStatus("/sbin/cman_tool",args)
  File "/usr/share/system-config-cluster/executil.py", line 20, in
execWithCaptureErrorStatus
    return __execWithCaptureErrorStatus(BASH_PATH, [BASH_PATH, '-c', command])
  File "/usr/share/system-config-cluster/executil.py", line 91, in
__execWithCaptureErrorStatus
    (pid, status) = os.waitpid(childpid, 0)
  File "/usr/share/system-config-cluster/ForkedCommand.py", line 133, in
serviceSignalHandler
    if(reaped == EXT_PID):
UnboundLocalError: local variable 'reaped' referenced before assignment

Comment 3 Stanko Kupcevic 2005-09-07 23:35:43 UTC
Fixed in 1.0.16

Comment 4 Corey Marthaler 2005-09-16 20:56:55 UTC
Not sure if this is the exact same bug but I was doing the same senario (playing
around with serivces; start and stopping...) and I hit this similar traceback:

Traceback (most recent call last):
  File "/usr/share/system-config-cluster/MgmtTab.py", line 234, in onTimer
    self.prep_service_tree()
  File "/usr/share/system-config-cluster/MgmtTab.py", line 205, in prep_service_tree
    services = self.command_handler.getServicesInfo()
  File "/usr/share/system-config-cluster/CommandHandler.py", line 327, in
getServicesInfo
    out,err,res =  executil.execWithCaptureErrorStatus(clustat_path,args)
  File "/usr/share/system-config-cluster/executil.py", line 20, in
execWithCaptureErrorStatus
    return __execWithCaptureErrorStatus(BASH_PATH, [BASH_PATH, '-c', command])
  File "/usr/share/system-config-cluster/executil.py", line 72, in
__execWithCaptureErrorStatus
    i,o,e = select.select(in_list, [], [])
select.error: (4, 'Interrupted system call')


If this is a different bug, let me know and I'll close this one and open a new one. 

Comment 5 Stanko Kupcevic 2005-09-19 16:37:51 UTC
This traceback is a different bug, but since it also originates from the code
that executes shell commands, it fits under the same umbrella. 

Comment 6 Stanko Kupcevic 2005-09-19 17:07:19 UTC
Shouldn't python's select.select() take care of EINTR, and enter select()
syscall again?

See traceback in comment #4

Adding misa to CC

I can check for EINTR in s-c-cluster; just wondering if exception on EINTR is
expected behavior. 


Comment 7 Corey Marthaler 2005-10-05 16:47:31 UTC
FYI: hit this again today while attempting to stop a running service.

Comment 8 Red Hat Bugzilla 2005-10-07 16:47:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-753.html


Comment 9 Corey Marthaler 2005-10-20 19:08:19 UTC
Why was this bug closed? There was never a final "fixed" message. 
Plus, I just hit this bug again.

Comment 10 Jim Parsons 2005-12-01 21:27:47 UTC
Fixed in U3 errata build...

Comment 13 Red Hat Bugzilla 2006-03-09 19:49:55 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0198.html