Description of problem: In a two node cluster setup, when trying to stop clustering using "service clumanager stop" or when shutting down the server, clumemd hangs pretty frequently and never times out. As a result, clumanager does not stop on the system on which this happens. The services failover to the passive system, but on the current system we have to kill the process manually to allow the system to shutdown or stop the clumanager. Version-Release number of selected component (if applicable): clumanager-1.2.9-1 How reproducible: very frequently - not always. Steps to Reproduce: 1.Setup clustering with two nodes - one active and one passive. 2.Stop clustering on the active system using "service clumanager stop" or through the redhat-config-cluster GUI interface. 3.Run ps -ef. 4.Three clumemd threads are waiting on something and never stop or timeout - the cumanager stop consequently hangs. Actual results: root 9876 1 0 10:38 ? 00:00:00 /usr/sbin/clumembd root 9877 9876 0 10:38 ? 00:00:00 /usr/sbin/clumembd root 9878 9877 0 10:38 ? 00:00:00 /usr/sbin/clumembd root 29812 14862 0 11:28 pts/4 00:00:00 /bin/sh /sbin/service clumanager stop root 29815 29812 0 11:28 pts/4 00:00:00 /bin/sh /etc/init.d/clumanager stop Expected results: clumemd process stops cleanly and shuts down clustering Additional info: strace outputs for the clumemd processes: #strace -p 9878 Process 9878 attached - interrupt to quit # strace -p 9877 Process 9877 attached - interrupt to quit waitpid(9878, # strace -p 9876 Process 9876 attached - interrupt to quit
This works for me (that is, I can't reproduce it). You'll need to visit: http://www.redhat.com/apps/support/ They'll assist you in gathering all the requisite information. Additionally, you may want to try download 1.2.12 from RHN.
I think this is related to: http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=126316
*** Bug 126316 has been marked as a duplicate of this bug. ***
This problem has not manifested for me. Marking Resolved, will go out with RHEL3-U4, clumanager-1.2.22-2.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-491.html
Fixing product name. Clumanager on RHEL3 was part of RHCS3, not RHEL3