Description of problem: If a node is killed while a service is in the state "stopping", it remains in this state even after the node comes back online. This only happens in the case that the service is part of a restricted failover domain that prevents the service being relocated elsewhere in the cluster; without this restriction the service will be correctly re-started on another node. Version-Release number of selected component (if applicable): rgmanager-1.9.68-1.0.1 How reproducible: 100% Steps to Reproduce: 1. Configure a cluster having a service that takes some time to start/stop (e.g. the attached service script takes 30s for start/stop). 2. Configure the service as a member of a restricted failover domain that will only allow the service to run on a single node. 3. While the service is running on that node, disable it and then kill power to the node while the service is still shutting down. Actual results: Service stays in the state "stopping" forever even after the node is powered up again. Expected results: Service restarts automatically once the killed node is back online. Additional info:
Created attachment 291124 [details] cluster.conf for a two-node cluster exhibiting this problem
Created attachment 291125 [details] test service script that is slow to start/stop
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Workaround exists. Disable the service.
This only occurs if there is no node capable of running the service, and is effectively the same bugzilla as 435466
https://bugzilla.redhat.com/show_bug.cgi?id=435466 (RHEL5 bug)
http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=63dc599a411ca2738ef7b12b08c0e05b7093fbf1
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1048.html