Hide Forgot
Description of problem: Currently, when user calls 'service vdsmd stop' on host that happens to be SPM, whole Datacenter becomes unmanageable, because the SPM role it not released until service restart. While this is by no means desired scenario, it may happen and in such circumstances, overall resilience of whole RHEV setup is much improved. The scope of the bug may be extended to handle host shutdown - i.e. during the installation, extend the period from shutdown call to sending 15 and 9 to all processes so there is enough time to migrate/pause/gracefully shutdown VMs, move/save/revert async tasks before VDSM exits. Version-Release number of selected component (if applicable): vdsm-4.9-112.6.el6_2.x86_64 How reproducible: always Steps to Reproduce: 1. on running SPM host, stop vdsm service 2. 3. Actual results: * host keeps SPM role * host status in RHEV-M goes to Connecting and then Non Responsive * Data Center status in RHEV-M goes to Non Responsive and then Non Operational Expected results: * vdsm releases SPM role before it exists * host is marked Non Responsive immediately * resources that are in use by host's vdsm are marked in unknown - "question mark" state to prevent their damage by other hosts Additional info: The question is what to do with HA VMs in such case, they are likely to keep running & working unless the host reboots.
Since RHEL 6.3 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development. This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.
Please try to reproduce this again with current release, and provide engine and vdms logs.
I tried 3.3 in various parts of cycle and my impression is that workaround were applied throughout the stack such as vdsm service (re)start over ssh. All of those do not resolve however the simple case of issuing "shutdown -h now" on SPM host - the host never lets know to engine or peers that it's going down so SPM status remains forever.