Bug 799969

Summary: [RFE] Improve flow in engine when vdsm service is restarted on the SPM
Product: Red Hat Enterprise Virtualization Manager Reporter: David Jaša <djasa>
Component: ovirt-engineAssignee: Allon Mureinik <amureini>
Status: CLOSED WONTFIX QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: acathrow, amureini, bazulay, djasa, iheim, jkt, lpeer, nsoffer, Rhev-m-bugs, yeylon
Target Milestone: ---Keywords: FutureFeature, Improvement, Reopened
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-04 05:16:01 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description David Jaša 2012-03-05 09:16:28 EST
Description of problem:
Currently, when user calls 'service vdsmd stop' on host that happens to be SPM, whole Datacenter becomes unmanageable, because the SPM role it not released until service restart.

While this is by no means desired scenario, it may happen and in such circumstances, overall resilience of whole RHEV setup is much improved.

The scope of the bug may be extended to handle host shutdown - i.e. during the installation, extend the period from shutdown call to sending 15 and 9 to all processes so there is enough time to migrate/pause/gracefully shutdown VMs, move/save/revert async tasks before VDSM exits.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. on running SPM host, stop vdsm service
Actual results:
* host keeps SPM role
* host status in RHEV-M goes to Connecting and then Non Responsive
* Data Center status in RHEV-M goes to Non Responsive and then Non Operational

Expected results:
* vdsm releases SPM role before it exists
* host is marked Non Responsive immediately
* resources that are in use by host's vdsm are marked in unknown - "question mark" state to prevent their damage by other hosts

Additional info:
The question is what to do with HA VMs in such case, they are likely to keep running & working unless the host reboots.
Comment 1 RHEL Product and Program Management 2012-05-04 00:08:52 EDT
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 3 RHEL Product and Program Management 2012-07-10 04:55:31 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 4 RHEL Product and Program Management 2012-07-10 21:55:00 EDT
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Comment 5 Itamar Heim 2013-02-25 02:25:31 EST
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.
Comment 6 Nir Soffer 2013-09-23 10:16:01 EDT
Please try to reproduce this again with current release, and provide engine and vdms logs.
Comment 7 David Jaša 2014-04-14 07:29:06 EDT
I tried 3.3 in various parts of cycle and my impression is that workaround were applied throughout the stack such as vdsm service (re)start over ssh. All of those do not resolve however the simple case of issuing "shutdown -h now" on SPM host - the host never lets know to engine or peers that it's going down so SPM status remains forever.