Bug 799969 - [RFE] Improve flow in engine when vdsm service is restarted on the SPM
[RFE] Improve flow in engine when vdsm service is restarted on the SPM
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 3.6.0
Assigned To: Allon Mureinik
storage
: FutureFeature, Improvement, Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-05 09:16 EST by David Jaša
Modified: 2016-02-10 15:22 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-08-04 05:16:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description David Jaša 2012-03-05 09:16:28 EST
Description of problem:
Currently, when user calls 'service vdsmd stop' on host that happens to be SPM, whole Datacenter becomes unmanageable, because the SPM role it not released until service restart.

While this is by no means desired scenario, it may happen and in such circumstances, overall resilience of whole RHEV setup is much improved.

The scope of the bug may be extended to handle host shutdown - i.e. during the installation, extend the period from shutdown call to sending 15 and 9 to all processes so there is enough time to migrate/pause/gracefully shutdown VMs, move/save/revert async tasks before VDSM exits.

Version-Release number of selected component (if applicable):
vdsm-4.9-112.6.el6_2.x86_64

How reproducible:
always

Steps to Reproduce:
1. on running SPM host, stop vdsm service
2. 
3.
  
Actual results:
* host keeps SPM role
* host status in RHEV-M goes to Connecting and then Non Responsive
* Data Center status in RHEV-M goes to Non Responsive and then Non Operational

Expected results:
* vdsm releases SPM role before it exists
* host is marked Non Responsive immediately
* resources that are in use by host's vdsm are marked in unknown - "question mark" state to prevent their damage by other hosts

Additional info:
The question is what to do with HA VMs in such case, they are likely to keep running & working unless the host reboots.
Comment 1 RHEL Product and Program Management 2012-05-04 00:08:52 EDT
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 3 RHEL Product and Program Management 2012-07-10 04:55:31 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 4 RHEL Product and Program Management 2012-07-10 21:55:00 EDT
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Comment 5 Itamar Heim 2013-02-25 02:25:31 EST
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.
Comment 6 Nir Soffer 2013-09-23 10:16:01 EDT
Please try to reproduce this again with current release, and provide engine and vdms logs.
Comment 7 David Jaša 2014-04-14 07:29:06 EDT
I tried 3.3 in various parts of cycle and my impression is that workaround were applied throughout the stack such as vdsm service (re)start over ssh. All of those do not resolve however the simple case of issuing "shutdown -h now" on SPM host - the host never lets know to engine or peers that it's going down so SPM status remains forever.

Note You need to log in before you can comment on or make changes to this bug.