Bug 799969 - [RFE] Improve flow in engine when vdsm service is restarted on the SPM
Summary: [RFE] Improve flow in engine when vdsm service is restarted on the SPM
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.6.0
Assignee: Allon Mureinik
QA Contact:
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-05 14:16 UTC by David Jaša
Modified: 2016-02-10 20:22 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-04 09:16:01 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description David Jaša 2012-03-05 14:16:28 UTC
Description of problem:
Currently, when user calls 'service vdsmd stop' on host that happens to be SPM, whole Datacenter becomes unmanageable, because the SPM role it not released until service restart.

While this is by no means desired scenario, it may happen and in such circumstances, overall resilience of whole RHEV setup is much improved.

The scope of the bug may be extended to handle host shutdown - i.e. during the installation, extend the period from shutdown call to sending 15 and 9 to all processes so there is enough time to migrate/pause/gracefully shutdown VMs, move/save/revert async tasks before VDSM exits.

Version-Release number of selected component (if applicable):
vdsm-4.9-112.6.el6_2.x86_64

How reproducible:
always

Steps to Reproduce:
1. on running SPM host, stop vdsm service
2. 
3.
  
Actual results:
* host keeps SPM role
* host status in RHEV-M goes to Connecting and then Non Responsive
* Data Center status in RHEV-M goes to Non Responsive and then Non Operational

Expected results:
* vdsm releases SPM role before it exists
* host is marked Non Responsive immediately
* resources that are in use by host's vdsm are marked in unknown - "question mark" state to prevent their damage by other hosts

Additional info:
The question is what to do with HA VMs in such case, they are likely to keep running & working unless the host reboots.

Comment 1 RHEL Program Management 2012-05-04 04:08:52 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 RHEL Program Management 2012-07-10 08:55:31 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 4 RHEL Program Management 2012-07-11 01:55:00 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 5 Itamar Heim 2013-02-25 07:25:31 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.

Comment 6 Nir Soffer 2013-09-23 14:16:01 UTC
Please try to reproduce this again with current release, and provide engine and vdms logs.

Comment 7 David Jaša 2014-04-14 11:29:06 UTC
I tried 3.3 in various parts of cycle and my impression is that workaround were applied throughout the stack such as vdsm service (re)start over ssh. All of those do not resolve however the simple case of issuing "shutdown -h now" on SPM host - the host never lets know to engine or peers that it's going down so SPM status remains forever.


Note You need to log in before you can comment on or make changes to this bug.