1649503 – No proper events or error message notified, when the host upgrade fails

Bug 1649503 - No proper events or error message notified, when the host upgrade fails

Summary: No proper events or error message notified, when the host upgrade fails

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Frontend.WebAdmin
Sub Component:
Version:	4.2.7.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.2.8
Target Release:	---
Assignee:	bugs@ovirt.org
QA Contact:	Lukas Svaty
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-11-13 18:21 UTC by SATHEESARAN
Modified:	2019-02-26 11:16 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:	1649502
Environment:
Last Closed:	2019-01-22 10:23:19 UTC
oVirt Team:	Infra
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2+

Attachments	(Terms of Use)

Description SATHEESARAN 2018-11-13 18:21:40 UTC

+++ This bug was initially created as a clone of Bug #1649502 +++

Description of problem:
-----------------------

When upgrade of RHVH host is initiated from RHV Manager UI, the host is first moved in to maintenance, redhat-virtualization-host-image-update is updated, then the host is rebooted.

As part of this upgrade/update procedure, in case, if moving the host in to maintenance fails, there are no proper events or messages notified to the user 

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHHI-V 1.5 ( RHV 4.2.7 & RHGS 3.4.1 )

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Stop one of the brick of the volume in host1
2. Try to upgrade host2 from RHV Manager UI

Actual results:
---------------
Upgrade initiated from RHV Manager UI silently fails without throwing any errors or events

Expected results:
-----------------
Upgrade should fail with meaningful error or event, so that user will be aware the reason behind the failure.

Additional info:
----------------
While moving the host in to maintenance, by stopping gluster service, there are proper error messages that were thrown. The same should be implemented for upgrade/update procedure initiated from RHV Manager UI

Comment 1 Sahina Bose 2018-11-19 04:59:01 UTC

The upgrade flow seems to be infra related - moving to infra team

Comment 2 Martin Perina 2018-11-23 16:35:29 UTC

I don't see any flow handling gluster bricks when moving host to maintenance, we are doing only 2 flows:

1. Enabling local maintenance if host can be used to run hosted engine VM
2. Migrate all VMs out of host

We have fixed error propagation, when some VM migration will fail as a part of BZ1631215.

So how exactly is the flow around stopping gluster bricks affecting moving host to maintenance? Are only VM migration unsuccessful or is there a different error?

Comment 3 Sahina Bose 2018-11-29 07:08:19 UTC

(In reply to Martin Perina from comment #2)
> I don't see any flow handling gluster bricks when moving host to
> maintenance, we are doing only 2 flows:
> 
> 1. Enabling local maintenance if host can be used to run hosted engine VM
> 2. Migrate all VMs out of host

We also have the flow to stop gluster services when a host is moved to maintenance.

> 
> We have fixed error propagation, when some VM migration will fail as a part
> of BZ1631215.
> 
> So how exactly is the flow around stopping gluster bricks affecting moving
> host to maintenance? Are only VM migration unsuccessful or is there a
> different error?

If one of the bricks on another host (h2) is stopped before moving host (h1) to maintenance, the validation fails that the quorum is lost.
I think this validation failure is not propogated to the UpgradeHost flow

Comment 4 Sahina Bose 2018-12-18 07:21:14 UTC

Martin, is there anything further that needs to be done to ensure validation failures are propogated?

Comment 5 Martin Perina 2018-12-18 08:59:05 UTC

Ravi, could you please verify if this issue is also fixed by BZ1631215

Comment 6 Ravi Nori 2018-12-18 19:03:49 UTC

After the recent change for BZ1631215 all validation failures should be picked up by UpgradeHostCallback.

So BZ1631215 should fix the issue in this BZ

Comment 7 Martin Perina 2018-12-19 20:13:10 UTC

Aligning status with BZ1631215

Comment 8 RHV bug bot 2019-01-09 14:23:09 UTC

INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: infra

Comment 9 Lukas Svaty 2019-01-14 14:13:01 UTC

Same reproduction as https://bugzilla.redhat.com/show_bug.cgi?id=1649502 moving to verified.

Comment 10 Sandro Bonazzola 2019-01-22 10:23:19 UTC

This bugzilla is included in oVirt 4.2.8 release, published on January 22nd 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.2.8 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.