Bug 1574862 - Vague message on failure in upgrade of compatibility level on cluster
Summary: Vague message on failure in upgrade of compatibility level on cluster
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.2.3.2
Hardware: All
OS: All
unspecified
medium vote
Target Milestone: ovirt-4.2.4
: ---
Assignee: Eli Mesika
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-04 08:06 UTC by Lukas Svaty
Modified: 2018-06-26 08:40 UTC (History)
5 users (show)

Fixed In Version: ovirt-engine-4.2.4.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-26 08:40:44 UTC
oVirt Team: Infra
rule-engine: ovirt-4.2+
lsvaty: testing_ack+


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 91415 master MERGED core: improve messages on cluster upgrade 2018-05-22 13:25:50 UTC
oVirt gerrit 91688 ovirt-engine-4.2 MERGED core: improve messages on cluster upgrade 2018-05-28 15:02:08 UTC

Description Lukas Svaty 2018-05-04 08:06:09 UTC
Description of problem:
When upgrading cluster compatibility level in case host is non-responsive this message appears:

20:52:34 Status: 400
20:52:34 Reason: Bad Request
20:52:34 Detail: [Cannot change Cluster Compatibility Version to higher version when there are active Hosts with lower version. Please move Hosts with lower version to maintenance first.]

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.3.4-0.1.el7.noarch

Actual results:
Vague message

Expected results:
More specific message with name of the host that failed the check, and specific vesrion user can check (vdsm>)

Comment 1 Eli Mesika 2018-05-10 10:13:10 UTC
As far as I see from the code, only if host is UP we are generating a message that we can not upgrade a host which is in UP status 
If host is non-responsive, then we still check if the upgrade is valid 

According to the attached logs , host 'e0e68249-c6bc-4a98-9179-54477447a2d1' became non-responsive prior to the upgrade operation (2018-05-06 18:12:20,276+03  in engine.log)

So, please specify if that bug requests that we will not try to upgrade a host which is not responding, or , the bug claims that this is OK but the message should be fixed to specify the hosts involved

Comment 2 Lukas Svaty 2018-05-10 11:17:22 UTC
1. please specify a host which is problematic
2. please specify a problem (vdsm version, host bad status etc..) host does not have versions from user perspective
3. In case that it is not possible to upgrade non-responsive host do not allow upgrade

Comment 3 Eli Mesika 2018-05-13 08:22:15 UTC
(In reply to Lukas Svaty from comment #2)

> 3. In case that it is not possible to upgrade non-responsive host do not
> allow upgrade

We can not know that in advance.
This can be a networking temporary problem and the upgrade may succeed or fail.
Therefor, we should decide if in concept we want to skip non-responding hosts or not.

In case that we want to skip non responding hosts, I think that a different BZ should be opened for that leaving this one dealing with the message improvement in the case of failure.

Martin, do you think we have to get PM advice here regarding skipping or not non-responding hosts upgrade ?

Comment 4 Lukas Svaty 2018-06-06 07:39:37 UTC
3 host cluster with SandyBridge CPUs, moving CPU arch to Broadwell on cluster

Message displayed:
Error while executing action: Cannot change Cluster CPU to higher CPU type when there are active Hosts with lower CPU type.
-Please move Host host_mixed_3 with lower CPU to maintenance first.

Message expected:
Error while executing action: Cannot change Cluster CPU to higher CPU type when there are active Hosts with lower CPU type.
-Please move Hosts host_mixed_1, host_mixed_2, host_mixed_3 with lower CPU to maintenance first.


This way user won't have to go through tabs cluster -> edit cluster -> failed msg ->hosts -> maintenance -> cluster -> edit cluster -> failed msg 
3 times

tested in ovirt-engine-4.2.4.1-0.1.el7.noarch

Comment 5 Eli Mesika 2018-06-06 07:55:36 UTC
(In reply to Lukas Svaty from comment #4)
> 3 host cluster with SandyBridge CPUs, moving CPU arch to Broadwell on cluster
> 
> Message displayed:
> Error while executing action: Cannot change Cluster CPU to higher CPU type
> when there are active Hosts with lower CPU type.
> -Please move Host host_mixed_3 with lower CPU to maintenance first.
> 
> Message expected:
> Error while executing action: Cannot change Cluster CPU to higher CPU type
> when there are active Hosts with lower CPU type.
> -Please move Hosts host_mixed_1, host_mixed_2, host_mixed_3 with lower CPU
> to maintenance first.
> 
> 
> This way user won't have to go through tabs cluster -> edit cluster ->
> failed msg ->hosts -> maintenance -> cluster -> edit cluster -> failed msg 
> 3 times
> 
> tested in ovirt-engine-4.2.4.1-0.1.el7.noarch

The code was written to handle errors one-by-one from scratch, changing that is re-writing the command 
I recommend to open a RFE for that if you want and mark this BZ as verified since it solves the original issue 

Martin, what do you think ?

Comment 6 Eli Mesika 2018-06-06 12:29:38 UTC
(In reply to Eli Mesika from comment #5)

> The code was written to handle errors one-by-one from scratch, changing that
> is re-writing the command 
> I recommend to open a RFE for that if you want and mark this BZ as verified
> since it solves the original issue 
> 
> Martin, what do you think ?

We will open a separate RFE for that and move this one to ON_QA again

Comment 7 Eli Mesika 2018-06-10 12:04:42 UTC
Moving this to ON_QA as concluded 

The requirement to have all errors in once were opened as a RFE in https://bugzilla.redhat.com/show_bug.cgi?id=1589512

Comment 8 Sandro Bonazzola 2018-06-26 08:40:44 UTC
This bugzilla is included in oVirt 4.2.4 release, published on June 26th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.4 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.