Bug 1373573 - Enhance error reporting when cluster compatibility update fails
Summary: Enhance error reporting when cluster compatibility update fails
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.0.3
Hardware: x86_64
OS: Linux
high
high vote
Target Milestone: ovirt-4.0.6
: 4.0.6.1
Assignee: Shmuel Melamud
QA Contact: sefi litmanovich
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-06 15:38 UTC by Barak Korren
Modified: 2017-01-18 07:26 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-18 07:26:25 UTC
oVirt Team: Virt
rule-engine: ovirt-4.0.z+
mgoldboi: planning_ack+
rule-engine: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
engine log from approx the time of attempted upgrade (664.82 KB, text/plain)
2016-09-06 15:38 UTC, Barak Korren
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 66205 0 None None None 2016-11-07 18:59:38 UTC
oVirt gerrit 66774 0 ovirt-engine-4.0 MERGED core: Propagate UpdateVm failure to UpdateClusterCommand 2016-11-16 08:04:01 UTC
oVirt gerrit 66850 0 ovirt-engine-4.0.6 MERGED core: Propagate UpdateVm failure to UpdateClusterCommand 2016-11-16 11:44:43 UTC

Description Barak Korren 2016-09-06 15:38:22 UTC
Created attachment 1198338 [details]
engine log from approx the time of attempted upgrade

Description of problem:
When trying to upgrade cluster level from 3.6 to 4.0 of a medium-sized long-lived cluster (~60 running VMs ~120 defined, ~10 hosts), the upgrade fails with the following error message:

    Error while executing action Edit Cluster properties: Internal Engine Error

Version-Release number of selected component (if applicable):
4.0.3-0.1


Additional info:
Engine log with 3 upgrade attempts attached

Looking at the log the following message seems to be relevant:

2016-09-06 18:12:25,604 WARN  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-91) [2fecb3a] Validation of action 'UpdateVm' failed for user .... Reasons: VAR__ACTION__UPDATE,            VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_CUSTOM_PROPERTIES_INVALID_KEYS,$MissingKeys macspoof

The "UserDefinedVMProperties" engine config value looks ATM like the following:

# engine-config -g UserDefinedVMProperties
UserDefinedVMProperties: macspoof=(true|false) version: 3.6
UserDefinedVMProperties:  version: 4.0

Comment 1 Yaniv Kaul 2016-09-07 06:30:32 UTC
Arik, is this something you've seen?

Comment 2 Arik 2016-09-07 06:43:24 UTC
(In reply to Yaniv Kaul from comment #1)
Yes, this is one of problems we faced while upgrading rhev.tlv.

Previously, the procedure of cluster upgrade included the following step:
engine-config -s "UserDefinedVMProperties=macspoof=(true|false);another_property=regexp" --cver=3.5
As for upgrade to 4.0, the recommended way is to set non-filtering on the network  instead, but similar step would work as well (danken is supposed to file a ticket to use the new configuration on rhev.tlv).

I think the problem here is just the uninformative message.

Comment 3 Michal Skrivanek 2016-09-07 06:49:22 UTC
(In reply to Arik from comment #2)
> Previously, the procedure of cluster upgrade included the following step:

where is it documented? Was it part of upgrade docs?

Comment 4 Barak Korren 2016-09-07 07:21:37 UTC
(In reply to Arik from comment #2)
> As for upgrade to 4.0, the recommended way is to set non-filtering on the
> network  instead, but similar step would work as well

The system is already configured with non-filtering network:

# engine-config -g EnableMACAntiSpoofingFilterRules
EnableMACAntiSpoofingFilterRules: false version: general

This doesn't help with the cluster upgrade because we still have form VMs with the custom properties configured...

Doing the following and restarting the engine resolves the issue:

# engine-config -s 'UserDefinedVMProperties=macspoof=(true|false)' --cver=4.0

But yeah, the error message should suggest something that would tell me this is the place to look without digging the engine loges. Or perhaps engine-setup should warn about this, or ever better, configure this automatically.

I guess this would also happen with other custom properties, not just network related ones?

Comment 5 Tomas Jelinek 2016-09-07 07:52:13 UTC
So changing the scope of this bug to enhance the error reporting. Targeting to 4.0.5 since it can safe lots of headache to users.

Comment 6 sefi litmanovich 2016-11-21 15:20:47 UTC
Was able to re produce the bug on rhevm-4.0.6-0.1.el7ev.noarch , please check if this patch made this build, if so, then there some further issues, but my guess is it's just not in the build. changing status back to Modified until this is sorted out.

Comment 7 sefi litmanovich 2016-11-27 15:43:25 UTC
Verified with rhevm-4.0.6.1-0.1.el7ev.noarch according to the description. This time got the informative error message with details of specific reason.


Note You need to log in before you can comment on or make changes to this bug.