1418641 – Make cluster upgrade logging more robust and specific

Bug 1418641 - Make cluster upgrade logging more robust and specific

Summary: Make cluster upgrade logging more robust and specific

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Virt
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.1.2
Target Release:	---
Assignee:	Shahar Havivi
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1425089 (view as bug list)
Depends On:	1432127 1479693
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-02 12:04 UTC by sefi litmanovich
Modified:	2019-04-28 13:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-05-23 08:11:56 UTC
oVirt Team:	Virt
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.1+ mtessun: planning_ack+ tjelinek: devel_ack+ mavital: testing_ack+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	73840	0	master	MERGED	core: update cluster fails when VM/Template have wrong values	2020-07-13 00:35:49 UTC
oVirt gerrit	74506	0	ovirt-engine-4.1	MERGED	core: update cluster fails when VM/Template have wrong values	2020-07-13 00:35:49 UTC

Description sefi litmanovich 2017-02-02 12:04:20 UTC

Description of problem:
This RFE is a follow up to previous RFE - https://bugzilla.redhat.com/show_bug.cgi?id=1386507.
The purpose is to eventually have more specific information in failure logging (and might be good to review also info loggings) so that users can easily find the source of the problem they're having, such as a specific vm/template configuration. If we assume users have envs of hundreds and sometimes thousands of vms on a cluster, failure messages such as the following example, will just not suffice:

Error while executing action:

Cannot edit Cluster. Invalid time zone for given OS type.
Attribute: vmStatic

In this case, engine.log info message provides info of the specific vm's id:

2016-11-22 14:10:27,854 INFO [org.ovirt.engine.core.bll.UpdateClusterCommand] (default task-12) [28807589] Lock freed to object 'EngineLock:{exclusiveLocks='null', sharedLocks='[{faulty_vm's_id}=<VM, ACTION_TYPE_FAILED_CLUSTER_IS_BEING_UPDATED$clusterName another-clust>]'}'

But wouldn't it feel more natural to have a message such as:
Cannot edit Cluster. Failed to update VM "some_name" with cause: Invalid time zone for given OS type.

Version-Release number of selected component (if applicable):
rhevm-4.1.0-3

How reproducible:
always

Steps to Reproduce:
In this specific example I provided the steps:
1. Have a vm in the cluster with invalid time zone (to force the problem edit 'time_zone' variable in DB for a specific vm)
2. Try to upgrade the cluster.

Actual results:
Cluster upgrade fails with the msg:

Error while executing action:

Cannot edit Cluster. Invalid time zone for given OS type.
Attribute: vmStatic

Expected results:
Cluster upgrade fails with a more specific message specifying to vm with faulty value.

Additional info:

Comment 1 Michal Skrivanek 2017-02-21 09:22:15 UTC

*** Bug 1425089 has been marked as a duplicate of this bug. ***

Comment 2 Shahar Havivi 2017-03-08 11:02:06 UTC

Sefi,
How did you got a wrong time-zone in the data base in the first place?
We need to solve the root of the issue - if its just by changing the data base value then its not an issue since user can change any value in the data base and cause lots of errors...

Comment 3 sefi litmanovich 2017-03-09 09:19:52 UTC

(In reply to Shahar Havivi from comment #2)
> Sefi,
> How did you got a wrong time-zone in the data base in the first place?
> We need to solve the root of the issue - if its just by changing the data
> base value then its not an issue since user can change any value in the data
> base and cause lots of errors...

The issue is not about the time zone, I imposed the "bug" with the time zone by playing with the data base. The problem that I'm suggesting here is that the log isn't specific enough. If I have an env with 200 vms and for some reason 1 of them has a problem with time zone configuration (as an example) the following error cause is just not informative enough:

Invalid time zone for given OS type.

What I'd prefer is that we add the name of the vm which causes this problem.

Comment 4 Shahar Havivi 2017-03-09 09:23:31 UTC

(In reply to sefi litmanovich from comment #3)
> (In reply to Shahar Havivi from comment #2)
> > Sefi,
> > How did you got a wrong time-zone in the data base in the first place?
> > We need to solve the root of the issue - if its just by changing the data
> > base value then its not an issue since user can change any value in the data
> > base and cause lots of errors...
> 
> The issue is not about the time zone, I imposed the "bug" with the time zone
> by playing with the data base. The problem that I'm suggesting here is that
> the log isn't specific enough. If I have an env with 200 vms and for some
> reason 1 of them has a problem with time zone configuration (as an example)
> the following error cause is just not informative enough:
> 
> Invalid time zone for given OS type.
> 
> What I'd prefer is that we add the name of the vm which causes this problem.

Ok, but again you should not play with the database.
we cannot address for changes in the database that cause by users - but we do need to know if the problem caused by values that came from the UI or API.

Comment 5 Tomas Jelinek 2017-03-10 13:09:00 UTC

Turning this to a bug since it is actually a bug that we don't provide good hints to the user guiding him to fix the issues.

When fixed, the behavior will be this:

- if the update of a VM/template fails, issue an audit log with VM name and the reason what happened
- don't stop, just remember all failed attempts
- at the end, if there were some failed attempts, fail the command (e.g. make a rollback) and return the list of VM name/error to the FE
- show the user an error message like this:

"
Update of cluster compatibility version failed because there are VMs/Templates with incorrect configuration. To fix the issue, please go to each of them, edit and press OK. If the save will not pass, fix the validation messages.

The list of VMs/Templates with incorrect configuration (you will find the same list also in Events):

...the list of VM name -> validation error follows...
"

Comment 6 Tomas Jelinek 2017-03-23 11:10:42 UTC

*** Bug 1425089 has been marked as a duplicate of this bug. ***

Comment 7 Israel Pinto 2017-05-08 08:35:37 UTC

Verify with RHVM Version: 4.1.2.1-0.1.el7
Steps:
Run case: https://polarion.engineering.redhat.com/polarion/#/project/RHEVM3/workitem?id=RHEVM-17309

Results: Message is show
message:
Error while executing action: Update of cluster compatibility version failed because there are VMs/Templates [test_bz_1] with incorrect configuration. To fix the issue, please go to each of them, edit and press OK. If the save does not pass, fix the dialog validation.

PASS

Note You need to log in before you can comment on or make changes to this bug.