Bug 1386507 - [RFE] better logging of cluster version upgrade failures
Summary: [RFE] better logging of cluster version upgrade failures
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.0.3
Hardware: All
OS: Linux
unspecified
low
Target Milestone: ovirt-4.1.0-alpha
: ---
Assignee: Shmuel Melamud
QA Contact: sefi litmanovich
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-19 07:00 UTC by Marcus West
Modified: 2021-08-30 12:06 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-25 00:55:25 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:
gklein: testing_plan_complete+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43199 0 None None None 2021-08-30 12:06:20 UTC
Red Hat Knowledge Base (Solution) 2715711 0 None None None 2016-10-19 07:28:02 UTC
Red Hat Product Errata RHEA-2017:0997 0 normal SHIPPED_LIVE Red Hat Virtualization Manager (ovirt-engine) 4.1 GA 2017-04-18 20:11:26 UTC
oVirt gerrit 66205 0 None MERGED core: Propagate UpdateVm failure to UpdateClusterCommand 2021-02-12 07:25:19 UTC

Description Marcus West 2016-10-19 07:00:35 UTC
## Description of problem:

An invalid timezone setting for a single VM can cause cluster compatibility upgrade to not work.  The logs do not clearly indicate the problem VM

## Version-Release number of selected component (if applicable):

rhevm-4.0.4.4-0.1.el7ev.noarch

## How reproducible:

always

## Steps to Reproduce:
1. create a 3.6 DC/cluster, and VM's
2. change one of the VM's timezone to '' (vm_static, time_zone)

engine=# select vm_name, vm_guid, os, time_zone from vm_static where cluster_id = 'f0f30779-6e8b-46e8-8689-9fd46cea220b' order by vm_name;
  vm_name   |               vm_guid                | os |     time_zone     
------------+--------------------------------------+----+-------------------
 linux-test | 0dbf06c6-2734-4d72-84ee-30d7dc230c56 |  5 | Etc/GMT
 rhel6-test | 5d5fe9fe-3e70-4a22-9cdc-d6c8c8f9694f | 19 | Etc/GMT
 rhel7-test | 12c43e79-648a-4c7f-a0a7-54a7a6be9e7f | 24 | 
 win-test   | b57eb0c7-f933-4a23-aff8-1ed6424ff0ed | 25 | GMT Standard Time

3.

## Actual results:

From the gui, action fails with error:

"Error while executing action Edit Cluster properties: Internal Engine Error"

## Expected results:

GUI (or logs) should report specifically which VM is in error.

## Additional info:

I don't have a reproducer for creating a VM with an invalid timezone.  Not sure how the customer managed to achieve it, but we spent several hours messing around with the wrong VM's in an attempt to isolate the problem.

In larger environments (with a mix of Linux and other OS's), it may be difficult to see which ones are 'invalid'

Comment 2 sefi litmanovich 2016-11-22 13:26:32 UTC
So far I had a look at this feature in the nightly build.
There is some more information, but not sure if it might be enough to satisfy the request for pin pointing the problem and pointing to the problematic vm.
e.g. I forced some invalid string in DB for vm's time_zone and try to upgrade the cluster I get:

Error while executing action:

    Cannot edit Cluster. Invalid time zone for given OS type.
    Attribute: vmStatic

While this does add the real reason to the message, if I had 200 VMs in my env I'd have a hard time figuring which one had caused the problem.
I can figure that out by this line in engine.log, but I'm thinking it's not enough, open for a discussion about it.

2016-11-22 14:10:27,854 INFO  [org.ovirt.engine.core.bll.UpdateClusterCommand] (default task-12) [28807589] Lock freed to object 'EngineLock:{exclusiveLocks='null', sharedLocks='[{faulty_vm's_id}=<VM, ACTION_TYPE_FAILED_CLUSTER_IS_BEING_UPDATED$clusterName another-clust>]'}'

As this RFE doesn't hold many cases, I'm adding 1 case to check that an error in upgrade cluster doesn't produce Internal Error message - please review this case once I upload the link to polarion in here and tell me if you think I need to add more cases.

Comment 4 sefi litmanovich 2017-02-02 12:05:03 UTC
Verifying based on my comment #2 and attached test case.
Opening a new RFE - https://bugzilla.redhat.com/show_bug.cgi?id=1418641
for more specific information in logging as I understand that my request from comment #2 will not be easily implemented within the scope of this RFE.


Note You need to log in before you can comment on or make changes to this bug.