Bug 837024 - ovirt-engine-backend [Task Manager]: wrong error message when host becomes non-operational with no information to user
Summary: ovirt-engine-backend [Task Manager]: wrong error message when host becomes no...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.0
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ---
: ---
Assignee: Eli Mesika
QA Contact: Pavel Stehlik
URL:
Whiteboard: infra
Depends On: 842635
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-07-02 14:30 UTC by Dafna Ron
Modified: 2016-02-10 19:15 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-12-04 19:58:14 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
screen shot (26.74 KB, image/jpeg)
2012-07-02 14:30 UTC, Dafna Ron
no flags Details
logs (1.16 MB, application/octet-stream)
2012-07-24 09:45 UTC, Leonid Natapov
no flags Details

Description Dafna Ron 2012-07-02 14:30:43 UTC
Created attachment 595748 [details]
screen shot

Description of problem:

I blocked master domain from my spm and the host became non-operational 
the error we are presenting in Task manager is: 

Handling non responsive Host orange-vdsd

and when we expand we only see: Validating

1. this is not a host which is non-responsive, its non-operational which are completely different things caused by different reasons. 
2. we offer no information to the user. what does validating mean? 


Version-Release number of selected component (if applicable):

si8

How reproducible:

100%

Steps to Reproduce:
1. in two host cluster, block connectivity to master storage domain from spm host
2.
3.
  
Actual results:

we see error in Task manager: 

Handling non responsive Host orange-vdsd

Expected results:

1. non-responsive is network issue or communication between rhev to vdsm - this is not a correct alert for storage issue since engine is communicating with the host. 
2. when we expand the task all we see is step: validating. 
please add some more steps - validating is not enough info. if you look at the even log we can see all the different actions run by the system with better explanation than the task manager. 

Additional info: screen shot

Comment 1 Eli Mesika 2012-07-19 13:59:16 UTC
You have attached only a screenshot 
please attache engine & vdsm log

Comment 2 Leonid Natapov 2012-07-24 09:45:13 UTC
Created attachment 599977 [details]
logs

I reproduce it on si11. Attaching engine and vdsm log.

Comment 3 Eli Mesika 2012-07-28 20:37:22 UTC
We had asked advice from Omer how to skip the non-responding state in this case.
Here is his answer: 

START
------
i cannot go immediately to non operational,
because when you block connection to master from spm,
vdsm does service restart to itself == vdsm not responding
only when it comes up rhevm can tell that it cannot connect to the storage,
and then moved to non operational.
what i think should be shown is:
1. handling non responsive host
2. handling non operational host

END
----

So, what I understand from the above is that this is not a bug.

Please approve

Comment 4 Dafna Ron 2012-07-29 07:58:49 UTC
no... this is a bug which is blocked by a vdsm bug (sanlock is restarting the host). once the vdsm bug is fixed you will be able to reproduce this issue.

Comment 5 Eli Mesika 2012-07-29 08:29:33 UTC
(In reply to comment #4)
> no... this is a bug which is blocked by a vdsm bug (sanlock is restarting
> the host). once the vdsm bug is fixed you will be able to reproduce this
> issue.

So, please privide the vdsm bug on the "depends on" field

Comment 6 Dafna Ron 2012-08-05 08:23:35 UTC
information was provided. 
removing need info

Comment 7 Barak 2012-09-09 11:10:57 UTC
Dafna,

I fail to understand the dependency (why this BZ is blocking on BZ#842635

Can you please elaborate ?

Comment 8 Dafna Ron 2012-09-09 11:29:43 UTC
Steps to Reproduce:
1. in two host cluster, block connectivity to master storage domain from spm host

when you do that the spm reboots = host becomes non-responsive

we want to check that the steps of host becoming non-operational are reported in the task manager.

Comment 9 Barak 2012-10-15 12:03:22 UTC
After discussing this issue with Dafna,
It looks like this was already handled.

Moving to ON_QA for verification


Note You need to log in before you can comment on or make changes to this bug.