Bug 837024 - ovirt-engine-backend [Task Manager]: wrong error message when host becomes non-operational with no information to user
ovirt-engine-backend [Task Manager]: wrong error message when host becomes no...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
x86_64 Linux
medium Severity low
: ---
: ---
Assigned To: Eli Mesika
Pavel Stehlik
infra
:
Depends On: 842635
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-02 10:30 EDT by Dafna Ron
Modified: 2016-02-10 14:15 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-04 14:58:14 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
screen shot (26.74 KB, image/jpeg)
2012-07-02 10:30 EDT, Dafna Ron
no flags Details
logs (1.16 MB, application/octet-stream)
2012-07-24 05:45 EDT, Leonid Natapov
no flags Details

  None (edit)
Description Dafna Ron 2012-07-02 10:30:43 EDT
Created attachment 595748 [details]
screen shot

Description of problem:

I blocked master domain from my spm and the host became non-operational 
the error we are presenting in Task manager is: 

Handling non responsive Host orange-vdsd

and when we expand we only see: Validating

1. this is not a host which is non-responsive, its non-operational which are completely different things caused by different reasons. 
2. we offer no information to the user. what does validating mean? 


Version-Release number of selected component (if applicable):

si8

How reproducible:

100%

Steps to Reproduce:
1. in two host cluster, block connectivity to master storage domain from spm host
2.
3.
  
Actual results:

we see error in Task manager: 

Handling non responsive Host orange-vdsd

Expected results:

1. non-responsive is network issue or communication between rhev to vdsm - this is not a correct alert for storage issue since engine is communicating with the host. 
2. when we expand the task all we see is step: validating. 
please add some more steps - validating is not enough info. if you look at the even log we can see all the different actions run by the system with better explanation than the task manager. 

Additional info: screen shot
Comment 1 Eli Mesika 2012-07-19 09:59:16 EDT
You have attached only a screenshot 
please attache engine & vdsm log
Comment 2 Leonid Natapov 2012-07-24 05:45:13 EDT
Created attachment 599977 [details]
logs

I reproduce it on si11. Attaching engine and vdsm log.
Comment 3 Eli Mesika 2012-07-28 16:37:22 EDT
We had asked advice from Omer how to skip the non-responding state in this case.
Here is his answer: 

START
------
i cannot go immediately to non operational,
because when you block connection to master from spm,
vdsm does service restart to itself == vdsm not responding
only when it comes up rhevm can tell that it cannot connect to the storage,
and then moved to non operational.
what i think should be shown is:
1. handling non responsive host
2. handling non operational host

END
----

So, what I understand from the above is that this is not a bug.

Please approve
Comment 4 Dafna Ron 2012-07-29 03:58:49 EDT
no... this is a bug which is blocked by a vdsm bug (sanlock is restarting the host). once the vdsm bug is fixed you will be able to reproduce this issue.
Comment 5 Eli Mesika 2012-07-29 04:29:33 EDT
(In reply to comment #4)
> no... this is a bug which is blocked by a vdsm bug (sanlock is restarting
> the host). once the vdsm bug is fixed you will be able to reproduce this
> issue.

So, please privide the vdsm bug on the "depends on" field
Comment 6 Dafna Ron 2012-08-05 04:23:35 EDT
information was provided. 
removing need info
Comment 7 Barak 2012-09-09 07:10:57 EDT
Dafna,

I fail to understand the dependency (why this BZ is blocking on BZ#842635

Can you please elaborate ?
Comment 8 Dafna Ron 2012-09-09 07:29:43 EDT
Steps to Reproduce:
1. in two host cluster, block connectivity to master storage domain from spm host

when you do that the spm reboots = host becomes non-responsive

we want to check that the steps of host becoming non-operational are reported in the task manager.
Comment 9 Barak 2012-10-15 08:03:22 EDT
After discussing this issue with Dafna,
It looks like this was already handled.

Moving to ON_QA for verification

Note You need to log in before you can comment on or make changes to this bug.