Bug 1395798 - Refresh capabilities fails with exception
Summary: Refresh capabilities fails with exception
Keywords:
Status: CLOSED DUPLICATE of bug 1315100
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: 3.6.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-3.6.10
: ---
Assignee: Oved Ourfali
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-16 17:22 UTC by Mor
Modified: 2016-11-23 07:35 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-23 07:35:57 UTC
oVirt Team: Virt
Embargoed:
mburman: planning_ack?
myakove: devel_ack?
mburman: testing_ack?


Attachments (Terms of Use)
vdsm and engine logs (2.30 MB, application/octet-stream)
2016-11-16 17:22 UTC, Mor
no flags Details

Description Mor 2016-11-16 17:22:10 UTC
Created attachment 1221296 [details]
vdsm and engine logs

Description of problem:
Using refresh capabilities on host raises Java exception on the engine.

Version-Release number of selected component (if applicable):
3.6.10-0.1.el6

How reproducible:
100%

Steps to Reproduce:
1. Try to use refresh capabilities on the host.
2. HOST status set to non-operational.

Actual results:
Host is non-operational.

Expected results:
Host should be operational.

Additional info:
engine.log:
[org.ovirt.engine.core.bll.hostdev.RefreshHostDevicesCommand] (org.ovirt.thread.pool-6-thread-43) [66790887] Exception: java.lang.RuntimeException: Failed managing transaction

Comment 1 Meni Yakove 2016-11-16 17:38:59 UTC
In our case, the ovirtmgmt network is out of sync so it's non-operational and after sync the network and refresh capabilities the host is still in non-operational even after the network should be sync.

We get the same NPE when we do refresh capabilities on a host with UP state.

Comment 2 Oved Ourfali 2016-11-17 06:25:34 UTC
Not sure I follow the exact steps to reproduce, as according to your comment they also require network manipulations.
Can you specify the steps to reproduce?

Comment 3 Mor 2016-11-17 09:22:35 UTC
Hi Oved,

Unfortunately, we do not have the exact steps for reproducing it on a clean environment. This issue is currently reproducible only on this environment.

This host is used for running automated tests for different testing areas (storage, network, infra, ...), and it is very to track what exactly went wrong by looking at the trace logs.

I can provide access to the environment, it might help us to locate in the source. Send me mail or IRC message and I will provide you the details.

Thanks,
Mor Kalfon

Comment 4 Mor 2016-11-17 09:24:26 UTC
Just to add to the previous message, it is reproducible on all the hosts on this environment.

Comment 5 Oved Ourfali 2016-11-17 09:36:26 UTC
Thanks.
Martin - can you assign someone to investigate?

Comment 6 Martin Perina 2016-11-18 13:42:19 UTC
There's an exception in RefreshHostDeviceCommand. Because this failure cause transaction timeout more information can probably been found in server.log.

Tomas, could you please take a look as this a Virt team part?

Comment 7 Tomas Jelinek 2016-11-18 13:59:19 UTC
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1315100
The issue was that the engine does not handle properly the situation when the VDSM returns the tree of host devices in an inconstant way. 

It most often happens as a consequence of https://bugzilla.redhat.com/show_bug.cgi?id=1306333 but does not have to.
You could try to workaround the issue by restarting libvirtd on the host where the refresh dont work.

*** This bug has been marked as a duplicate of bug 1315100 ***

Comment 8 Gil Klein 2016-11-20 08:45:40 UTC
(In reply to Tomas Jelinek from comment #7)
> This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1315100
> The issue was that the engine does not handle properly the situation when
> the VDSM returns the tree of host devices in an inconstant way. 
> 
> It most often happens as a consequence of
> https://bugzilla.redhat.com/show_bug.cgi?id=1306333 but does not have to.
> You could try to workaround the issue by restarting libvirtd on the host
> where the refresh dont work.
> 
> *** This bug has been marked as a duplicate of bug 1315100 ***
Tomas, this BZ was submitted against 3.6.z while the duplicated issue is only fixing 4.0.z. 

I'm re-opening for now so we won't lose the tracking. Feel free to close it in case you prefer to clone the other BZ to 3.6.z.

Comment 9 Tomas Jelinek 2016-11-21 08:01:21 UTC
Ah, right, I have not explained this in the previous comment.
The thing is that this is not a regression, this bug was always there, just does not happen all the time. It happens only when hitting a related libvirt bug: https://bugzilla.redhat.com/show_bug.cgi?id=1306333

It can be walked around by restarting libvirt on the affected host and the refresh caps should pass. Does this un-block the automation?

Comment 10 Meni Yakove 2016-11-21 15:26:12 UTC
automation means that the bug was found in automation and not automation blocker.

Comment 11 Tomas Jelinek 2016-11-22 08:18:22 UTC
OK, let me rephrase the question: does restarting libvirt on the affected host solve the issue?

Comment 12 Mor 2016-11-22 16:01:27 UTC
Hi Tomas, 

I tried to restart libvirtd and it solves the problem.


Note You need to log in before you can comment on or make changes to this bug.