Created attachment 1221296 [details] vdsm and engine logs Description of problem: Using refresh capabilities on host raises Java exception on the engine. Version-Release number of selected component (if applicable): 3.6.10-0.1.el6 How reproducible: 100% Steps to Reproduce: 1. Try to use refresh capabilities on the host. 2. HOST status set to non-operational. Actual results: Host is non-operational. Expected results: Host should be operational. Additional info: engine.log: [org.ovirt.engine.core.bll.hostdev.RefreshHostDevicesCommand] (org.ovirt.thread.pool-6-thread-43) [66790887] Exception: java.lang.RuntimeException: Failed managing transaction
In our case, the ovirtmgmt network is out of sync so it's non-operational and after sync the network and refresh capabilities the host is still in non-operational even after the network should be sync. We get the same NPE when we do refresh capabilities on a host with UP state.
Not sure I follow the exact steps to reproduce, as according to your comment they also require network manipulations. Can you specify the steps to reproduce?
Hi Oved, Unfortunately, we do not have the exact steps for reproducing it on a clean environment. This issue is currently reproducible only on this environment. This host is used for running automated tests for different testing areas (storage, network, infra, ...), and it is very to track what exactly went wrong by looking at the trace logs. I can provide access to the environment, it might help us to locate in the source. Send me mail or IRC message and I will provide you the details. Thanks, Mor Kalfon
Just to add to the previous message, it is reproducible on all the hosts on this environment.
Thanks. Martin - can you assign someone to investigate?
There's an exception in RefreshHostDeviceCommand. Because this failure cause transaction timeout more information can probably been found in server.log. Tomas, could you please take a look as this a Virt team part?
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1315100 The issue was that the engine does not handle properly the situation when the VDSM returns the tree of host devices in an inconstant way. It most often happens as a consequence of https://bugzilla.redhat.com/show_bug.cgi?id=1306333 but does not have to. You could try to workaround the issue by restarting libvirtd on the host where the refresh dont work. *** This bug has been marked as a duplicate of bug 1315100 ***
(In reply to Tomas Jelinek from comment #7) > This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1315100 > The issue was that the engine does not handle properly the situation when > the VDSM returns the tree of host devices in an inconstant way. > > It most often happens as a consequence of > https://bugzilla.redhat.com/show_bug.cgi?id=1306333 but does not have to. > You could try to workaround the issue by restarting libvirtd on the host > where the refresh dont work. > > *** This bug has been marked as a duplicate of bug 1315100 *** Tomas, this BZ was submitted against 3.6.z while the duplicated issue is only fixing 4.0.z. I'm re-opening for now so we won't lose the tracking. Feel free to close it in case you prefer to clone the other BZ to 3.6.z.
Ah, right, I have not explained this in the previous comment. The thing is that this is not a regression, this bug was always there, just does not happen all the time. It happens only when hitting a related libvirt bug: https://bugzilla.redhat.com/show_bug.cgi?id=1306333 It can be walked around by restarting libvirt on the affected host and the refresh caps should pass. Does this un-block the automation?
automation means that the bug was found in automation and not automation blocker.
OK, let me rephrase the question: does restarting libvirt on the affected host solve the issue?
Hi Tomas, I tried to restart libvirtd and it solves the problem.