Bug 1310563

Summary: Host remains locked in the system after manual fencing
Product: [oVirt] vdsm-jsonrpc-java Reporter: Moti Asayag <masayag>
Component: CoreAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: Pavol Brilla <pbrilla>
Severity: high Docs Contact:
Priority: high    
Version: 1.2CC: bugs, inetkach, mgoldboi, oourfali, pbrilla, pkliczew, pzhukov
Target Milestone: ovirt-3.6.5Keywords: WorkAround, ZStream
Target Release: 1.1.9Flags: rule-engine: ovirt-3.6.z+
mgoldboi: planning_ack+
masayag: devel_ack+
pstehlik: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-21 14:40:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Moti Asayag 2016-02-22 09:06:40 UTC
Description of problem:
After manual reboot of the host, the host was set into maintenance mode.
There engine reports there are still vms running on that host.
The 'Confirm host was rebooted' button was clicked, but this action never ends.
A second attempt to click on 'Confirm host was rebooted' produces 'The host is locked. Other action is already in progress'.

The logs shows the thread of 'Confirm host was rebooted' was never completed.
The Thread dump reveals the thread which occupies the lock on monitoring object of the host, but never releases it.

Version-Release number of selected component (if applicable):
ovirt-engine-3.6

How reproducible:
Sometimes

Steps to Reproduce:
There are no exact steps to reproduce this issue: The scenario occurred after data-center power-outage.
Two of the hosts were manually rebooted, and remain stuck as described above.
The rest of the hosts in the data-center didn't face the same issues.

Actual results:
Host cannot be confirmed as rebooted, no other action can be invoked on it.

Expected results:
Host state should be recoverable: "Confirm host was rebooted" should clear the running vms from the host.

Additional info:
The specific thread which holds the monitoring object:

"DefaultQuartzScheduler_Worker-36" prio=10 tid=0x00007f9c054ed800 nid=0xac9 waiting on condition [0x00007f9bea6e4000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000078ff5c530> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
        at org.ovirt.vdsm.jsonrpc.client.utils.OneTimeCallback.await(OneTimeCallback.java:27)
        at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:94)
        at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:114)
        at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:73)
        at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:68)
        at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getCapabilities(JsonRpcVdsServer.java:268)
        at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:15)
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110)
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65)
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467)
        at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:647)
        at org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:119)
        at org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84)
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227)
        - locked <0x000000078ff9cfb8> (a java.lang.Object)
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81)
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
        - locked <0x000000078e22a8f8> (a org.quartz.simpl.SimpleThreadPool$WorkerThread)

Comment 1 Oved Ourfali 2016-02-22 14:07:40 UTC
Engine restart is a way to workaround the issue.
Therefore, not a blocker for 3.6.3. Moving to 3.6.5.

Comment 2 Piotr Kliczewski 2016-04-06 09:14:35 UTC
*** Bug 1273754 has been marked as a duplicate of this bug. ***

Comment 6 Pavol Brilla 2016-04-18 23:06:48 UTC
Verified on rhevm-3.6.5.3-0.1.el6.noarch & vdsm-jsonrpc-java-1.1.9-1.el6ev.noarch