Bug 1310563 - Host remains locked in the system after manual fencing
Summary: Host remains locked in the system after manual fencing
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm-jsonrpc-java
Classification: oVirt
Component: Core
Version: 1.2
Hardware: Unspecified
OS: Unspecified
high
high vote
Target Milestone: ovirt-3.6.5
: 1.1.9
Assignee: Piotr Kliczewski
QA Contact: Pavol Brilla
URL:
Whiteboard:
: 1273754 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-22 09:06 UTC by Moti Asayag
Modified: 2019-10-10 11:19 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-21 14:40:57 UTC
oVirt Team: Infra
rule-engine: ovirt-3.6.z+
mgoldboi: planning_ack+
masayag: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 53798 0 master MERGED connect: timeout on sending connect frame 2016-02-22 15:52:17 UTC
oVirt gerrit 53844 0 ovirt-3.6 MERGED connect: timeout on sending connect frame 2016-02-23 08:52:32 UTC

Description Moti Asayag 2016-02-22 09:06:40 UTC
Description of problem:
After manual reboot of the host, the host was set into maintenance mode.
There engine reports there are still vms running on that host.
The 'Confirm host was rebooted' button was clicked, but this action never ends.
A second attempt to click on 'Confirm host was rebooted' produces 'The host is locked. Other action is already in progress'.

The logs shows the thread of 'Confirm host was rebooted' was never completed.
The Thread dump reveals the thread which occupies the lock on monitoring object of the host, but never releases it.

Version-Release number of selected component (if applicable):
ovirt-engine-3.6

How reproducible:
Sometimes

Steps to Reproduce:
There are no exact steps to reproduce this issue: The scenario occurred after data-center power-outage.
Two of the hosts were manually rebooted, and remain stuck as described above.
The rest of the hosts in the data-center didn't face the same issues.

Actual results:
Host cannot be confirmed as rebooted, no other action can be invoked on it.

Expected results:
Host state should be recoverable: "Confirm host was rebooted" should clear the running vms from the host.

Additional info:
The specific thread which holds the monitoring object:

"DefaultQuartzScheduler_Worker-36" prio=10 tid=0x00007f9c054ed800 nid=0xac9 waiting on condition [0x00007f9bea6e4000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000078ff5c530> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
        at org.ovirt.vdsm.jsonrpc.client.utils.OneTimeCallback.await(OneTimeCallback.java:27)
        at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:94)
        at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:114)
        at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:73)
        at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:68)
        at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getCapabilities(JsonRpcVdsServer.java:268)
        at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:15)
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110)
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65)
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467)
        at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:647)
        at org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:119)
        at org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84)
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227)
        - locked <0x000000078ff9cfb8> (a java.lang.Object)
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81)
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
        - locked <0x000000078e22a8f8> (a org.quartz.simpl.SimpleThreadPool$WorkerThread)

Comment 1 Oved Ourfali 2016-02-22 14:07:40 UTC
Engine restart is a way to workaround the issue.
Therefore, not a blocker for 3.6.3. Moving to 3.6.5.

Comment 2 Piotr Kliczewski 2016-04-06 09:14:35 UTC
*** Bug 1273754 has been marked as a duplicate of this bug. ***

Comment 6 Pavol Brilla 2016-04-18 23:06:48 UTC
Verified on rhevm-3.6.5.3-0.1.el6.noarch & vdsm-jsonrpc-java-1.1.9-1.el6ev.noarch


Note You need to log in before you can comment on or make changes to this bug.