Bug 1310563 - Host remains locked in the system after manual fencing
Host remains locked in the system after manual fencing
Status: CLOSED CURRENTRELEASE
Product: vdsm-jsonrpc-java
Classification: oVirt
Component: Core (Show other bugs)
1.2
Unspecified Unspecified
high Severity high (vote)
: ovirt-3.6.5
: 1.1.9
Assigned To: Piotr Kliczewski
Pavol Brilla
: WorkAround, ZStream
: 1273754 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-22 04:06 EST by Moti Asayag
Modified: 2016-04-21 10:40 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-21 10:40:57 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
mgoldboi: planning_ack+
masayag: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 53798 master MERGED connect: timeout on sending connect frame 2016-02-22 10:52 EST
oVirt gerrit 53844 ovirt-3.6 MERGED connect: timeout on sending connect frame 2016-02-23 03:52 EST

  None (edit)
Description Moti Asayag 2016-02-22 04:06:40 EST
Description of problem:
After manual reboot of the host, the host was set into maintenance mode.
There engine reports there are still vms running on that host.
The 'Confirm host was rebooted' button was clicked, but this action never ends.
A second attempt to click on 'Confirm host was rebooted' produces 'The host is locked. Other action is already in progress'.

The logs shows the thread of 'Confirm host was rebooted' was never completed.
The Thread dump reveals the thread which occupies the lock on monitoring object of the host, but never releases it.

Version-Release number of selected component (if applicable):
ovirt-engine-3.6

How reproducible:
Sometimes

Steps to Reproduce:
There are no exact steps to reproduce this issue: The scenario occurred after data-center power-outage.
Two of the hosts were manually rebooted, and remain stuck as described above.
The rest of the hosts in the data-center didn't face the same issues.

Actual results:
Host cannot be confirmed as rebooted, no other action can be invoked on it.

Expected results:
Host state should be recoverable: "Confirm host was rebooted" should clear the running vms from the host.

Additional info:
The specific thread which holds the monitoring object:

"DefaultQuartzScheduler_Worker-36" prio=10 tid=0x00007f9c054ed800 nid=0xac9 waiting on condition [0x00007f9bea6e4000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000078ff5c530> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
        at org.ovirt.vdsm.jsonrpc.client.utils.OneTimeCallback.await(OneTimeCallback.java:27)
        at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:94)
        at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:114)
        at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:73)
        at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:68)
        at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getCapabilities(JsonRpcVdsServer.java:268)
        at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:15)
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110)
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65)
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467)
        at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:647)
        at org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:119)
        at org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84)
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227)
        - locked <0x000000078ff9cfb8> (a java.lang.Object)
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81)
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
        - locked <0x000000078e22a8f8> (a org.quartz.simpl.SimpleThreadPool$WorkerThread)
Comment 1 Oved Ourfali 2016-02-22 09:07:40 EST
Engine restart is a way to workaround the issue.
Therefore, not a blocker for 3.6.3. Moving to 3.6.5.
Comment 2 Piotr Kliczewski 2016-04-06 05:14:35 EDT
*** Bug 1273754 has been marked as a duplicate of this bug. ***
Comment 6 Pavol Brilla 2016-04-18 19:06:48 EDT
Verified on rhevm-3.6.5.3-0.1.el6.noarch & vdsm-jsonrpc-java-1.1.9-1.el6ev.noarch

Note You need to log in before you can comment on or make changes to this bug.