Bug 1124141

Summary: StackOverflowError during fencing operation
Product: [Retired] oVirt Reporter: Martin Perina <mperina>
Component: ovirt-engine-coreAssignee: Martin Perina <mperina>
Status: CLOSED CURRENTRELEASE QA Contact: sefi litmanovich <slitmano>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5CC: ecohen, gklein, iheim, oourfali, rbalakri, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: ovirt-3.5.0_rc1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-17 12:39:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1073943    

Description Martin Perina 2014-07-29 04:42:35 UTC
Description of problem:

Host cannot be fenced, StackOverflowError appears during host restart in server.log:

2014-07-28 16:20:13,932 ERROR [org.jboss.ejb3.invocation] (org.ovirt.thread.pool-8-thread-41) JBAS014134: EJB Invocation failed on component Backend for method public abstract org.ovirt.engine.core.common.action.VdcReturnValueBase org.ovirt.engine.core.bll.interfaces.BackendInternal.runInternalAction(org.ovirt.engine.core.common.action.VdcActionType,org.ovirt.engine.core.common.action.VdcActionParametersBase,org.ovirt.engine.core.bll.context.CommandContext): javax.ejb.EJBException: java.lang.RuntimeException: java.lang.StackOverflowError
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:217) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:363) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:194) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation-1.1.1.Final.jar:1.1.1.Final]
        at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee-7.1.1.Final.jar:7.1.1.Final]
        at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view7.runInternalAction(Unknown Source) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsEventListener$3.run(VdsEventListener.java:232) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:90) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_45]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_45]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_45]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_45]
        at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_45]
Caused by: java.lang.RuntimeException: java.lang.StackOverflowError
        ... 26 more
Caused by: java.lang.StackOverflowError
        at org.ovirt.engine.core.bll.RestartVdsCommand.getContext(RestartVdsCommand.java:140) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.cloneContext(CommandBase.java:2227) [bll.jar:]
        at org.ovirt.engine.core.bll.RestartVdsCommand.getContext(RestartVdsCommand.java:141) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.cloneContext(CommandBase.java:2227) [bll.jar:]


Version-Release number of selected component (if applicable):

oVirt master commit hash 323e1611720cb7adf797298e4abc33b9869d401c

How reproducible:

100%

Steps to Reproduce:
1. Create a cluster with 2 hosts
2. Block communication with engine on one host using iptables
3. Blocked host is not fenced and it stays in Non Responsive state forever

Actual results:

Host is not fenced and stays in Non Responsive state forever

Expected results:

Host should be fenced successfully and become Up again
Additional info:

Comment 1 sefi litmanovich 2014-09-04 08:49:06 UTC
Verified with ovirt-engine-3.5.0-0.0.master.20140821064931.gitb794d66.el6.noarch,
according to steps in description.

after blocking connection from the host to the engine, host became non-responsive, fence action was invoked and host was successfully restarted.

The error in the description didn't appear in server log.

Comment 2 Sandro Bonazzola 2014-10-17 12:39:20 UTC
oVirt 3.5 has been released and should include the fix for this issue.