Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1132452

Summary: [scale] Host became non responsive on large environment (200 DCs) - vdsm error "Secured object is not in safe state"
Product: Red Hat Enterprise Virtualization Manager Reporter: Yuri Obshansky <yobshans>
Component: vdsmAssignee: Liran Zelkha <lzelkha>
Status: CLOSED NOTABUG QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.1-1CC: amureini, bazulay, ecohen, emesika, gklein, iheim, lpeer, oourfali, yeylon, yobshans
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-16 05:46:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm.log
none
engine.log none

Description Yuri Obshansky 2014-08-21 11:41:56 UTC
Description of problem:
Fake Hosts periodically became non responsive on large environment which has 200 DCs + Clusters + Fake Hosts + Storage Domains 
Initial setup has 1 DC, 1 Cluster, 2 Hosts, 1 SD and 200 VMs
As result Data Center is not functional for some period.

vdsm.log

Thread-13::ERROR::2014-08-21 11:11:05,490::task::866::TaskManager.Task::(_setError) Task=`fe514fc4-6f4e-4750-9751-ec14046f2a12`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2126, in getAllTasksStatuses
    allTasksStatus = sp.getAllTasksStatuses()
  File "/usr/share/vdsm/storage/securable.py", line 73, in wrapper
    raise SecureError("Secured object is not in safe state")
SecureError: Secured object is not in safe state
..............
Thread-13::ERROR::2014-08-21 11:11:05,527::dispatcher::68::Storage.Dispatcher.Protect::(run) Secured object is not in safe state
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 60, in run
    result = ctask.prepare(self.func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 103, in wrapper
    return m(self, *a, **kw)
  File "/usr/share/vdsm/storage/task.py", line 1176, in prepare
    raise self.error
SecureError: Secured object is not in safe state

engine.log

2014-08-21 11:10:51,761 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-63) [3ae1d9eb] Command GetCapabilitiesVDSCommand(HostName = TEST_
HOST_142, HostId = 678fa07d-1092-4fff-b79d-971a78023f6f, vds=Host[TEST_HOST_142,678fa07d-1092-4fff-b79d-971a78023f6f]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: 
Connection refused
2014-08-21 11:10:51,765 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-63) [3ae1d9eb] Correlation ID: null, Call Stack: org.ovirt.engin
e.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:102)
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56)
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31)
        at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:511)
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refreshVdsRunTimeInfo(VdsUpdateRunTimeInfo.java:488)
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refresh(VdsUpdateRunTimeInfo.java:344)
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:231)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618)
        at org.ovirt.engine.core.utils.ssl.AuthSSLProtocolSocketFactory.createSocket(AuthSSLProtocolSocketFactory.java:288)
        at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
        at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
        at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
        at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
        at org.apache.xmlrpc.client.XmlRpcCommonsTransport.writeRequest(XmlRpcCommonsTransport.java:227)
        at org.apache.xmlrpc.client.XmlRpcStreamTransport.sendRequest(XmlRpcStreamTransport.java:151)
        at org.apache.xmlrpc.client.XmlRpcHttpTransport.sendRequest(XmlRpcHttpTransport.java:143)
        at org.apache.xmlrpc.client.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:56)
        at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:167)
        at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:137)
        at org.apache.xmlrpc.client.XmlRpcClient.execute(XmlRpcClient.java:126)
        at org.apache.xmlrpc.client.util.ClientFactory$1.invoke(ClientFactory.java:140)
        at com.sun.proxy.$Proxy53.getVdsCapabilities(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcUtils$AsyncProxy$InternalCallable.call(XmlRpcUtils.java:225)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
, Custom Event ID: -1, Message: Host TEST_HOST_142 is non responsive.
...........
2014-08-21 11:10:52,964 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-60) [5263f19] Command SpmStatusVDSCommand(HostName = TEST_HOST_142, Hos
tId = 678fa07d-1092-4fff-b79d-971a78023f6f, storagePoolId = 80a3d808-3945-4db3-91ed-9a8fd0e1e9f8) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2014-08-21 11:10:52,966 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-60) [5263f19] IrsBroker::Failed::GetStoragePoolInfoVDS due to: ConnectExce
ption: Connection refused
2014-08-21 11:10:54,534 ERROR [org.ovirt.engine.core.bll.SshSoftFencingCommand] (org.ovirt.thread.pool-4-thread-151) [68824fdb] SSH Soft Fencing command failed on host 10.35.18.242: Command return
ed failure code 1 during SSH session 'root.18.242'
Stdout: 
Stderr: Traceback (most recent call last):
  File "/usr/bin/vdsm-tool", line 145, in <module>
    sys.exit(main())
  File "/usr/bin/vdsm-tool", line 142, in main
    return tool_command[cmd]["command"](*args[1:])
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/service.py", line 391, in service_restart
    return _runAlts(_srvRestartAlts, srvName)
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/service.py", line 351, in _runAlts
    "%s failed" % alt.func_name, out, err)
vdsm.tool.service.ServiceOperationError: ServiceOperationError: _serviceRestart failed
cannot stop vdsm, operation is locked[FAILED]
cannot start vdsm, operation is locked[FAILED]
Stacktrace: java.io.IOException: Command returned failure code 1 during SSH session 'root.18.242': java.io.IOException: Command returned failure code 1 during SSH session 'root.18.242'
        at org.ovirt.engine.core.utils.ssh.SSHClient.executeCommand(SSHClient.java:527) [utils.jar:]
        at org.ovirt.engine.core.bll.SshSoftFencingCommand.executeSshSoftFencingCommand(SshSoftFencingCommand.java:81) [bll.jar:]
        at org.ovirt.engine.core.bll.SshSoftFencingCommand.executeCommand(SshSoftFencingCommand.java:44) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1133) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1218) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1894) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1238) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:351) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:415) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:394) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:636) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) [:1.7.0_51]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_51]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_51]
        at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:58) [jboss-as-ee.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:58) [jboss-as-ee.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43) [jboss-as-ejb3.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53) [jboss-as-ee.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:52) [jboss-as-ejb3.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:259) [jboss-as-ejb3.jar:7.3.2.Final-redhat-1]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:398) [jboss-as-ejb3.jar:7.3.2.Final-redhat-1]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:242) [jboss-as-ejb3.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.3.2.Final-redhat-1]
        at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:182) [jboss-as-ee.jar:7.3.2.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.3.2.Final-redhat-1]
        at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view8.runInternalAction(Unknown Source) [bll.jar:]
        at org.ovirt.engine.core.bll.VdsEventListener$2.run(VdsEventListener.java:154) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:97) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_51]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_51]
        at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_51]
..............................................
2014-08-21 11:10:54,625 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-4-thread-151) [6092d104] Failed to run Fence script on vds:TEST_HOST_142, VMs moved to UnKnown instead.
2014-08-21 11:10:54,855 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-82) [71303c04] Command GetCapabilitiesVDSCommand(HostName = TEST_HOST_142, HostId = 678fa07d-1092-4fff-b79d-971a78023f6f, vds=Host[TEST_HOST_142,678fa07d-1092-4fff-b79d-971a78023f6f]) execution failed. Exception: VDSNetworkException: java.net.ConnectException: :


Version-Release number of selected component (if applicable):
RHEVM Version: 3.4.1-0.25.el6ev
OS Version: RHEL - 6Server - 6.5.0.1.el6
Kernel Version: 2.6.32 - 431.el6.x86_64
KVM Version: 0.12.1.2 - 2.415.el6_5.9
LIBVIRT Version: libvirt-0.10.2-29.el6_5.7
VDSM Version: vdsm-4.14.7-2.el6ev

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Yuri Obshansky 2014-08-21 11:44:14 UTC
Created attachment 929174 [details]
vdsm.log

Comment 2 Yuri Obshansky 2014-08-21 11:45:48 UTC
Created attachment 929175 [details]
engine.log

Comment 3 Allon Mureinik 2014-08-21 12:32:15 UTC
Seems to be related to task polling - flagging as infra to start the research.

Comment 6 Yuri Obshansky 2014-10-06 11:34:50 UTC
Closed. No comments