Bug 1176407
Summary: | JSON-RPC | engine gets stuck randomly | ERROR (SSL Stomp Reactor) Unable to process messages | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Meni Yakove <myakove> | ||||||
Component: | ovirt-engine | Assignee: | Piotr Kliczewski <pkliczew> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Meni Yakove <myakove> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.5.0 | CC: | aberezin, amureini, bazulay, danken, ecohen, gklein, iheim, lpeer, lsurette, lvernia, myakove, nsednev, oourfali, pstehlik, ratamir, rbalakri, Rhev-m-bugs, yeylon | ||||||
Target Milestone: | --- | Keywords: | AutomationBlocker | ||||||
Target Release: | 3.5.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | infra | ||||||||
Fixed In Version: | vt13.5 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1176528 (view as bug list) | Environment: | |||||||
Last Closed: | 2015-02-17 17:13:39 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1164308, 1164311, 1176417 | ||||||||
Attachments: |
|
I checked vdsm logs and this is the issue found on vdsm: Thread-2882::ERROR::2014-12-19 23:08:33,431::API::1499::vds::(_rollback) Cannot delete network u'sw162': It doesn't exist in the system Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1497, in _rollback yield rollbackCtx File "/usr/share/vdsm/API.py", line 1385, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> **kwargs) File "<string>", line 2, in setupNetworks File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) ConfigNetworkError: (27, "Cannot delete network u'sw162': It doesn't exist in the system") which causes error on the engine side. I found additional issue in the engine logs which seems to be unrelated to networking issue: 2014-12-19 21:55:19,167 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler_Worker-30) [371dfc3f] transaction rolled back 2014-12-19 21:55:19,168 ERROR [org.ovirt.engine.core.bll.AddVmFromScratchCommand] (DefaultQuartzScheduler_Worker-30) [371dfc3f] Command org.ovirt.engine.core.bll.AddVmFromScratchCommand throw exception: java.lang.NullPointerException at org.ovirt.engine.core.bll.AddVmCommand.addDiskPermissions(AddVmCommand.java:1247) [bll.jar:] at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:825) [bll.jar:] at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:820) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:210) [utils.jar:] at org.ovirt.engine.core.bll.AddVmCommand.executeVmCommand(AddVmCommand.java:820) [bll.jar:] at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:104) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1191) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1330) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1955) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1354) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:352) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:411) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:617) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_71] Dan - can you look at this as well? I pushed a patch to remove side effect (NPE) of setupNetwork failure. If test case covers the failure then we have a fix for this bug. We need to understand nature of setupNetworks failure. (In reply to Piotr Kliczewski from comment #1) > I found additional issue in the engine logs which seems to be unrelated to > networking issue: Agreed. I cloned it to bug 1176528 - let's track it there. Might be connected to 1162774 ? (In reply to Nikolai Sednev from comment #5) > Might be connected to 1162774 ? Doesn't seem related. (In reply to Oved Ourfali from comment #6) > (In reply to Nikolai Sednev from comment #5) > > Might be connected to 1162774 ? > > Doesn't seem related. Agreed. I was also unable to add RHEL7.0 two hosts to RHEVM vt13.4 with JSONRPC enabled on both hosts by default, until I've removed this functionality from them. Used components: vdsm-4.16.8.1-4.el7ev.x86_64 qemu-kvm-rhev-1.5.3-60.el7_0.11.x86_64 sanlock-3.2.2-2.el7.x86_64 libvirt-client-1.2.8-10.el7.x86_64 rhevm-3.5.0-0.26.el6ev.noarch 2014-12-22 11:01:11,587 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) Connecting to brown-vdsc.qa.lab.tlv.redhat.com/10. 35.106.12 2014-12-22 11:01:11,590 WARN [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing 2014-12-22 11:01:12,672 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-6) [58aefbe4] Command Ge tCapabilitiesVDSCommand(HostName = purple-vds1.qa.lab.tlv.redhat.com, HostId = 90d25f9c-0e64-4c8e-9165-78e80b18219c, vds=Host[purple-vds1.qa.lab.tlv.redhat .com,90d25f9c-0e64-4c8e-9165-78e80b18219c]) execution failed. Exception: VDSNetworkException: java.net.SocketTimeoutException: connect timed out 2014-12-22 11:01:12,673 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-6) [58aefbe4] Host purple-vds1.qa.lab.tlv.redhat. com is not responding. It will stay in Connecting state for a grace period of 120 seconds and after that an attempt to fence the host will be issued. 2014-12-22 11:01:12,686 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-6) [58aefbe4] Failure to refresh Vds ru ntime info: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.SocketTimeoutException: connect timed out at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:126) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:101) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56) [vdsbroker.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31) [dal.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:571) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refreshVdsRunTimeInfo(VdsUpdateRunTimeInfo.java:648) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refresh(VdsUpdateRunTimeInfo.java:494) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:236) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor83.invoke(Unknown Source) [:1.7.0_71] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_71] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_71] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:] Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) [rt.jar:1.7.0_71] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) [rt.jar:1.7.0_71] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) [rt.jar:1.7.0_71] at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) [rt.jar:1.7.0_71] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) [rt.jar:1.7.0_71] at java.net.Socket.connect(Socket.java:579) [rt.jar:1.7.0_71] at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618) [jsse.jar:1.7.0_71] at org.ovirt.engine.core.utils.ssl.AuthSSLProtocolSocketFactory.createSocket(AuthSSLProtocolSocketFactory.java:293) [utils.jar:] at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) [commons-httpclient.jar:] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) [commo ns-httpclient.jar:] at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) [commons-httpclient.jar:] : vt13.5 Still occurs - vt13.5 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages 11:59:26 2014-12-30 11:59:05,216 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages 11:59:26 2014-12-30 11:59:05,220 WARN [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing 11:59:26 2014-12-30 11:59:07,745 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (DefaultQuartzScheduler_Worker-56) Exception during connection 11:59:26 2014-12-30 11:59:07,752 WARN [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing 11:59:26 2014-12-30 11:59:07,758 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-56) [7e2170c7] Command SpmStatusVDSCommand(HostName = host_mixed_2, HostId = 825c1a69-19d0-4d64-8824-af76990d8c9e, storagePoolId = d83e1463-29dc-48a0-86e2-3fac3a41b15c) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused logs attached Created attachment 974407 [details]
new logs
Can you specify the flow in which it happened? Was the host responsive afterwards? Does it reproduce? I checked the logs and the issue results is the same but cause is different: Thread-64528::DEBUG::2014-12-30 11:59:04,007::API::545::vds::(migrationCreate) Migration create Thread-64528::INFO::2014-12-30 11:59:04,007::clientIF::394::vds::(createVm) vmContainerLock acquired by vm 777439ba-28fe-4bd4-bad2-d602ea13b24d Thread-64528::WARNING::2014-12-30 11:59:04,008::libvirtconnection::135::root::(wrapper) connection to libvirt broken. ecode: 1 edom: 7 Thread-64528::CRITICAL::2014-12-30 11:59:04,009::libvirtconnection::137::root::(wrapper) taking calling process down. MainThread::DEBUG::2014-12-30 11:59:04,010::vdsm::58::vds::(sigtermHandler) Received signal 15 Thread-64528::DEBUG::2014-12-30 11:59:04,010::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 1 edom: 7 level: 2 message: internal error client socket is closed MainThread::DEBUG::2014-12-30 11:59:04,010::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor Thread-64528::INFO::2014-12-30 11:59:04,010::vm::6098::root::(_getNetworkIp) network rhevm: using 0 MainThread::INFO::2014-12-30 11:59:04,011::__init__::565::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server Detector thread::DEBUG::2014-12-30 11:59:04,012::protocoldetector::115::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor MainThread::INFO::2014-12-30 11:59:04,014::vmchannels::188::vds::(stop) VM channels listener was stopped. MainThread::INFO::2014-12-30 11:59:04,014::momIF::91::MOM::(stop) Shutting down MOM It seems that during vm migration vdsm stopped/crashed. I suggest to open another bug. Based on comment #13, moving back to on_qa. Raz, if not reproduced with the scenario in the description, please move to verified. Also, please open a new bug on the new scenario, and attach the relevant logs there. Based on comment #14 - moving to verify and I will open new bug (if needed) rhev 3.5.0 was released. closing. |
Created attachment 971693 [details] vnic_profile_job_logs Description of problem: Our automation jobs fail randomly with: Engine: 2014-12-21 10:59:14,282 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages 2014-12-21 10:59:14,284 WARN [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing 2014-12-21 10:59:14,284 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages 2014-12-21 10:59:14,285 WARN [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing Server: 2014-12-21 10:43:51,129 ERROR [stderr] (Response tracker) Exception in thread "Response tracker" java.lang.NullPointerException 2014-12-21 10:43:51,130 ERROR [stderr] (Response tracker) at org.ovirt.vdsm.jsonrpc.client.internal.ResponseTracker.removeRequestFromTracking(ResponseTracker.java:58) 2014-12-21 10:43:51,131 ERROR [stderr] (Response tracker) at org.ovirt.vdsm.jsonrpc.client.internal.ResponseTracker.run(ResponseTracker.java:93) 2014-12-21 10:43:51,131 ERROR [stderr] (Response tracker) at java.lang.Thread.run(Thread.java:745) switching to xml-rpc solve the issue Version-Release number of selected component (if applicable): rhevm-3.5.0-0.26.el6ev.noarch How reproducible: Steps to Reproduce: 1. Automation jobs: Port mirroring vNIC Profiles Network migration