Bug 1324155 - (Stomp Reactor) Unable to process messages
Summary: (Stomp Reactor) Unable to process messages
Keywords:
Status: CLOSED DUPLICATE of bug 1323465
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: 4.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
high vote
Target Milestone: ovirt-4.0.0-beta
: ---
Assignee: Piotr Kliczewski
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks: 1330383
TreeView+ depends on / blocked
 
Reported: 2016-04-05 16:56 UTC by jniederm
Modified: 2016-05-20 11:53 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1330383 (view as bug list)
Environment:
Last Closed: 2016-05-20 11:53:46 UTC
oVirt Team: Infra
oourfali: ovirt-4.0.0?
rule-engine: planning_ack+
mperina: devel_ack+
rule-engine: testing_ack?


Attachments (Terms of Use)
logs (3.11 MB, application/zip)
2016-04-05 17:04 UTC, jniederm
no flags Details
engine.log in debug mode (10.64 MB, application/zip)
2016-04-07 15:42 UTC, jniederm
no flags Details
vdsm.log host x.x.122.154 (1.28 MB, application/zip)
2016-04-08 09:48 UTC, jniederm
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 55872 0 master ABANDONED jsonrpc: make sure not to block when processing i/o 2016-05-04 07:39:45 UTC

Description jniederm 2016-04-05 16:56:08 UTC
Description of problem:
Sometimes engine gets so some wrong stats such that it prints to log following message every 5 seconds. The host (IP 192.168.122.201) becomes "Not responsive" in webadmin. However the host can be still connected to using ssh, `systemct status vdsmd` reports "active (running)", remote vdsClient calls works.

2016-04-05 17:50:32,124 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (Stomp Reactor) [] Unable to process messages
2016-04-05 17:50:34,123 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-80) [] Command 'SpmStatusVDSCommand(HostName = fedora_host1, SpmStatusVDSCommandParameters:{runAsync='true', hostId='695c477c-81bb-4797-9c75-9dd7c7cca025', storagePoolId='00000001-0001-0001-0001-000000000028'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout
2016-04-05 17:50:35,126 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (Stomp Reactor) [] Connecting to /192.168.122.201
2016-04-05 17:50:35,129 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (Stomp Reactor) [] Unable to process messages
2016-04-05 17:50:37,128 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-25) [] Command 'GetCapabilitiesVDSCommand(HostName = fedora_host1, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='695c477c-81bb-4797-9c75-9dd7c7cca025', vds='Host[fedora_host1,695c477c-81bb-4797-9c75-9dd7c7cca025]'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout
2016-04-05 17:50:37,128 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler_Worker-25) [] Failure to refresh Vds runtime info: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout
2016-04-05 17:50:37,128 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler_Worker-25) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:157) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:120) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:]
	at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
	at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:449) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:649) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:121) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refresh(HostMonitoring.java:85) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:234) [vdsbroker.jar:]
	at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source) [:1.8.0_77]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_77]
	at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_77]
	at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
	at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
	at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
Caused by: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout
	at org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompClient$1.execute(StompClient.java:56) [vdsm-jsonrpc-java-client.jar:]
	at org.ovirt.vdsm.jsonrpc.client.reactors.stomp.StompClient.postConnect(StompClient.java:90) [vdsm-jsonrpc-java-client.jar:]
	at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:142) [vdsm-jsonrpc-java-client.jar:]
	at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:114) [vdsm-jsonrpc-java-client.jar:]
	at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:73) [vdsm-jsonrpc-java-client.jar:]
	at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:68) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getCapabilities(JsonRpcVdsServer.java:255) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:15) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) [vdsbroker.jar:]
	... 14 more



Version-Release number of selected component (if applicable):
engine 4.0 master, commit 4ffd5a4
vdsm
Version     : 4.17.999
Release     : 837.gite31d74d.fc23

How reproducible:
?

Steps to Reproduce:
1. I don't know how to trigger it. It happens pretty randomly.


Additional info:
It can be work-arounded by restarting the engine. I've never noticed the engine to recover from this state.

Comment 1 jniederm 2016-04-05 17:04:40 UTC
Created attachment 1143898 [details]
logs

Comment 2 Piotr Kliczewski 2016-04-06 06:45:52 UTC
please provide engine log with debug level.

Comment 3 jniederm 2016-04-06 13:04:27 UTC
I updated my log level, I will send the log when it happen again.

Comment 4 jniederm 2016-04-07 15:42:50 UTC
Created attachment 1144800 [details]
engine.log in debug mode

Comment 5 Piotr Kliczewski 2016-04-08 07:38:00 UTC
It looks like you lost connectivity with a host x.x.122.154. Please provide vdsm log from this host around 2016-04-07 17:23:57,266.

Comment 6 jniederm 2016-04-08 09:48:26 UTC
Created attachment 1145077 [details]
vdsm.log host x.x.122.154

Comment 7 Sandro Bonazzola 2016-05-02 09:53:02 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 8 Piotr Kliczewski 2016-05-20 09:46:16 UTC
Please retest with latest master. I think that the issue was fixed as part of setup networks fixes.

Comment 9 jniederm 2016-05-20 11:33:52 UTC
I've deployed latest master (commit 96db8e5) and it's working. However since there are no steps to reproduce it doesn't bring much informational value.

Comment 10 Martin Perina 2016-05-20 11:53:46 UTC
Unable to reproduce the issue on current master and according to previous comments it was most probably fixed by patches for setupNetworks/jsonrpc issues, closing this as duplicate of BZ1323465

*** This bug has been marked as a duplicate of bug 1323465 ***


Note You need to log in before you can comment on or make changes to this bug.