Bug 1412583 - Multiple disconnect messages in VDSM log
Summary: Multiple disconnect messages in VDSM log
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Bindings-API
Version: ---
Hardware: x86_64
OS: Linux
unspecified
high with 1 vote vote
Target Milestone: ovirt-4.1.1
: 4.19.7
Assignee: Piotr Kliczewski
QA Contact: Petr Kubica
URL:
Whiteboard:
: 1417835 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-12 11:05 UTC by Yaniv Kaul
Modified: 2017-04-21 09:32 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-21 09:32:17 UTC
oVirt Team: Infra
rule-engine: ovirt-4.1+


Attachments (Terms of Use)
logs (1.24 MB, application/x-gzip)
2017-01-12 11:05 UTC, Yaniv Kaul
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 71243 'None' MERGED m2c: reduce log level for ssl errors 2020-06-04 11:03:15 UTC
oVirt gerrit 72506 'None' MERGED m2c: reduce log level for ssl errors 2020-06-04 11:03:15 UTC

Description Yaniv Kaul 2017-01-12 11:05:59 UTC
Created attachment 1239851 [details]
logs

Description of problem:
A simple ovirt-system-test run, without running VM (and therefore no snapshots tests, no migration tests, no suspend-resume) - yet we see several disconnections of MOM from VDSM.
However, this is only seen in VDSM.log - nothing in mom. It'd be helpful to know:
1. Why it happened.
2. Why there's nothing logged on mom side.

[mini@ykaul-mini vdsm]$ grep SSL vdsm.log  |grep "::1"
2017-01-12 05:52:08,288 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 42336, 0, 0) at 0x18cec68>: unexpected eof (betterAsyncore:119)
2017-01-12 05:52:13,223 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 42338, 0, 0) at 0x18b67e8>: unexpected eof (betterAsyncore:119)
2017-01-12 05:52:13,541 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 42340, 0, 0) at 0x2815cb0>: unexpected eof (betterAsyncore:119)
2017-01-12 05:52:13,881 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 42342, 0, 0) at 0x18ceb48>: unexpected eof (betterAsyncore:119)
2017-01-12 05:52:14,202 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 42344, 0, 0) at 0x269cf80>: unexpected eof (betterAsyncore:119)


Version-Release number of selected component (if applicable):
[root@lago-basic-suite-master-host0 ~]# rpm -qa |grep vdsm
vdsm-api-4.20.0-176.gitd845c5f.el7.centos.noarch
vdsm-jsonrpc-4.20.0-176.gitd845c5f.el7.centos.noarch
vdsm-4.20.0-176.gitd845c5f.el7.centos.x86_64
vdsm-hook-vmfex-dev-4.20.0-176.gitd845c5f.el7.centos.noarch
vdsm-client-4.20.0-176.gitd845c5f.el7.centos.noarch
vdsm-python-4.20.0-176.gitd845c5f.el7.centos.noarch
vdsm-yajsonrpc-4.20.0-176.gitd845c5f.el7.centos.noarch
vdsm-cli-4.20.0-176.gitd845c5f.el7.centos.noarch
vdsm-xmlrpc-4.20.0-176.gitd845c5f.el7.centos.noarch
[root@lago-basic-suite-master-host0 ~]# rpm -qa |grep mom
mom-0.5.8-0.0.master.el7.centos.noarch


How reproducible:
Always.

Steps to Reproduce:
1. Run ovirt-system-tests

Comment 1 Martin Sivák 2017-01-16 09:21:09 UTC
How do you know these come from MOM? The default polling frequency is once per 15 seconds. We are not trying every second (or multiple times per second).

There is also hosted engine that is talking to vdsm, but that too only does that once per 10 seconds or so.

Moreover, we use the json client library to do our calls, so I wonder if some kind of disconnect/reconnect is happening there.

Comment 2 Yaniv Kaul 2017-01-16 09:55:52 UTC
(In reply to Martin Sivák from comment #1)
> How do you know these come from MOM? The default polling frequency is once
> per 15 seconds. We are not trying every second (or multiple times per
> second).

I don't.

> 
> There is also hosted engine that is talking to vdsm, but that too only does
> that once per 10 seconds or so.

This specific ovirt-system-test does not run hosted-engine.

> 
> Moreover, we use the json client library to do our calls, so I wonder if
> some kind of disconnect/reconnect is happening there.

Perhaps - do you think it hides those from upper level users such as mom?

Comment 3 Piotr Kliczewski 2017-01-17 13:43:45 UTC
I checked the logs and I am not able to correlate those exceptions between vdsm and mom log.

2017-01-12 05:52:08,288 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 42336, 0, 0) at 0x18cec68>: unexpected eof (betterAsyncore:119)

==============================================

2017-01-12 05:52:09,368 - mom.RPCServer - INFO - ping()
2017-01-12 05:52:09,369 - mom.RPCServer - INFO - getStatistics()

==============================================

2017-01-12 05:52:13,223 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 42338, 0, 0) at 0x18b67e8>: unexpected eof (betterAsyncore:119)

2017-01-12 05:52:13,541 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 42340, 0, 0) at 0x2815cb0>: unexpected eof (betterAsyncore:119)

2017-01-12 05:52:13,881 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 42342, 0, 0) at 0x18ceb48>: unexpected eof (betterAsyncore:119)

==============================================

2017-01-12 05:52:19,527 - mom.RPCServer - INFO - ping()
2017-01-12 05:52:19,528 - mom.RPCServer - INFO - getStatistics()

IS there anything else which could be calling vdsm?

Comment 4 Martin Sivák 2017-01-23 11:58:28 UTC
Is lago or the system test infrastructure itself monitoring the host?

Comment 5 Yaniv Kaul 2017-01-23 12:03:59 UTC
(In reply to Martin Sivák from comment #4)
> Is lago or the system test infrastructure itself monitoring the host?

Nope.

Comment 6 Martin Sivák 2017-01-25 11:41:24 UTC
We are pretty certain this is not MOM related so we decided to move it to Infra for now so they can either do something about the error message (it is not an error, nothing breaks and so the log severity can be reduced) or properly investigate who connects to vdsm.

Comment 7 Piotr Kliczewski 2017-01-25 19:53:44 UTC
If it is not mom than what is connecting to vdsm? Is there anything else than mom which could connect to vdsm from localhost?

Comment 8 Yaniv Kaul 2017-01-25 20:27:38 UTC
(In reply to Piotr Kliczewski from comment #7)
> If it is not mom than what is connecting to vdsm? Is there anything else
> than mom which could connect to vdsm from localhost?

Not that I can think of.

Comment 9 Piotr Kliczewski 2017-01-26 08:42:50 UTC
Based on above conversation I am not able to understand what is causing above errors upon connection closure. Is it ok to close it with insufficient data and reopen once we have more info?

Comment 10 Yaniv Kaul 2017-01-26 08:52:22 UTC
Since I reproduce it EVERY TIME I run ovirt-system-tests, I'm not sure what else I can provide. Any ideas?

Comment 11 Piotr Kliczewski 2017-01-26 09:16:03 UTC
Thinking about it more mom uses GC for closing the connections so it could lead to lack of correlation between interaction with vdsm and connection closure. This could cause delays in seeing m2crypto complaining about the connections being closed. I investigated those issues some time ago by analysing ssl debug logs and I haven't seen any issues that were caused by it. On the other hand when we use ssl module instead the errors are not there. I want to drop m2crypto support but it requires performance testing (due to known issue we had) and it hasn't happened yet.

I understand that err level logs are annoying so we can reduce the log level in short term and move away from m2crypto in the long term.

Comment 12 Martin Perina 2017-01-31 14:19:45 UTC
*** Bug 1417835 has been marked as a duplicate of this bug. ***

Comment 13 Michael Burman 2017-02-02 14:45:32 UTC
This bug happens and triggered every time i move a data storage domain to maintenance - 

2017-02-02 16:40:24,589+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (default task-15) [a2749088-f32c-4cf4-9513-3877a8b6bd3d] START, GetImageInfoVDSCommand( GetImageInfoVDSCommandPar
ameters:{runAsync='true', storagePoolId='74dd8b69-7211-4914-9a5f-3d64162eacfd', ignoreFailoverLimit='false', storageDomainId='40b3e690-4b26-481e-be7d-4dfb303a8660', imageGroupId='f04116ae-f373-4780-9377-8e30e234f1
27', imageId='4dba5eb0-ba31-48f0-a81a-5b65368b69b1'}), log id: 283048cd
2017-02-02 16:40:26,647+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Internal server error 876 > 875


Feb  2 16:43:45 orchid-vds1 journal: vdsm vds.dispatcher ERROR SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:10.35.160.148', 58438, 0, 0) at 0x4571518>: unexpected eof

This is the SPM host^^ 

And i think all of this causing to this BZ 1418673

Comment 14 Piotr Kliczewski 2017-02-02 15:04:30 UTC
Michael,

It do not looks to be related. Setting storage to maintenance causes the engine to close IRS connection to vdsm which we see as the message you pasted.

Comment 15 Коренберг Марк 2017-03-15 12:34:38 UTC
It's triggerd for me after fresh install of ovirt hosted engine and reboot. What can I do to debug or workaround that? hosted engine did not started...

Comment 16 Piotr Kliczewski 2017-03-15 12:57:45 UTC
Can you provide more info about what happened? The issue in this bug it do not cause any issues by itself. Please provide the logs.

Comment 17 Коренберг Марк 2017-03-15 13:13:13 UTC
Which log files should I attach ?

I have Centos 7, so logs are in journald.

Comment 18 Коренберг Марк 2017-03-15 13:35:42 UTC
```
2017-03-15 16:14:12,278 INFO  (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:54416 (protocoldetector:72)
2017-03-15 16:14:12,283 INFO  (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:54416 (protocoldetector:127)
2017-03-15 16:14:12,284 INFO  (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102)
2017-03-15 16:14:12,284 INFO  (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129)
2017-03-15 16:14:12,422 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.00 seconds (__init__:515)
2017-03-15 16:14:12,425 INFO  (jsonrpc/0) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49)
2017-03-15 16:14:12,425 INFO  (jsonrpc/0) [dispatcher] Run and protect: repoStats, Return response: {u'1cf0af9c-2d13-4375-91e5-b8dc76348298': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000401077', 'lastCheck': '0.2', 'valid': True}} (logUtils:52)
2017-03-15 16:14:12,441 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.02 seconds (__init__:515)
2017-03-15 16:14:12,456 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 54416, 0, 0) at 0x437e290>: unexpected eof (betterAsyncore:113)
2017-03-15 16:14:13,851 INFO  (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:54418 (protocoldetector:72)
2017-03-15 16:14:13,857 INFO  (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:54418 (protocoldetector:127)
2017-03-15 16:14:13,857 INFO  (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102)
2017-03-15 16:14:13,859 INFO  (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129)
2017-03-15 16:14:13,998 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.00 seconds (__init__:515)
2017-03-15 16:14:14,000 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 54418, 0, 0) at 0x3686128>: unexpected eof (betterAsyncore:113)
2017-03-15 16:14:14,534 INFO  (jsonrpc/2) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49)
2017-03-15 16:14:14,534 INFO  (jsonrpc/2) [dispatcher] Run and protect: repoStats, Return response: {u'1cf0af9c-2d13-4375-91e5-b8dc76348298': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000401077', 'lastCheck': '2.3', 'valid': True}} (logUtils:52)
2017-03-15 16:14:14,550 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.02 seconds (__init__:515)
```

Comment 19 Коренберг Марк 2017-03-15 13:36:51 UTC
i.e.

```
2017-03-15 16:14:12,456 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::1', 54416, 0, 0) at 0x437e290>: unexpected eof (betterAsyncore:113)
```

repeated many times

Comment 20 Piotr Kliczewski 2017-03-15 14:56:51 UTC
OK. so what you are referring to is BZ #1417708 and it is being worked on. Once this one and BZ #1417708 is solved you want see the issue in the logs anymore.
Please note it is harmless to your deployment.

Comment 21 Petr Kubica 2017-04-12 08:53:19 UTC
Verified in vdsm-yajsonrpc-4.19.10-1.el7ev.noarch

this is changing the mentioned message to be only on debug level, thus not visible in log without debug enabled


Note You need to log in before you can comment on or make changes to this bug.