Bug 2025381

Summary: VDSM logs are spammed due to multiple connection attempts and drops from ha-agent client
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-hosted-engine-haAssignee: Yedidyah Bar David <didi>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4.9CC: didi, lsurette, paulds, sbonazzo
Target Milestone: ovirt-4.4.10Keywords: Triaged, ZStream
Target Release: 4.4.10   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 2.4.10-1 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-08 10:08:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Germano Veit Michel 2021-11-22 02:43:28 UTC
Description of problem:

Upon restarting VDSM, the ha-agent is looping on connecting and disconnecting from VDSM with some sort of immediate timeout.
Everything is working fine (ha-agent has another working and stable connection to VDSM) which is used for the calls, but there seem to be this client thread looping over and over after the restart.

The result is VSDM logs are spamed with WARN aprox every ~10s (ha-agent monitoring loop cicle period)

2021-11-22 12:39:14,609+1000 INFO  (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:37988 (protocoldetector:61)
2021-11-22 12:39:14,612+1000 WARN  (Reactor thread) [vds.dispatcher] unhandled write event (betterAsyncore:184)
2021-11-22 12:39:14,612+1000 INFO  (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:37988 (protocoldetector:125)
2021-11-22 12:39:14,613+1000 INFO  (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompserver:95)
2021-11-22 12:39:14,613+1000 INFO  (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompserver:124)
2021-11-22 12:39:16,615+1000 WARN  (JsonRpc (StompReactor)) [vds.dispatcher] unhandled close event (betterAsyncore:184)

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-2.4.9-1.el8ev.noarch
vdsm-4.40.90.4-1.el8ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Have a working environment with Hosted-Engine, not in maintenace
2. Restart just vdsmd service

Actual results:
* VDSM logs spammed
* ha-agent opening and closing connection to VDSM indefinitely

Expected results:
* No connection retry loop

Additional info:
* Enable debug on ha-agent and wou will see this at each loop:
MainThread::DEBUG::2021-11-22 12:36:54,825::hosted_engine::542::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Sleeping 10 seconds
Client localhost:54321::DEBUG::2021-11-22 12:37:02,455::stomp::398::stomp.AsyncDispatcher::(handle_connect) managed to connect successfully.
Client localhost:54321::DEBUG::2021-11-22 12:37:02,456::stomp::431::stomp.AsyncDispatcher::(handle_error) Communication error occurred.
Client localhost:54321::DEBUG::2021-11-22 12:37:02,456::stomp::431::stomp.AsyncDispatcher::(handle_error) Communication error occurred.
Client localhost:54321::DEBUG::2021-11-22 12:37:02,456::stompclient::128::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect
Client localhost:54321::DEBUG::2021-11-22 12:37:02,456::stomp::398::stomp.AsyncDispatcher::(handle_connect) managed to connect successfully.
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,457::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:02,458::betterAsyncore::154::vds.dispatcher::(send) SSL error sending to <yajsonrpc.betterAsyncore.Dispatcher connected ::1:54321 at 0x7f043d54c160>: The operation did not complete (read) (_ssl.c:2207) 
Client localhost:54321::DEBUG::2021-11-22 12:37:04,462::stompclient::128::yajsonrpc.protocols.stomp.AsyncClient::(handle_timeout) Timeout occurred, trying to reconnect

Comment 1 Germano Veit Michel 2021-11-22 02:45:40 UTC
I think something is wrong when closing the socket to the VDSM that restarted, in netstat there are old connections in FIN_WAIT2 state from ha-agent side.

Comment 3 Yedidyah Bar David 2021-12-16 12:59:19 UTC
Moving back to POST, need to cherry-pick the patch for 4.4.

Comment 6 Nikolai Sednev 2022-01-20 08:03:59 UTC
Please fill in the "Fixed In Version" field with relevant fixed in version.

Comment 7 Nikolai Sednev 2022-01-23 09:37:07 UTC
Our current build contains ovirt-hosted-engine-ha-2.4.9-1.el8ev.noarch, will wait a bit, until 2.4.10-1 arrives.

Comment 9 Nikolai Sednev 2022-01-25 16:15:25 UTC
ovirt-hosted-engine-ha-2.4.10-1.el8ev.noarch
ovirt-hosted-engine-setup-2.5.4-2.el8ev.noarch
Red Hat Enterprise Linux release 8.5 (Ootpa)
Linux 4.18.0-348.13.1.el8_5.x86_64 #1 SMP Tue Jan 18 16:53:27 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

systemctl restart ovirt-hosted-engine-ha
cat /etc/ovirt-hosted-engine-ha/agent-log.conf
[logger_root]
level=DEBUG
handlers=syslog,logfile
propagate=0

# systemctl restart vdsmd

# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2022-01-25 18:04:12 IST; 2min 2s ago
  Process: 264350 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 264464 (vdsmd)
    Tasks: 115 (limit: 175716)
   Memory: 102.7M
   CGroup: /system.slice/vdsmd.service
           ├─264464 /usr/bin/python3 /usr/share/vdsm/vdsmd
           ├─264692 /usr/libexec/ioprocess --read-pipe-fd 69 --write-pipe-fd 68 --max-threads 10 --max-queued-request>
           ├─265066 /usr/libexec/ioprocess --read-pipe-fd 104 --write-pipe-fd 103 --max-threads 10 --max-queued-reque>
           ├─265139 /usr/libexec/ioprocess --read-pipe-fd 132 --write-pipe-fd 131 --max-threads 10 --max-queued-reque>
           ├─265155 /usr/libexec/ioprocess --read-pipe-fd 139 --write-pipe-fd 138 --max-threads 10 --max-queued-reque>
           ├─265163 /usr/libexec/ioprocess --read-pipe-fd 145 --write-pipe-fd 144 --max-threads 10 --max-queued-reque>
           ├─265170 /usr/libexec/ioprocess --read-pipe-fd 151 --write-pipe-fd 150 --max-threads 10 --max-queued-reque>
           ├─265177 /usr/libexec/ioprocess --read-pipe-fd 157 --write-pipe-fd 156 --max-threads 10 --max-queued-reque>
           └─265184 /usr/libexec/ioprocess --read-pipe-fd 163 --write-pipe-fd 162 --max-threads 10 --max-queued-reque>

Jan 25 18:04:35 caracal01.lab.eng.tlv2.redhat.com vdsm[264464]: WARN unhandled close event
Jan 25 18:04:39 caracal01.lab.eng.tlv2.redhat.com vdsm[264464]: ERROR ssl handshake: socket error, address: ::1
Jan 25 18:04:41 caracal01.lab.eng.tlv2.redhat.com vdsm[264464]: WARN Nic ovirtmgmt is not configured for IPv6 monitor>
Jan 25 18:04:41 caracal01.lab.eng.tlv2.redhat.com vdsm[264464]: WARN Nic ovirtmgmt is not configured for IPv6 monitor>
Jan 25 18:04:42 caracal01.lab.eng.tlv2.redhat.com vdsm[264464]: WARN Nic ovirtmgmt is not configured for IPv6 monitor>
Jan 25 18:04:42 caracal01.lab.eng.tlv2.redhat.com vdsm[264464]: WARN Nic ovirtmgmt is not configured for IPv6 monitor>
Jan 25 18:05:48 caracal01.lab.eng.tlv2.redhat.com vdsm[264464]: WARN Nic ovirtmgmt is not configured for IPv6 monitor>
Jan 25 18:05:48 caracal01.lab.eng.tlv2.redhat.com vdsm[264464]: WARN Nic ovirtmgmt is not configured for IPv6 monitor>
Jan 25 18:05:48 caracal01.lab.eng.tlv2.redhat.com vdsm[264464]: WARN Nic ovirtmgmt is not configured for IPv6 monitor>
Jan 25 18:05:49 caracal01.lab.eng.tlv2.redhat.com vdsm[264464]: WARN Nic ovirtmgmt is not configured for IPv6 monitor>


In vdsm log I don't see the initial loop anymore.
vdsm log is attached.
Moving to verified.

Comment 15 errata-xmlrpc 2022-02-08 10:08:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) [ovirt-4.4.10]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0462