1383118 – excessive SSL errors in the logs (vdsm.log) after hosted-engine update

Bug 1383118 - excessive SSL errors in the logs (vdsm.log) after hosted-engine update

Summary: excessive SSL errors in the logs (vdsm.log) after hosted-engine update

Keywords:
Status:	CLOSED DUPLICATE of bug 1349829
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	4.0.3
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Dan Kenigsberg
QA Contact:	Aharon Canan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-10 01:11 UTC by Marcus West
Modified:	2016-11-04 18:56 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-03 11:00:05 UTC
oVirt Team:	Infra
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Marcus West 2016-10-10 01:11:38 UTC

## Description of problem:

Recently upgraded to 4.0, there are excess SSL errors in the logs:

JsonRpc (StompReactor)::ERROR::2016-10-10 10:07:03,341::betterAsyncore::113::vds.dispatcher::(recv) SSL error during reading data: unexpected eof
JsonRpc (StompReactor)::ERROR::2016-10-10 10:07:08,792::betterAsyncore::113::vds.dispatcher::(recv) SSL error during reading data: unexpected eof
JsonRpc (StompReactor)::ERROR::2016-10-10 10:07:13,392::betterAsyncore::113::vds.dispatcher::(recv) SSL error during reading data: unexpected eof
JsonRpc (StompReactor)::ERROR::2016-10-10 10:07:28,223::betterAsyncore::113::vds.dispatcher::(recv) SSL error during reading data: unexpected eof
JsonRpc (StompReactor)::ERROR::2016-10-10 10:07:34,641::betterAsyncore::113::vds.dispatcher::(recv) SSL error during reading data: unexpected eof
JsonRpc (StompReactor)::ERROR::2016-10-10 10:07:43,040::betterAsyncore::113::vds.dispatcher::(recv) SSL error during reading data: unexpected eof
JsonRpc (StompReactor)::ERROR::2016-10-10 10:07:48,490::betterAsyncore::113::vds.dispatcher::(recv) SSL error during reading data: unexpected eof

## Version-Release number of selected component (if applicable):

RHV 4.0
vdsm-4.18.13-1.el7ev.x86_64
ovirt-hosted-engine-setup-2.0.2.2-2.el7ev.noarch
ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch

## How reproducible:

Always

## Additional info:

It seems to be related to ovirt-ha-agent - if I stop this then the errors stop

Comment 2 Piotr Kliczewski 2016-10-11 09:50:25 UTC

Someone reported similar issue on users list and after looking at the engine log we saw it was related to BZ #1371515. Please provide engine log and if this is the same issue please update your engine to the latest 4.0.

Comment 3 Marcus West 2016-10-11 23:29:34 UTC

Hi Piotr, I don't see any of those GetUserProfileQuery errors in my engine.log - i'll attach mine for confirmation.  It's a recent upgrade from a 3.6 environment - currently on ovirt-engine-4.0.4.4-0.1.el7ev.noarch

Comment 6 Piotr Kliczewski 2016-10-12 07:50:58 UTC

Looking at the engine logs I can see that host upgrade manager was started:

2016-10-06 15:00:53,591 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [29ff55ba] Correlation ID: 29ff55ba, Call Stack: null, Custom Event ID: -1, Message: Host intelh6.rhev.gsslab.bne.redhat.com upgrade was started (User: admin@internal-authz).
2016-10-06 15:00:54,260 INFO  [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback] (DefaultQuartzScheduler10) [29ff55ba] Host 'intelh6.rhev.gsslab.bne.redhat.com' is on maintenance mode. Proceeding with Upgrade process.

and just after it finished we lost connectivity with the host:

2016-10-06 15:16:05,981 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler7) [1eac2823] Command 'GetAllVmStatsVDSCommand(HostName = intelh6.rhev.gsslab.bne.redhat.com, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='a9e052f6-d953-41cb-b46a-c6d284855d63', vds='Host[intelh6.rhev.gsslab.bne.redhat.com,a9e052f6-d953-41cb-b46a-c6d284855d63]'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection failed

Moti please take a look at the logs.

Comment 7 Martin Perina 2016-10-12 08:10:45 UTC

Moti is on PTO, Ravi could you please take a look?

Comment 8 Piotr Kliczewski 2016-10-18 09:19:56 UTC

Simone can you please take a look.

Comment 9 Ravi Nori 2016-10-24 14:13:42 UTC

From the logs the node stopped responding on reboot after updating the packages redhat-virtualization-host-image-update and redhat-virtualization-host-image-update-placeholder. There are no errors during the update process, could be that the host network failed to initialize after the reboot.

Comment 10 Martin Perina 2016-10-24 15:24:27 UTC

Fabian, could you please take a look?

Comment 11 Fabian Deutsch 2016-10-28 09:29:47 UTC

A few questions:

1. Not explicitly stated, but: How did you "create" the RHVH 4 host? Was it a fresh install?
2. Did the upgrade complete successful?
3. Did the host reboot?
4. Did the networking on the host come up correctly?
5. Were you able to manually activate the host?
6. Was everything working after you performed the activation?

Comment 12 Marcus West 2016-10-30 23:33:18 UTC

Hi Fabian,

> 1. Not explicitly stated, but: How did you "create" the RHVH 4 host? Was it a fresh install?
Yes, hypervisors were wiped and fresh installed.  Were previously installed with rhev-h (rhel7, for rhev 3.6)

> 2. Did the upgrade complete successful?
'hosted-engine --deploy' created the VM successfully, but stopped at the point where it loads the restore database.  I did this step manually, and then the upgrade was effectively completed.  It's not obvious in the logs anywhere why the db copied over but failed to load.

> 3. Did the host reboot?
I have rebooted the hosts but the excessive logging is still there.

> 4. Did the networking on the host come up correctly?
Networking seems to be fine.

> 5. Were you able to manually activate the host?
Hosts appear to activate fine.

> 6. Was everything working after you performed the activation?
Apart from having to finish the upgrade manually, the rhev environment seems to work fine.

Comment 13 Fabian Deutsch 2016-10-31 10:03:10 UTC

Oh - So it's about hosted engine. And it seemed to happen during a he flow.

Comment 15 Piotr Kliczewski 2016-11-02 08:43:59 UTC

I think that this BZ is related to BZ #1349829.

Comment 16 Martin Perina 2016-11-03 11:00:05 UTC

Closing as duplicate

*** This bug has been marked as a duplicate of bug 1349829 ***

Note You need to log in before you can comment on or make changes to this bug.