Bug 1211236

Summary: [upgrade] failed to start hosts once they upgraded - SSLEngine error
Product: Red Hat Enterprise Virtualization Manager Reporter: Eldad Marciano <emarcian>
Component: vdsmAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED NOTABUG QA Contact: Eldad Marciano <emarcian>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5.1CC: bazulay, ecohen, emarcian, gklein, kripper, lpeer, lsurette, nsoffer, oourfali, pkliczew, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-05-06 07:43:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Eldad Marciano 2015-04-13 11:50:56 UTC
Description of problem:
once hosts was upgraded from vt14 to vt14.1 they failed to start and hosts become "Non-responsive"
by the logs there seems to be a problem related to certificate:

2015-04-12 08:29:49,620 DEBUG [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Message received: {"jsonrpc":"2.0","error":{"message":"General SSLEngine problem","code":"host05-rack04.scale.openstack.engineering.redhat.com:738906710"},"id":null}


not sure what went wrong, this is a standard use case and this issue should work fine.

when i tried to re-install the hosts it runs perfect.

Version-Release number of selected component (if applicable):
vt14.1

How reproducible:
100%

Steps to Reproduce:
1. run engine on top of vt14.1
2. run vdsm using vt14 and upgrade to vt14.1 (add vt14.1 repo then yum update)
3. start the hosts via engine webadmin

Actual results:
host failed to start and become non responsive.

Expected results:
hosts start as expected with no errors

Additional info:
re install the hosts resolving the problem.

Comment 2 Oved Ourfali 2015-04-14 07:47:56 UTC
Eldad - Can you check vt14.3?
I suspect maybe related to:
Bug 1208752 - Vdsm upgrade 3.4 >> 3.5.1 doesn't restart vdsmd service

but not sure.

Comment 3 Oved Ourfali 2015-04-14 07:50:33 UTC
In addition, can you attach all relevant logs?

Comment 4 Yaniv Bronhaim 2015-04-14 07:57:34 UTC
please check with latest vdsm for 3.5 as oved asked already (vdsm-4.16.13) and vdsm.log , /var/log/messages , /var/log/yum.log should be enough to figure the errors

Comment 5 Yaniv Bronhaim 2015-04-26 05:53:18 UTC
If still appears please reopen with the requested info

Comment 6 Eldad Marciano 2015-04-29 08:25:26 UTC
I have installed vt14.3 and the problem still reproduced
logs will attached

Comment 7 Eldad Marciano 2015-04-29 08:26:13 UTC
Created attachment 1020024 [details]
logs

Comment 8 Piotr Kliczewski 2015-05-03 20:38:26 UTC
Please attach engine logs as well

Comment 10 Yaniv Bronhaim 2015-05-06 07:43:36 UTC
We see certificate exception in vdsm.log due to the installation flow - after reinstall the certificate is installed currently by host-deploy on host side.

The steps that lead to this error were that Eldad added this host to engine, then removed manually the vdsm rpms on host and installed new once (then did the upgrade, but its not related) - this flow should not work without adding the host or reinstall the host by the engine (using the host-deploy). manual rpm installation requires the user to copy the engine's certificate as well - if the host already part of the engine setup it doesn't mean that it should work as expected if user changed configurations on host manually.

Comment 11 Christopher Pereira 2015-05-13 05:23:41 UTC
Confirmed. Removing VDSM RPMs causes lost certificates.
The solution is to go into maintenance mode and reinstall the host.