Description of problem: once hosts was upgraded from vt14 to vt14.1 they failed to start and hosts become "Non-responsive" by the logs there seems to be a problem related to certificate: 2015-04-12 08:29:49,620 DEBUG [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Message received: {"jsonrpc":"2.0","error":{"message":"General SSLEngine problem","code":"host05-rack04.scale.openstack.engineering.redhat.com:738906710"},"id":null} not sure what went wrong, this is a standard use case and this issue should work fine. when i tried to re-install the hosts it runs perfect. Version-Release number of selected component (if applicable): vt14.1 How reproducible: 100% Steps to Reproduce: 1. run engine on top of vt14.1 2. run vdsm using vt14 and upgrade to vt14.1 (add vt14.1 repo then yum update) 3. start the hosts via engine webadmin Actual results: host failed to start and become non responsive. Expected results: hosts start as expected with no errors Additional info: re install the hosts resolving the problem.
Eldad - Can you check vt14.3? I suspect maybe related to: Bug 1208752 - Vdsm upgrade 3.4 >> 3.5.1 doesn't restart vdsmd service but not sure.
In addition, can you attach all relevant logs?
please check with latest vdsm for 3.5 as oved asked already (vdsm-4.16.13) and vdsm.log , /var/log/messages , /var/log/yum.log should be enough to figure the errors
If still appears please reopen with the requested info
I have installed vt14.3 and the problem still reproduced logs will attached
Created attachment 1020024 [details] logs
Please attach engine logs as well
We see certificate exception in vdsm.log due to the installation flow - after reinstall the certificate is installed currently by host-deploy on host side. The steps that lead to this error were that Eldad added this host to engine, then removed manually the vdsm rpms on host and installed new once (then did the upgrade, but its not related) - this flow should not work without adding the host or reinstall the host by the engine (using the host-deploy). manual rpm installation requires the user to copy the engine's certificate as well - if the host already part of the engine setup it doesn't mean that it should work as expected if user changed configurations on host manually.
Confirmed. Removing VDSM RPMs causes lost certificates. The solution is to go into maintenance mode and reinstall the host.