+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1591751 +++ ====================================================================== Description of problem: Engine won't start if entire /var/cache/ovirt-engine dir is missing/deleted. It will also would fail to upgrade / add a host into it with a misleading message "Failed to enroll certificate for host" instead of "cannot create tarball" Version-Release number of selected component (if applicable): rhevm-4.1.10.3-0.1.el7.noarch ovirt-engine-4.1.10.3-0.1.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Move or remove /var/cache/ovirt-engine dir on engine system 2. Start ovirt-engine service Another repro: 1. Start ovirt-engine service 2. Move or remove /var/cache/ovirt-engine dir on engine system 3. Try to upgrade or add a new host into the cluster Actual results: Engine cannot start or cannot add/upgrade host in cluster Expected results: Engine should be able to re-create this /var/cache/ovirt-engine dir while starting and while adding/upgrading host in clusters and print a clear message to the user (Originally by Javier Coscia)
In case of a missing /var/cache/ovirt-engine folder, the host-deploy will fail with the following error in engine.log: 2018-06-26 15:39:15,287+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-3) [664fcf70-7c86-410e-82be-7f58f29439c6] EVENT_ID: VDS_INSTALL_FAILED(505), Host zeus05.eng.lab.tlv.redhat.com installation failed. Cannot create file under directory '/var/cache/ovirt-engine', make sure directory exists and has suitable permissions (error: 'No such file or directory'). or 2018-06-26 15:39:15,271+03 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-3) [664fcf70-7c86-410e-82be-7f58f29439c6] Host installation failed for host 'cde4ebca-bf30-4048-a498-a0ef5fbfcfd5', 'zeus05.eng.lab.tlv.redhat.com': Cannot create file under directory '/var/cache/ovirt-engine', make sure directory exists and has suitable permissions (error: 'Permission denied') (Originally by Moti Asayag)
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.2.z': '?'}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.2.z': '?'}', ] For more info please contact: rhv-devops (Originally by rhv-bugzilla-bot)
Still same misleading information in logs when /var/cache/ovirt-engine directory is missing during starting of engine: 2018-07-18 15:18:39,787+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.37.136.200 2018-07-18 15:18:39,908+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages General SSLEngine problem 2018-07-18 15:18:40,001+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM slot6c command GetCapabilitiesAsyncVDS failed: General SSLEngine problem 2018-07-18 15:18:40,014+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: General SSLEngine problem 2018-07-18 15:19:02,921+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.37.136.200 2018-07-18 15:19:02,935+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages General SSLEngine problem 2018-07-18 15:19:02,945+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: General SSLEngine problem 2018-07-18 15:19:22,424+02 WARN [org.ovirt.engine.core.utils.ThreadUtils] (EE-ManagedThreadFactory-engine-Thread-1) [] Interrupted: sleep interrupted Only errors during adding host were fixed, which is half of reported problem.
@Pavol, starting the engine should have fail if the folder is missing. Are you able to start the ovirt-engine service while directory /var/cache/ovirt-engine doesn't exist? (could be a difference between the development environment to prod env). However, that folder is accessed when host is being install/updated. Could you describe what did you do to get those failure ? When the folder was removed ? Thanks, Moti
# mv /var/cache/ovirt-engine /tmp/ # systemctl restart ovirt-engine and it failed. - so directory was removed. When I recreated /var/cache/ovirt-engine with ovirt:ovirt ownership, start of engine was succesfull, but from engine.log ( output in my previous comment ) I would never guested that "Unable to process messages General SSLEngine problem" means I am missing cache directory.
Comment 8 had wrong log info, fix working as intended.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2623