Bug 1597574

Summary: [downstream clone - 4.2.6] Recreate engine_cache dir during start and host deployment flows
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: ovirt-engineAssignee: Moti Asayag <masayag>
Status: CLOSED ERRATA QA Contact: Pavol Brilla <pbrilla>
Severity: medium Docs Contact:
Priority: high    
Version: 4.1.10CC: dfediuck, lsurette, masayag, mgoldboi, mkalinin, mperina, omachace, pbrilla, Rhev-m-bugs, srevivo
Target Milestone: ovirt-4.2.6Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1591751 Environment:
Last Closed: 2018-09-04 13:41:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1591751    
Bug Blocks:    

Description RHV bug bot 2018-07-03 08:59:15 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1591751 +++
======================================================================

Description of problem:

Engine won't start if entire /var/cache/ovirt-engine dir is missing/deleted. 

It will also would fail to upgrade / add a host into it with a misleading message "Failed to enroll certificate for host" instead of "cannot create tarball"


Version-Release number of selected component (if applicable):

rhevm-4.1.10.3-0.1.el7.noarch
ovirt-engine-4.1.10.3-0.1.el7.noarch


How reproducible:

100%

Steps to Reproduce:
1. Move or remove /var/cache/ovirt-engine dir on engine system
2. Start ovirt-engine service

Another repro:

1. Start ovirt-engine service
2. Move or remove /var/cache/ovirt-engine dir on engine system
3. Try to upgrade or add a new host into the cluster

Actual results:

Engine cannot start or cannot add/upgrade host in cluster



Expected results:

Engine should be able to re-create this /var/cache/ovirt-engine dir while starting and while adding/upgrading host in clusters and print a clear message to the user

(Originally by Javier Coscia)

Comment 5 RHV bug bot 2018-07-03 08:59:49 UTC
In case of a missing /var/cache/ovirt-engine folder, the host-deploy will fail with the following error in engine.log:


2018-06-26 15:39:15,287+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-3) [664fcf70-7c86-410e-82be-7f58f29439c6] EVENT_ID: VDS_INSTALL_FAILED(505), Host zeus05.eng.lab.tlv.redhat.com installation failed. Cannot create file under directory '/var/cache/ovirt-engine', make sure directory exists and has suitable permissions (error: 'No such file or directory').

or

2018-06-26 15:39:15,271+03 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-3) [664fcf70-7c86-410e-82be-7f58f29439c6] Host installation failed for host 'cde4ebca-bf30-4048-a498-a0ef5fbfcfd5', 'zeus05.eng.lab.tlv.redhat.com': Cannot create file under directory '/var/cache/ovirt-engine', make sure directory exists and has suitable permissions (error: 'Permission denied')

(Originally by Moti Asayag)

Comment 6 RHV bug bot 2018-07-03 08:59:55 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops

(Originally by rhv-bugzilla-bot)

Comment 8 Pavol Brilla 2018-07-18 13:25:41 UTC
Still same misleading information in logs when /var/cache/ovirt-engine directory is missing during starting of engine:

2018-07-18 15:18:39,787+02 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.37.136.200
2018-07-18 15:18:39,908+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages General SSLEngine problem
2018-07-18 15:18:40,001+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM slot6c command GetCapabilitiesAsyncVDS failed: General SSLEngine problem
2018-07-18 15:18:40,014+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: General SSLEngine problem
2018-07-18 15:19:02,921+02 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.37.136.200
2018-07-18 15:19:02,935+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages General SSLEngine problem
2018-07-18 15:19:02,945+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: General SSLEngine problem
2018-07-18 15:19:22,424+02 WARN  [org.ovirt.engine.core.utils.ThreadUtils] (EE-ManagedThreadFactory-engine-Thread-1) [] Interrupted: sleep interrupted

Only errors during adding host were fixed, which is half of reported problem.

Comment 9 Moti Asayag 2018-07-18 13:57:12 UTC
@Pavol, starting the engine should have fail if the folder is missing.
Are you able to start the ovirt-engine service while directory /var/cache/ovirt-engine doesn't exist?

(could be a difference between the development environment to prod env).
However, that folder is accessed when host is being install/updated.

Could you describe what did you do to get those failure ? When the folder was removed ?

Thanks,
Moti

Comment 10 Pavol Brilla 2018-07-19 13:06:29 UTC
# mv /var/cache/ovirt-engine /tmp/
# systemctl restart ovirt-engine

and it failed. - so directory was removed.

When I recreated /var/cache/ovirt-engine with ovirt:ovirt ownership, start of engine was succesfull, but from engine.log ( output in my previous comment ) I would never guested that "Unable to process messages General SSLEngine problem" means I am missing cache directory.

Comment 12 Pavol Brilla 2018-08-09 09:32:18 UTC
Comment 8 had wrong log info, fix working as intended.

Comment 14 errata-xmlrpc 2018-09-04 13:41:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2623