Bug 1597574 - [downstream clone - 4.2.6] Recreate engine_cache dir during start and host deployment flows
Summary: [downstream clone - 4.2.6] Recreate engine_cache dir during start and host de...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.1.10
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.2.6
: ---
Assignee: Moti Asayag
QA Contact: Pavol Brilla
URL:
Whiteboard:
Depends On: 1591751
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-03 08:59 UTC by RHV bug bot
Modified: 2021-12-10 16:37 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1591751
Environment:
Last Closed: 2018-09-04 13:41:42 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-44223 0 None None None 2021-12-10 16:37:06 UTC
Red Hat Knowledge Base (Solution) 3487071 0 None None None 2018-07-03 09:00:14 UTC
Red Hat Product Errata RHBA-2018:2623 0 None None None 2018-09-04 13:42:30 UTC
oVirt gerrit 92436 0 master ABANDONED engine: Recreate engine_cache dir during start and host deployment flows 2020-02-06 10:16:51 UTC
oVirt gerrit 92542 0 master MERGED engine: Fail with a clear message if cache dir is missing 2020-02-06 10:16:51 UTC
oVirt gerrit 92558 0 ovirt-engine-4.2 MERGED engine: Fail with a clear message if cache dir is missing 2020-02-06 10:16:51 UTC

Description RHV bug bot 2018-07-03 08:59:15 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1591751 +++
======================================================================

Description of problem:

Engine won't start if entire /var/cache/ovirt-engine dir is missing/deleted. 

It will also would fail to upgrade / add a host into it with a misleading message "Failed to enroll certificate for host" instead of "cannot create tarball"


Version-Release number of selected component (if applicable):

rhevm-4.1.10.3-0.1.el7.noarch
ovirt-engine-4.1.10.3-0.1.el7.noarch


How reproducible:

100%

Steps to Reproduce:
1. Move or remove /var/cache/ovirt-engine dir on engine system
2. Start ovirt-engine service

Another repro:

1. Start ovirt-engine service
2. Move or remove /var/cache/ovirt-engine dir on engine system
3. Try to upgrade or add a new host into the cluster

Actual results:

Engine cannot start or cannot add/upgrade host in cluster



Expected results:

Engine should be able to re-create this /var/cache/ovirt-engine dir while starting and while adding/upgrading host in clusters and print a clear message to the user

(Originally by Javier Coscia)

Comment 5 RHV bug bot 2018-07-03 08:59:49 UTC
In case of a missing /var/cache/ovirt-engine folder, the host-deploy will fail with the following error in engine.log:


2018-06-26 15:39:15,287+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-3) [664fcf70-7c86-410e-82be-7f58f29439c6] EVENT_ID: VDS_INSTALL_FAILED(505), Host zeus05.eng.lab.tlv.redhat.com installation failed. Cannot create file under directory '/var/cache/ovirt-engine', make sure directory exists and has suitable permissions (error: 'No such file or directory').

or

2018-06-26 15:39:15,271+03 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-3) [664fcf70-7c86-410e-82be-7f58f29439c6] Host installation failed for host 'cde4ebca-bf30-4048-a498-a0ef5fbfcfd5', 'zeus05.eng.lab.tlv.redhat.com': Cannot create file under directory '/var/cache/ovirt-engine', make sure directory exists and has suitable permissions (error: 'Permission denied')

(Originally by Moti Asayag)

Comment 6 RHV bug bot 2018-07-03 08:59:55 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops

(Originally by rhv-bugzilla-bot)

Comment 8 Pavol Brilla 2018-07-18 13:25:41 UTC
Still same misleading information in logs when /var/cache/ovirt-engine directory is missing during starting of engine:

2018-07-18 15:18:39,787+02 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.37.136.200
2018-07-18 15:18:39,908+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages General SSLEngine problem
2018-07-18 15:18:40,001+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM slot6c command GetCapabilitiesAsyncVDS failed: General SSLEngine problem
2018-07-18 15:18:40,014+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: General SSLEngine problem
2018-07-18 15:19:02,921+02 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.37.136.200
2018-07-18 15:19:02,935+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages General SSLEngine problem
2018-07-18 15:19:02,945+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: General SSLEngine problem
2018-07-18 15:19:22,424+02 WARN  [org.ovirt.engine.core.utils.ThreadUtils] (EE-ManagedThreadFactory-engine-Thread-1) [] Interrupted: sleep interrupted

Only errors during adding host were fixed, which is half of reported problem.

Comment 9 Moti Asayag 2018-07-18 13:57:12 UTC
@Pavol, starting the engine should have fail if the folder is missing.
Are you able to start the ovirt-engine service while directory /var/cache/ovirt-engine doesn't exist?

(could be a difference between the development environment to prod env).
However, that folder is accessed when host is being install/updated.

Could you describe what did you do to get those failure ? When the folder was removed ?

Thanks,
Moti

Comment 10 Pavol Brilla 2018-07-19 13:06:29 UTC
# mv /var/cache/ovirt-engine /tmp/
# systemctl restart ovirt-engine

and it failed. - so directory was removed.

When I recreated /var/cache/ovirt-engine with ovirt:ovirt ownership, start of engine was succesfull, but from engine.log ( output in my previous comment ) I would never guested that "Unable to process messages General SSLEngine problem" means I am missing cache directory.

Comment 12 Pavol Brilla 2018-08-09 09:32:18 UTC
Comment 8 had wrong log info, fix working as intended.

Comment 14 errata-xmlrpc 2018-09-04 13:41:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2623


Note You need to log in before you can comment on or make changes to this bug.