Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1597574 - [downstream clone - 4.2.6] Recreate engine_cache dir during start and host deployment flows
[downstream clone - 4.2.6] Recreate engine_cache dir during start and host de...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
4.1.10
Unspecified Unspecified
high Severity medium
: ovirt-4.2.6
: ---
Assigned To: Moti Asayag
Pavol Brilla
: ZStream
Depends On: 1591751
Blocks:
  Show dependency treegraph
 
Reported: 2018-07-03 04:59 EDT by RHV Bugzilla Automation and Verification Bot
Modified: 2018-09-04 09:42 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1591751
Environment:
Last Closed: 2018-09-04 09:41:42 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3487071 None None None 2018-07-03 05:00 EDT
oVirt gerrit 92436 master ABANDONED engine: Recreate engine_cache dir during start and host deployment flows 2018-07-05 14:28 EDT
oVirt gerrit 92542 master MERGED engine: Fail with a clear message if cache dir is missing 2018-07-03 05:00 EDT
oVirt gerrit 92558 ovirt-engine-4.2 MERGED engine: Fail with a clear message if cache dir is missing 2018-07-03 05:00 EDT
Red Hat Product Errata RHBA-2018:2623 None None None 2018-09-04 09:42 EDT

  None (edit)
Description RHV Bugzilla Automation and Verification Bot 2018-07-03 04:59:15 EDT
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1591751 +++
======================================================================

Description of problem:

Engine won't start if entire /var/cache/ovirt-engine dir is missing/deleted. 

It will also would fail to upgrade / add a host into it with a misleading message "Failed to enroll certificate for host" instead of "cannot create tarball"


Version-Release number of selected component (if applicable):

rhevm-4.1.10.3-0.1.el7.noarch
ovirt-engine-4.1.10.3-0.1.el7.noarch


How reproducible:

100%

Steps to Reproduce:
1. Move or remove /var/cache/ovirt-engine dir on engine system
2. Start ovirt-engine service

Another repro:

1. Start ovirt-engine service
2. Move or remove /var/cache/ovirt-engine dir on engine system
3. Try to upgrade or add a new host into the cluster

Actual results:

Engine cannot start or cannot add/upgrade host in cluster



Expected results:

Engine should be able to re-create this /var/cache/ovirt-engine dir while starting and while adding/upgrading host in clusters and print a clear message to the user

(Originally by Javier Coscia)
Comment 5 RHV Bugzilla Automation and Verification Bot 2018-07-03 04:59:49 EDT
In case of a missing /var/cache/ovirt-engine folder, the host-deploy will fail with the following error in engine.log:


2018-06-26 15:39:15,287+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-3) [664fcf70-7c86-410e-82be-7f58f29439c6] EVENT_ID: VDS_INSTALL_FAILED(505), Host zeus05.eng.lab.tlv.redhat.com installation failed. Cannot create file under directory '/var/cache/ovirt-engine', make sure directory exists and has suitable permissions (error: 'No such file or directory').

or

2018-06-26 15:39:15,271+03 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-3) [664fcf70-7c86-410e-82be-7f58f29439c6] Host installation failed for host 'cde4ebca-bf30-4048-a498-a0ef5fbfcfd5', 'zeus05.eng.lab.tlv.redhat.com': Cannot create file under directory '/var/cache/ovirt-engine', make sure directory exists and has suitable permissions (error: 'Permission denied')

(Originally by Moti Asayag)
Comment 6 RHV Bugzilla Automation and Verification Bot 2018-07-03 04:59:55 EDT
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops@redhat.com

(Originally by rhv-bugzilla-bot)
Comment 8 Pavol Brilla 2018-07-18 09:25:41 EDT
Still same misleading information in logs when /var/cache/ovirt-engine directory is missing during starting of engine:

2018-07-18 15:18:39,787+02 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.37.136.200
2018-07-18 15:18:39,908+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages General SSLEngine problem
2018-07-18 15:18:40,001+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM slot6c command GetCapabilitiesAsyncVDS failed: General SSLEngine problem
2018-07-18 15:18:40,014+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: General SSLEngine problem
2018-07-18 15:19:02,921+02 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.37.136.200
2018-07-18 15:19:02,935+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages General SSLEngine problem
2018-07-18 15:19:02,945+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: General SSLEngine problem
2018-07-18 15:19:22,424+02 WARN  [org.ovirt.engine.core.utils.ThreadUtils] (EE-ManagedThreadFactory-engine-Thread-1) [] Interrupted: sleep interrupted

Only errors during adding host were fixed, which is half of reported problem.
Comment 9 Moti Asayag 2018-07-18 09:57:12 EDT
@Pavol, starting the engine should have fail if the folder is missing.
Are you able to start the ovirt-engine service while directory /var/cache/ovirt-engine doesn't exist?

(could be a difference between the development environment to prod env).
However, that folder is accessed when host is being install/updated.

Could you describe what did you do to get those failure ? When the folder was removed ?

Thanks,
Moti
Comment 10 Pavol Brilla 2018-07-19 09:06:29 EDT
# mv /var/cache/ovirt-engine /tmp/
# systemctl restart ovirt-engine

and it failed. - so directory was removed.

When I recreated /var/cache/ovirt-engine with ovirt:ovirt ownership, start of engine was succesfull, but from engine.log ( output in my previous comment ) I would never guested that "Unable to process messages General SSLEngine problem" means I am missing cache directory.
Comment 12 Pavol Brilla 2018-08-09 05:32:18 EDT
Comment 8 had wrong log info, fix working as intended.
Comment 14 errata-xmlrpc 2018-09-04 09:41:42 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2623

Note You need to log in before you can comment on or make changes to this bug.