Bug 1585028 - [downstream clone - 4.2.4] RHV-H 4.2.3: hosted-engine agent fails to start after upgrade due to Permission denied: '/var/log/ovirt-hosted-engine-ha/broker.log' '/var/log/ovirt-hosted-engine-ha/agent.log'
Summary: [downstream clone - 4.2.4] RHV-H 4.2.3: hosted-engine agent fails to start af...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha
Version: 4.2.3
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-4.2.4
: ---
Assignee: Ido Rosenzwig
QA Contact: Nikolai Sednev
URL:
Whiteboard:
: 1591150 (view as bug list)
Depends On: 1579103
Blocks: ovirt-hosted-engine-ha-2.2.14
TreeView+ depends on / blocked
 
Reported: 2018-06-01 07:47 UTC by RHV bug bot
Modified: 2021-06-10 16:17 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1579103
Environment:
Last Closed: 2018-06-27 10:01:52 UTC
oVirt Team: Integration
Target Upstream Version:


Attachments (Terms of Use)
sosreport from alma03 (10.29 MB, application/x-xz)
2018-06-18 15:26 UTC, Nikolai Sednev
no flags Details
engine logs (10.14 MB, application/x-xz)
2018-06-18 15:32 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2066 0 None None None 2018-06-27 10:01:58 UTC
oVirt gerrit 91592 0 master MERGED packaging: set ghost directive for log files 2021-01-28 00:19:39 UTC
oVirt gerrit 91706 0 v2.2.z MERGED packaging: set ghost directive for log files 2021-01-28 00:19:39 UTC
oVirt gerrit 92246 0 None MERGED packaging: explicitly set permissions on log files 2021-01-28 00:19:39 UTC
oVirt gerrit 92250 0 None MERGED packaging: explicitly set permissions on log files 2021-01-28 00:19:39 UTC
oVirt gerrit 92367 0 master ABANDONED src: Fix permissions for agent and broker log files 2021-01-28 00:19:39 UTC
oVirt gerrit 92368 0 v2.2.z ABANDONED src: Fix permissions for agent and broker log files 2021-01-28 00:19:40 UTC
oVirt gerrit 92375 0 master MERGED osupdater: copy new files in /var also 2021-01-28 00:19:40 UTC
oVirt gerrit 92376 0 ovirt-4.2 MERGED osupdater: copy new files in /var also 2021-01-28 00:19:40 UTC

Description RHV bug bot 2018-06-01 07:47:36 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1579103 +++
======================================================================

Description of problem:
After upgrading 3 out of 3 hypervisors all of them failed in staring ovirt-ha-{agent,broker}.
The components are restarted by systemd and continue to trigger abrtd in analyzing python dumps.

The root cause is that for some reason after the upgrade the logs:

'/var/log/ovirt-hosted-engine-ha/broker.log' 
'/var/log/ovirt-hosted-engine-ha/agent.log'

changes their permissions from vdsm:kvm into root:root, making the impossible to be opened by the hosted-engine daemons and filling the system log with scary messages like:

mag 17 02:32:50 m1.goodfellow.local systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
mag 17 02:32:50 m1.goodfellow.local systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...
mag 17 02:32:51 m1.goodfellow.local python[1624]: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
mag 17 02:32:51 m1.goodfellow.local abrt-server[1629]: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
mag 17 02:32:51 m1.goodfellow.local systemd[1]: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE
mag 17 02:32:51 m1.goodfellow.local systemd[1]: Unit ovirt-ha-broker.service entered failed state.
mag 17 02:32:51 m1.goodfellow.local systemd[1]: ovirt-ha-broker.service failed.
[...]
mag 17 02:32:53 m1.goodfellow.local systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
mag 17 02:32:53 m1.goodfellow.local systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...
mag 17 02:32:54 m1.goodfellow.local python[1828]: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
mag 17 02:32:54 m1.goodfellow.local abrt-server[1835]: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
mag 17 02:32:54 m1.goodfellow.local systemd[1]: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE
mag 17 02:32:54 m1.goodfellow.local systemd[1]: Unit ovirt-ha-broker.service entered failed state.
mag 17 02:32:54 m1.goodfellow.local systemd[1]: ovirt-ha-broker.service failed.
mag 17 02:32:54 m1.goodfellow.local systemd[1]: ovirt-ha-broker.service holdoff time over, scheduling restart.
mag 17 02:32:54 m1.goodfellow.local systemd[1]: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked.
mag 17 02:32:54 m1.goodfellow.local systemd[1]: start request repeated too quickly for ovirt-ha-broker.service
mag 17 02:32:54 m1.goodfellow.local systemd[1]: Failed to start oVirt Hosted Engine High Availability Communications Broker.
mag 17 02:32:54 m1.goodfellow.local systemd[1]: Unit ovirt-ha-broker.service entered failed state.
mag 17 02:32:54 m1.goodfellow.local systemd[1]: ovirt-ha-broker.service failed.
[...]
mag 17 02:33:36 m1.goodfellow.local vdsm[6511]: WARN Not ready yet, ignoring event '|net|host_conn|no_id' args={}
mag 17 02:33:36 m1.goodfellow.local vdsm[6511]: WARN Not ready yet, ignoring event '|net|host_conn|no_id' args={}
mag 17 02:33:36 m1.goodfellow.local vdsm[6511]: WARN File: /var/run/vdsm/trackedInterfaces/ovirtmgmt already removed
mag 17 02:33:36 m1.goodfellow.local vdsm[6511]: WARN Not ready yet, ignoring event '|net|host_conn|no_id' args={}
mag 17 02:33:36 m1.goodfellow.local vdsm[6511]: WARN File: /var/run/vdsm/trackedInterfaces/eno2 already removed
mag 17 02:33:37 m1.goodfellow.local vdsm[6511]: WARN MOM not available.
mag 17 02:33:37 m1.goodfellow.local vdsm[6511]: WARN MOM not available, KSM stats will be missing.
mag 17 02:33:37 m1.goodfellow.local vdsm[6511]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?
[...]
mag 17 02:33:49 m1.goodfellow.local systemd[1]: ovirt-ha-agent.service holdoff time over, scheduling restart.
mag 17 02:33:49 m1.goodfellow.local systemd[1]: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked.
mag 17 02:33:49 m1.goodfellow.local systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
mag 17 02:33:49 m1.goodfellow.local systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...
mag 17 02:33:49 m1.goodfellow.local systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.
mag 17 02:33:49 m1.goodfellow.local systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent...
mag 17 02:33:50 m1.goodfellow.local python[9792]: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent'
mag 17 02:33:50 m1.goodfellow.local python[9791]: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'
mag 17 02:33:50 m1.goodfellow.local abrt-server[9846]: Not saving repeating crash in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent'
mag 17 02:33:50 m1.goodfellow.local systemd[1]: ovirt-ha-agent.service: main process exited, code=exited, status=1/FAILURE
mag 17 02:33:50 m1.goodfellow.local systemd[1]: Unit ovirt-ha-agent.service entered failed state.
mag 17 02:33:50 m1.goodfellow.local systemd[1]: ovirt-ha-agent.service failed.
mag 17 02:33:51 m1.goodfellow.local kernel: warning: `turbostat' uses 32-bit capabilities (legacy support in use)
mag 17 02:33:52 m1.goodfellow.local vdsm[6511]: ERROR failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished?


For an average end-user, reading "Is the Hosted Engine setup finished?" is very scary, because transmit a sense of insecurity related to the hosted-engine-ha cluster, that is the core of the infra and the way to manage the infra.

Also the message per itself is not that clear, you have to go inside the abrtd crash folder and read the backtrace to have some meaningful hints on what is going on

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-2.2.11-1.el7ev.noarch
redhat-release-virtualization-host-4.2-3.0.el7.x86_64

How reproducible:
Always (experienced in 3 out of 3 nodes, starting from different releases of RHV-H 4.1)

Steps to Reproduce:
1. upgrade RHV-H to 4.2.3 from RHV-H 4.1
2. reboot the node

Actual results:
ovirt-ha-agent , ovirt-ha-broker services do not start, reducing the number of members of the hosted-engine cluster.

Expected results:
painless upgrade, no change in the permission of the log files.


Additional info:

those are the backtraces abrtd collect for both the daemons:


[root@m1 abrt]# cat Python-2018-05-17-02\:34\:17-14299/backtrace 
__init__.py:925:_open:IOError: [Errno 13] Permission denied: '/var/log/ovirt-hosted-engine-ha/broker.log'

Traceback (most recent call last):
  File "/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker", line 25, in <module>
    broker.Broker().run()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 44, in run
    self._initialize_logging()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 70, in _initialize_logging
    disable_existing_loggers=False)
  File "/usr/lib64/python2.7/logging/config.py", line 78, in fileConfig
    handlers = _install_handlers(cp, formatters)
  File "/usr/lib64/python2.7/logging/config.py", line 156, in _install_handlers
    h = klass(*args)
  File "/usr/lib64/python2.7/logging/handlers.py", line 169, in __init__
    BaseRotatingHandler.__init__(self, filename, 'a', encoding, delay)
  File "/usr/lib64/python2.7/logging/handlers.py", line 64, in __init__
    logging.FileHandler.__init__(self, filename, mode, encoding, delay)
  File "/usr/lib64/python2.7/logging/__init__.py", line 902, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib64/python2.7/logging/__init__.py", line 925, in _open
    stream = open(self.baseFilename, self.mode)
IOError: [Errno 13] Permission denied: '/var/log/ovirt-hosted-engine-ha/broker.log'

Local variables in innermost frame:
self: <logging.handlers.TimedRotatingFileHandler object at 0x7fcd4d7d8e10>
[root@m1 abrt]# cat Python-2018-05-17-02\:34\:17-14300/backtrace 
__init__.py:925:_open:IOError: [Errno 13] Permission denied: '/var/log/ovirt-hosted-engine-ha/agent.log'

Traceback (most recent call last):
  File "/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent", line 25, in <module>
    agent.Agent().run()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 66, in run
    self._initialize_logging()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 95, in _initialize_logging
    disable_existing_loggers=False)
  File "/usr/lib64/python2.7/logging/config.py", line 78, in fileConfig
    handlers = _install_handlers(cp, formatters)
  File "/usr/lib64/python2.7/logging/config.py", line 156, in _install_handlers
    h = klass(*args)
  File "/usr/lib64/python2.7/logging/handlers.py", line 169, in __init__
    BaseRotatingHandler.__init__(self, filename, 'a', encoding, delay)
  File "/usr/lib64/python2.7/logging/handlers.py", line 64, in __init__
    logging.FileHandler.__init__(self, filename, mode, encoding, delay)
  File "/usr/lib64/python2.7/logging/__init__.py", line 902, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib64/python2.7/logging/__init__.py", line 925, in _open
    stream = open(self.baseFilename, self.mode)
IOError: [Errno 13] Permission denied: '/var/log/ovirt-hosted-engine-ha/agent.log'

Local variables in innermost frame:
self: <logging.handlers.TimedRotatingFileHandler object at 0x7fd9477a0890>

(Originally by Andrea Perotti)

Comment 3 RHV bug bot 2018-06-01 07:47:46 UTC
Hi, can you please attach /var/log/imgbased.log ?

(Originally by Yuval Turgeman)

Comment 4 RHV bug bot 2018-06-01 07:47:50 UTC
Created attachment 1440337 [details]
Archive containing imgbased.log for all 3 hypervisors

(Originally by Andrea Perotti)

Comment 7 Nikolai Sednev 2018-06-14 07:03:11 UTC
Deployed SHE over NFS on pair of hosts with these components:
ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch
ovirt-engine-setup-4.1.11.2-0.1.el7.noarchrhvm-appliance-20180419.0-1.x86_64.rhevm.ova
Red Hat Enterprise Linux release 7.5-0.14
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Upgraded the engine to latest components:
ovirt-engine-setup-4.2.4.2-0.1.el7_3.noarch
Linux 3.10.0-862.6.1.el7.x86_64 #1 SMP Mon Jun 4 15:33:25 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Upgraded hosts from UI of the engine using "Installation->Check for Upgrade" and then "Installation->Upgrade", one by one and then engine restarted the hosts after upgrade.
[root@alma04 ~]# yum list all
Available Packages
redhat-virtualization-host-image-update.noarch 4.2-20180605.0.el7_5  rhvh-qe 
     
Components after the upgrade:
redhat-virtualization-host-image-update-4.2-20180605.0.el7_5.noarch.rpm
ovirt-hosted-engine-ha-2.2.13-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch
vdsm-4.20.29-1.el7ev.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.3.x86_64
libvirt-3.9.0-14.el7_5.5.x86_64
sanlock-3.6.0-1.el7.x86_64


The result was as follows:
Upgraded RHVH was not counted as ha-capable host in UI, "Hosted Engine HA:Not Active".
ha-agent was running, but ha-broker was in failed status.
alma04 ~]# systemctl status ovirt-ha-agent
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2018-06-14 09:55:51 IDT; 2s ago
 Main PID: 34023 (ovirt-ha-agent)
    Tasks: 1
   Memory: 20.2M
   CGroup: /system.slice/ovirt-ha-agent.service
           └─34023 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent

Jun 14 09:55:51 alma04.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Monitori...ent.
Jun 14 09:55:51 alma04.qa.lab.tlv.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Monitor...t...
Jun 14 09:55:52 alma04.qa.lab.tlv.redhat.com python[34023]: detected unhandled Python exception in '/usr/share/o...nt'
Hint: Some lines were ellipsized, use -l to show in full.
[root@alma04 ~]# systemctl status ovirt-ha-broker
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Thu 2018-06-14 09:56:16 IDT; 914ms ago
  Process: 35178 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker (code=exited, status=1/FAILURE)
 Main PID: 35178 (code=exited, status=1/FAILURE)

Jun 14 09:56:16 alma04.qa.lab.tlv.redhat.com systemd[1]: Unit ovirt-ha-broker.service entered failed state.
Jun 14 09:56:16 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-broker.service failed.
Jun 14 09:56:16 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-broker.service holdoff time over, scheduling ...art.
Jun 14 09:56:16 alma04.qa.lab.tlv.redhat.com systemd[1]: start request repeated too quickly for ovirt-ha-broker...vice
Jun 14 09:56:16 alma04.qa.lab.tlv.redhat.com systemd[1]: Failed to start oVirt Hosted Engine High Availability ...ker.
Jun 14 09:56:16 alma04.qa.lab.tlv.redhat.com systemd[1]: Unit ovirt-ha-broker.service entered failed state.
Jun 14 09:56:16 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-broker.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@alma04 ~]# systemctl status ovirt-ha-agent -l\
> ^C
[root@alma04 ~]# systemctl status ovirt-ha-agent -l
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Thu 2018-06-14 09:59:22 IDT; 6s ago
  Process: 39568 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=1/FAILURE)
 Main PID: 39568 (code=exited, status=1/FAILURE)
    Tasks: 0
   Memory: 0B
   CGroup: /system.slice/ovirt-ha-agent.service

Jun 14 09:59:22 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service failed.
[root@alma04 ~]# systemctl status ovirt-ha-broker -l
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Thu 2018-06-14 09:59:40 IDT; 902ms ago
  Process: 40811 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker (code=exited, status=1/FAILURE)
 Main PID: 40811 (code=exited, status=1/FAILURE)

Jun 14 09:59:40 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-broker.service: main process exited, code=exited, status=1/FAILURE
Jun 14 09:59:40 alma04.qa.lab.tlv.redhat.com systemd[1]: Unit ovirt-ha-broker.service entered failed state.
Jun 14 09:59:40 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-broker.service failed.
Jun 14 09:59:40 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-broker.service holdoff time over, scheduling restart.
Jun 14 09:59:40 alma04.qa.lab.tlv.redhat.com systemd[1]: start request repeated too quickly for ovirt-ha-broker.service
Jun 14 09:59:40 alma04.qa.lab.tlv.redhat.com systemd[1]: Failed to start oVirt Hosted Engine High Availability Communications Broker.
Jun 14 09:59:40 alma04.qa.lab.tlv.redhat.com systemd[1]: Unit ovirt-ha-broker.service entered failed state.
Jun 14 09:59:40 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-broker.service failed.

The initial issue of this bug was resolved as follows:
alma04 ~]# ll -lsha /var/log/ovirt-hosted-engine-ha/broker.log
19M ----------. 1 vdsm kvm 19M Jun 14 09:42 /var/log/ovirt-hosted-engine-ha/broker.log
[root@alma04 ~]# ll -lsha /var/log/ovirt-hosted-engine-ha/agent.log
3.1M ----------. 1 vdsm kvm 3.1M Jun 14 09:39 /var/log/ovirt-hosted-engine-ha/agent.log

Moving this bug to verified and will open separate bug on found new issues.

Comment 8 Nikolai Sednev 2018-06-14 08:05:05 UTC
Forth to previous comment, please see https://bugzilla.redhat.com/show_bug.cgi?id=1591150 for more details.

Comment 9 Simone Tiraboschi 2018-06-14 08:07:51 UTC
(In reply to Nikolai Sednev from comment #7)
> The initial issue of this bug was resolved as follows:
> alma04 ~]# ll -lsha /var/log/ovirt-hosted-engine-ha/broker.log
> 19M ----------. 1 vdsm kvm 19M Jun 14 09:42
> /var/log/ovirt-hosted-engine-ha/broker.log
> [root@alma04 ~]# ll -lsha /var/log/ovirt-hosted-engine-ha/agent.log
> 3.1M ----------. 1 vdsm kvm 3.1M Jun 14 09:39
> /var/log/ovirt-hosted-engine-ha/agent.log

According to this the issue is still the same on my opinion: 000 doesn't look a good permission set to be able to append to that log file.

Comment 10 Simone Tiraboschi 2018-06-14 08:37:03 UTC
chmod 644 /var/log/ovirt-hosted-engine-ha/*.log

seams a valid workaround.

Comment 11 Nikolai Sednev 2018-06-14 08:37:20 UTC
uid/gid were fixed from being root:root back to normal vdsm:kvm, but the parameters are wrong "----------.".

Comment 12 Simone Tiraboschi 2018-06-14 08:38:02 UTC
*** Bug 1591150 has been marked as a duplicate of this bug. ***

Comment 14 Nikolai Sednev 2018-06-18 15:19:45 UTC
On 4.1RHVH I saw:
[root@alma03 ~]# ll -lsha /var/log/ovirt-hosted-engine-ha/agent.log 
12K -rw-r--r--. 1 root root 12K Jun 18 17:01 /var/log/ovirt-hosted-engine-ha/agent.log
[root@alma03 ~]# ll -lsha /var/log/ovirt-hosted-engine-ha/broker.log
32K -rw-r--r--. 1 root root 32K Jun 18 17:01 /var/log/ovirt-hosted-engine-ha/broker.log
Components were as follows:
ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch

Available Packages
redhat-virtualization-host-image-update.noarch                     4.2-20180615.0.el7_5                      rhvh-qe  
[root@alma03 ~]# 

Then I've tried to upgrade 4.1RHVH from UI to RHVH4.2 and rebooted it.
I've seen in UI this:

Failed to upgrade Host alma03 (User: admin@internal-authz).
6/18/185:35:06 PM

Failed to upgrade Host alma03 (User: admin@internal-authz).
6/18/185:35:04 PM

Host alma03 was switched to Maintenance Mode.
6/18/185:04:58 PM

Host alma03 upgrade was started (User: admin@internal-authz).
6/18/185:04:58 PM





ha-agent fails to get started on host and permissions are still incorrect (-rw-r-----. 1 vdsm kvm, instead of -rw-r--r--.)

@alma03 ~]# systemctl status ovirt-ha-agent -l
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Mon 2018-06-18 18:12:32 IDT; 9s ago
  Process: 30019 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited, status=157)
 Main PID: 30019 (code=exited, status=157)

Jun 18 18:12:32 alma03.qa.lab.tlv.redhat.com systemd[1]: Unit ovirt-ha-agent.service entered failed state.
Jun 18 18:12:32 alma03.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service failed.
[root@alma03 ~]# systemctl status ovirt-ha-broker -l
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2018-06-18 18:12:26 IDT; 22s ago
 Main PID: 29910 (ovirt-ha-broker)
    Tasks: 1
   Memory: 18.2M
   CGroup: /system.slice/ovirt-ha-broker.service
           └─29910 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker

Jun 18 18:12:26 alma03.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
Jun 18 18:12:26 alma03.qa.lab.tlv.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...
[root@alma03 ~]# ll -lsha /var/log/ovirt-hosted-engine-ha/agent.log 
260K -rw-r-----. 1 vdsm kvm 253K Jun 18 18:13 /var/log/ovirt-hosted-engine-ha/agent.log
[root@alma03 ~]# ll -lsha /var/log/ovirt-hosted-engine-ha/broker.log 
524K -rw-r-----. 1 vdsm kvm 517K Jun 18 18:12 /var/log/ovirt-hosted-engine-ha/broker.log
[root@alma03 ~]#


Components on host after upgrade:
ovirt-hosted-engine-ha-2.2.14-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch
Linux 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 7.5

Comment 15 Nikolai Sednev 2018-06-18 15:26:49 UTC
Created attachment 1452671 [details]
sosreport from alma03

Comment 16 Nikolai Sednev 2018-06-18 15:32:05 UTC
Created attachment 1452675 [details]
engine logs

Comment 17 Nikolai Sednev 2018-06-18 15:32:26 UTC
Engine version:
ovirt-engine-setup-4.2.4.4-0.1.el7_3.noarch

Comment 18 Sandro Bonazzola 2018-06-19 10:04:41 UTC
Moving to Ido since Simone is not available this week

Comment 19 Raz Tamir 2018-06-19 13:17:52 UTC
Hi Ido,

How this moved to ON_QA from POST? 
QE still doesn't have the new build so we can't test it. 
Moving back to POST for proper process

Comment 20 Sandro Bonazzola 2018-06-20 07:49:36 UTC
Note to QE:
For testing the fix imgbased-1.0.20 is needed. A new RHV-H build including this package will be done today.

Comment 22 Nikolai Sednev 2018-06-25 12:10:11 UTC
Tested on these components on engine:
ovirt-engine-setup-4.2.4.5-0.1.el7_3.noarch
Linux 3.10.0-862.6.3.el7.x86_64 #1 SMP Fri Jun 15 17:57:37 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

1.Deployed SHE on 4.1RHVH.
Components on host:
ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 7.5-0.14
2.Added additional 4.1RHVH as ha-host.
3.Checked for upgrade for one of the hosts via UI.
4.Upgraded 4.1RHVH from UI to 4.2RHVH (redhat-virtualization-host-4.2-20180622.0 ) and host did not recovered:


Failed to upgrade Host alma03.qa.lab.tlv.redhat.com (User: admin@internal-authz).
6/25/183:03:31 PM

Failed to upgrade Host alma03.qa.lab.tlv.redhat.com (User: admin@internal-authz).
6/25/183:03:28 PM

Host alma03.qa.lab.tlv.redhat.com was switched to Maintenance Mode.
6/25/182:33:24 PM

Host alma03.qa.lab.tlv.redhat.com upgrade was started (User: admin@internal-authz).
6/25/182:33:24 PM

Check for available updates on host alma03.qa.lab.tlv.redhat.com was completed successfully with message 'found updates for packages redhat-virtualization-host-image-update'.
6/25/182:30:54 PM

Host alma03.qa.lab.tlv.redhat.com has available updates: redhat-virtualization-host-image-update.
6/25/182:30:54 PM

Started to check for available updates on host alma03.qa.lab.tlv.redhat.com.
6/25/182:30:36 PM

Status of host alma03.qa.lab.tlv.redhat.com was set to Up.
6/25/182:26:50 PM

Host appeared as "Install Failed" in UI and like this in CLI:
--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : False
Hostname                           : alma03
Host ID                            : 2
Engine status                      : unknown stale-data
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : 8546a3cf
local_conf_timestamp               : 5994
Host timestamp                     : 5979
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=5979 (Mon Jun 25 14:59:27 2018)
        host-id=2
        score=0
        vm_conf_refresh_time=5994 (Mon Jun 25 14:59:42 2018)
        conf_on_shared_storage=True
        maintenance=True
        state=LocalMaintenance
        stopped=False

5.I ran "hosted-engine --set-maintenance --mode=none" on host to release it from local maintenance.
6.Host remained in local maintenance, regardless of step 5.

Comment 23 Nikolai Sednev 2018-06-25 12:17:33 UTC
alma03 ~]# systemctl status ovirt-ha-agent -l
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2018-06-25 14:25:47 IDT; 51min ago
 Main PID: 19189 (ovirt-ha-agent)
    Tasks: 2
   Memory: 36.6M
   CGroup: /system.slice/ovirt-ha-agent.service
           └─19189 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

Jun 25 14:25:47 alma03.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.
Jun 25 14:25:47 alma03.qa.lab.tlv.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent...
[root@alma03 ~]# systemctl status ovirt-ha-broker -l
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2018-06-25 14:25:47 IDT; 51min ago
 Main PID: 19188 (ovirt-ha-broker)
    Tasks: 7
   Memory: 44.5M
   CGroup: /system.slice/ovirt-ha-broker.service
           └─19188 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon

Jun 25 14:25:47 alma03.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
Jun 25 14:25:47 alma03.qa.lab.tlv.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...
Jun 25 14:25:49 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[19188]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=4fb972d3-5659-4a0f-a564-65e83fc81f72'
                                                                     Traceback (most recent call last):
                                                                       File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle
                                                                         data)
                                                                       File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch
                                                                         .set_storage_domain(client, sd_type, **options)
                                                                       File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain
                                                                         self._backends[client].connect()
                                                                       File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect
                                                                         self._dom_type)
                                                                       File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path
                                                                         " in {1}".format(sd_uuid, parent))
                                                                     BackendFailureException: path to storage domain 4fb972d3-5659-4a0f-a564-65e83fc81f72 not found in /rhev/data-center/mnt
Jun 25 15:00:08 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[19188]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.submonitor_base.SubmonitorBase ERROR Error executing submonitor mgmt-bridge, args {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}
                                                                     Traceback (most recent call last):
                                                                       File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitor_base.py", line 115, in _worker
                                                                         self.action(self._options)
                                                                       File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors/mgmt_bridge.py", line 42, in action
                                                                         logger=self._log
                                                                       File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 187, in connect_vdsm_json_rpc
                                                                         requestQueue=requestQueue,
                                                                       File "/usr/lib/python2.7/site-packages/vdsm/jsonrpcvdscli.py", line 248, in connect
                                                                         responseQueue)
                                                                       File "/usr/lib/python2.7/site-packages/vdsm/jsonrpcvdscli.py", line 234, in _create
                                                                         lazy_start=False)
                                                                       File "/usr/lib/python2.7/site-packages/yajsonrpc/stompreactor.py", line 629, in StandAloneRpcClient
                                                                         client = StompClient(utils.create_connected_socket(host, port, sslctx),
                                                                       File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 947, in create_connected_socket
                                                                         sock = sslctx.wrapSocket(sock)
                                                                       File "/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 101, in wrapSocket
                                                                         ca_certs=self.ca_certs)
                                                                       File "/usr/lib64/python2.7/ssl.py", line 936, in wrap_socket
                                                                         ciphers=ciphers)
                                                                       File "/usr/lib64/python2.7/ssl.py", line 547, in __init__
                                                                         self._context.load_verify_locations(ca_certs)
                                                                     IOError: [Errno 2] No such file or directory

Comment 24 Yuval Turgeman 2018-06-25 12:18:16 UTC
Nikolai, did the host reboot after the upgrade ?

Comment 25 Nikolai Sednev 2018-06-25 13:21:12 UTC
(In reply to Yuval Turgeman from comment #24)
> Nikolai, did the host reboot after the upgrade ?

During upgrade, the host was not able to get restarted from UI, as I did not configured remote power management.

I've restarted it manually and after that, host seamed upgrading something and then it got rebooted on its own again and then started with black screen and did not let me to log in for a few minutes (15-20), then I received the login prompt.
Now host has these components:
ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.14-1.el7ev.noarch
Linux 3.10.0-862.6.3.el7.x86_64 #1 SMP Fri Jun 15 17:57:37 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 7.5

alma03 ~]# ll -lsha /var/log/ovirt-hosted-engine-ha/agent.log
368K -rw-r-----. 1 vdsm kvm 363K Jun 25 16:18 /var/log/ovirt-hosted-engine-ha/agent.log
[root@alma03 ~]# ll -lsha /var/log/ovirt-hosted-engine-ha/broker.log
724K -rw-r-----. 1 vdsm kvm 718K Jun 25 16:18 /var/log/ovirt-hosted-engine-ha/broker.log

I tried to activate the host from UI and host successfully activated.
Check for available updates on host alma03.qa.lab.tlv.redhat.com was completed successfully with message 'no updates found.'.
6/25/184:20:48 PM

Started to check for available updates on host alma03.qa.lab.tlv.redhat.com.
6/25/184:20:29 PM

Successfully refreshed the capabilities of host alma03.qa.lab.tlv.redhat.com.
6/25/184:20:25 PM

Status of host alma03.qa.lab.tlv.redhat.com was set to Up.
6/25/184:19:19 PM

No faulty multipath paths on host alma03.qa.lab.tlv.redhat.com
6/25/184:19:19 PM

Activation of host alma03.qa.lab.tlv.redhat.com initiated by admin@internal-authz.
6/25/184:19:15 PM

Successfully refreshed the capabilities of host alma03.qa.lab.tlv.redhat.com.
6/25/184:13:56 PM

Comment 26 Nikolai Sednev 2018-06-25 13:24:41 UTC
Since both agent and broker are running correctly after the upgrade, movint this bug to verified.
Please also see https://bugzilla.redhat.com/show_bug.cgi?id=1534197#c57 .

Comment 27 Nikolai Sednev 2018-06-25 14:06:46 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=1594836, I've opened it to document the restart issue there.

Comment 29 errata-xmlrpc 2018-06-27 10:01:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2066

Comment 30 Franta Kust 2019-05-16 13:06:35 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.