Bug 1525601

Summary: Broker service fails to start after the straight upgrade of HE packages from 3.6 to 4.2
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Artyom <alukiano>
Component: BrokerAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Artyom <alukiano>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.2.1CC: alukiano, bugs, stirabos
Target Milestone: ovirt-4.2.0Keywords: Triaged
Target Release: ---Flags: rule-engine: ovirt-4.2+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-20 11:19:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1458711    
Attachments:
Description Flags
versions none

Description Artyom 2017-12-13 16:26:56 UTC
Created attachment 1367493 [details]
versions

Description of problem:
After upgrade of hosted-engine packages from 3.6 -> 4.2 the broker service fails to start. Under the journalctl I can see the single error line
Dec 13 18:07:19 alma06.qa.lab.tlv.redhat.com python[6336]: detected unhandled Python exception in '/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker'

If you run the service script you recieve
INFO:ovirt_hosted_engine_ha.broker.status_broker.StatusBroker:Starting status updating thread
INFO:ovirt_hosted_engine_ha.broker.status_broker.StatusBroker:Status broker initialized.
INFO:ovirt_hosted_engine_ha.broker.listener.Listener:Initializing RPCServer
Traceback (most recent call last):
  File "/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker", line 25, in <module>
    broker.Broker().run()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 57, in run
    self._listener = self._get_listener()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 127, in _get_listener
    self._status_broker_instance)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 51, in __init__
    self._server = unixrpc.UnixXmlRpcServer(constants.BROKER_SOCKET_FILE)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 30, in __init__
    request_handler)
  File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__
    self.server_bind()
  File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
    self.socket.bind(self.server_address)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
socket.error: [Errno 2] No such file or directory

From first glance, looks like systemd does not create /var/run/ovirt-hosted-engine-ha directory although we specified "RuntimeDirectory=ovirt-hosted-engine-ha" under service file.


Version-Release number of selected component (if applicable):
You can find all version in the attachment

How reproducible:
Always

Steps to Reproduce:
1. Deploy 3.6 HE environment with at least one host
2. Update host packages to 4.2
3.

Actual results:
After the update broker service fails to start

Expected results:
All hosted-engine services must work

Additional info:

Comment 1 Simone Tiraboschi 2017-12-13 16:40:00 UTC
We introduced RuntimeDirectory on https://gerrit.ovirt.org/#/c/73778/ and never back-ported to 4.1

Comment 2 Martin Sivák 2017-12-13 16:54:10 UTC
Can you try systemctl daemon-reload first? Just to be sure systemd knows about the service file change?

Comment 3 Simone Tiraboschi 2017-12-13 23:57:35 UTC
Reproducible also upgrading from 4.1.z to 4.2-pre

Comment 4 Simone Tiraboschi 2017-12-14 00:18:41 UTC
The first issue seams here:

[root@c74he20171214h1 ~]# cat /var/tmp/abrt/Python-2017-12-14-00\:48\:21-30697/cmdline 
/usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker[root@c74he20171214h1 ~]# cat /var/tmp/abrt/Python-2017-12-14-00\:48\:21-30697/backtrace 
__init__.py:925:_open:IOError: [Errno 13] Permission denied: '/var/log/ovirt-hosted-engine-ha/broker.log'

Traceback (most recent call last):
  File "/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker", line 25, in <module>
    broker.Broker().run()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 44, in run
    self._initialize_logging()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/broker.py", line 70, in _initialize_logging
    disable_existing_loggers=False)
  File "/usr/lib64/python2.7/logging/config.py", line 78, in fileConfig
    handlers = _install_handlers(cp, formatters)
  File "/usr/lib64/python2.7/logging/config.py", line 156, in _install_handlers
    h = klass(*args)
  File "/usr/lib64/python2.7/logging/handlers.py", line 169, in __init__
    BaseRotatingHandler.__init__(self, filename, 'a', encoding, delay)
  File "/usr/lib64/python2.7/logging/handlers.py", line 64, in __init__
    logging.FileHandler.__init__(self, filename, mode, encoding, delay)
  File "/usr/lib64/python2.7/logging/__init__.py", line 902, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib64/python2.7/logging/__init__.py", line 925, in _open
    stream = open(self.baseFilename, self.mode)
IOError: [Errno 13] Permission denied: '/var/log/ovirt-hosted-engine-ha/broker.log'

Local variables in innermost frame:
self: <logging.handlers.TimedRotatingFileHandler object at 0x17f4d50>
[root@c74he20171214h1 ~]# ls -l /var/log/ovirt-hosted-engine-ha/broker.log
-rw-r--r--. 1 root root 65971 14 dic 00.48 /var/log/ovirt-hosted-engine-ha/broker.log
[root@c74he20171214h1 ~]# ls -l /var/log/ovirt-hosted-engine-ha/agent.log 
-rw-r--r--. 1 root root 35368 14 dic 00.46 /var/log/ovirt-hosted-engine-ha/agent.log


Workaround:

 chown vdsm:kvm /var/log/ovirt-hosted-engine-ha/broker.log /var/log/ovirt-hosted-engine-ha/agent.log
 systemctl restart ovirt-ha-broker
 systemctl restart ovirt-ha-agent

Comment 5 Artyom 2017-12-14 07:18:28 UTC
I tried Simone solution in my environment and it worked like a magic:) so in the end, the problem wasn't in systemd

Comment 6 Martin Sivák 2017-12-14 10:57:41 UTC
Hmm so should we just add the chown to post install?

Comment 7 Artyom 2017-12-19 12:59:23 UTC
Verified on ovirt-hosted-engine-ha-2.2.2-1.el7ev.noarch

After update logs have a correct owner.

Comment 8 Sandro Bonazzola 2017-12-20 11:19:59 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.