Bug 1124826 - hosted engine upgrade has issues restarting the services with systemd
Summary: hosted engine upgrade has issues restarting the services with systemd
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-hosted-engine-ha
Version: 3.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.5.1
Assignee: Sandro Bonazzola
QA Contact: meital avital
URL: http://www.ovirt.org/QA:TestCase_Host...
Whiteboard: sla
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-30 12:14 UTC by Antoni Segura Puimedon
Modified: 2016-02-10 19:42 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-11-28 10:51:19 UTC
oVirt Team: SLA


Attachments (Terms of Use)

Description Antoni Segura Puimedon 2014-07-30 12:14:29 UTC
Description of problem:
To upgrade a hosted engine environment I followed:
    http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine
The 5th step says:
     Restart ha-agent and broker services(# service ovirt-ha-broker restart && service ovirt-ha-agent restart)

which translates to:
    systemctl restart ovirt-ha-broker 
    systemctl restart ovirt-ha-agent

However, after doing that, the daemons are not running and systemctl
shows:
    [root@dev-20 ~]# systemctl status ovirt-ha-broker
    ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
       Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled)
       Active: inactive (dead) since Wed 2014-07-30 13:55:19 CEST; 2min 44s ago
      Process: 26794 ExecStop=/lib/systemd/systemd-ovirt-ha-broker stop (code=exited, status=0/SUCCESS)
      Process: 26786 ExecStart=/lib/systemd/systemd-ovirt-ha-broker start (code=exited, status=0/SUCCESS)
     Main PID: 861 (code=killed, signal=KILL)

    Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Co......
    Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd-ovirt-ha-broker[26794]: Stopping ovirt-ha-broker: [FAILED]
    Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Com...er.
    Hint: Some lines were ellipsized, use -l to show in full.
    [root@dev-20 ~]# ps aux | grep ovirt

The only way to get the daemons up and running again was to bypass systemd and
do:
    /lib/systemd/systemd-ovirt-ha-broker restart
    /lib/systemd/systemd-ovirt-ha-agent restart


After doing that, the --vm-status shows that the data is stale though (possibly
unrelated)

    --== Host 2 status ==--

    Status up-to-date                  : False
    Hostname                           : 10.34.63.180
    Host ID                            : 2
    Engine status                      : unknown stale-data
    Score                              : 2000
    Local maintenance                  : False
    Host timestamp                     : 1406640293
    Extra metadata (valid at timestamp):
         metadata_parse_version=1
         metadata_feature_version=1
         timestamp=1406640293 (Tue Jul 29 15:24:53 2014)
         host-id=2
         score=2000
         maintenance=False
         bridge=True
         cpu-load=0.0856
         engine-health={"health": "good", "vm": "up", "detail": "up"}
         gateway=True
         mem-free=3192

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-1.2.1-0.2.master.20140725101556.fc20.noarch

How reproducible:
I only have one setup, so I did it once


Steps to Reproduce:
1. Follow the guide above
2. Try to do step 7 hosted-engine --set-maintenance --mode=none

Actual results:
    [root@dev-20 ~]# hosted-engine --set-maintenance --mode=none
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
        "__main__", fname, loader, pkg_name)
      File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
        exec code in run_globals
      File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 72, in <module>
        if not maintenance.set_mode(sys.argv[1]):
      File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 60, in set_mode
        value=m_global,
      File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 248, in set_maintenance_mode
        str(value))
      File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 196, in set_global_md_flag
        with broker.connection():
      File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
        return self.gen.next()
      File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
        self.connect()
      File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
        raise BrokerConnectionError(error_msg)
    ovirt_hosted_engine_ha.lib.exceptions.BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (5)

Expected results:
It silently succeeds


Additional info:

Above I put how to workaround the error by bypassing systemd, but the results
are not optimal with the workaround

Comment 1 Sandro Bonazzola 2014-09-29 15:16:42 UTC
(In reply to Antoni Segura Puimedon from comment #0)
> Description of problem:
> To upgrade a hosted engine environment I followed:
>     http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine
> The 5th step says:
>      Restart ha-agent and broker services(# service ovirt-ha-broker restart
> && service ovirt-ha-agent restart)
> 
> which translates to:
>     systemctl restart ovirt-ha-broker 
>     systemctl restart ovirt-ha-agent

Please note it translates to

  systemctl restart ovirt-ha-broker.service
  systemctl restart ovirt-ha-agent.service



> 
> However, after doing that, the daemons are not running and systemctl
> shows:
>     [root@dev-20 ~]# systemctl status ovirt-ha-broker
>     ovirt-ha-broker.service - oVirt Hosted Engine High Availability
> Communications Broker
>        Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
> enabled)
>        Active: inactive (dead) since Wed 2014-07-30 13:55:19 CEST; 2min 44s
> ago
>       Process: 26794 ExecStop=/lib/systemd/systemd-ovirt-ha-broker stop
> (code=exited, status=0/SUCCESS)
>       Process: 26786 ExecStart=/lib/systemd/systemd-ovirt-ha-broker start
> (code=exited, status=0/SUCCESS)
>      Main PID: 861 (code=killed, signal=KILL)
> 
>     Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Starting
> oVirt Hosted Engine High Availability Co......
>     Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com
> systemd-ovirt-ha-broker[26794]: Stopping ovirt-ha-broker: [FAILED]
>     Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Started
> oVirt Hosted Engine High Availability Com...er.
>     Hint: Some lines were ellipsized, use -l to show in full.
>     [root@dev-20 ~]# ps aux | grep ovirt
> 
> The only way to get the daemons up and running again was to bypass systemd
> and
> do:
>     /lib/systemd/systemd-ovirt-ha-broker restart
>     /lib/systemd/systemd-ovirt-ha-agent restart
> 

Can you check if for some reason the services have not been reloaded by the rpm update?
Can you reporduce executing:

 systemctl reload ovirt-ha-broker.service
 systemctl reload ovirt-ha-agent.service

before trying to restart the services?

Comment 2 Sandro Bonazzola 2014-09-29 15:17:17 UTC
About the stale file, I've no clue right now.

Comment 3 Sven Kieske 2014-09-29 15:58:09 UTC
(In reply to Sandro Bonazzola from comment #1)

> 
> Can you check if for some reason the services have not been reloaded by the
> rpm update?


AFAIK rpm updates should never reload services on their own?

Comment 4 Sandro Bonazzola 2014-11-27 12:33:08 UTC
the spec file contains:

%postun
%if 0%{?with_systemd}
%systemd_postun_with_restart ovirt-ha-agent.service
%systemd_postun_with_restart ovirt-ha-broker.service
%else
...


so the daemon reload the service and execute try-restart for the 2 services on upgrade.

Comment 5 Sandro Bonazzola 2014-11-28 10:51:19 UTC
I can't reproduce the issue upgrading F20 host with F19 VM engine.

Upgrade from 3.4.4 to 3.5.1 snapshot from 
http://jenkins.ovirt.org/view/Publishers/job/publish_ovirt_rpms_nightly_3.5/197/


Note You need to log in before you can comment on or make changes to this bug.