Description of problem: To upgrade a hosted engine environment I followed: http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine The 5th step says: Restart ha-agent and broker services(# service ovirt-ha-broker restart && service ovirt-ha-agent restart) which translates to: systemctl restart ovirt-ha-broker systemctl restart ovirt-ha-agent However, after doing that, the daemons are not running and systemctl shows: [root@dev-20 ~]# systemctl status ovirt-ha-broker ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled) Active: inactive (dead) since Wed 2014-07-30 13:55:19 CEST; 2min 44s ago Process: 26794 ExecStop=/lib/systemd/systemd-ovirt-ha-broker stop (code=exited, status=0/SUCCESS) Process: 26786 ExecStart=/lib/systemd/systemd-ovirt-ha-broker start (code=exited, status=0/SUCCESS) Main PID: 861 (code=killed, signal=KILL) Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Co...... Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd-ovirt-ha-broker[26794]: Stopping ovirt-ha-broker: [FAILED] Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Com...er. Hint: Some lines were ellipsized, use -l to show in full. [root@dev-20 ~]# ps aux | grep ovirt The only way to get the daemons up and running again was to bypass systemd and do: /lib/systemd/systemd-ovirt-ha-broker restart /lib/systemd/systemd-ovirt-ha-agent restart After doing that, the --vm-status shows that the data is stale though (possibly unrelated) --== Host 2 status ==-- Status up-to-date : False Hostname : 10.34.63.180 Host ID : 2 Engine status : unknown stale-data Score : 2000 Local maintenance : False Host timestamp : 1406640293 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1406640293 (Tue Jul 29 15:24:53 2014) host-id=2 score=2000 maintenance=False bridge=True cpu-load=0.0856 engine-health={"health": "good", "vm": "up", "detail": "up"} gateway=True mem-free=3192 Version-Release number of selected component (if applicable): ovirt-hosted-engine-ha-1.2.1-0.2.master.20140725101556.fc20.noarch How reproducible: I only have one setup, so I did it once Steps to Reproduce: 1. Follow the guide above 2. Try to do step 7 hosted-engine --set-maintenance --mode=none Actual results: [root@dev-20 ~]# hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 72, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 60, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 248, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 196, in set_global_md_flag with broker.connection(): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) ovirt_hosted_engine_ha.lib.exceptions.BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (5) Expected results: It silently succeeds Additional info: Above I put how to workaround the error by bypassing systemd, but the results are not optimal with the workaround
(In reply to Antoni Segura Puimedon from comment #0) > Description of problem: > To upgrade a hosted engine environment I followed: > http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine > The 5th step says: > Restart ha-agent and broker services(# service ovirt-ha-broker restart > && service ovirt-ha-agent restart) > > which translates to: > systemctl restart ovirt-ha-broker > systemctl restart ovirt-ha-agent Please note it translates to systemctl restart ovirt-ha-broker.service systemctl restart ovirt-ha-agent.service > > However, after doing that, the daemons are not running and systemctl > shows: > [root@dev-20 ~]# systemctl status ovirt-ha-broker > ovirt-ha-broker.service - oVirt Hosted Engine High Availability > Communications Broker > Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; > enabled) > Active: inactive (dead) since Wed 2014-07-30 13:55:19 CEST; 2min 44s > ago > Process: 26794 ExecStop=/lib/systemd/systemd-ovirt-ha-broker stop > (code=exited, status=0/SUCCESS) > Process: 26786 ExecStart=/lib/systemd/systemd-ovirt-ha-broker start > (code=exited, status=0/SUCCESS) > Main PID: 861 (code=killed, signal=KILL) > > Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Starting > oVirt Hosted Engine High Availability Co...... > Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com > systemd-ovirt-ha-broker[26794]: Stopping ovirt-ha-broker: [FAILED] > Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Started > oVirt Hosted Engine High Availability Com...er. > Hint: Some lines were ellipsized, use -l to show in full. > [root@dev-20 ~]# ps aux | grep ovirt > > The only way to get the daemons up and running again was to bypass systemd > and > do: > /lib/systemd/systemd-ovirt-ha-broker restart > /lib/systemd/systemd-ovirt-ha-agent restart > Can you check if for some reason the services have not been reloaded by the rpm update? Can you reporduce executing: systemctl reload ovirt-ha-broker.service systemctl reload ovirt-ha-agent.service before trying to restart the services?
About the stale file, I've no clue right now.
(In reply to Sandro Bonazzola from comment #1) > > Can you check if for some reason the services have not been reloaded by the > rpm update? AFAIK rpm updates should never reload services on their own?
the spec file contains: %postun %if 0%{?with_systemd} %systemd_postun_with_restart ovirt-ha-agent.service %systemd_postun_with_restart ovirt-ha-broker.service %else ... so the daemon reload the service and execute try-restart for the 2 services on upgrade.
I can't reproduce the issue upgrading F20 host with F19 VM engine. Upgrade from 3.4.4 to 3.5.1 snapshot from http://jenkins.ovirt.org/view/Publishers/job/publish_ovirt_rpms_nightly_3.5/197/