Bug 1124826
| Summary: | hosted engine upgrade has issues restarting the services with systemd | ||
|---|---|---|---|
| Product: | [Retired] oVirt | Reporter: | Antoni Segura Puimedon <asegurap> |
| Component: | ovirt-hosted-engine-ha | Assignee: | Sandro Bonazzola <sbonazzo> |
| Status: | CLOSED WORKSFORME | QA Contact: | meital avital <mavital> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.5 | CC: | bazulay, bugs, ecohen, gklein, iheim, rbalakri, sbonazzo, s.kieske, yeylon |
| Target Milestone: | --- | Keywords: | TestCaseNeeded, TestCaseProvided |
| Target Release: | 3.5.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| URL: | http://www.ovirt.org/QA:TestCase_Hosted_Engine_Upgrade | ||
| Whiteboard: | sla | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-11-28 10:51:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
(In reply to Antoni Segura Puimedon from comment #0) > Description of problem: > To upgrade a hosted engine environment I followed: > http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine > The 5th step says: > Restart ha-agent and broker services(# service ovirt-ha-broker restart > && service ovirt-ha-agent restart) > > which translates to: > systemctl restart ovirt-ha-broker > systemctl restart ovirt-ha-agent Please note it translates to systemctl restart ovirt-ha-broker.service systemctl restart ovirt-ha-agent.service > > However, after doing that, the daemons are not running and systemctl > shows: > [root@dev-20 ~]# systemctl status ovirt-ha-broker > ovirt-ha-broker.service - oVirt Hosted Engine High Availability > Communications Broker > Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; > enabled) > Active: inactive (dead) since Wed 2014-07-30 13:55:19 CEST; 2min 44s > ago > Process: 26794 ExecStop=/lib/systemd/systemd-ovirt-ha-broker stop > (code=exited, status=0/SUCCESS) > Process: 26786 ExecStart=/lib/systemd/systemd-ovirt-ha-broker start > (code=exited, status=0/SUCCESS) > Main PID: 861 (code=killed, signal=KILL) > > Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Starting > oVirt Hosted Engine High Availability Co...... > Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com > systemd-ovirt-ha-broker[26794]: Stopping ovirt-ha-broker: [FAILED] > Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Started > oVirt Hosted Engine High Availability Com...er. > Hint: Some lines were ellipsized, use -l to show in full. > [root@dev-20 ~]# ps aux | grep ovirt > > The only way to get the daemons up and running again was to bypass systemd > and > do: > /lib/systemd/systemd-ovirt-ha-broker restart > /lib/systemd/systemd-ovirt-ha-agent restart > Can you check if for some reason the services have not been reloaded by the rpm update? Can you reporduce executing: systemctl reload ovirt-ha-broker.service systemctl reload ovirt-ha-agent.service before trying to restart the services? About the stale file, I've no clue right now. (In reply to Sandro Bonazzola from comment #1) > > Can you check if for some reason the services have not been reloaded by the > rpm update? AFAIK rpm updates should never reload services on their own? the spec file contains:
%postun
%if 0%{?with_systemd}
%systemd_postun_with_restart ovirt-ha-agent.service
%systemd_postun_with_restart ovirt-ha-broker.service
%else
...
so the daemon reload the service and execute try-restart for the 2 services on upgrade.
I can't reproduce the issue upgrading F20 host with F19 VM engine. Upgrade from 3.4.4 to 3.5.1 snapshot from http://jenkins.ovirt.org/view/Publishers/job/publish_ovirt_rpms_nightly_3.5/197/ |
Description of problem: To upgrade a hosted engine environment I followed: http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine The 5th step says: Restart ha-agent and broker services(# service ovirt-ha-broker restart && service ovirt-ha-agent restart) which translates to: systemctl restart ovirt-ha-broker systemctl restart ovirt-ha-agent However, after doing that, the daemons are not running and systemctl shows: [root@dev-20 ~]# systemctl status ovirt-ha-broker ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled) Active: inactive (dead) since Wed 2014-07-30 13:55:19 CEST; 2min 44s ago Process: 26794 ExecStop=/lib/systemd/systemd-ovirt-ha-broker stop (code=exited, status=0/SUCCESS) Process: 26786 ExecStart=/lib/systemd/systemd-ovirt-ha-broker start (code=exited, status=0/SUCCESS) Main PID: 861 (code=killed, signal=KILL) Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Co...... Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd-ovirt-ha-broker[26794]: Stopping ovirt-ha-broker: [FAILED] Jul 30 13:55:19 dev-20.rhev.lab.eng.brq.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Com...er. Hint: Some lines were ellipsized, use -l to show in full. [root@dev-20 ~]# ps aux | grep ovirt The only way to get the daemons up and running again was to bypass systemd and do: /lib/systemd/systemd-ovirt-ha-broker restart /lib/systemd/systemd-ovirt-ha-agent restart After doing that, the --vm-status shows that the data is stale though (possibly unrelated) --== Host 2 status ==-- Status up-to-date : False Hostname : 10.34.63.180 Host ID : 2 Engine status : unknown stale-data Score : 2000 Local maintenance : False Host timestamp : 1406640293 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1406640293 (Tue Jul 29 15:24:53 2014) host-id=2 score=2000 maintenance=False bridge=True cpu-load=0.0856 engine-health={"health": "good", "vm": "up", "detail": "up"} gateway=True mem-free=3192 Version-Release number of selected component (if applicable): ovirt-hosted-engine-ha-1.2.1-0.2.master.20140725101556.fc20.noarch How reproducible: I only have one setup, so I did it once Steps to Reproduce: 1. Follow the guide above 2. Try to do step 7 hosted-engine --set-maintenance --mode=none Actual results: [root@dev-20 ~]# hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 72, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 60, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 248, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 196, in set_global_md_flag with broker.connection(): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) ovirt_hosted_engine_ha.lib.exceptions.BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (5) Expected results: It silently succeeds Additional info: Above I put how to workaround the error by bypassing systemd, but the results are not optimal with the workaround