+++ This bug was initially created as a clone of Bug #1416893 +++ Description of problem: Unable to undeploy hosted-engine host via UI. Version-Release number of selected component (if applicable): Engine: rhev-guest-tools-iso-4.1-3.el7ev.noarch rhevm-doc-4.1.0-1.el7ev.noarch rhevm-dependencies-4.1.0-1.el7ev.noarch rhevm-setup-plugins-4.1.0-1.el7ev.noarch rhevm-4.1.0.1-0.1.el7.noarch rhevm-guest-agent-common-1.0.12-3.el7ev.noarch rhevm-branding-rhev-4.1.0-0.el7ev.noarch Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) Hosts: rhvm-appliance-4.1.20170119.1-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.0-1.el7ev.noarch ovirt-hosted-engine-setup-2.1.0-2.el7ev.noarch ovirt-host-deploy-1.6.0-1.el7ev.noarch ovirt-imageio-common-0.5.0-0.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64 libvirt-client-2.0.0-10.el7_3.4.x86_64 mom-0.5.8-1.el7ev.noarch vdsm-4.19.2-2.el7ev.x86_64 ovirt-setup-lib-1.1.0-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-imageio-daemon-0.5.0-0.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) How reproducible: 100% Steps to Reproduce: 1.Deploy HE environment over NFS and add one NFS data storage domain. 2.After auto-import finished with importing HE-VM and its hosted_storage, add additional hosted engine host to the environment, so you will have at least two HA hosts in your hosted engine host cluster. 3.Set host that is not running HE-VM in to maintenance. 4.Edit host that was put in to maintenance in step 3 and in "Hosted Engine" sub-tab select "Choose hosted engine deployment action" -> "Undeploy" and press "OK". 5.Activate back host that was in maintenance. Actual results: Host not being undeployed, it's HA agent and broker are both still active. Expected results: Once host that was undeployed and then activated, should return to active with stopped ovirt-ha-agent and ovirt-ha-broker. Additional info: Sosreports from engine and hosts being attached. --- Additional comment from Nikolai Sednev on 2017-01-26 12:44 EST --- --- Additional comment from Nikolai Sednev on 2017-01-26 12:49:23 EST --- Sosreport from alma03: https://drive.google.com/open?id=0B85BEaDBcF88dUJfR3NBaUJJXzQ Sosreport from alma04: https://drive.google.com/open?id=0B85BEaDBcF88TE9jT21lSF85OE0 --- Additional comment from Nikolai Sednev on 2017-01-26 12:52:33 EST --- Host alma04 was undeployed, but still active in ha claster: alma04 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma03.qa.lab.tlv.redhat.com Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 8310baad local_conf_timestamp : 102588 Host timestamp : 102575 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=102575 (Thu Jan 26 19:50:26 2017) host-id=1 score=3400 vm_conf_refresh_time=102588 (Thu Jan 26 19:50:39 2017) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma04.qa.lab.tlv.redhat.com Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : eb49a8fc local_conf_timestamp : 85787 Host timestamp : 85775 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=85775 (Thu Jan 26 19:50:19 2017) host-id=2 score=3400 vm_conf_refresh_time=85787 (Thu Jan 26 19:50:32 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False alma04 ~]# systemctl status ovirt-ha-agent -l && systemctl status ovirt-ha-broker -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2017-01-26 14:29:05 IST; 5h 23min ago Main PID: 19608 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─19608 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Jan 26 14:29:05 alma04.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. Jan 26 14:29:05 alma04.qa.lab.tlv.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent... Jan 26 14:31:31 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[19608]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2017-01-26 14:29:05 IST; 5h 23min ago Main PID: 19607 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─19607 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon Jan 26 14:29:05 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[19607]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 26 14:31:31 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[19607]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 26 14:31:35 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[19607]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 26 14:31:40 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[19607]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 26 14:31:44 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[19607]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 26 14:31:50 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[19607]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 26 14:31:50 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[19607]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 26 14:32:00 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[19607]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 26 14:32:05 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[19607]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 26 14:34:50 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[19607]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt --- Additional comment from Simone Tiraboschi on 2017-01-27 06:02:23 EST --- From the engine VM logs we see that host-deploy has been run two times for alma04; in both the case we have: 2017-01-25 20:43:03 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:VALUE HOSTED_ENGINE/action 2017-01-25 20:43:03 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: HOSTED_ENGINE/action 2017-01-25 20:43:03 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE VALUE HOSTED_ENGINE/action=str:deploy 2017-01-26 14:26:59 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:VALUE HOSTED_ENGINE/action 2017-01-26 14:26:59 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: HOSTED_ENGINE/action 2017-01-26 14:26:59 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE VALUE HOSTED_ENGINE/action=str:deploy So for host-deploy point of view, hosted-engine configuration has been deployed twice over alma04 --- Additional comment from Sandro Bonazzola on 2017-01-27 06:46:53 EST --- Martin, can you please investigate on engine side? --- Additional comment from Andrej Krejcir on 2017-01-30 10:00:17 EST --- Currently, editing a host through the Edit dialog will not deploy/undeploy the hosted engine. These options work only when adding a new host. An existing host can be undeployed through the Installation -> Reinstall dialog. --- Additional comment from Nikolai Sednev on 2017-01-31 05:59:34 EST --- 1) If these options are not working using edit button, then should be hidden for an existing host. 2) Is this documented somewhere in downstream documentation? 3)Proposed new way of undeployment e.g. maintenance the host, reinstall, hosted engine, undeploy, did worked, although metadata was not cleaned properly from shared storage, and I still could see already undeployed alma04, in CLI, from alma03: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma03 Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : f7259df9 local_conf_timestamp : 509796 Host timestamp : 509783 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=509783 (Tue Jan 31 12:57:16 2017) host-id=1 score=3400 vm_conf_refresh_time=509796 (Tue Jan 31 12:57:30 2017) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : alma04 Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : 92331640 local_conf_timestamp : 492857 Host timestamp : 492881 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=492881 (Tue Jan 31 12:55:28 2017) host-id=2 score=0 vm_conf_refresh_time=492857 (Tue Jan 31 12:55:04 2017) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True --- Additional comment from Nikolai Sednev on 2017-01-31 06:02:14 EST --- I also see that ovirt-ha-broker service was not turned off on undeployed host: ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled; vendor preset: disabled) Active: active (running) since Sun 2017-01-29 13:29:00 IST; 1 day 23h ago Main PID: 41428 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─41428 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon Jan 29 13:29:00 alma04.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Jan 29 13:29:00 alma04.qa.lab.tlv.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... Jan 29 13:32:53 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[41428]: ovirt-ha-broker cpu_load_no_engine.EngineHealth ERROR Failed to read vm stats: [Errno 2] No such file or directory: '/proc/0/stat' Jan 31 12:53:27 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[41428]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 31 12:53:34 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[41428]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt Jan 31 12:53:35 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[41428]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR Failed to read metadata from /var/run/vdsm/storage/ba4febaa-775b-443f-b241-eaf51aa5f724/eaa24e46-c634-4e84-b38c-2617123baec1/e0134df9-7af3-43b3-9a7e-f2c67c0022f1 Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 129, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) OSError: [Errno 2] No such file or directory: '/var/run/vdsm/storage/ba4febaa-775b-443f-b241-eaf51aa5f724/eaa24e46-c634-4e84-b38c-2617123baec1/e0134df9-7af3-43b3-9a7e-f2c67c0022f1' Jan 31 12:53:36 alma04.qa.lab.tlv.redhat.com ovirt-ha-broker[41428]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=ba4febaa-775b-443f-b241-eaf51aa5f724' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 461, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain ba4febaa-775b-443f-b241-eaf51aa5f724 not found in /rhev/data-center/mnt ovirt-ha-agent was turned off: # systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled; vendor preset: disabled) Active: inactive (dead) Jan 26 20:45:18 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service stop-sigterm timed out. Killing. Jan 26 20:45:18 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service: main process exited, code=killed, status=9/KILL Jan 26 20:45:18 alma04.qa.lab.tlv.redhat.com systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent. Jan 26 20:45:18 alma04.qa.lab.tlv.redhat.com systemd[1]: Unit ovirt-ha-agent.service entered failed state. Jan 26 20:45:18 alma04.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service failed. Jan 29 13:29:00 alma04.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. Jan 29 13:29:00 alma04.qa.lab.tlv.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent... Jan 31 12:53:35 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[41429]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Request failed: failed to read metadata: [Errno 2] No such file or directory: '/var/run/vdsm/storage/ba4febaa-775b-443f-b241-eaf51aa5f724/eaa24e46-c634-4e84-b38c-2617123baec1/e0134df9-7af3-43b3-9a7e-f2c67c0022f1'' - trying to restart agent Jan 31 12:55:28 alma04.qa.lab.tlv.redhat.com systemd[1]: Stopping oVirt Hosted Engine High Availability Monitoring Agent... Jan 31 12:55:32 alma04.qa.lab.tlv.redhat.com systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent. --- Additional comment from Andrej Krejcir on 2017-01-31 08:57:40 EST --- (In reply to Nikolai Sednev from comment #7) > 1) If these options are not working using edit button, then should be hidden > for an existing host. Ok, should be a simple fix. > 2) Is this documented somewhere in downstream documentation? I haven't found if it is in the documentation somewhere. It was introduced in this RFE: Bug 1167262 . > 3)Proposed new way of undeployment e.g. maintenance the host, reinstall, > hosted engine, undeploy, did worked, although metadata was not cleaned > properly from shared storage, and I still could see already undeployed > alma04, in CLI, from alma03: This is a bug and we can fix it. > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : alma03 > Host ID : 1 > Engine status : {"health": "good", "vm": "up", > "detail": "up"} > Score : 3400 > stopped : False > Local maintenance : False > crc32 : f7259df9 > local_conf_timestamp : 509796 > Host timestamp : 509783 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=509783 (Tue Jan 31 12:57:16 2017) > host-id=1 > score=3400 > vm_conf_refresh_time=509796 (Tue Jan 31 12:57:30 2017) > conf_on_shared_storage=True > maintenance=False > state=EngineUp > stopped=False > > > --== Host 2 status ==-- > > conf_on_shared_storage : True > Status up-to-date : False > Hostname : alma04 > Host ID : 2 > Engine status : unknown stale-data > Score : 0 > stopped : True > Local maintenance : False > crc32 : 92331640 > local_conf_timestamp : 492857 > Host timestamp : 492881 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=492881 (Tue Jan 31 12:55:28 2017) > host-id=2 > score=0 > vm_conf_refresh_time=492857 (Tue Jan 31 12:55:04 2017) > conf_on_shared_storage=True > maintenance=False > state=AgentStopped > stopped=True --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-02-08 09:00:40 EST --- This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. --- Additional comment from Nikolai Sednev on 2017-04-15 19:38:10 EDT --- 1)Deployed clean HE on puma18, over NFS and added two NFS data storage domains to get auto-imported the hosted-storage. 2)Added additional ha-host named puma19. 3)Put to maintenance puma19 and then undeployed it using "reinstall" option. 4)puma19 was activated back after being undeployed as ha-host. 5)On puma19 I see: puma19 ~]# hosted-engine --vm-status You must run deploy first puma19 ~]# systemctl status ovirt-ha-agent.service -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Sun 2017-04-16 02:19:57 IDT; 8min ago Main PID: 13019 (code=exited, status=255) Apr 16 02:19:45 puma19 ovirt-ha-agent[13019]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs Apr 16 02:19:50 puma19 ovirt-ha-agent[13019]: ovirt-ha-agent ovirt_hosted_engine_ha.lib.upgrade.StorageServer.config ERROR Configuration file '/etc/ovirt-hosted-engine/hosted-engine.conf' not available [[Errno 2] No such file or directory: '/etc/ovirt-hosted-engine/hosted-engine.conf'] Apr 16 02:19:52 puma19 ovirt-ha-agent[13019]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 191, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 64, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 414, in start_monitoring upg = upgrade.Upgrade() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/upgrade.py", line 54, in __init__ self._type = self._config.get(config.ENGINE, config.DOMAIN_TYPE) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/env/config.py", line 222, in get key KeyError: 'Configuration value not found: file=/etc/ovirt-hosted-engine/hosted-engine.conf, key=domainType' Apr 16 02:19:52 puma19 ovirt-ha-agent[13019]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent Apr 16 02:19:57 puma19 ovirt-ha-agent[13019]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Configuration file '/etc/ovirt-hosted-engine/hosted-engine.conf' not available [[Errno 2] No such file or directory: '/etc/ovirt-hosted-engine/hosted-engine.conf'] Apr 16 02:19:57 puma19 ovirt-ha-agent[13019]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Hosted Engine is not configured. Shutting down. Apr 16 02:19:57 puma19 systemd[1]: ovirt-ha-agent.service: main process exited, code=exited, status=255/n/a Apr 16 02:19:57 puma19 systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent. Apr 16 02:19:57 puma19 systemd[1]: Unit ovirt-ha-agent.service entered failed state. Apr 16 02:19:57 puma19 systemd[1]: ovirt-ha-agent.service failed. puma19 ~]# systemctl status ovirt-ha-broker.service -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled; vendor preset: disabled) Active: active (running) since Sun 2017-04-16 02:16:05 IDT; 12min ago Main PID: 13018 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─13018 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon Apr 16 02:16:05 puma19 systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker. Apr 16 02:16:05 puma19 systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker... Apr 16 02:16:06 puma19 ovirt-ha-broker[13018]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=9afe5e06-f1cd-4078-bf7a-bb2f242c630b' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain 9afe5e06-f1cd-4078-bf7a-bb2f242c630b not found in /rhev/data-center/mnt Apr 16 02:19:10 puma19 ovirt-ha-broker[13018]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=9afe5e06-f1cd-4078-bf7a-bb2f242c630b' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain 9afe5e06-f1cd-4078-bf7a-bb2f242c630b not found in /rhev/data-center/mnt Apr 16 02:19:13 puma19 ovirt-ha-broker[13018]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR Failed to read metadata from /var/run/vdsm/storage/9afe5e06-f1cd-4078-bf7a-bb2f242c630b/f4e8bf74-c112-4ab8-86aa-975227dcb5ee/b6927029-34cf-4285-8d22-d8ce285385f1 Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 129, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) OSError: [Errno 2] No such file or directory: '/var/run/vdsm/storage/9afe5e06-f1cd-4078-bf7a-bb2f242c630b/f4e8bf74-c112-4ab8-86aa-975227dcb5ee/b6927029-34cf-4285-8d22-d8ce285385f1' Apr 16 02:19:21 puma19 ovirt-ha-broker[13018]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=9afe5e06-f1cd-4078-bf7a-bb2f242c630b' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 462, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 107, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain 9afe5e06-f1cd-4078-bf7a-bb2f242c630b not found in /rhev/data-center/mnt On puma18 I see: puma18 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : puma18 Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 17fd3fb5 local_conf_timestamp : 3841 Host timestamp : 3826 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3826 (Sun Apr 16 02:28:49 2017) host-id=1 score=3400 vm_conf_refresh_time=3841 (Sun Apr 16 02:29:03 2017) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : puma19 Host ID : 2 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : f6ec2c2e local_conf_timestamp : 3131 Host timestamp : 3116 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3116 (Sun Apr 16 02:18:43 2017) host-id=2 score=3400 vm_conf_refresh_time=3131 (Sun Apr 16 02:18:58 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False Host puma19 was successfully undeployed as ha-host, although its ovirt-ha-broker was not turned off properly and had continued running, ovirt-ha-agent should have been properly disabled, but appears as failed to start. On puma18 I still see that puma19 being reported in metadata as ha-host, although it was undeployed. Moving this bug to verified as now undeployment is working with exceptions which I've reported as appears above and I'll open a separate bug on this. --- Additional comment from Nikolai Sednev on 2017-04-15 19:40 EDT ---
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Created attachment 1271847 [details] screencast-2017-04-16_02.18.39.mkv
Created attachment 1271848 [details] sosreport-nsednev-he-4
Created attachment 1271849 [details] sosreport-puma18
Created attachment 1271850 [details] sosreport-puma19
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
The regression introduced in improperly handled ha-broker and ha-agent, while compared to 4.0 previous release. Yet host might be undeployed and not being ha-host after that in engine and being handled as regular host. Please consider on removing the blocker as it is not really blocking customer from making an ha-host a regular host, although with some exceptions.
Can you please write a clear new description a steps to reproduced? It seems to be different from the issue you cloned this from.
Please open a fresh bug with the issue you want fix. This bug is not clear enough to address.