Description of problem: Remove metadata from shared storage for undeployed ha-host from WEBUI of the engine. Undeployed host being shown from CLI of active ha-hosts, although it's being shown as inactive, it should be cleared as it was intentionally undeployed using WBUI of the engine, hence it doesn't contibute to anything if still appears in CLI, its confusing and irrelevant data, once undeployed. Version-Release number of selected component (if applicable): ovirt-vmconsole-1.0.4-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 ovirt-imageio-daemon-1.0.0-0.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch mom-0.5.9-1.el7ev.noarch vdsm-4.19.15-1.el7ev.x86_64 ovirt-hosted-engine-ha-2.1.0.6-1.el7ev.noarch ovirt-setup-lib-1.1.0-1.el7ev.noarch ovirt-imageio-common-1.0.0-0.el7ev.noarch libvirt-client-2.0.0-10.el7_3.5.x86_64 ovirt-hosted-engine-setup-2.1.0.6-1.el7ev.noarch ovirt-host-deploy-1.6.5-1.el7ev.noarch Linux version 3.10.0-514.21.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Apr 22 02:41:35 EDT 2017 Linux 3.10.0-514.21.1.el7.x86_64 #1 SMP Sat Apr 22 02:41:35 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) How reproducible: 100% Steps to Reproduce: 1.Deploy HE on pair of hosts over NFS and add one or more NFS data storage domains to get hosted-storage auto-imported. 2.Undeploy one of the ha-hosts. 3.Check from remaining ha-host's CLI "hosted-engine --vm-status". 4.You should see both hosts, one that is currently active ha-host and also undeployed host, just like here: puma18 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : puma18 Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : d6a0a955 local_conf_timestamp : 347701 Host timestamp : 347686 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=347686 (Mon May 22 16:37:27 2017) host-id=1 score=3400 vm_conf_refresh_time=347701 (Mon May 22 16:37:43 2017) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : False Hostname : puma19 Host ID : 2 Engine status : unknown stale-data Score : 0 stopped : True Local maintenance : False crc32 : 19b15ef8 local_conf_timestamp : 342836 Host timestamp : 342822 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=342822 (Mon May 22 15:17:53 2017) host-id=2 score=0 vm_conf_refresh_time=342836 (Mon May 22 15:18:06 2017) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True Actual results: puma19 was once ha-host, but once undeployed from the engine it is still shown in CLI of an active ha-host puma18. Expected results: puma19 (undeployed ha-host) should be cleared from metadata, once undeployed from engine's WEBUI. Additional info: I've read https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/self-hosted_engine_guide/removing_a_host_from_a_self-hosted_engine_environment section and I think that there is no need to hold irrelevant metadata, once ha-host was undeployed for such a long period. This bug derived from https://bugzilla.redhat.com/show_bug.cgi?id=1442580.
I would like to add that in case that you have 2 ha-hosts and one of which was undeployed, you may not use https://access.redhat.com/solutions/2212601 work around as you should not stop ovirt-ha-agent on active ha-host. Casting "hosted-engine --clean-metadata --host-id=$old_ID --force-clean" without stopping ovirt-ha-agent previously, will also cause for service to get stopped, which is highly undesirable on a single, active ha-host, which is running HE-VM with other guest-VMs. puma18 ~]# hosted-engine --clean-metadata --host-id=2 --force-clean INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha agent 2.1.0.6 started INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: puma18.scl.lab.tlv.redhat.com INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Refreshing vm.conf INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Reloading vm.conf from the shared storage domain INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:08f6844b-f1e5-4acb-a4ad-5129606785b5, volUUID:8e9d95ee-636f-49bf-8633-7f9f8f19a466 INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:80a61d2b-74c2-4a79-8009-85f7e8517825, volUUID:a43b161c-a18f-4ced-97a7-7b521caecd90 INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_nsednev__he__4/f7d64e4c-a34d-484f-8dd8-412ea87b2e67/images/80a61d2b-74c2-4a79-8009-85f7e8517825/a43b161c-a18f-4ced-97a7-7b521caecd90 ERROR:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to extract HEVM OVF ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing ha-broker connection INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor ping, options {'addr': '10.35.160.254'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765819780560 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765819586768 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765819586576 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'c8e1075f-4c8f-431b-93d7-e7ada07b4cce', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765819586640 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'c8e1075f-4c8f-431b-93d7-e7ada07b4cce', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765889785680 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765889785040 INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Broker initialized, all submonitors started INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Ensuring lease for lockspace hosted-engine, host id 2 is acquired (file: /var/run/vdsm/storage/f7d64e4c-a34d-484f-8dd8-412ea87b2e67/f144f971-d0c5-4b3a-bcdf-4f6d8aac6b2e/ce600432-88ea-45db-b9c0-e26cf50cefc7) ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:cannot get lock on host id 2: host already holds lock on a different host id WARNING:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Force requested, overriding sanlock failure. INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Cleaning the metadata block! INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down puma18 ~]# systemctl start ovirt-ha-agent puma18 ~]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : puma18 Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : fc5e0c2d local_conf_timestamp : 348819 Host timestamp : 348804 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=348804 (Mon May 22 16:56:06 2017) host-id=1 score=3400 vm_conf_refresh_time=348819 (Mon May 22 16:56:21 2017) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False
If we get requests to update ad hoc, we will consider it. For now I don't see a reason to invest in this.