Bug 1454342 - Remove metadata from shared storage for undeployed ha-host.
Summary: Remove metadata from shared storage for undeployed ha-host.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Host-Deploy
Version: 4.1.2.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Sandro Bonazzola
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-22 13:52 UTC by Nikolai Sednev
Modified: 2018-01-17 01:01 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-06-12 09:22:29 UTC
oVirt Team: Integration
Embargoed:
nsednev: planning_ack?
nsednev: devel_ack?
nsednev: testing_ack?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1535268 0 medium CLOSED HE Undeploy leaves host metadata on shared storage. 2021-05-01 16:35:15 UTC

Internal Links: 1535268

Description Nikolai Sednev 2017-05-22 13:52:17 UTC
Description of problem:
Remove metadata from shared storage for undeployed ha-host from WEBUI of the engine.

Undeployed host being shown from CLI of active ha-hosts, although it's being shown as inactive, it should be cleared as it was intentionally undeployed using WBUI of the engine, hence it doesn't contibute to anything if still appears in CLI, its confusing and irrelevant data, once undeployed.

Version-Release number of selected component (if applicable):
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.9-1.el7ev.noarch
vdsm-4.19.15-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.1.0.6-1.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
libvirt-client-2.0.0-10.el7_3.5.x86_64
ovirt-hosted-engine-setup-2.1.0.6-1.el7ev.noarch
ovirt-host-deploy-1.6.5-1.el7ev.noarch
Linux version 3.10.0-514.21.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Apr 22 02:41:35 EDT 2017
Linux 3.10.0-514.21.1.el7.x86_64 #1 SMP Sat Apr 22 02:41:35 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

How reproducible:
100%

Steps to Reproduce:
1.Deploy HE on pair of hosts over NFS and add one or more NFS data storage domains to get hosted-storage auto-imported.
2.Undeploy one of the ha-hosts.
3.Check from remaining ha-host's CLI "hosted-engine --vm-status".
4.You should see both hosts, one that is currently active ha-host and also undeployed host, just like here:
puma18 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : puma18
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : d6a0a955
local_conf_timestamp               : 347701
Host timestamp                     : 347686
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=347686 (Mon May 22 16:37:27 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=347701 (Mon May 22 16:37:43 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : False
Hostname                           : puma19
Host ID                            : 2
Engine status                      : unknown stale-data
Score                              : 0
stopped                            : True
Local maintenance                  : False
crc32                              : 19b15ef8
local_conf_timestamp               : 342836
Host timestamp                     : 342822
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=342822 (Mon May 22 15:17:53 2017)
        host-id=2
        score=0
        vm_conf_refresh_time=342836 (Mon May 22 15:18:06 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=AgentStopped
        stopped=True


Actual results:
puma19 was once ha-host, but once undeployed from the engine it is still shown in CLI of an active ha-host puma18.

Expected results:
puma19 (undeployed ha-host) should be cleared from metadata, once undeployed from engine's WEBUI.

Additional info:
I've read https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/self-hosted_engine_guide/removing_a_host_from_a_self-hosted_engine_environment section and I think that there is no need to hold irrelevant metadata, once ha-host was undeployed for such a long period.
This bug derived from https://bugzilla.redhat.com/show_bug.cgi?id=1442580.

Comment 1 Nikolai Sednev 2017-05-22 14:03:30 UTC
I would like to add that in case that you have 2 ha-hosts and one of which was undeployed, you may not use https://access.redhat.com/solutions/2212601 work around as you should not stop ovirt-ha-agent on active ha-host.
Casting "hosted-engine --clean-metadata --host-id=$old_ID --force-clean" without stopping ovirt-ha-agent previously, will also cause for service to get stopped, which is highly undesirable on a single, active ha-host, which is running HE-VM with other guest-VMs.

puma18 ~]#  hosted-engine --clean-metadata --host-id=2 --force-clean
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha agent 2.1.0.6 started
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: puma18.scl.lab.tlv.redhat.com
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images
INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Refreshing vm.conf
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Reloading vm.conf from the shared storage domain
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:08f6844b-f1e5-4acb-a4ad-5129606785b5, volUUID:8e9d95ee-636f-49bf-8633-7f9f8f19a466
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:80a61d2b-74c2-4a79-8009-85f7e8517825, volUUID:a43b161c-a18f-4ced-97a7-7b521caecd90
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_nsednev__he__4/f7d64e4c-a34d-484f-8dd8-412ea87b2e67/images/80a61d2b-74c2-4a79-8009-85f7e8517825/a43b161c-a18f-4ced-97a7-7b521caecd90 
ERROR:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to extract HEVM OVF
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing ha-broker connection
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor ping, options {'addr': '10.35.160.254'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765819780560
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765819586768
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765819586576
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 'c8e1075f-4c8f-431b-93d7-e7ada07b4cce', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765819586640
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 'c8e1075f-4c8f-431b-93d7-e7ada07b4cce', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765889785680
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 139765889785040
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Broker initialized, all submonitors started
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Ensuring lease for lockspace hosted-engine, host id 2 is acquired (file: /var/run/vdsm/storage/f7d64e4c-a34d-484f-8dd8-412ea87b2e67/f144f971-d0c5-4b3a-bcdf-4f6d8aac6b2e/ce600432-88ea-45db-b9c0-e26cf50cefc7)
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:cannot get lock on host id 2: host already holds lock on a different host id
WARNING:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Force requested, overriding sanlock failure.
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Cleaning the metadata block!
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down


puma18 ~]# systemctl start ovirt-ha-agent

puma18 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : puma18
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : fc5e0c2d
local_conf_timestamp               : 348819
Host timestamp                     : 348804
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=348804 (Mon May 22 16:56:06 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=348819 (Mon May 22 16:56:21 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False

Comment 2 Yaniv Lavi 2017-06-12 09:22:29 UTC
If we get requests to update ad hoc, we will consider it.
For now I don't see a reason to invest in this.


Note You need to log in before you can comment on or make changes to this bug.