Description of problem: Metadata not cleared when hosted-engine host being removed from the setup over WEBUI. Although I've removed host alma03.qa.lab.tlv.redhat.com from the engine, the metadata was not cleaned and host remains there as appears bellow: [root@alma04 ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : alma03.qa.lab.tlv.redhat.com Host ID : 1 Engine status : unknown stale-data Score : 3400 stopped : False Local maintenance : False crc32 : e19e4915 Host timestamp : 7474 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=7474 (Thu Jun 23 14:32:19 2016) host-id=1 score=3400 maintenance=False state=EngineDown stopped=False --== Host 2 status ==-- Status up-to-date : True Hostname : alma04.qa.lab.tlv.redhat.com Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : cc11079a Host timestamp : 9285 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=9285 (Thu Jun 23 16:36:04 2016) host-id=2 score=3400 maintenance=False state=EngineUp stopped=False Version-Release number of selected component (if applicable): Host: ovirt-setup-lib-1.0.2-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64 mom-0.5.4-1.el7ev.noarch rhev-release-4.0.0-19-001.noarch vdsm-4.18.4-2.el7ev.x86_64 ovirt-vmconsole-host-1.0.3-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-host-deploy-1.5.0-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-vmconsole-1.0.3-1.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) Engine: rhevm-doc-4.0.0-2.el7ev.noarch rhevm-setup-plugins 4.0.0.1-1.el7ev.noarch rhevm-spice-client-x64-msi-4.0-2.el7ev.noarch rhevm-4.0.0.6-0.1.el7ev.noarch rhev-release-4.0.0-19-001.noarch rhevm-guest-agent-common-1.0.12-2.el7ev.noarch rhevm-dependencies-4.0.0-1.el7ev.noarch rhevm-branding-rhev-4.0.0-2.el7ev.noarch rhevm-spice-client-x86-msi-4.0-2.el7ev.noarch rhev-guest-tools-iso-4.0-2.el7ev.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) How reproducible: 100% Steps to Reproduce: 1.Deploy HE on pair of hosts. 2.Try removing one of the hosts from the setup. 3.on remained host run "hosted-engine --vm-status" and check if host is removed. Actual results: Host was not cleared from metadata. Expected results: Host should be removed from both, setup and metadata. Additional info: logs being attached from both hosts and the engine.
Created attachment 1171544 [details] sosreport from engine
Created attachment 1171545 [details] latest sosreport from host alma04
sosreport from second host (alma03) available from here: https://drive.google.com/open?id=0B85BEaDBcF88Q3M3REpxZTdMTm8
What do you mean by setup? The webadmin UI? If so then this is not really a bug. We do not propagate removals to hosts at all (VDSM is still running there as well).
(In reply to Martin Sivák from comment #4) > What do you mean by setup? The webadmin UI? If so then this is not really a > bug. We do not propagate removals to hosts at all (VDSM is still running > there as well). Setup equals to environment that I've established with two hosts and HE-VM running on them. UI-user interface, yes it's webadmin. I don't care about vdsm or other components, I do care about metadata not being cleaned at shared storage at OVF. If host is removed, it should be cleared, this is what was agreed at a time.
> If host is removed, it should be > cleared, this is what was agreed at a time. No, it wasn't. Because there is no mechanism to do that. Engine is not telling the host that you removed it from the database. All packages are still installed, all mountpoints are still mounted and all services are still running. The only change happened in the engine database. The only way to clean host from metadata is to issue the clean metadata command manually using the hosted engine tool. > I don't care about vdsm or other components, I do care about metadata not > being cleaned at shared storage at OVF. And OVF (volume holding the shared configuration) has nothing to do with it (metadata and sanlock volumes). You are mixing unrelated things together.
(In reply to Martin Sivák from comment #6) > > If host is removed, it should be > > cleared, this is what was agreed at a time. > > No, it wasn't. Because there is no mechanism to do that. Engine is not > telling the host that you removed it from the database. > All packages are still installed, all mountpoints are still mounted and all > services are still running. The only change happened in the engine database. > > The only way to clean host from metadata is to issue the clean metadata > command manually using the hosted engine tool. > > > I don't care about vdsm or other components, I do care about metadata not > > being cleaned at shared storage at OVF. > > And OVF (volume holding the shared configuration) has nothing to do with it > (metadata and sanlock volumes). You are mixing unrelated things together. I've had this discussion here https://bugzilla.redhat.com/show_bug.cgi?id=1200469#c9. What I'm meaning is if I remove hosted-engine-host "alma03" from WEBUI, then I want it to disappear from here: [root@alma04 ~]# hosted-engine --vm-status --== Host 2 status ==-- Status up-to-date : True Hostname : alma04.qa.lab.tlv.redhat.com Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : cc11079a Host timestamp : 9285 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=9285 (Thu Jun 23 16:36:04 2016) host-id=2 score=3400 maintenance=False state=EngineUp stopped=False
This requirement was also discussed here: https://bugzilla.redhat.com/show_bug.cgi?id=1200469#c7 This requirement was accepted here: https://bugzilla.redhat.com/show_bug.cgi?id=1200469#c8
This is by design, we can consider removing it completely but we will need to first provide a tool for clean of hosted engine.
Supposing we call host-deply with the undeploy command we can clear the metadata for the host. Issue is if you remove the host without runing the undeploy command first. This require the engine to run the udeploy before dropping the host from the engine.
(In reply to Sandro Bonazzola from comment #10) > Supposing we call host-deply with the undeploy command we can clear the > metadata for the host. Issue is if you remove the host without runing the > undeploy command first. This require the engine to run the udeploy before > dropping the host from the engine. What do we do in host remove? It should be a the same flow.
(In reply to Yaniv Dary from comment #11) > (In reply to Sandro Bonazzola from comment #10) > > Supposing we call host-deply with the undeploy command we can clear the > > metadata for the host. Issue is if you remove the host without runing the > > undeploy command first. This require the engine to run the udeploy before > > dropping the host from the engine. > > What do we do in host remove? It should be a the same flow. In general, on host remove we do nothing except removing the host data from engine db. For Hosted engine hosts this won't take the host out of hosted engine cluster. So we'll need an RFE on the engine to trigger ovirt-host-deploy before removing an host from the db if the host is hosted engine. We'll need also a force checkbox there because otherwise we won't be able to remove the host if the host is just dead.
Who is the team to handle this? can you move this ticket to be handled by this team?
I guess it's infra but it has to be done on engine side so a different ticket is needed.
Can we consider this for 4.1? Can we please retarget it?
We can retarget it if you give it the right PM-x priority and limit the scope to the UI warning only.
(In reply to Martin Sivák from comment #16) > We can retarget it if you give it the right PM-x priority and limit the > scope to the UI warning only. Ack
Forth to PMs decision, the RFE was only limited to textual warning: " Remove Host(s) Are you sure you want to remove the following items? - alma04.qa.lab.tlv.redhat.com * The hosts marked with * still have hosted engine deployed on them. Hosted engine should be undeployed before they are removed." 1)In case that customer proceeds and clicks on "OK" button, then host being removed. 2)In case that hosted engine host being undeployed using UI and then removed, then no warning appears and host is removed. 3)In case of host being undeployed and then activated in UI, then in CLI customer will see: --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : alma04.qa.lab.tlv.redhat.com Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : d3561bee local_conf_timestamp : 66949 Host timestamp : 66936 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=66936 (Thu Jan 26 14:36:21 2017) host-id=2 score=3400 vm_conf_refresh_time=66949 (Thu Jan 26 14:36:34 2017) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False Step 3 looks as "Undeploy" is not working properly and a separate bug will be opened on this. CLI metadata still not being cleared for removed hosts, but this is due to PS's decision. Moving to verified. Components on hosts: rhvm-appliance-4.1.20170119.1-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.0-1.el7ev.noarch ovirt-hosted-engine-setup-2.1.0-2.el7ev.noarch ovirt-host-deploy-1.6.0-1.el7ev.noarch ovirt-imageio-common-0.5.0-0.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64 libvirt-client-2.0.0-10.el7_3.4.x86_64 mom-0.5.8-1.el7ev.noarch vdsm-4.19.2-2.el7ev.x86_64 ovirt-setup-lib-1.1.0-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-imageio-daemon-0.5.0-0.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) On engine: rhev-guest-tools-iso-4.1-3.el7ev.noarch rhevm-doc-4.1.0-1.el7ev.noarch rhevm-dependencies-4.1.0-1.el7ev.noarch rhevm-setup-plugins-4.1.0-1.el7ev.noarch rhevm-4.1.0.1-0.1.el7.noarch rhevm-guest-agent-common-1.0.12-3.el7ev.noarch rhevm-branding-rhev-4.1.0-0.el7ev.noarch Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo)