Description of problem: Failed to clean metadata "hosted-engine --clean-metadata". INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha agent 2.0.0 started INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:2358368b-5a76-4302-8749-dec82cc198f6, volUUID:ed9e8a17-e932-4973-aeee-b71694692d79 INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:64424b6c-072c-4c25-9453-6312d3e1dfe9, volUUID:a2aca504-bc50-4c58-a7ef-09f0c9286200 INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__1/29d459ea-989d-4127-b996-248928adf543/images/64424b6c-072c-4c25-9453-6312d3e1dfe9/a2aca504-bc50-4c58-a7ef-09f0c9286200 INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Started VDSM domain monitor for 29d459ea-989d-4127-b996-248928adf543 INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to start monitoring domain (sd_uuid=29d459ea-989d-4127-b996-248928adf543, host_id=1): timeout during domain acquisition None ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Failed to start monitoring domain (sd_uuid=29d459ea-989d-4127-b996-248928adf543, host_id=1): timeout during domain acquisition' - trying to restart agent WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0' ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors occurred, giving up. Please review the log and consider filing a bug. INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down Version-Release number of selected component (if applicable): mom-0.5.4-1.el7ev.noarch ovirt-vmconsole-host-1.0.3-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch ovirt-setup-lib-1.0.2-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64 libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch ovirt-vmconsole-1.0.3-1.el7ev.noarch rhev-release-4.0.0-19-001.noarch vdsm-4.18.4-2.el7ev.x86_64 ovirt-host-deploy-1.5.0-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) How reproducible: 100% Steps to Reproduce: 1.Deploy pair of hosted-engine hosts over NFS. 2.Set via WEBUI one of the hosts in to maintenance and then remove it. 3.For the removed in WEBUI host, run via CLI "hosted-engine --clean-metadata". Actual results: Command failed with errors. Expected results: Command should succeed. Additional info: Sosreport from host is attached.
Created attachment 1173050 [details] sosreport-nsednev-he-1.qa.lab.tlv.redhat.com-20160627133738.tar.xz
Created attachment 1173051 [details] sosreport from host alma03
Please try: systemctl stop ovirt-ha-agent.service hosted-engine --clean-metadata
(In reply to Yedidyah Bar David from comment #3) > Please try: > > systemctl stop ovirt-ha-agent.service > > hosted-engine --clean-metadata Is this requirement has been documented? I mean shutting down the ovirt-ha-agent prior to casting "hosted-engine --clean-metadata" command? If it's required, I'd expect from the command itself to do this automatically. BTW, in order to shut down agen, I have to shut down the broker first, as it follows for if the agent is running and if not it starts the agent. [nsednev@nsednev ~]$ ssh root.lab.tlv.redhat.com root.lab.tlv.redhat.com's password: Last login: Mon Jun 27 20:03:21 2016 from dhcp-4-96.tlv.redhat.com [root@alma03 ~]# systemctl stop ovirt-ha-broker && systemctl stop ovirt-ha-agent [root@alma03 ~]# hosted-engine --clean-metadata INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha agent 2.0.0 started INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:2358368b-5a76-4302-8749-dec82cc198f6, volUUID:ed9e8a17-e932-4973-aeee-b71694692d79 INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:64424b6c-072c-4c25-9453-6312d3e1dfe9, volUUID:a2aca504-bc50-4c58-a7ef-09f0c9286200 INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__1/29d459ea-989d-4127-b996-248928adf543/images/64424b6c-072c-4c25-9453-6312d3e1dfe9/a2aca504-bc50-4c58-a7ef-09f0c9286200 INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Started VDSM domain monitor for 29d459ea-989d-4127-b996-248928adf543 INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing ha-broker connection INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds ERROR:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker, the number of errors has exceeded the limit (10) ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to connect to ha-broker: Failed to connect to broker, the number of errors has exceeded the limit (10) ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Can't initialize brokerlink 'Failed to connect to broker, the number of errors has exceeded the limit (10)' - reinitializing WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0' ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors occurred, giving up. Please review the log and consider filing a bug. INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down I've tried also with broker running and agent powered off.: [root@alma03 ~]# systemctl status ovirt-ha-broker -l && systemctl status ovirt-ha-agent -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2016-06-28 11:15:19 IDT; 29s ago Main PID: 95503 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─95503 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor cpu-load-no-engine Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor mem-free Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor cpu-load Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor mem-load Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor cpu-load Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor mgmt-bridge Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor mem-load Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Finished loading submonitors Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.listener.Listener:Initializing SocketServer Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.listener.Listener:SocketServer ready ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Tue 2016-06-28 11:12:37 IDT; 3min 11s ago Main PID: 50147 (code=killed, signal=KILL) Jun 28 11:12:27 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[50147]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Can't initialize brokerlink 'Failed to connect to broker, the number of errors has exceeded the limit (10)' - reinitializing Jun 28 11:12:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[50147]: WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '2' Jun 28 11:12:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[50147]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com Jun 28 11:12:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[50147]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM Jun 28 11:12:37 alma03.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service stop-sigterm timed out. Killing. Jun 28 11:12:37 alma03.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service: main process exited, code=killed, status=9/KILL Jun 28 11:12:37 alma03.qa.lab.tlv.redhat.com systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent. Jun 28 11:12:37 alma03.qa.lab.tlv.redhat.com systemd[1]: Unit ovirt-ha-agent.service entered failed state. Jun 28 11:12:37 alma03.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service failed. Jun 28 11:15:19 alma03.qa.lab.tlv.redhat.com systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent. [root@alma03 ~]# hosted-engine --clean-metadata INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha agent 2.0.0 started INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:2358368b-5a76-4302-8749-dec82cc198f6, volUUID:ed9e8a17-e932-4973-aeee-b71694692d79 INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:64424b6c-072c-4c25-9453-6312d3e1dfe9, volUUID:a2aca504-bc50-4c58-a7ef-09f0c9286200 INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__1/29d459ea-989d-4127-b996-248928adf543/images/64424b6c-072c-4c25-9453-6312d3e1dfe9/a2aca504-bc50-4c58-a7ef-09f0c9286200 INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing ha-broker connection INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor ping, options {'addr': '10.35.117.254'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 29998480 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 29662224 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 29936912 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '280c4195-08ee-4385-a031-6288702a6aad', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217663319760 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '280c4195-08ee-4385-a031-6288702a6aad', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217462031248 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217462077200 INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Broker initialized, all submonitors started INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/29d459ea-989d-4127-b996-248928adf543/2bc81a5d-908c-4a9b-83ec-e64f564d8255/6888c1dd-9d57-4fdd-ba3c-8375e9b45073) ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Cannot clean unclean metadata block. Consider --force-clean. INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down Eventually I've ran with --force-clean: [root@alma03 ~]# hosted-engine --clean-metadata --force-clean INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha agent 2.0.0 started INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:2358368b-5a76-4302-8749-dec82cc198f6, volUUID:ed9e8a17-e932-4973-aeee-b71694692d79 INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:64424b6c-072c-4c25-9453-6312d3e1dfe9, volUUID:a2aca504-bc50-4c58-a7ef-09f0c9286200 INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__1/29d459ea-989d-4127-b996-248928adf543/images/64424b6c-072c-4c25-9453-6312d3e1dfe9/a2aca504-bc50-4c58-a7ef-09f0c9286200 INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing ha-broker connection INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor ping, options {'addr': '10.35.117.254'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 29936784 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 29998800 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 30016784 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '280c4195-08ee-4385-a031-6288702a6aad', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217462030544 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '280c4195-08ee-4385-a031-6288702a6aad', 'address': '0'} INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217462076496 INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217462075728 INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Broker initialized, all submonitors started INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/29d459ea-989d-4127-b996-248928adf543/2bc81a5d-908c-4a9b-83ec-e64f564d8255/6888c1dd-9d57-4fdd-ba3c-8375e9b45073) INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Acquired lock on host id 1 INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Cleaning the metadata block! INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down [root@alma03 ~]# So to summarize, the command worked only after shutting down agent, but leaving broker, and there is no warning anywhere about this, then I still had to cast "hosted-engine --clean-metadata --force-clean", without "--force-clean" command failed. Here goes out-print from second host and the metadata is cleared on it now. [root@alma04 ~]# hosted-engine --vm-status --== Host 2 status ==-- Status up-to-date : True Hostname : alma04.qa.lab.tlv.redhat.com Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 6813f0fb Host timestamp : 59836 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=59836 (Tue Jun 28 11:18:27 2016) host-id=2 score=3400 maintenance=False state=EngineUp stopped=False [root@alma04 ~]#
I don't know well the details of this functionality. I'll let Martin handle this. I'll just note that in the past (3.6 I think), this worked for me: 1. Remove host1 from engine 2. Shutdown host1 (presumably for reprovisioning) 3. On host2: systemctl stop ovirt-ha-agent.service hosted-engine --clean-metadata --host-id=1 Didn't have to use force, did have to have broker up. IIRC at some point Martin said that shutting down the agent should not be needed, but iirc this never worked for me without shutting it down first.
The broker has to be up for --clean-metadata to work properly. The agent from the host you want to clean has to be stopped cleanly (service stop), --force must be used if it was not. The agent on the host from where you perform the clean has to be down. It can be the same host you are removing or another host, but then the --host-id= argument is required.
(In reply to Martin Sivák from comment #7) > The broker has to be up for --clean-metadata to work properly. The agent > from the host you want to clean has to be stopped cleanly (service stop), > --force must be used if it was not. > > The agent on the host from where you perform the clean has to be down. It > can be the same host you are removing or another host, but then the > --host-id= argument is required. Can you provide the documentation link for this sequence? Is this documented? I don't see this requirement in CLI help for HE, aka hosted-engine --help.
> Can you provide the documentation link for this sequence? Is this > documented? I don't see this requirement in CLI help for HE, aka > hosted-engine --help. And did you try hosted-engine --clean-lockspace --help as the main help and man page is telling you? "For additional information about a specific command try: hosted-engine <command> --help" This is then printed out: Usage: $0 --clean_metadata [--force-cleanup] [--host-id=<id>] Remove host's metadata from the global status database. Available only in properly deployed cluster with properly stopped agent. --force-cleanup This option overrides the safety checks. Use at your own risk DANGEROUS. --host-id=<id> Specify an explicit host id to clean
(In reply to Martin Sivák from comment #9) > > Can you provide the documentation link for this sequence? Is this > > documented? I don't see this requirement in CLI help for HE, aka > > hosted-engine --help. > > And did you try hosted-engine --clean-lockspace --help as the main help and > man page is telling you? > > "For additional information about a specific command try: hosted-engine > <command> --help" > > This is then printed out: > > Usage: $0 --clean_metadata [--force-cleanup] [--host-id=<id>] > Remove host's metadata from the global status database. > Available only in properly deployed cluster with properly stopped > agent. > > --force-cleanup This option overrides the safety checks. Use at your own > risk DANGEROUS. > > --host-id=<id> Specify an explicit host id to clean Correct, but it does not mentions anything regarding shutting down processes i.e. for this command to work properly you must turn off ovirt-ha-agent service, but make sure that ovirt-ha-broker is still running. I was mentioning this kind of help message and not a generic help message about usage of the commands and their syntax. [root@alma04 ~]# hosted-engine --clean-metadata --help Usage: /usr/sbin/hosted-engine --clean_metadata [--force-cleanup] [--host-id=<id>] Remove host's metadata from the global status database. Available only in properly deployed cluster with properly stopped agent. --force-cleanup This option overrides the safety checks. Use at your own risk DANGEROUS. --host-id=<id> Specify an explicit host id to clean [root@alma04 ~]# hosted-engine --clean-metadata --force-cleanup --help Usage: ovirt-ha-agent [options] Options: --version show program's version number and exit -h, --help show this help message and exit --no-daemon don't start as a daemon --pdb start pdb in case of crash --cleanup purge the metadata block --force-cleanup purge the metadata blockeven when not clean --host-id=HOST_ID override the host id
And also I've asked about this to be documented in Red Hat official documentation and not only in hosted-engine help in CLI.
> Correct, but it does not mentions anything regarding shutting down processes > i.e. for this command to work properly you must turn off ovirt-ha-agent service But the help does indeed say this: Available only in properly deployed cluster with __properly_stopped_agent__. > And also I've asked about this to be documented in Red Hat official > documentation and not only in hosted-engine help in CLI. For that you need to file a proper documentation bug and yes, we should have that.
This bug had requires_doc_text flag, yet no documentation text was provided. Please add the documentation text and only then set this flag.
We won't be perusing this issue any more. If someone needs it, patches are welcomed.