Bug 1350539 - When the ha service is down and user run "hosted-engine --clean-metadata" a error should appear since this is required for the cleanup.
Summary: When the ha service is down and user run "hosted-engine --clean-metadata" a e...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent
Version: 2.0.0
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
: ---
Assignee: Martin Sivák
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-27 17:35 UTC by Nikolai Sednev
Modified: 2017-07-26 08:45 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-07-26 08:45:27 UTC
oVirt Team: SLA
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
pstehlik: testing_ack+


Attachments (Terms of Use)
sosreport-nsednev-he-1.qa.lab.tlv.redhat.com-20160627133738.tar.xz (8.36 MB, application/x-xz)
2016-06-27 17:40 UTC, Nikolai Sednev
no flags Details
sosreport from host alma03 (7.00 MB, application/x-xz)
2016-06-27 17:42 UTC, Nikolai Sednev
no flags Details

Description Nikolai Sednev 2016-06-27 17:35:07 UTC
Description of problem:
Failed to clean metadata "hosted-engine --clean-metadata".

INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha agent 2.0.0 started
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images
INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:2358368b-5a76-4302-8749-dec82cc198f6, volUUID:ed9e8a17-e932-4973-aeee-b71694692d79
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:64424b6c-072c-4c25-9453-6312d3e1dfe9, volUUID:a2aca504-bc50-4c58-a7ef-09f0c9286200
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__1/29d459ea-989d-4127-b996-248928adf543/images/64424b6c-072c-4c25-9453-6312d3e1dfe9/a2aca504-bc50-4c58-a7ef-09f0c9286200 
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Started VDSM domain monitor for 29d459ea-989d-4127-b996-248928adf543
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to start monitoring domain (sd_uuid=29d459ea-989d-4127-b996-248928adf543, host_id=1): timeout during domain acquisition
None
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Failed to start monitoring domain (sd_uuid=29d459ea-989d-4127-b996-248928adf543, host_id=1): timeout during domain acquisition' - trying to restart agent
WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0'
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors occurred, giving up. Please review the log and consider filing a bug.
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down


Version-Release number of selected component (if applicable):
mom-0.5.4-1.el7ev.noarch
ovirt-vmconsole-host-1.0.3-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64
libvirt-client-1.2.17-13.el7_2.5.x86_64
ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.0.2-1.el7ev.noarch
ovirt-vmconsole-1.0.3-1.el7ev.noarch
rhev-release-4.0.0-19-001.noarch
vdsm-4.18.4-2.el7ev.x86_64
ovirt-host-deploy-1.5.0-1.el7ev.noarch
sanlock-3.2.4-2.el7_2.x86_64
Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016
Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)


How reproducible:
100%

Steps to Reproduce:
1.Deploy pair of hosted-engine hosts over NFS.
2.Set via WEBUI one of the hosts in to maintenance and then remove it.
3.For the removed in WEBUI host, run via CLI "hosted-engine --clean-metadata".

Actual results:
Command failed with errors.

Expected results:
Command should succeed.

Additional info:
Sosreport from host is attached.

Comment 1 Nikolai Sednev 2016-06-27 17:40:47 UTC
Created attachment 1173050 [details]
sosreport-nsednev-he-1.qa.lab.tlv.redhat.com-20160627133738.tar.xz

Comment 2 Nikolai Sednev 2016-06-27 17:42:37 UTC
Created attachment 1173051 [details]
sosreport from host alma03

Comment 3 Yedidyah Bar David 2016-06-28 06:51:14 UTC
Please try:

systemctl stop ovirt-ha-agent.service

hosted-engine --clean-metadata

Comment 4 Nikolai Sednev 2016-06-28 08:22:10 UTC
(In reply to Yedidyah Bar David from comment #3)
> Please try:
> 
> systemctl stop ovirt-ha-agent.service
> 
> hosted-engine --clean-metadata

Is this requirement has been documented? I mean shutting down the ovirt-ha-agent prior to casting "hosted-engine --clean-metadata" command? If it's required, I'd expect from the command itself to do this automatically.
BTW, in order to shut down agen, I have to shut down the broker first, as it follows for if the agent is running and if not it starts the agent.

[nsednev@nsednev ~]$ ssh root.lab.tlv.redhat.com 
root.lab.tlv.redhat.com's password: 
Last login: Mon Jun 27 20:03:21 2016 from dhcp-4-96.tlv.redhat.com
[root@alma03 ~]# systemctl stop ovirt-ha-broker  &&  systemctl stop ovirt-ha-agent
[root@alma03 ~]# hosted-engine --clean-metadata
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha agent 2.0.0 started
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images
INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:2358368b-5a76-4302-8749-dec82cc198f6, volUUID:ed9e8a17-e932-4973-aeee-b71694692d79
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:64424b6c-072c-4c25-9453-6312d3e1dfe9, volUUID:a2aca504-bc50-4c58-a7ef-09f0c9286200
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__1/29d459ea-989d-4127-b996-248928adf543/images/64424b6c-072c-4c25-9453-6312d3e1dfe9/a2aca504-bc50-4c58-a7ef-09f0c9286200 
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: NONE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Started VDSM domain monitor for 29d459ea-989d-4127-b996-248928adf543
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing ha-broker connection
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker: [Errno 2] No such file or directory
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Retrying broker connection in '5' seconds
ERROR:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Failed to connect to broker, the number of errors has exceeded the limit (10)
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to connect to ha-broker: Failed to connect to broker, the number of errors has exceeded the limit (10)
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Can't initialize brokerlink 'Failed to connect to broker, the number of errors has exceeded the limit (10)' - reinitializing
WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0'
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors occurred, giving up. Please review the log and consider filing a bug.
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down



I've tried also with broker running and agent powered off.:
[root@alma03 ~]# systemctl status ovirt-ha-broker -l  &&  systemctl status ovirt-ha-agent -l
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-06-28 11:15:19 IDT; 29s ago
 Main PID: 95503 (ovirt-ha-broker)
   CGroup: /system.slice/ovirt-ha-broker.service
           └─95503 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon

Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor cpu-load-no-engine
Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor mem-free
Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor cpu-load
Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor mem-load
Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor cpu-load
Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor mgmt-bridge
Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Loaded submonitor mem-load
Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.monitor.Monitor:Finished loading submonitors
Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.listener.Listener:Initializing SocketServer
Jun 28 11:15:20 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[95503]: INFO:ovirt_hosted_engine_ha.broker.listener.Listener:SocketServer ready
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: failed (Result: signal) since Tue 2016-06-28 11:12:37 IDT; 3min 11s ago
 Main PID: 50147 (code=killed, signal=KILL)

Jun 28 11:12:27 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[50147]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Can't initialize brokerlink 'Failed to connect to broker, the number of errors has exceeded the limit (10)' - reinitializing
Jun 28 11:12:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[50147]: WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '2'
Jun 28 11:12:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[50147]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com
Jun 28 11:12:32 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[50147]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM
Jun 28 11:12:37 alma03.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service stop-sigterm timed out. Killing.
Jun 28 11:12:37 alma03.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service: main process exited, code=killed, status=9/KILL
Jun 28 11:12:37 alma03.qa.lab.tlv.redhat.com systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent.
Jun 28 11:12:37 alma03.qa.lab.tlv.redhat.com systemd[1]: Unit ovirt-ha-agent.service entered failed state.
Jun 28 11:12:37 alma03.qa.lab.tlv.redhat.com systemd[1]: ovirt-ha-agent.service failed.
Jun 28 11:15:19 alma03.qa.lab.tlv.redhat.com systemd[1]: Stopped oVirt Hosted Engine High Availability Monitoring Agent.
[root@alma03 ~]# hosted-engine --clean-metadata
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha agent 2.0.0 started
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images
INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:2358368b-5a76-4302-8749-dec82cc198f6, volUUID:ed9e8a17-e932-4973-aeee-b71694692d79
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:64424b6c-072c-4c25-9453-6312d3e1dfe9, volUUID:a2aca504-bc50-4c58-a7ef-09f0c9286200
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__1/29d459ea-989d-4127-b996-248928adf543/images/64424b6c-072c-4c25-9453-6312d3e1dfe9/a2aca504-bc50-4c58-a7ef-09f0c9286200 
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing ha-broker connection
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor ping, options {'addr': '10.35.117.254'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 29998480
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 29662224
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 29936912
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '280c4195-08ee-4385-a031-6288702a6aad', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217663319760
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '280c4195-08ee-4385-a031-6288702a6aad', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217462031248
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217462077200
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Broker initialized, all submonitors started
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/29d459ea-989d-4127-b996-248928adf543/2bc81a5d-908c-4a9b-83ec-e64f564d8255/6888c1dd-9d57-4fdd-ba3c-8375e9b45073)
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Cannot clean unclean metadata block. Consider --force-clean.
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down

Eventually I've ran with --force-clean:

[root@alma03 ~]# hosted-engine --clean-metadata --force-clean
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha agent 2.0.0 started
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: alma03.qa.lab.tlv.redhat.com
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images
INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:2358368b-5a76-4302-8749-dec82cc198f6, volUUID:ed9e8a17-e932-4973-aeee-b71694692d79
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:64424b6c-072c-4c25-9453-6312d3e1dfe9, volUUID:a2aca504-bc50-4c58-a7ef-09f0c9286200
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE
INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__HE__1/29d459ea-989d-4127-b996-248928adf543/images/64424b6c-072c-4c25-9453-6312d3e1dfe9/a2aca504-bc50-4c58-a7ef-09f0c9286200 
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing ha-broker connection
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor ping, options {'addr': '10.35.117.254'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 29936784
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 29998800
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 30016784
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '280c4195-08ee-4385-a031-6288702a6aad', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217462030544
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '280c4195-08ee-4385-a031-6288702a6aad', 'address': '0'}
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217462076496
INFO:ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink:Success, id 140217462075728
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Broker initialized, all submonitors started
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/29d459ea-989d-4127-b996-248928adf543/2bc81a5d-908c-4a9b-83ec-e64f564d8255/6888c1dd-9d57-4fdd-ba3c-8375e9b45073)
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Acquired lock on host id 1
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Cleaning the metadata block!
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
[root@alma03 ~]# 

So to summarize, the command worked only after shutting down agent, but leaving broker, and there is no warning anywhere about this, then I still had to cast "hosted-engine --clean-metadata --force-clean", without "--force-clean" command failed. Here goes out-print from second host and the metadata is cleared on it now.

[root@alma04 ~]# hosted-engine --vm-status


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : alma04.qa.lab.tlv.redhat.com
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 6813f0fb
Host timestamp                     : 59836
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=59836 (Tue Jun 28 11:18:27 2016)
        host-id=2
        score=3400
        maintenance=False
        state=EngineUp
        stopped=False
[root@alma04 ~]#

Comment 5 Yedidyah Bar David 2016-06-28 08:40:00 UTC
I don't know well the details of this functionality. I'll let Martin handle this.

I'll just note that in the past (3.6 I think), this worked for me:

1. Remove host1 from engine
2. Shutdown host1 (presumably for reprovisioning)
3. On host2:
systemctl stop ovirt-ha-agent.service
hosted-engine --clean-metadata --host-id=1

Didn't have to use force, did have to have broker up.

IIRC at some point Martin said that shutting down the agent should not be needed, but iirc this never worked for me without shutting it down first.

Comment 7 Martin Sivák 2016-06-28 13:27:44 UTC
The broker has to be up for --clean-metadata to work properly. The agent from the host you want to clean has to be stopped cleanly (service stop), --force must be used if it was not.

The agent on the host from where you perform the clean has to be down. It can be the same host you are removing or another host, but then the --host-id= argument is required.

Comment 8 Nikolai Sednev 2016-06-28 13:39:02 UTC
(In reply to Martin Sivák from comment #7)
> The broker has to be up for --clean-metadata to work properly. The agent
> from the host you want to clean has to be stopped cleanly (service stop),
> --force must be used if it was not.
> 
> The agent on the host from where you perform the clean has to be down. It
> can be the same host you are removing or another host, but then the
> --host-id= argument is required.

Can you provide the documentation link for this sequence? Is this documented? I don't see this requirement in CLI help for HE, aka hosted-engine --help.

Comment 9 Martin Sivák 2016-06-30 09:10:19 UTC
> Can you provide the documentation link for this sequence? Is this
> documented? I don't see this requirement in CLI help for HE, aka
> hosted-engine --help.

And did you try hosted-engine --clean-lockspace --help as the main help and man page is telling you?

"For additional information about a specific command try: hosted-engine <command> --help"

This is then printed out:

Usage: $0 --clean_metadata [--force-cleanup] [--host-id=<id>]
    Remove host's metadata from the global status database.
    Available only in properly deployed cluster with properly stopped
    agent.

    --force-cleanup  This option overrides the safety checks. Use at your own
                     risk DANGEROUS.

    --host-id=<id>  Specify an explicit host id to clean

Comment 10 Nikolai Sednev 2016-06-30 10:39:44 UTC
(In reply to Martin Sivák from comment #9)
> > Can you provide the documentation link for this sequence? Is this
> > documented? I don't see this requirement in CLI help for HE, aka
> > hosted-engine --help.
> 
> And did you try hosted-engine --clean-lockspace --help as the main help and
> man page is telling you?
> 
> "For additional information about a specific command try: hosted-engine
> <command> --help"
> 
> This is then printed out:
> 
> Usage: $0 --clean_metadata [--force-cleanup] [--host-id=<id>]
>     Remove host's metadata from the global status database.
>     Available only in properly deployed cluster with properly stopped
>     agent.
> 
>     --force-cleanup  This option overrides the safety checks. Use at your own
>                      risk DANGEROUS.
> 
>     --host-id=<id>  Specify an explicit host id to clean

Correct, but it does not mentions anything regarding shutting down processes i.e. for this command to work properly you must turn off ovirt-ha-agent service, but make sure that ovirt-ha-broker is still running. I was mentioning this kind of help message and not a generic help message about usage of the commands and their syntax.

[root@alma04 ~]#  hosted-engine --clean-metadata --help
Usage: /usr/sbin/hosted-engine --clean_metadata [--force-cleanup] [--host-id=<id>]
    Remove host's metadata from the global status database.
    Available only in properly deployed cluster with properly stopped
    agent.

    --force-cleanup  This option overrides the safety checks. Use at your own
                     risk DANGEROUS.

    --host-id=<id>  Specify an explicit host id to clean
[root@alma04 ~]#  hosted-engine --clean-metadata --force-cleanup --help
Usage: ovirt-ha-agent [options]

Options:
  --version          show program's version number and exit
  -h, --help         show this help message and exit
  --no-daemon        don't start as a daemon
  --pdb              start pdb in case of crash
  --cleanup          purge the metadata block
  --force-cleanup    purge the metadata blockeven when not clean
  --host-id=HOST_ID  override the host id

Comment 11 Nikolai Sednev 2016-06-30 10:41:05 UTC
And also I've asked about this to be documented in Red Hat official documentation and not only in hosted-engine help in CLI.

Comment 12 Martin Sivák 2016-06-30 11:20:30 UTC
> Correct, but it does not mentions anything regarding shutting down processes 
> i.e. for this command to work properly you must turn off ovirt-ha-agent service

But the help does indeed say this:

Available only in properly deployed cluster with __properly_stopped_agent__.


> And also I've asked about this to be documented in Red Hat official
> documentation and not only in hosted-engine help in CLI.

For that you need to file a proper documentation bug and yes, we should have that.

Comment 13 Yaniv Lavi 2016-12-14 16:22:17 UTC
This bug had requires_doc_text flag, yet no documentation text was provided. Please add the documentation text and only then set this flag.

Comment 14 Doron Fediuck 2017-07-26 08:45:27 UTC
We won't be perusing this issue any more. If someone needs it, patches are welcomed.


Note You need to log in before you can comment on or make changes to this bug.