Description of problem: ovirt-ansible-hosted-engine-setup is not clearing a leftover local maintenance mode from a previous deployment on the same host. local maintenance mode is just a flag on /var/lib/ovirt-hosted-engine-ha/ha.conf; nothing is probably clearing it trying a second deployment on an host that has been set in local maintenance mode in the past. Version-Release number of selected component (if applicable): 1.0.15 How reproducible: 100% Steps to Reproduce: 1. deploy HE on a single host 2. set the host in local maintenance mode (or in maintenance mode from the engine) 3. clean the host with ovirt-hosted-engine-cleanup 4. clean the shared storage 5. try a new deployment Actual results: the deployment fails with: 2019-04-04 15:18:57,572+0530 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Start ovirt-ha-agent service on the host] 2019-04-04 15:18:58,675+0530 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 changed: [localhost] 2019-04-04 15:18:59,377+0530 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Check engine VM health] 2019-04-04 15:29:44,689+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': [u'hosted-engine', u'--vm-status', u'--json'], u'end': u'2019-04-04 15:29:44.534880', u'_ansible_no_log': False, u'stdout': u'{"1": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=176468 (Thu Apr 4 15:29:42 2019)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=176468 (Thu Apr 4 15:29:43 2019)\\nconf_on_shared_storage=True\\nmaintenance=True\\nstate=LocalMaintenance\\nstopped=False\\n", "hostname": "rhsqa-grafton1.lab.eng.blr.redhat.com", "host-id": 1, "engine-status": {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}, "score": 0, "stopped": false, "maintenance": true, "crc32": "6c111ec8", "local_conf_timestamp": 176468, "host-ts": 176468}, "global_maintenance": false}', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'hosted-engine --vm-status --json', u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2019-04-04 15:29:44.331736', u'attempts': 120, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.203144', u'stdout_lines': [u'{"1": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=176468 (Thu Apr 4 15:29:42 2019)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=176468 (Thu Apr 4 15:29:43 2019)\\nconf_on_shared_storage=True\\nmaintenance=True\\nstate=LocalMaintenance\\nstopped=False\\n", "hostname": "rhsqa-grafton1.lab.eng.blr.redhat.com", "host-id": 1, "engine-status": {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}, "score": 0, "stopped": false, "maintenance": true, "crc32": "6c111ec8", "local_conf_timestamp": 176468, "host-ts": 176468}, "global_maintenance": false}']} 2019-04-04 15:29:44,791+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.203144", "end": "2019-04-04 15:29:44.534880", "rc": 0, "start": "2019-04-04 15:29:44.331736", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=176468 (Thu Apr 4 15:29:42 2019)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=176468 (Thu Apr 4 15:29:43 2019)\\nconf_on_shared_storage=True\\nmaintenance=True\\nstate=LocalMaintenance\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton1.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"vm not running on this host\", \"health\": \"bad\", \"vm\": \"down\", \"detail\": \"unknown\"}, \"score\": 0, \"stopped\": false, \"maintenance\": true, \"crc32\": \"6c111ec8\", \"local_conf_timestamp\": 176468, \"host-ts\": 176468}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=176468 (Thu Apr 4 15:29:42 2019)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=176468 (Thu Apr 4 15:29:43 2019)\\nconf_on_shared_storage=True\\nmaintenance=True\\nstate=LocalMaintenance\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton1.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"vm not running on this host\", \"health\": \"bad\", \"vm\": \"down\", \"detail\": \"unknown\"}, \"score\": 0, \"stopped\": false, \"maintenance\": true, \"crc32\": \"6c111ec8\", \"local_conf_timestamp\": 176468, \"host-ts\": 176468}, \"global_maintenance\": false}"]} 2019-04-04 15:29:45,393+0530 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Check VM status at virt level] 2019-04-04 15:29:46,195+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u'virsh -r list | grep HostedEngine | grep running', u'end': u'2019-04-04 15:29:46.063014', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'start': u'2019-04-04 15:29:45.992117', u'delta': u'0:00:00.070897', u'stderr': u'', u'rc': 1, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u'virsh -r list | grep HostedEngine | grep running', u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'msg': u'non-zero return code'} 2019-04-04 15:29:46,295+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 ignored: [localhost]: FAILED! => {"changed": true, "cmd": "virsh -r list | grep HostedEngine | grep running", "delta": "0:00:00.070897", "end": "2019-04-04 15:29:46.063014", "msg": "non-zero return code", "rc": 1, "start": "2019-04-04 15:29:45.992117", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} 2019-04-04 15:29:46,896+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 TASK [ovirt.hosted_engine_setup : debug] 2019-04-04 15:29:47,498+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 vm_status_virsh: {'stderr_lines': [], u'changed': True, u'end': u'2019-04-04 15:29:46.063014', u'stdout': u'', u'cmd': u'virsh -r list | grep HostedEngine | grep running', u'failed': True, u'delta': u'0:00:00.070897', u'stderr': u'', u'rc': 1, u'msg': u'non-zero return code', 'stdout_lines': [], u'start': u'2019-04-04 15:29:45.992117'} 2019-04-04 15:29:48,099+0530 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Fail if engine VM is not running] 2019-04-04 15:29:48,700+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'msg': u'Engine VM is not running, please check vdsm logs', u'changed': False, u'_ansible_no_log': False} 2019-04-04 15:29:48,800+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM is not running, please check vdsm logs"} 2019-04-04 15:29:49,301+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 PLAY RECAP [localhost] : ok: 116 changed: 44 unreachable: 0 skipped: 12 failed: 2 in /var/log/ovirt-hosted-engine-ha/agent.log MainThread::INFO::2019-04-04 15:18:58,633::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.1 started MainThread::INFO::2019-04-04 15:18:58,663::hosted_engine::244::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: rhsqa-grafton1.lab.eng.blr.redhat.com MainThread::INFO::2019-04-04 15:18:58,767::hosted_engine::524::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2019-04-04 15:18:58,768::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '10.70.37.254'} MainThread::INFO::2019-04-04 15:18:58,770::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id ping MainThread::INFO::2019-04-04 15:18:58,770::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'} MainThread::INFO::2019-04-04 15:18:58,774::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id mgmt-bridge MainThread::INFO::2019-04-04 15:18:58,774::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} MainThread::INFO::2019-04-04 15:18:58,776::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id mem-free MainThread::INFO::2019-04-04 15:18:58,776::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '45318db2-6385-4999-a386-a7589d0685d9', 'address': '0'} MainThread::INFO::2019-04-04 15:18:58,777::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id cpu-load-no-engine MainThread::INFO::2019-04-04 15:18:58,777::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '45318db2-6385-4999-a386-a7589d0685d9', 'address': '0'} MainThread::INFO::2019-04-04 15:18:58,779::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id engine-health MainThread::INFO::2019-04-04 15:18:58,779::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor storage-domain, options {'sd_uuid': '4d7cd59c-2621-4460-9087-5375c0d64075'} MainThread::INFO::2019-04-04 15:18:58,781::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id storage-domain MainThread::INFO::2019-04-04 15:18:58,781::hosted_engine::546::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started MainThread::INFO::2019-04-04 15:19:20,046::upgrade::979::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Host configuration is already up-to-date MainThread::INFO::2019-04-04 15:19:20,205::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:1a21814e-8e51-4798-9ef2-8e018f397960, volUUID:3f231cdf-39fe-4ae4-b7c7-00870aaf02d8 MainThread::INFO::2019-04-04 15:19:20,450::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:d786c8f3-eaf5-498d-b596-2fe5b4556425, volUUID:22b5116c-85c3-4dc8-a9dd-6fcdd42df080 MainThread::INFO::2019-04-04 15:19:20,712::ovf_store::151::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4d7cd59c-2621-4460-9087-5375c0d64075/d786c8f3-eaf5-498d-b596-2fe5b4556425/22b5116c-85c3-4dc8-a9dd-6fcdd42df080 MainThread::INFO::2019-04-04 15:19:20,729::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {} MainThread::INFO::2019-04-04 15:19:20,729::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'bridge': True, 'mem-free': 249172.0, 'maintenance': True, 'cpu-load': 0.0103, 'gateway': 1.0, 'storage-domain': True} MainThread::INFO::2019-04-04 15:19:20,897::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (StartState-ReinitializeFSM) sent? sent MainThread::INFO::2019-04-04 15:19:21,038::hosted_engine::615::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Stopped VDSM domain monitor MainThread::INFO::2019-04-04 15:19:21,038::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state ReinitializeFSM (score: 0) MainThread::INFO::2019-04-04 15:19:31,048::state_decorators::128::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Local maintenance detected MainThread::INFO::2019-04-04 15:19:31,079::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-LocalMaintenance) sent? sent MainThread::INFO::2019-04-04 15:19:31,208::hosted_engine::615::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Stopped VDSM domain monitor MainThread::INFO::2019-04-04 15:19:31,209::states::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to local maintenance mode MainThread::INFO::2019-04-04 15:19:31,209::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state LocalMaintenance (score: 0) Expected results: ovirt-ha-agent correctly starts the engine VM Additional info:
Setting severity to low since it happens only on redeployment on dirty hosts.
Works for me on these components: ovirt-hosted-engine-setup-2.3.7-1.el7ev.noarch ovirt-hosted-engine-ha-2.3.1-1.el7ev.noarch Red Hat Enterprise Linux Server release 7.6 (Maipo) Linux 3.10.0-957.10.1.el7.x86_64 #1 SMP Thu Feb 7 07:12:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Moving to verified.
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.