Bug 1696229 - ovirt-ansible-hosted-engine-setup is not clearing a leftover local maintenance mode from a previous deployment on the same host
Summary: ovirt-ansible-hosted-engine-setup is not clearing a leftover local maintenanc...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-ansible-collection
Classification: oVirt
Component: hosted-engine-setup
Version: 1.1.10
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ovirt-4.3.3
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
Tahlia Richardson
URL:
Whiteboard:
Depends On:
Blocks: 1696228
TreeView+ depends on / blocked
 
Reported: 2019-04-04 11:35 UTC by Simone Tiraboschi
Modified: 2019-04-16 13:58 UTC (History)
1 user (show)

Fixed In Version: ovirt-ansible-hosted-engine-setup-1.0.16
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-16 13:58:23 UTC
oVirt Team: Integration
Embargoed:
sbonazzo: ovirt-4.3?
sbonazzo: planning_ack?
sbonazzo: devel_ack+
sbonazzo: testing_ack?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-ansible-hosted-engine-setup pull 168 0 'None' closed Ensure we are out of maintenance mode 2020-10-31 13:00:13 UTC

Description Simone Tiraboschi 2019-04-04 11:35:35 UTC
Description of problem:
ovirt-ansible-hosted-engine-setup is not clearing a leftover local maintenance mode from a previous deployment on the same host.

local maintenance mode is just a flag on /var/lib/ovirt-hosted-engine-ha/ha.conf; nothing is probably clearing it trying a second deployment on an host that has been set in local maintenance mode in the past.

Version-Release number of selected component (if applicable):
1.0.15

How reproducible:
100%

Steps to Reproduce:
1. deploy HE on a single host
2. set the host in local maintenance mode (or in maintenance mode from the engine)
3. clean the host with ovirt-hosted-engine-cleanup
4. clean the shared storage
5. try a new deployment

Actual results:
the deployment fails with:
2019-04-04 15:18:57,572+0530 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Start ovirt-ha-agent service on the host]
2019-04-04 15:18:58,675+0530 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 changed: [localhost]
2019-04-04 15:18:59,377+0530 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Check engine VM health]
2019-04-04 15:29:44,689+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': [u'hosted-engine', u'--vm-status', u'--json'], u'end': u'2019-04-04 15:29:44.534880', u'_ansible_no_log': False, u'stdout': u'{"1": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=176468 (Thu Apr  4 15:29:42 2019)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=176468 (Thu Apr  4 15:29:43 2019)\\nconf_on_shared_storage=True\\nmaintenance=True\\nstate=LocalMaintenance\\nstopped=False\\n", "hostname": "rhsqa-grafton1.lab.eng.blr.redhat.com", "host-id": 1, "engine-status": {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}, "score": 0, "stopped": false, "maintenance": true, "crc32": "6c111ec8", "local_conf_timestamp": 176468, "host-ts": 176468}, "global_maintenance": false}', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'hosted-engine --vm-status --json', u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2019-04-04 15:29:44.331736', u'attempts': 120, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.203144', u'stdout_lines': [u'{"1": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=176468 (Thu Apr  4 15:29:42 2019)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=176468 (Thu Apr  4 15:29:43 2019)\\nconf_on_shared_storage=True\\nmaintenance=True\\nstate=LocalMaintenance\\nstopped=False\\n", "hostname": "rhsqa-grafton1.lab.eng.blr.redhat.com", "host-id": 1, "engine-status": {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}, "score": 0, "stopped": false, "maintenance": true, "crc32": "6c111ec8", "local_conf_timestamp": 176468, "host-ts": 176468}, "global_maintenance": false}']}
2019-04-04 15:29:44,791+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.203144", "end": "2019-04-04 15:29:44.534880", "rc": 0, "start": "2019-04-04 15:29:44.331736", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=176468 (Thu Apr  4 15:29:42 2019)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=176468 (Thu Apr  4 15:29:43 2019)\\nconf_on_shared_storage=True\\nmaintenance=True\\nstate=LocalMaintenance\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton1.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"vm not running on this host\", \"health\": \"bad\", \"vm\": \"down\", \"detail\": \"unknown\"}, \"score\": 0, \"stopped\": false, \"maintenance\": true, \"crc32\": \"6c111ec8\", \"local_conf_timestamp\": 176468, \"host-ts\": 176468}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=176468 (Thu Apr  4 15:29:42 2019)\\nhost-id=1\\nscore=0\\nvm_conf_refresh_time=176468 (Thu Apr  4 15:29:43 2019)\\nconf_on_shared_storage=True\\nmaintenance=True\\nstate=LocalMaintenance\\nstopped=False\\n\", \"hostname\": \"rhsqa-grafton1.lab.eng.blr.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"vm not running on this host\", \"health\": \"bad\", \"vm\": \"down\", \"detail\": \"unknown\"}, \"score\": 0, \"stopped\": false, \"maintenance\": true, \"crc32\": \"6c111ec8\", \"local_conf_timestamp\": 176468, \"host-ts\": 176468}, \"global_maintenance\": false}"]}
2019-04-04 15:29:45,393+0530 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Check VM status at virt level]
2019-04-04 15:29:46,195+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': u'virsh -r list | grep HostedEngine | grep running', u'end': u'2019-04-04 15:29:46.063014', u'_ansible_no_log': False, u'stdout': u'', u'changed': True, u'start': u'2019-04-04 15:29:45.992117', u'delta': u'0:00:00.070897', u'stderr': u'', u'rc': 1, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u'virsh -r list | grep HostedEngine | grep running', u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'msg': u'non-zero return code'}
2019-04-04 15:29:46,295+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 ignored: [localhost]: FAILED! => {"changed": true, "cmd": "virsh -r list | grep HostedEngine | grep running", "delta": "0:00:00.070897", "end": "2019-04-04 15:29:46.063014", "msg": "non-zero return code", "rc": 1, "start": "2019-04-04 15:29:45.992117", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2019-04-04 15:29:46,896+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 TASK [ovirt.hosted_engine_setup : debug]
2019-04-04 15:29:47,498+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 vm_status_virsh: {'stderr_lines': [], u'changed': True, u'end': u'2019-04-04 15:29:46.063014', u'stdout': u'', u'cmd': u'virsh -r list | grep HostedEngine | grep running', u'failed': True, u'delta': u'0:00:00.070897', u'stderr': u'', u'rc': 1, u'msg': u'non-zero return code', 'stdout_lines': [], u'start': u'2019-04-04 15:29:45.992117'}
2019-04-04 15:29:48,099+0530 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Fail if engine VM is not running]
2019-04-04 15:29:48,700+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {u'msg': u'Engine VM is not running, please check vdsm logs', u'changed': False, u'_ansible_no_log': False}
2019-04-04 15:29:48,800+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM is not running, please check vdsm logs"}
2019-04-04 15:29:49,301+0530 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 PLAY RECAP [localhost] : ok: 116 changed: 44 unreachable: 0 skipped: 12 failed: 2


in /var/log/ovirt-hosted-engine-ha/agent.log
MainThread::INFO::2019-04-04 15:18:58,633::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.1 started
MainThread::INFO::2019-04-04 15:18:58,663::hosted_engine::244::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: rhsqa-grafton1.lab.eng.blr.redhat.com
MainThread::INFO::2019-04-04 15:18:58,767::hosted_engine::524::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2019-04-04 15:18:58,768::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '10.70.37.254'}
MainThread::INFO::2019-04-04 15:18:58,770::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id ping
MainThread::INFO::2019-04-04 15:18:58,770::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 'ovirtmgmt', 'address': '0'}
MainThread::INFO::2019-04-04 15:18:58,774::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id mgmt-bridge
MainThread::INFO::2019-04-04 15:18:58,774::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
MainThread::INFO::2019-04-04 15:18:58,776::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id mem-free
MainThread::INFO::2019-04-04 15:18:58,776::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': '45318db2-6385-4999-a386-a7589d0685d9', 'address': '0'}
MainThread::INFO::2019-04-04 15:18:58,777::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id cpu-load-no-engine
MainThread::INFO::2019-04-04 15:18:58,777::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': '45318db2-6385-4999-a386-a7589d0685d9', 'address': '0'}
MainThread::INFO::2019-04-04 15:18:58,779::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id engine-health
MainThread::INFO::2019-04-04 15:18:58,779::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor storage-domain, options {'sd_uuid': '4d7cd59c-2621-4460-9087-5375c0d64075'}
MainThread::INFO::2019-04-04 15:18:58,781::brokerlink::89::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Success, id storage-domain
MainThread::INFO::2019-04-04 15:18:58,781::hosted_engine::546::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Broker initialized, all submonitors started
MainThread::INFO::2019-04-04 15:19:20,046::upgrade::979::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Host configuration is already up-to-date
MainThread::INFO::2019-04-04 15:19:20,205::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:1a21814e-8e51-4798-9ef2-8e018f397960, volUUID:3f231cdf-39fe-4ae4-b7c7-00870aaf02d8
MainThread::INFO::2019-04-04 15:19:20,450::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:d786c8f3-eaf5-498d-b596-2fe5b4556425, volUUID:22b5116c-85c3-4dc8-a9dd-6fcdd42df080
MainThread::INFO::2019-04-04 15:19:20,712::ovf_store::151::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /var/run/vdsm/storage/4d7cd59c-2621-4460-9087-5375c0d64075/d786c8f3-eaf5-498d-b596-2fe5b4556425/22b5116c-85c3-4dc8-a9dd-6fcdd42df080 
MainThread::INFO::2019-04-04 15:19:20,729::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {}
MainThread::INFO::2019-04-04 15:19:20,729::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'bridge': True, 'mem-free': 249172.0, 'maintenance': True, 'cpu-load': 0.0103, 'gateway': 1.0, 'storage-domain': True}
MainThread::INFO::2019-04-04 15:19:20,897::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (StartState-ReinitializeFSM) sent? sent
MainThread::INFO::2019-04-04 15:19:21,038::hosted_engine::615::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Stopped VDSM domain monitor
MainThread::INFO::2019-04-04 15:19:21,038::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state ReinitializeFSM (score: 0)
MainThread::INFO::2019-04-04 15:19:31,048::state_decorators::128::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Local maintenance detected
MainThread::INFO::2019-04-04 15:19:31,079::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-LocalMaintenance) sent? sent
MainThread::INFO::2019-04-04 15:19:31,208::hosted_engine::615::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Stopped VDSM domain monitor
MainThread::INFO::2019-04-04 15:19:31,209::states::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to local maintenance mode
MainThread::INFO::2019-04-04 15:19:31,209::hosted_engine::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state LocalMaintenance (score: 0)


Expected results:
ovirt-ha-agent correctly starts the engine VM

Additional info:

Comment 1 Simone Tiraboschi 2019-04-04 11:36:09 UTC
Setting severity to low since it happens only on redeployment on dirty hosts.

Comment 2 Nikolai Sednev 2019-04-14 08:34:32 UTC
Works for me on these components:
ovirt-hosted-engine-setup-2.3.7-1.el7ev.noarch
ovirt-hosted-engine-ha-2.3.1-1.el7ev.noarch
Red Hat Enterprise Linux Server release 7.6 (Maipo)
Linux 3.10.0-957.10.1.el7.x86_64 #1 SMP Thu Feb 7 07:12:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Moving to verified.

Comment 3 Sandro Bonazzola 2019-04-16 13:58:23 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.