Bug 1629454
| Summary: | Host stuck in preparing to maintenance after agent restart | ||
|---|---|---|---|
| Product: | [oVirt] ovirt-hosted-engine-ha | Reporter: | Liran Rotenberg <lrotenbe> |
| Component: | Agent | Assignee: | Simone Tiraboschi <stirabos> |
| Status: | CLOSED DUPLICATE | QA Contact: | Liran Rotenberg <lrotenbe> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 2.2.14 | CC: | bugs, lrotenbe, michal.skrivanek |
| Target Milestone: | ovirt-4.3.2 | Keywords: | Automation |
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.3+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-02-20 08:49:57 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Liran Rotenberg
2018-09-16 13:28:06 UTC
2018-09-06 01:33:11,457 host 2 triggered a EngineUp-LocalMaintenanceMigrateVm but it didn't migrated.
MainThread::INFO::2018-09-06 01:33:00,909::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 3400)
MainThread::INFO::2018-09-06 01:33:10,935::state_decorators::128::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Local maintenance detected
MainThread::INFO::2018-09-06 01:33:11,457::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineUp-LocalMaintenanceMigrateVm) sent? sent
MainThread::WARNING::2018-09-06 01:33:11,879::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor_if_possible) The VM is running locally or we have no data, keeping the domain monitor.
MainThread::INFO::2018-09-06 01:33:11,880::states::243::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to local maintenance mode
MainThread::INFO::2018-09-06 01:33:11,880::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state LocalMaintenanceMigrateVm (score: 0)
MainThread::INFO::2018-09-06 01:33:11,981::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (LocalMaintenanceMigrateVm-ReinitializeFSM) sent? sent
MainThread::WARNING::2018-09-06 01:33:11,982::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor_if_possible) The VM is running locally or we have no data, keeping the domain monitor.
MainThread::INFO::2018-09-06 01:33:11,982::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state ReinitializeFSM (score: 0)
MainThread::INFO::2018-09-06 01:33:22,007::state_decorators::128::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Local maintenance detected
MainThread::INFO::2018-09-06 01:33:22,191::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (ReinitializeFSM-LocalMaintenance) sent? sent
MainThread::WARNING::2018-09-06 01:33:22,596::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor_if_possible) The VM is running locally or we have no data, keeping the domain monitor.
MainThread::INFO::2018-09-06 01:33:22,597::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state LocalMaintenance (score: 0)
MainThread::INFO::2018-09-06 01:33:31,619::state_decorators::128::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Local maintenance detected
MainThread::WARNING::2018-09-06 01:33:32,044::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor_if_possible) The VM is running locally or we have no data, keeping the domain monitor.
MainThread::INFO::2018-09-06 01:33:32,045::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state LocalMaintenance (score: 0)
MainThread::INFO::2018-09-06 01:33:42,072::state_decorators::128::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Local maintenance detected
MainThread::WARNING::2018-09-06 01:33:42,505::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor_if_possible) The VM is running locally or we have no data, keeping the domain monitor.
MainThread::INFO::2018-09-06 01:33:42,505::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state LocalMaintenance (score: 0)
MainThread::INFO::2018-09-06 01:33:51,527::state_decorators::128::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Local maintenance detected
MainThread::WARNING::2018-09-06 01:33:51,953::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor_if_possible) The VM is running locally or we have no data, keeping the domain monitor.
MainThread::INFO::2018-09-06 01:33:51,954::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state LocalMaintenance (score: 0)
MainThread::INFO::2018-09-06 01:34:01,980::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {'maintenance': False}
MainThread::INFO::2018-09-06 01:34:01,981::state_machine::174::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Host lynx01.lab.eng.tlv2.redhat.com (id 1): {'conf_on_shared_storage': True, 'extra': 'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=136 (Thu Sep 6 01:33:45 2018)\nhost-id=1\nscore=0\nvm_conf_refresh_time=138 (Thu Sep 6 01:33:46 2018)\nconf_on_shared_storage=True\nmaintenance=False\nstate=ReinitializeFSM\nstopped=False\n', 'hostname': 'lynx01.lab.eng.tlv2.redhat.com', 'alive': True, 'host-id': 1, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 0, 'stopped': False, 'maintenance': False, 'crc32': '540785a1', 'local_conf_timestamp': 138, 'host-ts': 136}
MainThread::INFO::2018-09-06 01:34:01,981::state_machine::174::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Host lynx03.lab.eng.tlv2.redhat.com (id 3): {'conf_on_shared_storage': True, 'extra': 'metadata_parse_version=1\nmetadata_feature_version=1\ntimestamp=60183 (Wed Sep 5 17:00:37 2018)\nhost-id=3\nscore=0\nvm_conf_refresh_time=60181 (Wed Sep 5 17:00:35 2018)\nconf_on_shared_storage=True\nmaintenance=False\nstate=AgentStopped\nstopped=True\n', 'hostname': 'lynx03.lab.eng.tlv2.redhat.com', 'alive': False, 'host-id': 3, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 0, 'stopped': True, 'maintenance': False, 'crc32': 'c7049afc', 'local_conf_timestamp': 60181, 'host-ts': 60183}
MainThread::INFO::2018-09-06 01:34:01,981::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 2): {'engine-health': {'health': 'good', 'vm': 'up', 'detail': 'Up'}, 'bridge': True, 'mem-free': 17186.0, 'maintenance': True, 'cpu-load': 0.0489, 'gateway': 1.0, 'storage-domain': True}
The issue is that we had a MainThread::INFO::2018-09-06 01:33:11,880::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state LocalMaintenanceMigrateVm (score: 0) MainThread::INFO::2018-09-06 01:33:11,981::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (LocalMaintenanceMigrateVm-ReinitializeFSM) sent? sent instead of a MainThread::INFO::2018-09-05 16:12:25,904::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state LocalMaintenanceMigrateVm (score: 0) MainThread::INFO::2018-09-05 16:12:26,015::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (LocalMaintenanceMigrateVm-EngineMigratingAway) sent? sent as expected. seems related to HE agent only, reassigning Are you able to rreproduce this one with latest build? Postponing to 4.3.0 not being identified as blocker for 4.2.8 I tried to reproduce it but encounter a new bug: https://bugzilla.redhat.com/show_bug.cgi?id=1665934 IMO, this bug exists, with a low or even a very low reproduce rate. Moving to 4.3.2 not being identified as blocker for 4.3.1. Marking as duplicate of 1665934, probably same root cause. *** This bug has been marked as a duplicate of bug 1665934 *** |