Bug 1479823
Summary: | SHE HA agent could log better when HE VM dies | ||
---|---|---|---|
Product: | [oVirt] ovirt-hosted-engine-ha | Reporter: | Jiri Belka <jbelka> |
Component: | Agent | Assignee: | bugs <bugs> |
Status: | CLOSED NOTABUG | QA Contact: | meital avital <mavital> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.1.0.6 | CC: | bugs |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-09 13:56:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jiri Belka
2017-08-09 13:21:23 UTC
See that for starting HE VM on other node log message is quite OK: MainThread::INFO::2017-08-08 09:04:54,337::states::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3000), attempting to start engine VM TLDR: Hosted engine deals with cluster-wide status and so it takes its time before deciding on what happened. Realtime status of VM is monitored by qemu, libvirt and vdsm. > 09:02:12,969::states::402::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm may be running on another host > > ^^ is this to be first human readable event informing HE VM is down? No, the qemu log already told you (as did libvirt and vdsm). And so did journalctl I hope (as it aggregates the logs). Hosted engine deals with cluster wide status. It also only checks couple of times per minute and logs the state machine decisions. So here it told you it needs to check whether the VM is running elsewhere before doing anything else. > 09:04:24,977::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUnexpectedlyDown (score: 0) > MainThread::INFO::2017-08-08 09:04:28,923::states::663::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down, local host does not have best score This is the final decision. The transition took 30 seconds due to storage checks, but we just fixed that. It is configured as NOWAIT in the code. We can't tell if the VM crashed or if the user shut it down using hosted-engine --vm-stop, vds-cli or virsh. We actually do not care here... the job is to make sure the engine is up somewhere. > I'd love to see some better wording for first discover that HE VM died and Aren't qemu/libvirt/vdsm logs enough? We might add a sentence somewhere, but journal already has the same info at least three times. > maybe better log level that INFO? INFO is the appropriate level from agent's perspective. WARN / ERROR will be present in the relevant logs for the VM (vdsm, libvirt). We could maybe add a WARN level, but we will be outputting warning during almost every migration or manual shutdown, because it will look the same as a crash (until we get access to lifecycle events). So you would be getting: - WARN engine died! - oh wait, no, ignore what I just said. It just properly migrated faster than we expected We do not need so much redundancy in logging, what we need is common structured logging using syslog or journal to be able to correlate information. |