Bug 1467063
Summary: | Destination host's score being penalized for 50 points due to 1 engine vm retry attempts during normal HE-VM's migration to SPM host. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Nikolai Sednev <nsednev> | ||||||||
Component: | BLL.HostedEngine | Assignee: | Yanir Quinn <yquinn> | ||||||||
Status: | CLOSED NOTABUG | QA Contact: | Nikolai Sednev <nsednev> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4.1.3.5 | CC: | bugs, dfediuck, mgoldboi, msivak | ||||||||
Target Milestone: | ovirt-4.1.5 | Flags: | rule-engine:
ovirt-4.1?
dfediuck: ovirt-4.2? mgoldboi: planning_ack+ dfediuck: devel_ack+ nsednev: testing_ack? |
||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1478848 (view as bug list) | Environment: | |||||||||
Last Closed: | 2017-08-06 10:25:21 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1478848 | ||||||||||
Attachments: |
|
Description
Nikolai Sednev
2017-07-02 11:15:40 UTC
Forgot to mention, that score being raised back to normal on destination SPM host after some time. Created attachment 1293603 [details]
sosreport from the engine
Created attachment 1293604 [details]
sosreport from host1 (the SPM host)
Created attachment 1293605 [details]
sosreport from host2
Screencast is available from here: https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88SVBPdk16TWRTanc/view?usp=sharing Seems like a temporary sync delay/issue (maybe due to more heavy duty tasks of the SPM host) So the explanation can be that the score is decreased by 50 after failing to start the HE VM and after getting the lock it starts and migration is eventually successful. see in the spm host log : MainThread::INFO::2017-07-02 13:51:23,880::hosted_engine::1119::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Starting vm using `/usr/sbin/hosted-engine --vm-start` MainThread::INFO::2017-07-02 13:51:29,005::hosted_engine::1125::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stdout: MainThread::INFO::2017-07-02 13:51:29,006::hosted_engine::1126::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stderr: Virtual machine does not exist Virtual machine already exists MainThread::INFO::2017-07-02 13:51:29,006::hosted_engine::1148::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Failed to start engine VM: 'Virtual machine does not exist Virtual machine already exists '. Please check the vdsm logs. The possible reason: the engine has been already started on a different host so this one has failed to acquire the lock and it will sync in a while. For more information please visit: http://www.ovirt.org/Hosted_Engine_Howto#EngineUnexpectedlyDown MainThread::INFO::2017-07-02 13:51:29,010::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1498992689.01 type=state_transition detail=EngineStart-EngineDown hostname='puma18.scl.lab.tlv.redhat.com' (In reply to Yanir Quinn from comment #6) > Seems like a temporary sync delay/issue (maybe due to more heavy duty tasks > of the SPM host) > So the explanation can be that the score is decreased by 50 after failing to > start the HE VM and after getting the lock it starts and migration is > eventually successful. > > see in the spm host log : > > MainThread::INFO::2017-07-02 > 13:51:23,880::hosted_engine::1119::ovirt_hosted_engine_ha.agent. > hosted_engine.HostedEngine::(_start_engine_vm) Starting vm using > `/usr/sbin/hosted-engine --vm-start` > MainThread::INFO::2017-07-02 > 13:51:29,005::hosted_engine::1125::ovirt_hosted_engine_ha.agent. > hosted_engine.HostedEngine::(_start_engine_vm) stdout: > MainThread::INFO::2017-07-02 > 13:51:29,006::hosted_engine::1126::ovirt_hosted_engine_ha.agent. > hosted_engine.HostedEngine::(_start_engine_vm) stderr: Virtual machine does > not exist > Virtual machine already exists > > MainThread::INFO::2017-07-02 > 13:51:29,006::hosted_engine::1148::ovirt_hosted_engine_ha.agent. > hosted_engine.HostedEngine::(_start_engine_vm) Failed to start engine VM: > 'Virtual machine does not exist > Virtual machine already exists > '. Please check the vdsm logs. The possible reason: the engine has been > already started on a different host so this one has failed to acquire the > lock and it will sync in a while. For more information please visit: > http://www.ovirt.org/Hosted_Engine_Howto#EngineUnexpectedlyDown > MainThread::INFO::2017-07-02 > 13:51:29,010::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink. > BrokerLink::(notify) Trying: notify time=1498992689.01 type=state_transition > detail=EngineStart-EngineDown hostname='puma18.scl.lab.tlv.redhat.com' Which heavy tasks? Pair of hosts with single SHE-VM... Engine was not doing any load/performance or stress tests, only very basic migration from SPM to none-SPM and then back. This is not a new feature. Not properly documented maybe.. but it was part of the code since the beginning. See this file from version 1.0.0 (Mar 2014): https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_hosted_engine_ha/agent/hosted_engine.py;h=8ab210808780c7289a33135083a1ea2cb609039f;hb=85fde3305ea11ebd367f63dfed7911ffcd265d74#l594 Is this on track for 4.1.5? The 50 points penalty is by design (and will be better documented). For now I'm closing this issue. If there's a specific issue around the SPM please open a specific bz with the information on starting a VM there which may be a real issue. |