Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1207634

Summary: HE VM not powered up on second host | ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown
Product: Red Hat Enterprise Virtualization Manager Reporter: Nikolai Sednev <nsednev>
Component: ovirt-hosted-engine-haAssignee: Roman Mohr <rmohr>
Status: CLOSED DUPLICATE QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.1CC: ecohen, gklein, istein, lsurette, rmohr, ycui
Target Milestone: ---   
Target Release: 3.6.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-02 08:16:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
agent.log
none
broker.log
none
alma03 logs
none
alma03 logs none

Description Nikolai Sednev 2015-03-31 11:19:21 UTC
Description of problem:
HE VM not powered up on alma03 host after ovirt-ha-broker service stopped on alma04.

ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Score is 0 due to unexpected vm shutdown.

[root@alma03 ~]# hosted-engine --vm-status                                         


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : alma03.qa.lab.tlv.redhat.com
Host ID                            : 1                           
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 2400                                                                                         
Local maintenance                  : False                                                                                        
Host timestamp                     : 66481                                                                                        
Extra metadata (valid at timestamp):                                                                                              
        metadata_parse_version=1                                                                                                  
        metadata_feature_version=1                                                                                                
        timestamp=66481 (Tue Mar 31 10:45:29 2015)                                                                                
        host-id=1                                                                                                                 
        score=2400                                                                                                                
        maintenance=False                                                                                                         
        state=EngineDown                                                                                                          


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : alma04.qa.lab.tlv.redhat.com
Host ID                            : 2                           
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 2400                                          
Local maintenance                  : False                                         
Host timestamp                     : 66442                                         
Extra metadata (valid at timestamp):                                               
        metadata_parse_version=1                                                   
        metadata_feature_version=1                                                 
        timestamp=66442 (Tue Mar 31 10:44:59 2015)                                 
        host-id=2                                                                  
        score=2400                                                                 
        maintenance=False                                                          
        state=EngineUp                                                             
[root@alma03 ~]# hosted-engine --vm-status                                         


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : alma03.qa.lab.tlv.redhat.com
Host ID                            : 1                           
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0                                                                                            
Local maintenance                  : False                                                                                        
Host timestamp                     : 66600                                                                                        
Extra metadata (valid at timestamp):                                                                                              
        metadata_parse_version=1                                                                                                  
        metadata_feature_version=1                                                                                                
        timestamp=66600 (Tue Mar 31 10:47:28 2015)                                                                                
        host-id=1                                                                                                                 
        score=0                                                                                                                   
        maintenance=False                                                                                                         
        state=EngineUnexpectedlyDown                                                                                              
        timeout=Thu Jan  1 18:39:07 1970                                                                                          


--== Host 2 status ==--

Status up-to-date                  : False
Hostname                           : alma04.qa.lab.tlv.redhat.com
Host ID                            : 2                           
Engine status                      : unknown stale-data          
Score                              : 2400                        
Local maintenance                  : False                       
Host timestamp                     : 66442                       
Extra metadata (valid at timestamp):                             
        metadata_parse_version=1                                 
        metadata_feature_version=1                               
        timestamp=66442 (Tue Mar 31 10:44:59 2015)               
        host-id=2                                                
        score=2400                                               
        maintenance=False                                        
        state=EngineUp                                  
Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.Deploy HE on two RHEVHs6.6 (20150304.0.el6ev).
2.Stop service ovirt-ha-broker on host that currently running the HE VM.
3.Wait for HE VM to get started on another host and see that it's score changes to 0 for some unknown reason.

Actual results:
HE VM not started on second host after service ovirt-ha-broker stopped on host that is running HE VM.

Expected results:
HE VM should be started on second host and score should not be zero.

Additional info:
logs attached.

Comment 1 Nikolai Sednev 2015-03-31 11:22:32 UTC
Created attachment 1008955 [details]
agent.log

Comment 2 Nikolai Sednev 2015-03-31 11:24:22 UTC
Created attachment 1008956 [details]
broker.log

Comment 3 Nikolai Sednev 2015-03-31 11:28:22 UTC
Created attachment 1008957 [details]
alma03 logs

Comment 4 Nikolai Sednev 2015-03-31 11:29:00 UTC
Created attachment 1008958 [details]
alma03 logs

Comment 5 Nikolai Sednev 2015-03-31 13:52:48 UTC
Components that were used on Red Hat Enterprise Virtualization Hypervisor 6.6 (20150304.0.el6ev):
sanlock-2.8-1.el6.x86_64
mom-0.4.1-4.el6ev.noarch
ovirt-node-selinux-3.2.1-9.el6.noarch
ovirt-host-deploy-offline-1.3.0-3.el6ev.x86_64
ovirt-node-plugin-vdsm-0.2.0-19.el6ev.noarch
ovirt-host-deploy-1.3.0-2.el6ev.noarch
libvirt-client-0.10.2-46.el6_6.3.x86_64
ovirt-node-plugin-rhn-3.2.1-9.el6.noarch
ovirt-node-3.2.1-9.el6.noarch
vdsm-4.16.8.1-7.el6ev.x86_64
ovirt-hosted-engine-ha-1.2.5-1.el6ev.noarch
ovirt-node-plugin-hosted-engine-0.2.0-9.0.el6ev.x86_64
ovirt-node-plugin-cim-3.2.1-9.el6.noarch
ovirt-node-branding-rhev-3.2.1-9.el6.noarch
qemu-kvm-rhev-0.12.1.2-2.446.el6.x86_64
ovirt-hosted-engine-setup-1.2.2-1.el6ev.noarch
ovirt-node-plugin-snmp-3.2.1-9.el6.noarch

On engine Red Hat Enterprise Linux Server release 6.6 (Santiago):
rhevm-guest-agent-common-1.0.10-2.el6ev.noarch
rhevm-3.5.1-0.2.el6ev.noarch

Comment 6 Martin Sivák 2015-04-07 07:45:18 UTC
This is not urgent at all, because I have not seen it in production ever.

The second host tries to start the engine when you stop the broker on the first host (because it is not getting any updates and thinks that the host is dead). But the engine is still running so sanlock prevents the VM from starting on the second host. That puts the host to EngineUnexpectedlyDown for ten minutes. The score is reduced to 0 while the host is in that state.

There is one known issue here and that is we do not know the reason for the VM crash. We can't distinguish sanlock protection from a real crash here.

Comment 8 Martin Sivák 2015-09-02 08:16:17 UTC

*** This bug has been marked as a duplicate of bug 1150087 ***