Bug 1151344

Summary: Can not launch rhevm as a monitored service as it says after shutdown engine vm
Product: Red Hat Enterprise Virtualization Manager Reporter: wanghui <huiwa>
Component: ovirt-hosted-engine-haAssignee: Doron Fediuck <dfediuck>
Status: CLOSED ERRATA QA Contact: meital avital <mavital>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.0CC: alukiano, cshao, dfediuck, ecohen, fdeutsch, gklein, gouyang, hadong, huiwa, iheim, leiwang, lsurette, mavital, sbonazzo, scohen, yaniwang, ycui
Target Milestone: ---Flags: mavital: needinfo+
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: ovirt-hosted-engine-ha-1.2.4-3.el7ev Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-11 21:04:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1094719, 1164308, 1164311    
Attachments:
Description Flags
log files
none
log files for comment#2 none

Description wanghui 2014-10-10 08:18:08 UTC
Created attachment 945499 [details]
log files

Description of problem:
After I shutdown the engine vm as it says, it can not launch rhevm as a monitored service automatically. 

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.0-20141006.0.el7ev
ovirt-node-3.1.0-0.20.20141006gitc421e04.el7.noarch.rpm
ovirt-node-plugin-hosted-engine-0.2.0-2.0.el7.x86_64
ovirt-hosted-engine-setup-1.2.1-1.el7.noarch
ovirt-host-deploy-1.3.0-1.el7.noarch
ovirt-hosted-engine-ha-1.2.2-2.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install rhev-hypervisor7-7.0-20141006.0.el7ev.
2. Configure hosted engine using ISo install engine vm option.
3. Shutdown engine vm When process says "Please shutdown the VM allowing the system to launch it as a monitored service".
4. Check whether engine vm is up or not.

Actual results:
1. After step4, engine vm is not up.
   #ps -aux | grep qemu-kvm
    root     13105  0.0  0.0  10632   936 pts/0    S+   07:26   0:00 grep --color=auto qemu-kvm

Expected results:
1. It should start engine vm as a monitored service as it says.

Additional info:

Comment 2 wanghui 2014-12-05 03:07:38 UTC
Test version:
rhev-hypervisor7-7.0-20141202.0.iso
ovirt-node-3.1.0-0.28.20141126git25ce016.el7.noarch
ovirt-node-plugin-hosted-engine-0.2.0-5.0.el7ev.x86_64
ovirt-hosted-engine-setup-1.2.1-6.el7ev.noarch
ovirt-hosted-engine-ha-1.2.4-2.el7ev.noarch

Test steps:
1. Install rhev-hypervisor7-7.0-20141202.0.iso
2. Configure hosted engine using ISO install engine vm option.
3. Shutdown engine vm When process says "Please shutdown the VM allowing the system to launch it as a monitored service".
4. Check whether engine vm is up or not.

Test result:
1. After step4, the UI shows "Engine Status: Engine is down. Please check 'hosted-engine --vm-status'".
2. #ps -aux | grep qemu-kvm
    root     27529  0.0  0.0  10632   940 pts/2    S+   03:05   0:00 grep --color=auto qemu-kvm

So this issue is not fixed in rhev-hypervisor7-7.0-20141202.0.iso. Change the status to ASSIGNED.

Comment 3 wanghui 2014-12-05 03:10:28 UTC
Created attachment 964963 [details]
log files for comment#2

attache the log files according to comment#2.

Comment 4 wanghui 2014-12-05 03:13:09 UTC
(In reply to wanghui from comment #2)
> Test version:
> rhev-hypervisor7-7.0-20141202.0.iso
> ovirt-node-3.1.0-0.28.20141126git25ce016.el7.noarch
> ovirt-node-plugin-hosted-engine-0.2.0-5.0.el7ev.x86_64
> ovirt-hosted-engine-setup-1.2.1-6.el7ev.noarch
> ovirt-hosted-engine-ha-1.2.4-2.el7ev.noarch
> 
> Test steps:
> 1. Install rhev-hypervisor7-7.0-20141202.0.iso
> 2. Configure hosted engine using ISO install engine vm option.
> 3. Shutdown engine vm When process says "Please shutdown the VM allowing the
> system to launch it as a monitored service".
> 4. Check whether engine vm is up or not.
> 
> Test result:
> 1. After step4, the UI shows "Engine Status: Engine is down. Please check
> 'hosted-engine --vm-status'".

#hosted-engine --vm-status
No output for this command.

> 2. #ps -aux | grep qemu-kvm
>     root     27529  0.0  0.0  10632   940 pts/2    S+   03:05   0:00 grep
> --color=auto qemu-kvm
> 
> So this issue is not fixed in rhev-hypervisor7-7.0-20141202.0.iso. Change
> the status to ASSIGNED.

Comment 5 Ying Cui 2014-12-05 06:29:01 UTC
consider this bug is Testblocker, because after engine VM shutdown, end-user can do nothing on VM which installed engine.

Comment 6 Guohua Ouyang 2014-12-05 08:14:38 UTC
(In reply to wanghui from comment #4)
> (In reply to wanghui from comment #2)
> > Test version:
> > rhev-hypervisor7-7.0-20141202.0.iso
> > ovirt-node-3.1.0-0.28.20141126git25ce016.el7.noarch
> > ovirt-node-plugin-hosted-engine-0.2.0-5.0.el7ev.x86_64
> > ovirt-hosted-engine-setup-1.2.1-6.el7ev.noarch
> > ovirt-hosted-engine-ha-1.2.4-2.el7ev.noarch
> > 
> > Test steps:
> > 1. Install rhev-hypervisor7-7.0-20141202.0.iso
> > 2. Configure hosted engine using ISO install engine vm option.
> > 3. Shutdown engine vm When process says "Please shutdown the VM allowing the
> > system to launch it as a monitored service".
> > 4. Check whether engine vm is up or not.
> > 
> > Test result:
> > 1. After step4, the UI shows "Engine Status: Engine is down. Please check
> > 'hosted-engine --vm-status'".
> 
> #hosted-engine --vm-status
> No output for this command.
> 
> > 2. #ps -aux | grep qemu-kvm
> >     root     27529  0.0  0.0  10632   940 pts/2    S+   03:05   0:00 grep
> > --color=auto qemu-kvm
> > 
> > So this issue is not fixed in rhev-hypervisor7-7.0-20141202.0.iso. Change
> > the status to ASSIGNED.

However, run `hosted-engine --vm-start` manuall, the VM can start and could login into the rhevm admin portal, host and vm are green, but the hosted-engine configure page is not updated, still shows "Engine is down".

So I think the root cause maybe after VM shutdown at step4, the backend does not run `hosted-engine --vm-start` to start the Engine.

Comment 7 Ying Cui 2014-12-05 08:26:33 UTC
according to comment 6, move the testblocker out.

Comment 8 Fabian Deutsch 2014-12-05 09:10:09 UTC
Ouyang, could you please provide /var/log as a tarball?

Comment 9 Sandro Bonazzola 2014-12-05 09:14:34 UTC
Please note that it takes a while between the VM shutdown and the HA starting it again.
I don't remember exact timing, Jiri?

Comment 10 Ying Cui 2014-12-05 09:35:15 UTC
(In reply to Fabian Deutsch from comment #8)
> Ouyang, could you please provide /var/log as a tarball?

Fabian, could you check attachment 964963 [details] attached today with latest 7.0 build, is that what you want for /var/log? Thanks.

Comment 11 Jiri Moskovcak 2014-12-05 09:39:21 UTC
It should take just a few seconds until some agent tries to start the VM. The services are not running. This exception from broker prevents it from starting:

Thread-1::ERROR::2014-12-05 02:51:52,213::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'monitor ping addr=10.66.11.254'
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 248, in _dispatch
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/monitor.py", line 60, in start_submonitor
Exception: ping not a registered submonitor type

@Fabian, there seems to be missing file: /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors/ping.py

@Ying, can you please check if the file /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors/ping.py exists on your system?

Comment 12 Ying Cui 2014-12-05 10:53:50 UTC
> @Ying, can you please check if the file
> /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors/
> ping.py exists on your system?

ping.pyc existed in rhevh 7.0 build(rhev-hypervisor7-7.0-20141202.0.iso) yet.

Comment 13 Fabian Deutsch 2014-12-05 11:56:03 UTC
(In reply to Jiri Moskovcak from comment #11)
> It should take just a few seconds until some agent tries to start the VM.
> The services are not running. This exception from broker prevents it from
> starting:
> 
> Thread-1::ERROR::2014-12-05
> 02:51:52,213::listener::192::ovirt_hosted_engine_ha.broker.listener.
> ConnectionHandler::(handle) Error handling request, data: 'monitor ping
> addr=10.66.11.254'
> Traceback (most recent call last):
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
> line 166, in handle
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
> line 248, in _dispatch
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/monitor.py",
> line 60, in start_submonitor
> Exception: ping not a registered submonitor type
> 
> @Fabian, there seems to be missing file:
> /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors/
> ping.py

As Ying noted, the file exists.

Could it be that you import those submonitors with some mechanism which requires the .py file to exist?

Comment 14 Jiri Moskovcak 2014-12-05 12:46:22 UTC
(In reply to Fabian Deutsch from comment #13)
> 
> As Ying noted, the file exists.
> 
> Could it be that you import those submonitors with some mechanism which
> requires the .py file to exist?

- yes, that's exactly the problem

Comment 16 Fabian Deutsch 2014-12-08 13:21:18 UTC
Moving the bug to the infra team

Comment 18 Sandro Bonazzola 2014-12-09 12:51:22 UTC
Dropping Test Only flag since there have been code changes.

Comment 20 wanghui 2014-12-18 09:05:57 UTC
Test version:
rhevh version:
rhev-hypervisor6-6.6-20141212.0.iso
ovirt-node-3.1.0-0.34.20141210git0c9c493.el6.noarch
vdsm-4.16.8.1-3.el6ev.x86_64
ovirt-node-plugin-vdsm-0.2.0-14.el6ev.noarch
ovirt-node-plugin-hosted-engine-0.2.0-6.0.el7ev.x86_64
ovirt-hosted-engine-setup-1.2.1-8.el6ev.noarch
ovirt-hosted-engine-ha-1.2.4-3.el6ev.noarch

rhevm version:
rhevm-3.5.0-0.25.el6ev.noarch

Test step:
1. Install rhev-hypervisor6-6.6-20141212.0.iso
2. Configure hosted engine using ISO install engine vm option.
3. Shutdown engine vm When process says "Please shutdown the VM allowing the system to launch it as a monitored service".
4. Check whether engine vm is up or not.

Test result:
1. After step4, engine vm can run in backend as a service now. And engine can be access through web now. Rhevh still up in engine part. 

So this issue is fixed in ovirt-hosted-engine-ha-1.2.4-3.el6ev.noarch now.

Comment 21 Artyom 2014-12-24 12:21:37 UTC
I see that bug opened for rhevh7, but you did checked fix on rhevh6.6, why?

Comment 22 Artyom 2014-12-24 12:24:35 UTC
Ok I see that patch for hosted-engine package and not for rhevh, so it's must not really important version of rhevh.

Comment 23 wanghui 2014-12-25 06:37:17 UTC
Hi Artyom,

According to my verification, this issue is fixed in ovirt-hosted-engine-ha-1.2.4-3.el6ev.noarch. So if you accept that, you can change the bug' status now.

Thanks!
Hui Wang

Comment 24 meital avital 2014-12-25 07:20:16 UTC
Moving to verified according to comment 20

Comment 27 errata-xmlrpc 2015-02-11 21:04:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0160.html