Bug 1366144 - Hosted Engine is down after doing some operations on Cockpit Virtual Machine page
Summary: Hosted Engine is down after doing some operations on Cockpit Virtual Machine ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-node
Classification: oVirt
Component: UI
Version: 4.0
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: ovirt-4.0.4
: 4.0
Assignee: Marek Libra
QA Contact: cshao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-11 07:57 UTC by cshao
Modified: 2016-09-29 11:15 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-09-29 11:15:53 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.0.z+
mgoldboi: planning_ack+
rule-engine: devel_ack+
cshao: testing_ack+


Attachments (Terms of Use)
he-down1 (15.48 KB, image/png)
2016-08-11 07:57 UTC, cshao
no flags Details
he-down2 (12.47 KB, image/png)
2016-08-11 07:57 UTC, cshao
no flags Details
all_log_info (10.72 MB, application/x-gzip)
2016-08-11 07:58 UTC, cshao
no flags Details
maintenance1 (32.71 KB, image/png)
2016-08-16 07:19 UTC, cshao
no flags Details
maintenance2 (26.26 KB, image/png)
2016-08-16 07:20 UTC, cshao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1971 0 normal SHIPPED_LIVE cockpit-ovirt for RHV 4.0.4 2016-09-29 16:41:15 UTC
oVirt gerrit 62757 0 master MERGED vdsm: Rephrase Shut Down All VMs warning 2016-08-24 10:16:09 UTC
oVirt gerrit 62759 0 ovirt-4.0 MERGED vdsm: Rephrase Shut Down All VMs warning 2016-08-24 10:26:51 UTC

Description cshao 2016-08-11 07:57:14 UTC
Created attachment 1189938 [details]
he-down1

Description of problem:
Hosted Engine is down after doing some operations on VM page.

This operation of this method is as follows:
1. Switch to Virtual Machine page.
2. Click "Login to engine" button.
3. Click "Host to maintenance" button.
4. Repeat step 2 and 3.

Version-Release number of selected component (if applicable):
redhat-virtualization-host-4.0-20160810.1
imgbased-0.8.3-0.1.el7ev.noarch
cockpit-ovirt-dashboard-0.10.6-1.3.6.el7ev.noarch
ovirt-hosted-engine-ha-2.0.2-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.1.4-1.el7ev.noarch
rhevm-appliance-20160731.0-1.el7ev.ova

How reproducible:
80%

Steps to Reproduce:
1. Install redhat-virtualization-host-4.0-20160810.1 with below ks.
2. Deploy HE with correct steps
3. Reboot RHVH and login cockpit.
4. Switch to Virtual Machine page.
5. Click "Login to engine" button.
6. Click "Host to maintenance" button.
7. Repeat step 2 and 3.

Actual results:
Hosted Engine is down after doing some operations on VM page.

Expected results:
Hosted Engine still up status.


Additional info:
KS:
liveimg --url=http://xx.xx.xx.xx:8090/rhevh/rhevh7-ng-36/redhat-virtualization-host-4.0-20160810.1/redhat-virtualization-host-4.0-20160810.1.x86_64.liveimg.squashfs

%post
imgbase layout --init
%end

Comment 1 cshao 2016-08-11 07:57:39 UTC
Created attachment 1189941 [details]
he-down2

Comment 2 cshao 2016-08-11 07:58:59 UTC
Created attachment 1189943 [details]
all_log_info

Comment 3 cshao 2016-08-11 08:06:05 UTC
# hosted-engine --vm-status
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  import vdsm.vdscli


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : dell740.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : e39a4fad
Host timestamp                     : 3922
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=3922 (Thu Aug 11 16:04:20 2016)
	host-id=1
	score=0
	maintenance=True
	state=LocalMaintenance
	stopped=False

Comment 4 Fabian Deutsch 2016-08-11 10:29:58 UTC
To me the question is if this problem alsso appears if you put the host into maintenance from Engine?

Thsi would indicate that it is a hosted-engine problem.

Comment 5 Michal Skrivanek 2016-08-11 12:46:28 UTC
indeed please reproduce by moving host to maintenance from engine. If it doesn't then this is not urgent as the cockpit feature is in TechPreview

Comment 6 Michal Skrivanek 2016-08-11 13:07:07 UTC
sorry, I didn't want to switch some fields

Comment 7 cshao 2016-08-12 07:01:03 UTC
(In reply to Fabian Deutsch from comment #4)
> To me the question is if this problem alsso appears if you put the host into
> maintenance from Engine?
> 
> Thsi would indicate that it is a hosted-engine problem.

(In reply to Michal Skrivanek from comment #5)
> indeed please reproduce by moving host to maintenance from engine. If it
> doesn't then this is not urgent as the cockpit feature is in TechPreview

The operation about maintenance host from engine side need 2 hosts in the same cluster, 

So I did some testing according above conditions.

Test steps:
1. Prepare 2 machine with same cpu mode.
2. Install redhat-virtualization-host-4.0-20160811.0 on the first host.
3. Deploy HE with correct steps(use nfs storage 1)
4. Install RHVH on the second host.
5. Deploy addition HE with the same nfs storage 1.
6. Login engine after two hosts changed to up status.
7. Maintenance host 1 from engine.

Test result:
1. After step 7, maintenance host 1 from engine can successful. (HE status in cockpit still can work well)
2. The VM(HE) can migrate to another host automatic.

Comment 8 Michal Skrivanek 2016-08-15 10:59:13 UTC
IIUC you can't reproduce it then?

Comment 9 cshao 2016-08-15 11:36:26 UTC
(In reply to Michal Skrivanek from comment #8)
> IIUC you can't reproduce it then?

Still can reproduce this issue on the latest RHVH with original steps.
redhat-virtualization-host-4.0-20160812.0.ovirt-hosted-engine-ha-2.0.2-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.1.4-1.el7ev.noarch

Comment 10 cshao 2016-08-15 11:56:17 UTC
HE will back after a reboot.
hosted-engine --vm-shutdown
hosted-engine --vm-start

Comment 11 Michal Skrivanek 2016-08-15 12:13:39 UTC
since it's working correctly from engine and the cockpit-based feature is tech preview I'm moving it out of 4.0.3 and decrease importance

Comment 12 Marek Libra 2016-08-16 06:49:07 UTC
How the Host to Maintenance works:
If the engine login is available then call REST API to switch the host to maintenance.
If the call fails or engine login is not available, shut down all VMs _after_ user confirmation.

The issue might be caused by confirmation of 'Shut down all VMs'.

Shaochen, can you please attach screenshot(s) of the dialog(s) from the clicking on Host to Maintenance on?

Comment 13 cshao 2016-08-16 07:19:40 UTC
Created attachment 1191115 [details]
maintenance1

Comment 14 cshao 2016-08-16 07:20:42 UTC
Created attachment 1191116 [details]
maintenance2

Comment 15 Marek Libra 2016-08-16 08:49:04 UTC
Based on the attached 'maintenance2' screenshot, the issue is in invocation of 'Shut down all VMs' on the host since call of 'host to maintenance' via REST API is not possible/failed.

To fix the issue, I'll change the text in dialog to better inform the user about consequences of shutting down critical VMs like the HE.

Comment 16 Tomas Jelinek 2016-08-17 11:09:32 UTC
It is only a label change, changing priority.

Comment 18 cshao 2016-09-27 15:57:49 UTC
Test version:
redhat-virtualization-host-4.0-20160919.0
imgbased-0.8.5-0.1.el7ev.noarch 
cockpit-ws-0.114-2.el7.x86_64
cockpit-ovirt-dashboard-0.10.6-1.4.0.el7ev.noarch
ovirt-hosted-engine-setup-2.0.2.2-2.el7ev.noarch
ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch
rhevm-appliance-20160922.0-1.el7ev.ova

Test steps:

1. Install redhat-virtualization-host-4.0-20160919.0.
2. Deploy HE with correct steps
3. Reboot RHVH and login cockpit.
4. Switch to Virtual Machine page.
5. Click "Login to engine" button.
6. Click "Host to maintenance" button.
7. Repeat step 6 and 7.

Test result:
1. After step 7, hosted engine is down.

But according #c11 & c15, if verify this bug, the only thing that I can do is check below warning text whether pop-up during put the host into maintenance mode.
And the answer is Yes.

========================================================
Login to Engine not available.
Please confirm all VMs on this host will be shut down.

Please consider the type of running VMs. Shutting down critical VMs such as Hosted Engine can cause serious issues.
========================================================

Hi Mlibra and Fabian, 

Because above text warning can pop-up, so can I verify this bug directly?

If yes, should I report a new bug to trace the "HE down" issue also the cockpit feature is in TechPreview?

Thanks.

Comment 19 Michal Skrivanek 2016-09-28 09:12:22 UTC
Critical VMs are supposed to be flagged as HA and will be restarted, same for HE where the he-agent takes care of that
I don't think any other bug is needed, if you believe the text should be changed please suggest or contribute that upstream at https://github.com/mareklibra/cockpit-ovirt/issues

Comment 20 cshao 2016-09-28 09:39:24 UTC
Hi Mlibra,


Scenario 1:
If there is only one host, then we can see the warning text pop-up. 
But Hosted Engine will down after doing some operations(see original bug) on VM page.

Scenario 2:
If there are 2 hosts(see #c7), then maintenance the host can successful.

Is this by design? If yes, I will verify this bug.

Thanks.

Comment 21 Michal Skrivanek 2016-09-28 10:05:44 UTC
Yes, that's ok. Thank you

Comment 22 cshao 2016-09-28 10:24:36 UTC
Verify this bug according #c18 ~ c21.


Note You need to log in before you can comment on or make changes to this bug.