Bug 1366144
| Summary: | Hosted Engine is down after doing some operations on Cockpit Virtual Machine page | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-node | Reporter: | cshao <cshao> | ||||||||||||
| Component: | UI | Assignee: | Marek Libra <mlibra> | ||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | cshao <cshao> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | low | ||||||||||||||
| Version: | 4.0 | CC: | bugs, cshao, dguo, fdeutsch, gklein, huzhao, leiwang, mgoldboi, michal.skrivanek, mlibra, rbarry, tjelinek, trichard, weiwang, yaniwang, ycui | ||||||||||||
| Target Milestone: | ovirt-4.0.4 | Flags: | rule-engine:
ovirt-4.0.z+
mgoldboi: planning_ack+ rule-engine: devel_ack+ cshao: testing_ack+ |
||||||||||||
| Target Release: | 4.0 | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | Enhancement | |||||||||||||
| Doc Text: |
With this release, a warning before the 'Shut Down All VMs' action stresses the importance of checking which virtual machines will be affected, namely the self-hosted engine.
|
Story Points: | --- | ||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2016-09-29 11:15:53 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
cshao
2016-08-11 07:57:14 UTC
Created attachment 1189941 [details]
he-down2
Created attachment 1189943 [details]
all_log_info
# hosted-engine --vm-status
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
import vdsm.vdscli
--== Host 1 status ==--
Status up-to-date : True
Hostname : dell740.redhat.com
Host ID : 1
Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
Score : 0
stopped : False
Local maintenance : True
crc32 : e39a4fad
Host timestamp : 3922
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3922 (Thu Aug 11 16:04:20 2016)
host-id=1
score=0
maintenance=True
state=LocalMaintenance
stopped=False
To me the question is if this problem alsso appears if you put the host into maintenance from Engine? Thsi would indicate that it is a hosted-engine problem. indeed please reproduce by moving host to maintenance from engine. If it doesn't then this is not urgent as the cockpit feature is in TechPreview sorry, I didn't want to switch some fields (In reply to Fabian Deutsch from comment #4) > To me the question is if this problem alsso appears if you put the host into > maintenance from Engine? > > Thsi would indicate that it is a hosted-engine problem. (In reply to Michal Skrivanek from comment #5) > indeed please reproduce by moving host to maintenance from engine. If it > doesn't then this is not urgent as the cockpit feature is in TechPreview The operation about maintenance host from engine side need 2 hosts in the same cluster, So I did some testing according above conditions. Test steps: 1. Prepare 2 machine with same cpu mode. 2. Install redhat-virtualization-host-4.0-20160811.0 on the first host. 3. Deploy HE with correct steps(use nfs storage 1) 4. Install RHVH on the second host. 5. Deploy addition HE with the same nfs storage 1. 6. Login engine after two hosts changed to up status. 7. Maintenance host 1 from engine. Test result: 1. After step 7, maintenance host 1 from engine can successful. (HE status in cockpit still can work well) 2. The VM(HE) can migrate to another host automatic. IIUC you can't reproduce it then? (In reply to Michal Skrivanek from comment #8) > IIUC you can't reproduce it then? Still can reproduce this issue on the latest RHVH with original steps. redhat-virtualization-host-4.0-20160812.0.ovirt-hosted-engine-ha-2.0.2-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.1.4-1.el7ev.noarch HE will back after a reboot. hosted-engine --vm-shutdown hosted-engine --vm-start since it's working correctly from engine and the cockpit-based feature is tech preview I'm moving it out of 4.0.3 and decrease importance How the Host to Maintenance works: If the engine login is available then call REST API to switch the host to maintenance. If the call fails or engine login is not available, shut down all VMs _after_ user confirmation. The issue might be caused by confirmation of 'Shut down all VMs'. Shaochen, can you please attach screenshot(s) of the dialog(s) from the clicking on Host to Maintenance on? Created attachment 1191115 [details]
maintenance1
Created attachment 1191116 [details]
maintenance2
Based on the attached 'maintenance2' screenshot, the issue is in invocation of 'Shut down all VMs' on the host since call of 'host to maintenance' via REST API is not possible/failed. To fix the issue, I'll change the text in dialog to better inform the user about consequences of shutting down critical VMs like the HE. It is only a label change, changing priority. Test version: redhat-virtualization-host-4.0-20160919.0 imgbased-0.8.5-0.1.el7ev.noarch cockpit-ws-0.114-2.el7.x86_64 cockpit-ovirt-dashboard-0.10.6-1.4.0.el7ev.noarch ovirt-hosted-engine-setup-2.0.2.2-2.el7ev.noarch ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch rhevm-appliance-20160922.0-1.el7ev.ova Test steps: 1. Install redhat-virtualization-host-4.0-20160919.0. 2. Deploy HE with correct steps 3. Reboot RHVH and login cockpit. 4. Switch to Virtual Machine page. 5. Click "Login to engine" button. 6. Click "Host to maintenance" button. 7. Repeat step 6 and 7. Test result: 1. After step 7, hosted engine is down. But according #c11 & c15, if verify this bug, the only thing that I can do is check below warning text whether pop-up during put the host into maintenance mode. And the answer is Yes. ======================================================== Login to Engine not available. Please confirm all VMs on this host will be shut down. Please consider the type of running VMs. Shutting down critical VMs such as Hosted Engine can cause serious issues. ======================================================== Hi Mlibra and Fabian, Because above text warning can pop-up, so can I verify this bug directly? If yes, should I report a new bug to trace the "HE down" issue also the cockpit feature is in TechPreview? Thanks. Critical VMs are supposed to be flagged as HA and will be restarted, same for HE where the he-agent takes care of that I don't think any other bug is needed, if you believe the text should be changed please suggest or contribute that upstream at https://github.com/mareklibra/cockpit-ovirt/issues Hi Mlibra, Scenario 1: If there is only one host, then we can see the warning text pop-up. But Hosted Engine will down after doing some operations(see original bug) on VM page. Scenario 2: If there are 2 hosts(see #c7), then maintenance the host can successful. Is this by design? If yes, I will verify this bug. Thanks. Yes, that's ok. Thank you Verify this bug according #c18 ~ c21. |