Created attachment 1189938 [details] he-down1 Description of problem: Hosted Engine is down after doing some operations on VM page. This operation of this method is as follows: 1. Switch to Virtual Machine page. 2. Click "Login to engine" button. 3. Click "Host to maintenance" button. 4. Repeat step 2 and 3. Version-Release number of selected component (if applicable): redhat-virtualization-host-4.0-20160810.1 imgbased-0.8.3-0.1.el7ev.noarch cockpit-ovirt-dashboard-0.10.6-1.3.6.el7ev.noarch ovirt-hosted-engine-ha-2.0.2-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.1.4-1.el7ev.noarch rhevm-appliance-20160731.0-1.el7ev.ova How reproducible: 80% Steps to Reproduce: 1. Install redhat-virtualization-host-4.0-20160810.1 with below ks. 2. Deploy HE with correct steps 3. Reboot RHVH and login cockpit. 4. Switch to Virtual Machine page. 5. Click "Login to engine" button. 6. Click "Host to maintenance" button. 7. Repeat step 2 and 3. Actual results: Hosted Engine is down after doing some operations on VM page. Expected results: Hosted Engine still up status. Additional info: KS: liveimg --url=http://xx.xx.xx.xx:8090/rhevh/rhevh7-ng-36/redhat-virtualization-host-4.0-20160810.1/redhat-virtualization-host-4.0-20160810.1.x86_64.liveimg.squashfs %post imgbase layout --init %end
Created attachment 1189941 [details] he-down2
Created attachment 1189943 [details] all_log_info
# hosted-engine --vm-status /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli import vdsm.vdscli --== Host 1 status ==-- Status up-to-date : True Hostname : dell740.redhat.com Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"} Score : 0 stopped : False Local maintenance : True crc32 : e39a4fad Host timestamp : 3922 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3922 (Thu Aug 11 16:04:20 2016) host-id=1 score=0 maintenance=True state=LocalMaintenance stopped=False
To me the question is if this problem alsso appears if you put the host into maintenance from Engine? Thsi would indicate that it is a hosted-engine problem.
indeed please reproduce by moving host to maintenance from engine. If it doesn't then this is not urgent as the cockpit feature is in TechPreview
sorry, I didn't want to switch some fields
(In reply to Fabian Deutsch from comment #4) > To me the question is if this problem alsso appears if you put the host into > maintenance from Engine? > > Thsi would indicate that it is a hosted-engine problem. (In reply to Michal Skrivanek from comment #5) > indeed please reproduce by moving host to maintenance from engine. If it > doesn't then this is not urgent as the cockpit feature is in TechPreview The operation about maintenance host from engine side need 2 hosts in the same cluster, So I did some testing according above conditions. Test steps: 1. Prepare 2 machine with same cpu mode. 2. Install redhat-virtualization-host-4.0-20160811.0 on the first host. 3. Deploy HE with correct steps(use nfs storage 1) 4. Install RHVH on the second host. 5. Deploy addition HE with the same nfs storage 1. 6. Login engine after two hosts changed to up status. 7. Maintenance host 1 from engine. Test result: 1. After step 7, maintenance host 1 from engine can successful. (HE status in cockpit still can work well) 2. The VM(HE) can migrate to another host automatic.
IIUC you can't reproduce it then?
(In reply to Michal Skrivanek from comment #8) > IIUC you can't reproduce it then? Still can reproduce this issue on the latest RHVH with original steps. redhat-virtualization-host-4.0-20160812.0.ovirt-hosted-engine-ha-2.0.2-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.1.4-1.el7ev.noarch
HE will back after a reboot. hosted-engine --vm-shutdown hosted-engine --vm-start
since it's working correctly from engine and the cockpit-based feature is tech preview I'm moving it out of 4.0.3 and decrease importance
How the Host to Maintenance works: If the engine login is available then call REST API to switch the host to maintenance. If the call fails or engine login is not available, shut down all VMs _after_ user confirmation. The issue might be caused by confirmation of 'Shut down all VMs'. Shaochen, can you please attach screenshot(s) of the dialog(s) from the clicking on Host to Maintenance on?
Created attachment 1191115 [details] maintenance1
Created attachment 1191116 [details] maintenance2
Based on the attached 'maintenance2' screenshot, the issue is in invocation of 'Shut down all VMs' on the host since call of 'host to maintenance' via REST API is not possible/failed. To fix the issue, I'll change the text in dialog to better inform the user about consequences of shutting down critical VMs like the HE.
It is only a label change, changing priority.
Test version: redhat-virtualization-host-4.0-20160919.0 imgbased-0.8.5-0.1.el7ev.noarch cockpit-ws-0.114-2.el7.x86_64 cockpit-ovirt-dashboard-0.10.6-1.4.0.el7ev.noarch ovirt-hosted-engine-setup-2.0.2.2-2.el7ev.noarch ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch rhevm-appliance-20160922.0-1.el7ev.ova Test steps: 1. Install redhat-virtualization-host-4.0-20160919.0. 2. Deploy HE with correct steps 3. Reboot RHVH and login cockpit. 4. Switch to Virtual Machine page. 5. Click "Login to engine" button. 6. Click "Host to maintenance" button. 7. Repeat step 6 and 7. Test result: 1. After step 7, hosted engine is down. But according #c11 & c15, if verify this bug, the only thing that I can do is check below warning text whether pop-up during put the host into maintenance mode. And the answer is Yes. ======================================================== Login to Engine not available. Please confirm all VMs on this host will be shut down. Please consider the type of running VMs. Shutting down critical VMs such as Hosted Engine can cause serious issues. ======================================================== Hi Mlibra and Fabian, Because above text warning can pop-up, so can I verify this bug directly? If yes, should I report a new bug to trace the "HE down" issue also the cockpit feature is in TechPreview? Thanks.
Critical VMs are supposed to be flagged as HA and will be restarted, same for HE where the he-agent takes care of that I don't think any other bug is needed, if you believe the text should be changed please suggest or contribute that upstream at https://github.com/mareklibra/cockpit-ovirt/issues
Hi Mlibra, Scenario 1: If there is only one host, then we can see the warning text pop-up. But Hosted Engine will down after doing some operations(see original bug) on VM page. Scenario 2: If there are 2 hosts(see #c7), then maintenance the host can successful. Is this by design? If yes, I will verify this bug. Thanks.
Yes, that's ok. Thank you
Verify this bug according #c18 ~ c21.