Created attachment 1420918 [details] sosreport from alma04 Description of problem: ha-host fails to retrieve vm.conf file from shared storage and becomes not ha-host if vm.conf manually removed from host, while its not in global maintenance. [root@alma04 ~]# rm -f /var/run/ovirt-hosted-engine-ha/vm.conf [root@alma04 ~]# hosted-engine --vm-status The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch rhvm-appliance-4.1.20180125.0-1.el7.noarch Red Hat Enterprise Linux Server release 7.5 (Maipo) Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux How reproducible: 100% Steps to Reproduce: 1.Deploy SHE over 2 ha-hosts over NFS. 2.SHE-VM is running on first host. 3.Run "rm -f /var/run/ovirt-hosted-engine-ha/vm.conf" on second host. 4.Check that /var/run/ovirt-hosted-engine-ha/vm.conf not being copied from shared storage and does not exists within the directory. 5."hosted-engine --vm-status" The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. 6.Check that NFS storage is accessible, instead of what was printed in step 5. Actual results: ha-agent fails to retrieve vm.conf and have to be manually restarted to work around the issue. Expected results: vm.conf should be copied from shared storage without issues and ha-agent should not fail the ha-host. Additional info: sosreport from host is attached.
Created attachment 1420919 [details] engine logs
In WEBUI engine recognizes second host as ha-host with active score of 3400, in CLI it appears as: alma04 ~]# hosted-engine --vm-status The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
We do not refresh the vm.conf unless it is necessary. Nobody is supposed to touch runtime files manually. So this is not really a bug. The file will eventually appear again once needed. As long as hosted engine is running, the engine sees the right score (and comment 2 says it does) then there is nothing wrong, maybe except a missing comment about how to cause the refresh (try starting the vm, restart agent and probably some other events). We could provide a command to download the file on user's request.
Simone do you think we should change the detection logic here to show the status even when vm.conf is missing?
(In reply to Martin Sivák from comment #4) > Simone do you think we should change the detection logic here to show the > status even when vm.conf is missing? Maybe just that fix could be worth although manually deleting a file is still not a recommended action.
Not going to fix this