Created attachment 1196623 [details] HE_not_running.png Description of problem: Hosted Engine always show "Not running" status in cockpit after deploy it. The HE-VM can up after run hosted-engine --vm-start, but the hostname of engine will lost, and HE status still show as "Not running" in cockpit. [root@dell-per210-01 /]# hosted-engine --vm-status /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli import vdsm.vdscli Version-Release number of selected component (if applicable): rhvh-4.0-0.20160829.0+1 cockpit-ws-0.114-2.el7.x86_64 cockpit-ovirt-dashboard-0.10.6-1.3.6.el7ev.noarch imgbased-0.8.4-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.3-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.1.5-1.el7ev.noarch rhevm-appliance-20160831.0-1.el7ev.ova How reproducible: 100% Regression bug:1364034 Steps to Reproduce: 1.Install RHVH4.0 via PXE 2. Login RHVH via cockpit UI. 3. Deploy Hosted Engine via cockpit with correct steps. 4. After vm shut down, wait a few minutes, check HE status. Actual results: Hosted Engine always show "Not running" status after deploy it. Expected results: Hosted Engine can up and work well after deploy it. Additional info: It's a regression bug:closed by bug:1364036
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Created attachment 1196627 [details] RHVH_tmp_log
Created attachment 1196629 [details] HE-VM_tmp_log
Created attachment 1196636 [details] RHVH_var_log
Created attachment 1196642 [details] HE-VM_var_log
Add keyword "regression" and "Testblocker", due to no such issue on redhat-virtualization-host-4.0-20160817.0, and it will block our HE testing.
If reboot the RHVH4.0,and the HE-VM is down permanently. Information: 1.After deploy HE,and is up: [root@dhcp-8-194 ~]# hosted-engine --vm-start /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli import vdsm.vdscli 2.After reboot RHVH4.0: 1) [root@dhcp-8-194 ~]# hosted-engine --vm-start /usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli from vdsm import utils, vdscli, constants /usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli from vdsm import utils, vdscli, constants 56dd2f56-9eb3-4800-84d2-fd7720fbaa86 Status = WaitForLaunch nicModel = rtl8139,pv statusTime = 4294785510 emulatedMachine = rhel6.5.0 pid = 0 vmName = HostedEngine devices = [{'index': '2', 'iface': 'ide', 'specParams': {}, 'readonly': 'true', 'deviceId': 'a24c5fa3-4e58-470f-a606-764f274531fc', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'index': '0', 'iface': 'virtio', 'format': 'raw', 'bootOrder': '1', 'poolID': '00000000-0000-0000-0000-000000000000', 'volumeID': '0481b736-fe9a-448a-b91a-c81d2b255da3', 'imageID': '37210cf8-a06a-47eb-9327-b4ba52648d3b', 'specParams': {}, 'readonly': 'false', 'domainID': '7d5ef81d-3cbc-42ed-9ef2-26d505a3b840', 'optional': 'false', 'deviceId': '37210cf8-a06a-47eb-9327-b4ba52648d3b', 'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'disk', 'shared': 'exclusive', 'propagateErrors': 'off', 'type': 'disk'}, {'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'nicModel': 'pv', 'macAddr': '00:16:3e:39:7e:94', 'linkActive': 'true', 'network': 'ovirtmgmt', 'filter': 'vdsm-no-mac-spoofing', 'specParams': {}, 'deviceId': '46a6dc68-7a25-4e60-b922-37f337a82ba2', 'address': {'slot': '0x03', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'bridge', 'type': 'interface'}, {'device': 'console', 'specParams': {}, 'type': 'console', 'deviceId': '6f4ea432-8c13-44ea-b2fb-7e1bb5c60cc7', 'alias': 'console0'}, {'device': 'vga', 'alias': 'video0', 'type': 'video'}] guestDiskMapping = {} vmType = kvm clientIp = displaySecurePort = -1 memSize = 4096 displayPort = -1 cpuType = Haswell-noTSX spiceSecureChannels = smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir smp = 2 displayIp = 0 display = vnc 2) [root@dhcp-8-194 ~]# hosted-engine --add-console-password Enter password: /usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli from vdsm import utils, vdscli, constants Unexpected exception 3) [root@dhcp-8-194 ~]# hosted-engine --vm-status /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli import vdsm.vdscli
The cause is here: MainThread::ERROR::2016-09-01 15:02:06,645::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: ''Configuration value not found: file=/var/lib/ovirt-hosted-engine-ha/ha.conf, key=local_maintenance'' - trying to restart agent MainThread::WARNING::2016-09-01 15:02:11,651::agent::208::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '8' Now we have to understand why it got lost on node. Yihui, can you please chekc the permission and the content of: /var/lib/ovirt-hosted-engine-ha/ha.conf
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
(In reply to Simone Tiraboschi from comment #11) > The cause is here: > > MainThread::ERROR::2016-09-01 > 15:02:06,645::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent:: > (_run_agent) Error: ''Configuration value not found: > file=/var/lib/ovirt-hosted-engine-ha/ha.conf, key=local_maintenance'' - > trying to restart agent > MainThread::WARNING::2016-09-01 > 15:02:11,651::agent::208::ovirt_hosted_engine_ha.agent.agent.Agent:: > (_run_agent) Restarting agent, attempt '8' > > Now we have to understand why it got lost on node. > > Yihui, can you please chekc the permission and the content of: > /var/lib/ovirt-hosted-engine-ha/ha.conf # cat ha.conf local_maintenance=False # ls -l total 8 -rw-r--r--. 1 root kvm 187 Aug 24 21:36 broker.conf -rw-r--r--. 1 root kvm 24 Aug 24 21:36 ha.conf # ls -ld ovirt-hosted-engine-ha drwx------. 2 root kvm 38 Aug 30 00:47 ovirt-hosted-engine-ha
(In reply to Ryan Barry from comment #13) > # ls -ld ovirt-hosted-engine-ha > drwx------. 2 root kvm 38 Aug 30 00:47 ovirt-hosted-engine-ha The issue is here ^, it should be vdsm:kvm since ovirt-ha-agent is running as vdsm user
(In reply to Simone Tiraboschi from comment #14) > (In reply to Ryan Barry from comment #13) > > # ls -ld ovirt-hosted-engine-ha > > drwx------. 2 root kvm 38 Aug 30 00:47 ovirt-hosted-engine-ha > > The issue is here ^, it should be vdsm:kvm since ovirt-ha-agent is running > as vdsm user Hi Simone, Should i change the permission manually,like "chown -R 36:36 ovirt-hosted-engine-ha".But previous versions don't change the permision manually. Is the cause both the display error "the hosted engine is not running" and reboot the RHVH4.0, the HE-VM is down permanently? Thank you Yihui
That's the root cause, yes. However, a deeper cause is that this happens in RHVH due to some package ordering problem (possible dependency loop), which results on ovirt-hosted-engine-ha being installed before vdsm (and a couple of other simple dep failures). I'm investigating to try to find the loop.
(In reply to Ryan Barry from comment #16) > That's the root cause, yes. > > However, a deeper cause is that this happens in RHVH due to some package > ordering problem (possible dependency loop), which results on > ovirt-hosted-engine-ha being installed before vdsm (and a couple of other > simple dep failures). > > I'm investigating to try to find the loop. Simone, didn't we fix something like this in the past on some package?
(In reply to Sandro Bonazzola from comment #17) > Simone, didn't we fix something like this in the past on some package? Yes, this one: https://gerrit.ovirt.org/#/c/62109/ But it seams that the downstream builds are still affected; I checked the downstream spec file and we ported that patch also there but it seams that for some reasons, building the image, ovirt-hosted-engine-ha still got installed before vdsm.
The problem in this case is a little bit different. As Ryan found out we have circular dependencies during installation, these circular dependencies are broken up by rpm, but this can cause that the installation is getting messed up which is likely causing this bug.
Hi,all I already verified the bug. Version-Release number of selected component (if applicable): rhvh-4.0-0.20160906.0+1 cockpit-ws-0.114-2.el7.x86_64 cockpit-ovirt-dashboard-0.10.6-1.3.6.el7ev.noarch imgbased-0.8.4-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.3-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.1.5-1.el7ev.noarch rhevm-appliance-20160831.0-1.el7ev.ova Steps to Reproduce: 1.Install RHVH4.0 via PXE 2. Login RHVH via cockpit UI. 3. Deploy Hosted Engine via cockpit with correct steps. 4. After vm shut down, wait a few minutes, check HE status results: Hosted Engine can up and work well after deploy it. Thanks, Yihui
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1859.html