Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1372240 - require loops cause incorrect installation order, which causes incorrect file permissions, prevents HE setup
require loops cause incorrect installation order, which causes incorrect file...
Status: CLOSED ERRATA
Product: ovirt-node
Classification: oVirt
Component: Build (Show other bugs)
4.0
All Linux
urgent Severity urgent (vote)
: ovirt-4.0.3
: 4.0
Assigned To: Fabian Deutsch
Yihui Zhao
: Regression, TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-09-01 04:45 EDT by Yihui Zhao
Modified: 2016-09-13 08:48 EDT (History)
16 users (show)

See Also:
Fixed In Version: redhat-release-virtualization-host-4.0-3.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-13 08:48:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Node
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.0.z+
rule-engine: blocker+
mgoldboi: planning_ack+
fdeutsch: devel_ack+
cshao: testing_ack+


Attachments (Terms of Use)
HE_not_running.png (34.72 KB, image/png)
2016-09-01 04:45 EDT, Yihui Zhao
no flags Details
RHVH_tmp_log (1.09 KB, application/x-gzip)
2016-09-01 04:55 EDT, Yihui Zhao
no flags Details
HE-VM_tmp_log (13.03 KB, application/x-gzip)
2016-09-01 04:56 EDT, Yihui Zhao
no flags Details
RHVH_var_log (2.24 MB, application/x-gzip)
2016-09-01 04:57 EDT, Yihui Zhao
no flags Details
HE-VM_var_log (703.20 KB, application/x-gzip)
2016-09-01 04:57 EDT, Yihui Zhao
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1859 normal SHIPPED_LIVE redhat-release-virtualization-host bug fix and enhancement update for RHV 4.0 2016-09-13 12:47:00 EDT

  None (edit)
Description Yihui Zhao 2016-09-01 04:45:01 EDT
Created attachment 1196623 [details]
HE_not_running.png

Description of problem:
Hosted Engine always show "Not running" status in cockpit after deploy it.
The HE-VM can up after run hosted-engine --vm-start, but the hostname of engine will lost, and HE status still show as "Not running" in cockpit.


[root@dell-per210-01 /]# hosted-engine --vm-status
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  import vdsm.vdscli


Version-Release number of selected component (if applicable):
rhvh-4.0-0.20160829.0+1
cockpit-ws-0.114-2.el7.x86_64
cockpit-ovirt-dashboard-0.10.6-1.3.6.el7ev.noarch
imgbased-0.8.4-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.3-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.1.5-1.el7ev.noarch
rhevm-appliance-20160831.0-1.el7ev.ova


How reproducible:
100%
Regression bug:1364034


Steps to Reproduce:
1.Install RHVH4.0 via PXE
2. Login RHVH via cockpit UI.
3. Deploy Hosted Engine via cockpit with correct steps.
4. After vm shut down, wait a few minutes, check HE status.

Actual results:
Hosted Engine always show "Not running" status after deploy it.

Expected results:
Hosted Engine can up and work well after deploy it.

Additional info:
It's a regression bug:closed by bug:1364036
Comment 1 Red Hat Bugzilla Rules Engine 2016-09-01 04:46:27 EDT
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
Comment 2 Red Hat Bugzilla Rules Engine 2016-09-01 04:46:27 EDT
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 3 Red Hat Bugzilla Rules Engine 2016-09-01 04:47:29 EDT
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
Comment 4 Red Hat Bugzilla Rules Engine 2016-09-01 04:47:29 EDT
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 5 Yihui Zhao 2016-09-01 04:55 EDT
Created attachment 1196627 [details]
RHVH_tmp_log
Comment 6 Yihui Zhao 2016-09-01 04:56 EDT
Created attachment 1196629 [details]
HE-VM_tmp_log
Comment 7 Yihui Zhao 2016-09-01 04:57 EDT
Created attachment 1196636 [details]
RHVH_var_log
Comment 8 Yihui Zhao 2016-09-01 04:57 EDT
Created attachment 1196642 [details]
HE-VM_var_log
Comment 9 cshao 2016-09-01 05:12:38 EDT
Add keyword "regression" and "Testblocker", due to no such issue on redhat-virtualization-host-4.0-20160817.0, and it will block our HE testing.
Comment 10 Yihui Zhao 2016-09-01 07:16:22 EDT
If reboot the RHVH4.0,and the HE-VM is down permanently.

Information:
1.After deploy HE,and is up:
[root@dhcp-8-194 ~]# hosted-engine --vm-start
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  import vdsm.vdscli

2.After reboot RHVH4.0:
1)
[root@dhcp-8-194 ~]# hosted-engine --vm-start
/usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  from vdsm import utils, vdscli, constants
/usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  from vdsm import utils, vdscli, constants

56dd2f56-9eb3-4800-84d2-fd7720fbaa86
	Status = WaitForLaunch
	nicModel = rtl8139,pv
	statusTime = 4294785510
	emulatedMachine = rhel6.5.0
	pid = 0
	vmName = HostedEngine
	devices = [{'index': '2', 'iface': 'ide', 'specParams': {}, 'readonly': 'true', 'deviceId': 'a24c5fa3-4e58-470f-a606-764f274531fc', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'index': '0', 'iface': 'virtio', 'format': 'raw', 'bootOrder': '1', 'poolID': '00000000-0000-0000-0000-000000000000', 'volumeID': '0481b736-fe9a-448a-b91a-c81d2b255da3', 'imageID': '37210cf8-a06a-47eb-9327-b4ba52648d3b', 'specParams': {}, 'readonly': 'false', 'domainID': '7d5ef81d-3cbc-42ed-9ef2-26d505a3b840', 'optional': 'false', 'deviceId': '37210cf8-a06a-47eb-9327-b4ba52648d3b', 'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'disk', 'shared': 'exclusive', 'propagateErrors': 'off', 'type': 'disk'}, {'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'nicModel': 'pv', 'macAddr': '00:16:3e:39:7e:94', 'linkActive': 'true', 'network': 'ovirtmgmt', 'filter': 'vdsm-no-mac-spoofing', 'specParams': {}, 'deviceId': '46a6dc68-7a25-4e60-b922-37f337a82ba2', 'address': {'slot': '0x03', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'bridge', 'type': 'interface'}, {'device': 'console', 'specParams': {}, 'type': 'console', 'deviceId': '6f4ea432-8c13-44ea-b2fb-7e1bb5c60cc7', 'alias': 'console0'}, {'device': 'vga', 'alias': 'video0', 'type': 'video'}]
	guestDiskMapping = {}
	vmType = kvm
	clientIp = 
	displaySecurePort = -1
	memSize = 4096
	displayPort = -1
	cpuType = Haswell-noTSX
	spiceSecureChannels = smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
	smp = 2
	displayIp = 0
	display = vnc

2)
[root@dhcp-8-194 ~]# hosted-engine --add-console-password
Enter password: 
/usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  from vdsm import utils, vdscli, constants
Unexpected exception

3)
[root@dhcp-8-194 ~]# hosted-engine --vm-status
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  import vdsm.vdscli
Comment 11 Simone Tiraboschi 2016-09-01 08:14:59 EDT
The cause is here:

MainThread::ERROR::2016-09-01 15:02:06,645::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: ''Configuration value not found: file=/var/lib/ovirt-hosted-engine-ha/ha.conf, key=local_maintenance'' - trying to restart agent
MainThread::WARNING::2016-09-01 15:02:11,651::agent::208::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '8'

Now we have to understand why it got lost on node.

Yihui, can you please chekc the permission and the content of:
/var/lib/ovirt-hosted-engine-ha/ha.conf
Comment 12 Red Hat Bugzilla Rules Engine 2016-09-01 08:16:42 EDT
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Comment 13 Ryan Barry 2016-09-01 08:28:25 EDT
(In reply to Simone Tiraboschi from comment #11)
> The cause is here:
> 
> MainThread::ERROR::2016-09-01
> 15:02:06,645::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::
> (_run_agent) Error: ''Configuration value not found:
> file=/var/lib/ovirt-hosted-engine-ha/ha.conf, key=local_maintenance'' -
> trying to restart agent
> MainThread::WARNING::2016-09-01
> 15:02:11,651::agent::208::ovirt_hosted_engine_ha.agent.agent.Agent::
> (_run_agent) Restarting agent, attempt '8'
> 
> Now we have to understand why it got lost on node.
> 
> Yihui, can you please chekc the permission and the content of:
> /var/lib/ovirt-hosted-engine-ha/ha.conf

# cat ha.conf 
local_maintenance=False

# ls -l
total 8
-rw-r--r--. 1 root kvm 187 Aug 24 21:36 broker.conf
-rw-r--r--. 1 root kvm  24 Aug 24 21:36 ha.conf

# ls -ld ovirt-hosted-engine-ha
drwx------. 2 root kvm 38 Aug 30 00:47 ovirt-hosted-engine-ha
Comment 14 Simone Tiraboschi 2016-09-01 08:35:37 EDT
(In reply to Ryan Barry from comment #13)
> # ls -ld ovirt-hosted-engine-ha
> drwx------. 2 root kvm 38 Aug 30 00:47 ovirt-hosted-engine-ha

The issue is here ^, it should be vdsm:kvm since ovirt-ha-agent is running as vdsm user
Comment 15 Yihui Zhao 2016-09-01 21:13:37 EDT
(In reply to Simone Tiraboschi from comment #14)
> (In reply to Ryan Barry from comment #13)
> > # ls -ld ovirt-hosted-engine-ha
> > drwx------. 2 root kvm 38 Aug 30 00:47 ovirt-hosted-engine-ha
> 
> The issue is here ^, it should be vdsm:kvm since ovirt-ha-agent is running
> as vdsm user

Hi Simone,
  Should i change the permission manually,like "chown -R 36:36 ovirt-hosted-engine-ha".But  previous versions don't change the permision manually.
Is the cause both the display error "the hosted engine is not running" and reboot the RHVH4.0, the HE-VM is down permanently?

Thank you 
Yihui
Comment 16 Ryan Barry 2016-09-01 22:30:43 EDT
That's the root cause, yes.

However, a deeper cause is that this happens in RHVH due to some package ordering problem (possible dependency loop), which results on ovirt-hosted-engine-ha being installed before vdsm (and a couple of other simple dep failures).

I'm investigating to try to find the loop.
Comment 17 Sandro Bonazzola 2016-09-02 08:10:51 EDT
(In reply to Ryan Barry from comment #16)
> That's the root cause, yes.
> 
> However, a deeper cause is that this happens in RHVH due to some package
> ordering problem (possible dependency loop), which results on
> ovirt-hosted-engine-ha being installed before vdsm (and a couple of other
> simple dep failures).
> 
> I'm investigating to try to find the loop.


Simone, didn't we fix something like this in the past on some package?
Comment 18 Simone Tiraboschi 2016-09-02 08:15:49 EDT
(In reply to Sandro Bonazzola from comment #17)
> Simone, didn't we fix something like this in the past on some package?

Yes, this one:
https://gerrit.ovirt.org/#/c/62109/

But it seams that the downstream builds are still affected; 
I checked the downstream spec file and we ported that patch also there but it seams that for some reasons, building the image, ovirt-hosted-engine-ha still got installed before vdsm.
Comment 19 Fabian Deutsch 2016-09-02 08:52:51 EDT
The problem in this case is a little bit different.

As Ryan found out we have circular dependencies during installation, these circular dependencies are broken up by rpm, but this can cause that the installation is getting messed up which is likely causing this bug.
Comment 22 Yihui Zhao 2016-09-08 02:32:10 EDT
Hi,all
 I already verified the bug.
 
Version-Release number of selected component (if applicable):
rhvh-4.0-0.20160906.0+1
cockpit-ws-0.114-2.el7.x86_64
cockpit-ovirt-dashboard-0.10.6-1.3.6.el7ev.noarch
imgbased-0.8.4-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.3-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.1.5-1.el7ev.noarch
rhevm-appliance-20160831.0-1.el7ev.ova


Steps to Reproduce:
1.Install RHVH4.0 via PXE
2. Login RHVH via cockpit UI.
3. Deploy Hosted Engine via cockpit with correct steps.
4. After vm shut down, wait a few minutes, check HE status

results:
Hosted Engine can up and work well after deploy it.

Thanks,
Yihui
Comment 24 errata-xmlrpc 2016-09-13 08:48:11 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1859.html

Note You need to log in before you can comment on or make changes to this bug.