Bug 1372240 - require loops cause incorrect installation order, which causes incorrect file permissions, prevents HE setup
Summary: require loops cause incorrect installation order, which causes incorrect file...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: ovirt-node
Classification: oVirt
Component: Build
Version: 4.0
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: ovirt-4.0.3
: 4.0
Assignee: Fabian Deutsch
QA Contact: Yihui Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-01 08:45 UTC by Yihui Zhao
Modified: 2016-09-13 12:48 UTC (History)
16 users (show)

Fixed In Version: redhat-release-virtualization-host-4.0-3.el7
Clone Of:
Environment:
Last Closed: 2016-09-13 12:48:11 UTC
oVirt Team: Node
Embargoed:
rule-engine: ovirt-4.0.z+
rule-engine: blocker+
mgoldboi: planning_ack+
fdeutsch: devel_ack+
cshao: testing_ack+


Attachments (Terms of Use)
HE_not_running.png (34.72 KB, image/png)
2016-09-01 08:45 UTC, Yihui Zhao
no flags Details
RHVH_tmp_log (1.09 KB, application/x-gzip)
2016-09-01 08:55 UTC, Yihui Zhao
no flags Details
HE-VM_tmp_log (13.03 KB, application/x-gzip)
2016-09-01 08:56 UTC, Yihui Zhao
no flags Details
RHVH_var_log (2.24 MB, application/x-gzip)
2016-09-01 08:57 UTC, Yihui Zhao
no flags Details
HE-VM_var_log (703.20 KB, application/x-gzip)
2016-09-01 08:57 UTC, Yihui Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1859 0 normal SHIPPED_LIVE redhat-release-virtualization-host bug fix and enhancement update for RHV 4.0 2016-09-13 16:47:00 UTC

Description Yihui Zhao 2016-09-01 08:45:01 UTC
Created attachment 1196623 [details]
HE_not_running.png

Description of problem:
Hosted Engine always show "Not running" status in cockpit after deploy it.
The HE-VM can up after run hosted-engine --vm-start, but the hostname of engine will lost, and HE status still show as "Not running" in cockpit.


[root@dell-per210-01 /]# hosted-engine --vm-status
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  import vdsm.vdscli


Version-Release number of selected component (if applicable):
rhvh-4.0-0.20160829.0+1
cockpit-ws-0.114-2.el7.x86_64
cockpit-ovirt-dashboard-0.10.6-1.3.6.el7ev.noarch
imgbased-0.8.4-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.3-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.1.5-1.el7ev.noarch
rhevm-appliance-20160831.0-1.el7ev.ova


How reproducible:
100%
Regression bug:1364034


Steps to Reproduce:
1.Install RHVH4.0 via PXE
2. Login RHVH via cockpit UI.
3. Deploy Hosted Engine via cockpit with correct steps.
4. After vm shut down, wait a few minutes, check HE status.

Actual results:
Hosted Engine always show "Not running" status after deploy it.

Expected results:
Hosted Engine can up and work well after deploy it.

Additional info:
It's a regression bug:closed by bug:1364036

Comment 1 Red Hat Bugzilla Rules Engine 2016-09-01 08:46:27 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 2 Red Hat Bugzilla Rules Engine 2016-09-01 08:46:27 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 3 Red Hat Bugzilla Rules Engine 2016-09-01 08:47:29 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 4 Red Hat Bugzilla Rules Engine 2016-09-01 08:47:29 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 5 Yihui Zhao 2016-09-01 08:55:13 UTC
Created attachment 1196627 [details]
RHVH_tmp_log

Comment 6 Yihui Zhao 2016-09-01 08:56:21 UTC
Created attachment 1196629 [details]
HE-VM_tmp_log

Comment 7 Yihui Zhao 2016-09-01 08:57:14 UTC
Created attachment 1196636 [details]
RHVH_var_log

Comment 8 Yihui Zhao 2016-09-01 08:57:59 UTC
Created attachment 1196642 [details]
HE-VM_var_log

Comment 9 cshao 2016-09-01 09:12:38 UTC
Add keyword "regression" and "Testblocker", due to no such issue on redhat-virtualization-host-4.0-20160817.0, and it will block our HE testing.

Comment 10 Yihui Zhao 2016-09-01 11:16:22 UTC
If reboot the RHVH4.0,and the HE-VM is down permanently.

Information:
1.After deploy HE,and is up:
[root@dhcp-8-194 ~]# hosted-engine --vm-start
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  import vdsm.vdscli

2.After reboot RHVH4.0:
1)
[root@dhcp-8-194 ~]# hosted-engine --vm-start
/usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  from vdsm import utils, vdscli, constants
/usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  from vdsm import utils, vdscli, constants

56dd2f56-9eb3-4800-84d2-fd7720fbaa86
	Status = WaitForLaunch
	nicModel = rtl8139,pv
	statusTime = 4294785510
	emulatedMachine = rhel6.5.0
	pid = 0
	vmName = HostedEngine
	devices = [{'index': '2', 'iface': 'ide', 'specParams': {}, 'readonly': 'true', 'deviceId': 'a24c5fa3-4e58-470f-a606-764f274531fc', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'index': '0', 'iface': 'virtio', 'format': 'raw', 'bootOrder': '1', 'poolID': '00000000-0000-0000-0000-000000000000', 'volumeID': '0481b736-fe9a-448a-b91a-c81d2b255da3', 'imageID': '37210cf8-a06a-47eb-9327-b4ba52648d3b', 'specParams': {}, 'readonly': 'false', 'domainID': '7d5ef81d-3cbc-42ed-9ef2-26d505a3b840', 'optional': 'false', 'deviceId': '37210cf8-a06a-47eb-9327-b4ba52648d3b', 'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'disk', 'shared': 'exclusive', 'propagateErrors': 'off', 'type': 'disk'}, {'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'nicModel': 'pv', 'macAddr': '00:16:3e:39:7e:94', 'linkActive': 'true', 'network': 'ovirtmgmt', 'filter': 'vdsm-no-mac-spoofing', 'specParams': {}, 'deviceId': '46a6dc68-7a25-4e60-b922-37f337a82ba2', 'address': {'slot': '0x03', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'bridge', 'type': 'interface'}, {'device': 'console', 'specParams': {}, 'type': 'console', 'deviceId': '6f4ea432-8c13-44ea-b2fb-7e1bb5c60cc7', 'alias': 'console0'}, {'device': 'vga', 'alias': 'video0', 'type': 'video'}]
	guestDiskMapping = {}
	vmType = kvm
	clientIp = 
	displaySecurePort = -1
	memSize = 4096
	displayPort = -1
	cpuType = Haswell-noTSX
	spiceSecureChannels = smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
	smp = 2
	displayIp = 0
	display = vnc

2)
[root@dhcp-8-194 ~]# hosted-engine --add-console-password
Enter password: 
/usr/share/vdsm/vdsClient.py:33: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  from vdsm import utils, vdscli, constants
Unexpected exception

3)
[root@dhcp-8-194 ~]# hosted-engine --vm-status
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  import vdsm.vdscli

Comment 11 Simone Tiraboschi 2016-09-01 12:14:59 UTC
The cause is here:

MainThread::ERROR::2016-09-01 15:02:06,645::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: ''Configuration value not found: file=/var/lib/ovirt-hosted-engine-ha/ha.conf, key=local_maintenance'' - trying to restart agent
MainThread::WARNING::2016-09-01 15:02:11,651::agent::208::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '8'

Now we have to understand why it got lost on node.

Yihui, can you please chekc the permission and the content of:
/var/lib/ovirt-hosted-engine-ha/ha.conf

Comment 12 Red Hat Bugzilla Rules Engine 2016-09-01 12:16:42 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 13 Ryan Barry 2016-09-01 12:28:25 UTC
(In reply to Simone Tiraboschi from comment #11)
> The cause is here:
> 
> MainThread::ERROR::2016-09-01
> 15:02:06,645::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::
> (_run_agent) Error: ''Configuration value not found:
> file=/var/lib/ovirt-hosted-engine-ha/ha.conf, key=local_maintenance'' -
> trying to restart agent
> MainThread::WARNING::2016-09-01
> 15:02:11,651::agent::208::ovirt_hosted_engine_ha.agent.agent.Agent::
> (_run_agent) Restarting agent, attempt '8'
> 
> Now we have to understand why it got lost on node.
> 
> Yihui, can you please chekc the permission and the content of:
> /var/lib/ovirt-hosted-engine-ha/ha.conf

# cat ha.conf 
local_maintenance=False

# ls -l
total 8
-rw-r--r--. 1 root kvm 187 Aug 24 21:36 broker.conf
-rw-r--r--. 1 root kvm  24 Aug 24 21:36 ha.conf

# ls -ld ovirt-hosted-engine-ha
drwx------. 2 root kvm 38 Aug 30 00:47 ovirt-hosted-engine-ha

Comment 14 Simone Tiraboschi 2016-09-01 12:35:37 UTC
(In reply to Ryan Barry from comment #13)
> # ls -ld ovirt-hosted-engine-ha
> drwx------. 2 root kvm 38 Aug 30 00:47 ovirt-hosted-engine-ha

The issue is here ^, it should be vdsm:kvm since ovirt-ha-agent is running as vdsm user

Comment 15 Yihui Zhao 2016-09-02 01:13:37 UTC
(In reply to Simone Tiraboschi from comment #14)
> (In reply to Ryan Barry from comment #13)
> > # ls -ld ovirt-hosted-engine-ha
> > drwx------. 2 root kvm 38 Aug 30 00:47 ovirt-hosted-engine-ha
> 
> The issue is here ^, it should be vdsm:kvm since ovirt-ha-agent is running
> as vdsm user

Hi Simone,
  Should i change the permission manually,like "chown -R 36:36 ovirt-hosted-engine-ha".But  previous versions don't change the permision manually.
Is the cause both the display error "the hosted engine is not running" and reboot the RHVH4.0, the HE-VM is down permanently?

Thank you 
Yihui

Comment 16 Ryan Barry 2016-09-02 02:30:43 UTC
That's the root cause, yes.

However, a deeper cause is that this happens in RHVH due to some package ordering problem (possible dependency loop), which results on ovirt-hosted-engine-ha being installed before vdsm (and a couple of other simple dep failures).

I'm investigating to try to find the loop.

Comment 17 Sandro Bonazzola 2016-09-02 12:10:51 UTC
(In reply to Ryan Barry from comment #16)
> That's the root cause, yes.
> 
> However, a deeper cause is that this happens in RHVH due to some package
> ordering problem (possible dependency loop), which results on
> ovirt-hosted-engine-ha being installed before vdsm (and a couple of other
> simple dep failures).
> 
> I'm investigating to try to find the loop.


Simone, didn't we fix something like this in the past on some package?

Comment 18 Simone Tiraboschi 2016-09-02 12:15:49 UTC
(In reply to Sandro Bonazzola from comment #17)
> Simone, didn't we fix something like this in the past on some package?

Yes, this one:
https://gerrit.ovirt.org/#/c/62109/

But it seams that the downstream builds are still affected; 
I checked the downstream spec file and we ported that patch also there but it seams that for some reasons, building the image, ovirt-hosted-engine-ha still got installed before vdsm.

Comment 19 Fabian Deutsch 2016-09-02 12:52:51 UTC
The problem in this case is a little bit different.

As Ryan found out we have circular dependencies during installation, these circular dependencies are broken up by rpm, but this can cause that the installation is getting messed up which is likely causing this bug.

Comment 22 Yihui Zhao 2016-09-08 06:32:10 UTC
Hi,all
 I already verified the bug.
 
Version-Release number of selected component (if applicable):
rhvh-4.0-0.20160906.0+1
cockpit-ws-0.114-2.el7.x86_64
cockpit-ovirt-dashboard-0.10.6-1.3.6.el7ev.noarch
imgbased-0.8.4-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.3-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.1.5-1.el7ev.noarch
rhevm-appliance-20160831.0-1.el7ev.ova


Steps to Reproduce:
1.Install RHVH4.0 via PXE
2. Login RHVH via cockpit UI.
3. Deploy Hosted Engine via cockpit with correct steps.
4. After vm shut down, wait a few minutes, check HE status

results:
Hosted Engine can up and work well after deploy it.

Thanks,
Yihui

Comment 24 errata-xmlrpc 2016-09-13 12:48:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1859.html


Note You need to log in before you can comment on or make changes to this bug.