Bug 1128065 - [node][el7] Hosted-engine --deploy failed with "Failed to execute stage 'Environment setup': Command '/bin/systemctl' failed to execute"
Summary: [node][el7] Hosted-engine --deploy failed with "Failed to execute stage 'Envi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.5.0
Assignee: Fabian Deutsch
QA Contact: Virtualization Bugs
URL:
Whiteboard: node
Depends On:
Blocks: rhevh-7.0 rhev35betablocker rhev35rcblocker rhev35gablocker
TreeView+ depends on / blocked
 
Reported: 2014-08-08 08:32 UTC by wanghui
Modified: 2016-02-10 20:11 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-11 20:40:35 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Provide more log files according to comment#3 (8.64 MB, application/gzip)
2014-08-11 05:23 UTC, wanghui
no flags Details
the output of #systemd-analyze plot > boot.svg according to comment#9 (97.20 KB, image/svg+xml)
2014-08-18 09:16 UTC, wanghui
no flags Details
provide audit.log as comment#34 asked (983.94 KB, text/plain)
2014-09-29 09:01 UTC, wanghui
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0161 0 normal SHIPPED_LIVE ovirt-hosted-engine-setup bug fix and enhancement update 2015-12-07 21:35:11 UTC
oVirt gerrit 33957 0 master MERGED installer: Fix relabeling of /var/log Never
oVirt gerrit 34007 0 master MERGED selinux: Drop auditd_log_t related rules Never

Description wanghui 2014-08-08 08:32:23 UTC
Description of problem:
Hosted-engine --deploy failed by reporting "Failed to execute stage 'Environment setup': Command '/bin/systemctl' failed to execute" error.

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.0-20140807.0.iso
ovirt-node-3.1.0-0.6.20140731git2c8e71f.el7.noarch
ovirt-node-plugin-hosted-engine-0.1.0-0.0.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Install  rhev-hypervisor7-7.0-20140807.0.iso
2. Run #hosted-engine --deploy in shell 
3. Run #ovirt-hosted-engine-setup in shell

Actual results:
1. After step2, Hosted-engine --deploy failed by reporting "Failed to execute stage 'Environment setup': Command '/bin/systemctl' failed to execute" error.
2. After step3, it report the same error as step2.

Expected results:
1. Hosted-engine --deploy can successful.

Additional info:
Add keyword "test_blocker" due to this bug blocked our testing on hosted-engine feature.

Comment 1 Ying Cui 2014-08-08 09:35:07 UTC
Hey Hui,
   Could you help to provide the ovirt-hosted-engine-setup version here?

Thanks
Ying

Comment 2 wanghui 2014-08-08 10:27:46 UTC
Provide more version info:
ovirt-hosted-engine-ha-1.2.1-0.2.ovirtbeta2.el7.noarch
ovirt-hosted-engine-setup-1.2.0-0.1.ovirtbeta2.el7.noarch

Comment 3 Sandro Bonazzola 2014-08-08 13:16:06 UTC
Fabian looks like hosted engine and node doesn't work too well together...

wanghui, we need hosted-engine, vdsm, supervdsm, sanlock and libvirt logs regarding the same time interval of the error you got.
Can you attach them?

Comment 4 Sandro Bonazzola 2014-08-08 13:16:57 UTC
Can you reproduce on clean CentOS / RHEL 7 environment (not node)?

Comment 5 Fabian Deutsch 2014-08-08 14:19:57 UTC
(In reply to Sandro Bonazzola from comment #3)
> Fabian looks like hosted engine and node doesn't work too well together...

Any idea what might be going wrong?

Comment 6 Sandro Bonazzola 2014-08-08 14:26:56 UTC
No, waiting for logs in order to try to figure out.

Comment 7 wanghui 2014-08-11 05:23:39 UTC
Created attachment 925612 [details]
Provide more log files according to comment#3

Comment 8 Sandro Bonazzola 2014-08-11 11:26:18 UTC
Weird, you have vdsm sources in vdsm log directory...
BTW,

setup logs shows:

2014-08-11 05:07:03 DEBUG otopi.context context._executeMethod:152 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 142, in _executeMethod
  File "/usr/share/ovirt-hosted-engine-setup/plugins/ovirt-hosted-engine-setup/system/vdsmenv.py", line 155, in _late_setup
  File "/usr/share/otopi/plugins/otopi/services/systemd.py", line 138, in state
  File "/usr/share/otopi/plugins/otopi/services/systemd.py", line 77, in _executeServiceCommand
  File "/usr/lib/python2.7/site-packages/otopi/plugin.py", line 871, in execute
RuntimeError: Command '/bin/systemctl' failed to execute

vdsm.log is empty.

supervdsm.log only shows:

MainThread::DEBUG::2014-08-11 05:07:03,872::netconfpersistence::134::root::(_getConfigs) Non-existing config set.

libvirt logs look ok.

Fabian, is it possible that on rhev-hypervisor7, vdsm.log is not writable?
This may cause vdsmd to fail start.

Comment 9 Fabian Deutsch 2014-08-12 08:14:03 UTC
(In reply to Sandro Bonazzola from comment #8)
…
> Fabian, is it possible that on rhev-hypervisor7, vdsm.log is not writable?
> This may cause vdsmd to fail start.

After booting a RHEVH7 image I see that /var/log/ is mounted and writable.

But this could be a race. Systemd is booting the services in parallel if no dependency is set. That can lead to a situation where vdsm is started before /var/log is writable, and thus can't write the logfiles and subsequently fails to start.

Could you provide the output of 
$ systemd-analyze plot > boot.svg

Any service which is persisting or relying on Node specific features should have a dependency (Requires= and After=) on ovirt-early.service.

Would it help if Node introduced a node-ready.target for easier consumption?

Comment 10 Sandro Bonazzola 2014-08-12 08:18:43 UTC
Moving needinfo to reporter.

Comment 11 Sandro Bonazzola 2014-08-12 08:19:45 UTC
Fabian, hosted-engine --deploy is called once the system is up.
So it shouldn't be a race condition on systemd.

Comment 12 Fabian Deutsch 2014-08-12 08:25:30 UTC
(In reply to Sandro Bonazzola from comment #11)
> Fabian, hosted-engine --deploy is called once the system is up.
> So it shouldn't be a race condition on systemd.

You are right.

I also wonder about this snippet now:

> MainThread::DEBUG::2014-08-11 05:07:03,872::netconfpersistence::134::root::(_getConfigs) Non-existing config set.

Toni, can you maybe say something about this log snippet?

Comment 14 Ying Cui 2014-08-18 08:45:52 UTC
Hey Leonid,
   As you are QA owner of this component, could you please help to reproduce this issue on RHEL 7 environment, and provide more useful info and status for this bug.

Thanks.

Comment 15 wanghui 2014-08-18 09:16:56 UTC
Created attachment 927752 [details]
the output of #systemd-analyze plot > boot.svg according to comment#9

Comment 16 wanghui 2014-08-18 09:18:38 UTC
Set back the needinfo due to remove them by mistake

Comment 17 Antoni Segura Puimedon 2014-08-18 15:33:55 UTC
The log message might be because of a bug that was clearing the network upgrade file. That should not happen with the latest 3.5. If the message still shows up, please let me know.

Comment 18 Sandro Bonazzola 2014-08-19 06:25:55 UTC
wanghui we're going to rebuild downstream packages for QE next week, please tru to reproduce with the new packages once they'll be available.

Comment 19 Ying Cui 2014-08-27 06:59:04 UTC
(In reply to Sandro Bonazzola from comment #18)
> wanghui we're going to rebuild downstream packages for QE next week, please
> tru to reproduce with the new packages once they'll be available.

Checking brew, ovirt-hosted-engine-setup-1.2.0-0.2.master.el7 is the latest in brewweb, so we need to rebuild downstream rhevh 7.0 for rhev 3.5 build to reproduce this bug again.

https://brewweb.devel.redhat.com/buildinfo?buildID=379706

Comment 20 Ying Cui 2014-08-28 11:56:57 UTC
As comment 18 and 19, we got the new rhevh build today, rhev-hypervisor7-7.0-20140827.0.iso, but we encounter the new testblocker bug 1134873, need to check more.

Comment 21 wanghui 2014-09-01 02:04:01 UTC
(In reply to Sandro Bonazzola from comment #18)
> wanghui we're going to rebuild downstream packages for QE next week, please
> tru to reproduce with the new packages once they'll be available.

Hi sandro,

I have tested rhev-hypervisor7-7.0-20140827.0.iso, the issue in this bug is fixed. But I encounter another bug 1134873 as comment#20 said. Please help to check it.

Thanks
Hui Wang

Comment 22 Sandro Bonazzola 2014-09-01 07:05:42 UTC
Moving to ON_QA as per comment #21. Will follow up bug #1134873 in its own bug report.

Comment 23 Jiri Belka 2014-09-05 08:14:52 UTC
vt2.2 is first downstream build but doesn't contain rhevh7. please move to ON_QA when official rhevh build will be done, thanks.

Comment 24 Sandro Bonazzola 2014-09-05 08:17:40 UTC
Fabian, wasn't rhevh7 built for vt2.2?

Comment 25 Ying Cui 2014-09-05 08:25:50 UTC
official build in brew:
https://brewweb.devel.redhat.com/buildinfo?buildID=381633

install rhevh with enforcing=0 cmdline to workaround some selinux issues.

This issue in description is gone.
Test version:
rhev-hypervisor7-7.0-20140904.0.iso
ovirt-node-3.1.0-0.10.20140904gitb828c37.el7.noarch
ovirt-hosted-engine-setup-1.2.0-0.2.master.el7.noarch
ovirt-hosted-engine-ha-1.2.1-0.3.master.el7.noarch
ovirt-node-plugin-hosted-engine-0.1.0-0.0.x86_64
ovirt-host-deploy-1.3.0-0.0.1.master.el7.noarch

Comment 28 Jiri Belka 2014-09-10 10:48:51 UTC
vt2.2 still doesn't have hypervisor.

Comment 29 Sandro Bonazzola 2014-09-16 13:42:57 UTC
Fabian, please move back to QA once a new rhev-h iso is available.

Comment 31 wanghui 2014-09-28 04:52:03 UTC
Tested version:
rhev-hypervisor7-7.0-20140926.0.iso
ovirt-node-3.1.0-0.17.20140925git29c3403.el7.noarch

ovirt-host-deploy-1.3.0-0.0.4.master.el7.noarch
ovirt-host-deploy-offline-1.3.0-0.0.2.master.el7.x86_64
ovirt-hosted-engine-setup-1.2.0-1.el7.noarch
ovirt-hosted-engine-ha-1.2.1-1.el7.noarch

Test steps:
1. Clean install rhev-hypervisor7-7.0-20140926.0.iso
2. Configure network with ipv4 dhcp mode
3. Run #hosted-engine --deploy in shell

Test result:
1. After step3, it reports "Failed to execute stage 'Environment setup': Command '/bin/systemctl' failed to execute".

So this issue is not fixed in rhev-hypervisor7-7.0-20140926.0.iso. Change the status from ON_QA to Assigned.


Thanks,
Hui Wang

Comment 32 Fabian Deutsch 2014-09-29 07:08:58 UTC
Hui Wang, could you please try to reproduce this with enforcing=0?

Comment 33 wanghui 2014-09-29 07:34:48 UTC
(In reply to Fabian Deutsch from comment #32)
> Hui Wang, could you please try to reproduce this with enforcing=0?

Hi Fabian,

It works when set enforcing=0. No such issue after set enforcing=0 then.

Thanks
Hui Wang

Comment 34 Fabian Deutsch 2014-09-29 08:34:10 UTC
Hui Wang, could you please attach the logs /var/log/audit.log!

Comment 35 wanghui 2014-09-29 09:01:00 UTC
Created attachment 942237 [details]
provide audit.log as comment#34 asked

provide audit.log as comment#34 asked when enforcing=0

Comment 36 Simone Tiraboschi 2014-10-09 09:08:51 UTC
From the provided audit.log, it seams that SELinux denies sanlock to open its log file and this could be enough to prevent it from starting generating than this bug.

Comment 37 Fabian Deutsch 2014-10-09 09:47:37 UTC
Moving this to ovirt-node. It seems that all files in /var/log are labeled as auditd_log_t, this is causing this, and other, denials - and prevents hosted engine from functioning correctly.

Comment 41 wanghui 2014-12-17 07:11:40 UTC
Tested version:
rhev-hypervisor7-7.0-20141212.0.iso
ovirt-node-3.1.0-0.34.20141210git0c9c493.el7.noarch

Test steps:
1. Clean install rhev-hypervisor7-7.0-20141212.0.iso
2. Configure network with ipv4 dhcp mode
3. Run #hosted-engine --deploy in shell

Test result:
1. After step3, it can deploy the hosted engine without error now.

So this issue is fixed in rhev-hypervisor7-7.0-20141212.0.iso. Change the status from ON_QA to Verified.

Comment 43 errata-xmlrpc 2015-02-11 20:40:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0161.html


Note You need to log in before you can comment on or make changes to this bug.