Bug 1459481 - Can't shutdown/reboot host with hosted engine
Can't shutdown/reboot host with hosted engine
Status: CLOSED NOTABUG
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General (Show other bugs)
2.1.0.6
x86_64 Linux
medium Severity medium (vote)
: ovirt-4.2.0
: ---
Assigned To: Simone Tiraboschi
meital avital
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-07 05:22 EDT by shyningcrow
Modified: 2017-06-12 05:33 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-12 05:33:24 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Integration
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ylavi: ovirt‑4.2+


Attachments (Terms of Use)
Journald log file (187.35 KB, text/x-vhdl)
2017-06-07 05:22 EDT, shyningcrow
no flags Details

  None (edit)
Description shyningcrow 2017-06-07 05:22:41 EDT
Created attachment 1285723 [details]
Journald log file

Description of problem:
In a test environment, I have one node (CentOS 7.3) on which the hosted engine has been deployed. The machine could be rebooted normally before the installation of oVirt. After the installation of oVirt HE, it refuses to shutdown or reboot, refuses any form of connection (SSH connection refused) and will require a manual reset using the physical button.

Version-Release number of selected component (if applicable):
ovirt-engine-sdk-python-3.6.9.1-1.el7.centos.noarch
ovirt-imageio-common-1.0.0-1.el7.noarch
ovirt-hosted-engine-ha-2.1.0.6-1.el7.centos.noarch
ovirt-hosted-engine-setup-2.1.0.6-1.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7.centos.noarch
ovirt-host-deploy-1.6.5-1.el7.centos.noarch
cockpit-ovirt-dashboard-0.10.7-0.0.18.el7.centos.noarch
ovirt-vmconsole-host-1.0.4-1.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
ovirt-engine-appliance-4.1-20170523.1.el7.centos.noarch
ovirt-imageio-daemon-1.0.0-1.el7.noarch
vdsm-client-4.19.15-1.el7.centos.noarch
vdsm-xmlrpc-4.19.15-1.el7.centos.noarch
vdsm-jsonrpc-4.19.15-1.el7.centos.noarch
vdsm-hook-vmfex-dev-4.19.15-1.el7.centos.noarch
vdsm-api-4.19.15-1.el7.centos.noarch
vdsm-yajsonrpc-4.19.15-1.el7.centos.noarch
vdsm-4.19.15-1.el7.centos.x86_64
vdsm-python-4.19.15-1.el7.centos.noarch
vdsm-cli-4.19.15-1.el7.centos.noarch
libvirt-daemon-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-lxc-2.0.0-10.el7_3.9.x86_64
libvirt-client-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-secret-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-config-network-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-nwfilter-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-config-nwfilter-2.0.0-10.el7_3.9.x86_64
libvirt-2.0.0-10.el7_3.9.x86_64
libvirt-python-2.0.0-2.el7.x86_64
libvirt-daemon-driver-network-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-nodedev-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-qemu-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-kvm-2.0.0-10.el7_3.9.x86_64
libvirt-lock-sanlock-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-interface-2.0.0-10.el7_3.9.x86_64
libvirt-daemon-driver-storage-2.0.0-10.el7_3.9.x86_64
libnfsidmap-0.25-15.el7.x86_64
nfs-utils-1.3.0-0.33.el7_3.x86_64


How reproducible:
Install CentOS, oVirt repositories, perform hosted-engine --deploy or use the cockpit plugin. Try to restart the host.

Steps to Reproduce:
1. Install CentOS 7
2. Install oVirt repositories
3. Setup NFSv3 shares on the host.
4. Disable/Configure SELinux.
4. Install oVirt HE (hosted-engine --deploy; alternatively use the cockpit plugin)
5. Perform a shutdown/reboot, whatever the mean (systemctl poweroff; shutdown; cockpit; HE)

Actual results:
The system hangs and refuses to shutdown, reboot or accept any form of connection (SSH connection refused) for a long (3+ hours), possibly indefinite time.

Expected results:
The system should shutdown/reboot normally, even if it isn't able to migrate the HE, although I suspect that's not the reason. OR at least it should try to kill processes to allow a shutdown.

Additional info:
- SELinux is currently disabled for testing purposes and was disabled during the installation.
- The problem arises when the system isn't yet configured (HE requires an additional storage to be set after installation) and thereafter.
- My current configuration uses NFSv3 shares on the same host which I'm trying to shutdown.
- Vdsm reports an error (BackendFailureException).
- Sanlock probably prevents the system to shutdown since unmounting times out. (My best bet at the moment).
- Stopping Sanlock/Vdsm or both has no effect on the current behaviour.
- Stopping the HE virtual machine before shutting down has no effect.
- Setting the global maintenance mode has no effect and will prevent host shutdown anyway.
- Cockpit Ovirt Plugin doesn't seem to be able to connect to Vdsm, although executing the script manually to query Vdsm API works correctly.

Attaching a journald persistent log file where the problem can be observed. Notice the pages nearing the end of the log. In this log, the machine has been manually shut.

Probably related: http://lists.ovirt.org/pipermail/users/2016-March/071649.html .
Comment 1 shyningcrow 2017-06-09 10:46:48 EDT
A few updates:
 - I was able to access vdsmd through Cockpit Ovirt Plugin as root. Logging in Cockpit as user probably lacks permissions. Hence I don't think it's related to this bug.
 - Stopping the ovirt-ha-agent, ovirt-ha-broker, vdsmd and sanlock (without global maintenance) produces a system reboot (as wdmd probably thinks the node has failed).
 - Stopping sanlock isn't successful. The process isn't stopped by systemd and the unit enters failed state. The process keeps living and prevents umount on the /rhev/... mountpoint.
 - Stopping wdmd (alongside the agents, vdsmd and sanlock) produces a reboot anyway (it shouldn't behave like this).
Comment 2 Sandro Bonazzola 2017-06-12 05:33:24 EDT
Looks like you're running a hosted engine on a single host with local storage (nfs export?).

For shutting it down you need to put the host in global maintenance, then shut down the hosted engine vm, then disconnect the storage from vdsm before shutting down.

This is not a supported deployment anyway, for hosted engine at least 2 hosts should be used.

Closing as not a bug.

Note You need to log in before you can comment on or make changes to this bug.