Bug 1376092

Summary: Cockpit reports vdsmd service is in failed state at initial install (as it was not yet configured)
Product: Red Hat Enterprise Virtualization Manager Reporter: Roger Heslop <rheslop>
Component: vdsmAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Pavol Brilla <pbrilla>
Severity: low Docs Contact:
Priority: medium    
Version: 4.0.0CC: bazulay, cshao, dfediuck, dguo, edwardh, fdeutsch, huzhao, jiawu, lsurette, mgoldboi, mperina, pstehlik, rbarry, rheslop, srevivo, weiwang, yaniwang, ybronhei, ycui, ykaul, yzhao
Target Milestone: ovirt-4.2.0Flags: rbarry: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-21 15:50:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Screen shot of health status as seen through cockpit none

Description Roger Heslop 2016-09-14 16:25:22 UTC
Description of problem:

vdsmd service is in failed state after initial install, will not start.



Version-Release number of selected component (if applicable): 4.0 / vdsm 4.18.11-1


How reproducible:

Steps to Reproduce:

1.Install RHV-H from downloadable install media "RHVH-4.0-20160907.4-RHVH-x86_64-dvd1.iso"

2. Ensure hostname is resolveable
[root@rhev-0 ~]# ping rhev-0
PING rhev-0.vmnet.local (192.168.99.32) 56(84) bytes of data.
64 bytes from rhev-0.vmnet.local (192.168.99.32): icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from rhev-0.vmnet.local (192.168.99.32): icmp_seq=2 ttl=64 time=0.013 ms
^C
3. run 'systemctl status vdsmd'
[root@rhev-0 ~]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: failed (Result: start-limit) since Wed 2016-09-14 11:18:32 CDT; 5min ago
  Process: 10003 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=1/FAILURE)

Sep 14 11:18:31 rhev-0.vmnet.local systemd[1]: vdsmd.service: control process exited, code=exited status=1
Sep 14 11:18:31 rhev-0.vmnet.local systemd[1]: Failed to start Virtual Desktop Server Manager.
Sep 14 11:18:31 rhev-0.vmnet.local systemd[1]: Unit vdsmd.service entered failed state.
Sep 14 11:18:31 rhev-0.vmnet.local systemd[1]: vdsmd.service failed.
Sep 14 11:18:32 rhev-0.vmnet.local systemd[1]: vdsmd.service holdoff time over, scheduling restart.
Sep 14 11:18:32 rhev-0.vmnet.local systemd[1]: start request repeated too quickly for vdsmd.service
Sep 14 11:18:32 rhev-0.vmnet.local systemd[1]: Failed to start Virtual Desktop Server Manager.
Sep 14 11:18:32 rhev-0.vmnet.local systemd[1]: Unit vdsmd.service entered failed state.
Sep 14 11:18:32 rhev-0.vmnet.local systemd[1]: vdsmd.service failed.

Actual results:
vdsmd is in failed state

Expected results:
vdsmd should be running at initial install.

Additional info:

From /var/log/messages:

Sep 13 14:55:59 rhev-0 sasldblistusers2: _sasldb_getkeyhandle has failed
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: Error:
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: One of the modules is not configured to work with VDSM.
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: To configure the module use the following:
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: 'vdsm-tool configure [--module module-name]'.
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: If all modules are not configured try to use:
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: 'vdsm-tool configure --force'
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: (The force flag will stop the module's service and start it
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: afterwards automatically to load the new configuration.)
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: multipath requires configuration
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: libvirt is not configured for vdsm yet
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: Modules certificates, sebool, multipath, passwd, sanlock, libvirt are not configured
Sep 13 14:55:59 rhev-0 vdsmd_init_common.sh: vdsm: stopped during execute check_is_configured task (task returned with error code 1).

If I run "vdsm-tool configure --force"

[root@rhev-0 ~]# vdsm-tool configure --force
/usr/lib/python2.7/site-packages/vdsm/tool/dump_volume_chains.py:28: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  from vdsm import vdscli

Checking configuration status...

multipath requires configuration
libvirt is not configured for vdsm yet
FAILED: conflicting vdsm and libvirt-qemu tls configuration.
vdsm.conf with ssl=True requires the following changes:
libvirtd.conf: listen_tcp=0, auth_tcp="sasl", listen_tls=1
qemu.conf: spice_tls=1.
...

To get vdsmd working run:
sed -i 's/#listen_tls = 0/listen_tls = 1/' /etc/libvirt/libvirtd.conf 
sed -i 's/#listen_tcp = 1/listen_tcp = 0/' /etc/libvirt/libvirtd.conf
sed -i '/auth_tcp/s/^#//' /etc/libvirt/libvirtd.conf
sed -i '/spice_tls /s/^#//' /etc/libvirt/qemu.conf
vdsm-tool configure --force
systemctl start vdsmd

Comment 1 Roger Heslop 2016-09-14 19:53:32 UTC
Created attachment 1200962 [details]
Screen shot of health status as seen through cockpit

Comment 2 Roger Heslop 2016-09-14 19:53:53 UTC
It does appear as though vdsmd gets configured and subsequently started during hosted-engine setup.  (I've moved on to troubleshooting another unrelated problem).

The status of the vdsmd service came to light when opening cockpit to run through the HE setup process; the health status of the node within cockpit reads 'bad'.  (See attachment)

This apparently will be the default state of a node prior to HE setup, and I imagine will cause confusion for those using RHV-H hypervisor and cockpit for HE installation.

Comment 3 Fabian Deutsch 2016-09-14 22:21:06 UTC
The problem is really that vdsmd is enabled by default on RHVH - and because vdsm is unconfigured by default, it fails to come up.
vdsm itself is enabling itself by default.

And because these are vdsm's defaults, I wonder if it should be solved there.

Edward, what do you think?
Should vdsm not be enabled by default? Or could we add a conditional to the unit file, to only start vdsmd if i.e. the client certs are installed?

This problem should actually be reproducable on RHEL:
1. install vdsmd
2. reboot
3. check vdsmd status

vdsmd status should be failed

Comment 4 Edward Haas 2016-09-20 09:21:40 UTC
Yaniv, do you recall why do we enable vdsm on installation (via presets) and not wait for vdsm-tool configure?

https://gerrit.ovirt.org/43032

Comment 5 Yaniv Bronhaim 2016-09-20 10:05:07 UTC
I don't see any problem with that.. after reboot vdsmd won't be up even if its enabled without the configure call. When installing the package I do want that as part of the rpm installation it will enable vdsmd in systemd automatically 

host-deploy, rhv-h setup or administrator that install vdsm rpms directly should take care for the configure call after the installation

Comment 6 Yaniv Bronhaim 2017-08-02 09:30:52 UTC
Fabian, I don't see how it is an infra bug. Please explain .. the bug is about requesting us to remove the "enable by default"? 
Node should manage this kind of installation flow properly .. it runs vdsm-tool configure during boot, doesn't it?

Comment 7 Oved Ourfali 2017-09-27 10:31:30 UTC
Moving to Node for further investigation.

Comment 8 Ryan Barry 2017-10-02 12:38:29 UTC
(In reply to Yaniv Bronhaim from comment #6)
> Fabian, I don't see how it is an infra bug. Please explain .. the bug is
> about requesting us to remove the "enable by default"? 
> Node should manage this kind of installation flow properly .. it runs
> vdsm-tool configure during boot, doesn't it?

Node does manage this -- we do run "vdsm-tool configure --force" as part of rebooting. We also filter vdsm status in 'nodectl check' by checking to see if it ever actually started (rather than Starting ...), and fake the status being ok if it's never come up. So comment#2 is not valid -- we report a 'good' status

Obviously this is just for basic sanity, and wouldn't do anything to resolve this bug (an administrator looking at journalctl).

Comment 9 Yaniv Bronhaim 2017-10-15 08:05:18 UTC
Roger, can you reach to the journal log from the same time vdsm failed to start? in the ticket you have logs when vdsm is already running and I can't see if the configure was indeed called

Comment 10 Martin Perina 2017-11-21 15:50:00 UTC
Closing as insufficient data, feel free to reopen if reproduces again and add requested information