Created attachment 1314028 [details] logs and journalctl output Description of problem: When rebooting the host while vdsmd service is running, the service enters failed state when the host comes up again: Aug 16 12:13:01 storage-ge2-vdsm1.scl.lab.tlv.redhat.com kernel: BRK [0x01feb000, 0x01febfff] PGTABLE ...skipping... Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: One of the modules is not configured to work with VDSM. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: To configure the module use the following: Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: 'vdsm-tool configure [--module module-name]'. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: If all modules are not configured try to use: Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: 'vdsm-tool configure --force' Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: (The force flag will stop the module's service and start it Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: afterwards automatically to load the new configuration.) Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: abrt is already configured for vdsm Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: lvm is configured for vdsm Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: libvirt is already configured for vdsm Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: Current revision of multipath.conf detected, preserving Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: Modules bond_defaults are not configured Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service: control process exited, code=exited status=1 Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: vdsm: stopped during execute check_is_configured task (task returned with error code 1). Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Dependency failed for MOM instance configured for VDSM purposes. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Job mom-vdsm.service/start failed with result 'dependency'. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Invalid request descriptor Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service holdoff time over, scheduling restart. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: start request repeated too quickly for vdsmd.service Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Dependency failed for MOM instance configured for VDSM purposes. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Job mom-vdsm.service/start failed with result 'dependency'. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state. Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed. Version-Release number of selected component (if applicable): kernel 3.10.0-693.1.1.el7.x86_64 Red Hat Enterprise Linux Server 7.4 (Maipo) vdsm-yajsonrpc-4.20.2-64.git072feb0.el7.centos.noarch vdsm-hook-ethtool-options-4.20.2-64.git072feb0.el7.centos.noarch vdsm-http-4.20.2-64.git072feb0.el7.centos.noarch vdsm-api-4.20.2-64.git072feb0.el7.centos.noarch vdsm-jsonrpc-4.20.2-64.git072feb0.el7.centos.noarch vdsm-4.20.2-64.git072feb0.el7.centos.x86_64 vdsm-client-4.20.2-64.git072feb0.el7.centos.noarch vdsm-hook-vmfex-dev-4.20.2-64.git072feb0.el7.centos.noarch vdsm-python-4.20.2-64.git072feb0.el7.centos.noarch mom-0.5.10-1.el7.noarch libvirt-3.2.0-14.el7_4.2.x86_64 qemu-kvm-ev-2.6.0-28.el7.10.1.x86_64 selinux-policy-3.13.1-166.el7.noarch selinux-policy-targeted-3.13.1-166.el7.noarch sanlock-3.5.0-1.el7.x86_64 lvm2-2.02.171-8.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Reboot the host while vdsmd service is running and the host is connected to an active storage pool (in my env I had few storage domains from iSCSI, NFS and Gluster storage types and no running VMs) Actual results: vdsmd service enters failed state Expected results: vdsmd service should be started successfully Additional info: logs and journalctl output
Workaround: 'vdsm-tool configure --force' and start vdsmd service when the host comes up.
Seems to me like some storage issue, Nir could you please take a look?
Martin, why do you think there is a storage issue? The issue seems to be: Aug 16 12:13:33 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[1598]: Modules bond_defaults are not configured We do not maintain this module.
Ahh, sorry, I'm blind :-( I saw the line with multipath.conf and somehow overlooked that there's also line with bond_defaults error before line with vdsm startup failure
I do not understand how this is possible. The operation suggested is already applied on upgrade (see spec file): %posttrans ... if ! %{_bindir}/vdsm-tool is-configured >/dev/null 2>&1; then %{_bindir}/vdsm-tool configure --force >/dev/null 2>&1 fi ... If this was not an upgrade but a clean installation, then the command should have been executed as part of the installation. And I do not know where has this name came from: bond_defaults. Maybe some old vdsm-tool module?
We will like to connect remotely to this host while the problem exists. Could you please arrange this?
(In reply to Edward Haas from comment #6) > We will like to connect remotely to this host while the problem exists. > Could you please arrange this? Provided offline (in mail)
Based on the logs, the time has changed in the middle of the boot and therefore the bond options dump validity has failed (it checked that the dump is not older then the time elapsed from the system boot).
Never saw it before and can't reproduce the report on my setup with my hosts. Everything working as expected and hosts coming up after reboot. vdsm-4.20.2-77.gite43f776.el7.centos
Elad, it seems that what triggers this bug is the a UTC+3 timezone set on your host. Did you configure this explicitly? if so, how?
(In reply to Dan Kenigsberg from comment #10) > Elad, it seems that what triggers this bug is the a UTC+3 timezone set on > your host. Did you configure this explicitly? No, this wasn't explicitly configured by me. In fact, this bug occurs on many of our hosts used for automation. They are all provisioned with the same image, which can be the place to look.
(In reply to Elad from comment #11) > They are all provisioned with the > same image, which can be the place to look. Can you? Or at lease relay to someone who can?
(In reply to Dan Kenigsberg from comment #12) > (In reply to Elad from comment #11) > > They are all provisioned with the > > same image, which can be the place to look. > > Can you? Or at lease relay to someone who can? The only change that is being done is by the kickstart which changes the TZ to the local time.
vdsmd service starts normally after host reboot. Tested using the same host that the bug was seen on. Used: vdsm-4.20.2-120.git72011e6.el7.centos.x86_64
Elad, this bug is targeted to 4.1.6 so 4.19.29 should have been used for testing. 4.20 is oVirt 4.2 material.
vdsmd service starts normally after host reboot. Used: vdsm-4.19.29-1.el7ev.x86_64