Bug 1482014 - vdsmd fails to start if system time moves backwards during boot
Summary: vdsmd fails to start if system time moves backwards during boot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.0
Hardware: x86_64
OS: Unspecified
medium
urgent
Target Milestone: ovirt-4.1.6
: 4.19.29
Assignee: Edward Haas
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-16 09:29 UTC by Elad
Modified: 2021-09-09 12:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-19 10:03:00 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
logs and journalctl output (724.59 KB, application/x-gzip)
2017-08-16 09:29 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43452 0 None None None 2021-09-09 12:33:34 UTC
oVirt gerrit 80750 0 'None' MERGED vdsm tool configurators: bond_defaults - do not depend on time 2020-11-05 15:18:42 UTC
oVirt gerrit 80804 0 'None' MERGED net: place bonding defaults under /var/run 2020-11-05 15:18:26 UTC
oVirt gerrit 80805 0 'None' MERGED vdsm tool configurators: bond_defaults - do not depend on time 2020-11-05 15:18:25 UTC

Description Elad 2017-08-16 09:29:58 UTC
Created attachment 1314028 [details]
logs and journalctl output

Description of problem:
When rebooting the host while vdsmd service is running, the service enters failed state when the host comes up again:


Aug 16 12:13:01 storage-ge2-vdsm1.scl.lab.tlv.redhat.com kernel: BRK [0x01feb000, 0x01febfff] PGTABLE
...skipping...
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: One of the modules is not configured to work with VDSM.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: To configure the module use the following:
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: 'vdsm-tool configure [--module module-name]'.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: If all modules are not configured try to use:
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: 'vdsm-tool configure --force'
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: (The force flag will stop the module's service and start it
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: afterwards automatically to load the new configuration.)
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: abrt is already configured for vdsm
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: lvm is configured for vdsm
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: libvirt is already configured for vdsm
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: Current revision of multipath.conf detected, preserving
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: Modules bond_defaults are not configured
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service: control process exited, code=exited status=1
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: vdsm: stopped during execute check_is_configured task (task returned with error code 1).
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Dependency failed for MOM instance configured for VDSM purposes.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Job mom-vdsm.service/start failed with result 'dependency'.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Invalid request descriptor
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service holdoff time over, scheduling restart.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: start request repeated too quickly for vdsmd.service
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Dependency failed for MOM instance configured for VDSM purposes.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Job mom-vdsm.service/start failed with result 'dependency'.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed.



Version-Release number of selected component (if applicable):
kernel 3.10.0-693.1.1.el7.x86_64 
Red Hat Enterprise Linux Server 7.4 (Maipo)
vdsm-yajsonrpc-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-hook-ethtool-options-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-http-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-api-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-jsonrpc-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-4.20.2-64.git072feb0.el7.centos.x86_64
vdsm-client-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-hook-vmfex-dev-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-python-4.20.2-64.git072feb0.el7.centos.noarch
mom-0.5.10-1.el7.noarch
libvirt-3.2.0-14.el7_4.2.x86_64
qemu-kvm-ev-2.6.0-28.el7.10.1.x86_64
selinux-policy-3.13.1-166.el7.noarch
selinux-policy-targeted-3.13.1-166.el7.noarch
sanlock-3.5.0-1.el7.x86_64
lvm2-2.02.171-8.el7.x86_64


How reproducible:
Always


Steps to Reproduce:
1. Reboot the host while vdsmd service is running and the host is connected to an active storage pool (in my env I had few storage domains from iSCSI, NFS and Gluster storage types and no running VMs)

Actual results:
vdsmd service enters failed state

Expected results:
vdsmd service should be started successfully

Additional info:
logs and journalctl output

Comment 1 Elad 2017-08-16 09:33:31 UTC
Workaround: 
'vdsm-tool configure --force' and start vdsmd service when the host comes up.

Comment 2 Martin Perina 2017-08-16 15:32:50 UTC
Seems to me like some storage issue, Nir could you please take a look?

Comment 3 Nir Soffer 2017-08-16 16:00:58 UTC
Martin, why do you think there is a storage issue?

The issue seems to be:

Aug 16 12:13:33 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[1598]: Modules bond_defaults are not configured

We do not maintain this module.

Comment 4 Martin Perina 2017-08-16 18:50:04 UTC
Ahh, sorry, I'm blind  :-( I saw the line with multipath.conf and somehow overlooked that there's also line with bond_defaults error before line with vdsm startup failure

Comment 5 Edward Haas 2017-08-17 07:53:42 UTC
I do not understand how this is possible. The operation suggested is already applied on upgrade (see spec file):

%posttrans
...
    if ! %{_bindir}/vdsm-tool is-configured >/dev/null 2>&1; then
        %{_bindir}/vdsm-tool configure --force >/dev/null 2>&1
    fi
...

If this was not an upgrade but a clean installation, then the command should have been executed as part of the installation.

And I do not know where has this name came from: bond_defaults. Maybe some old vdsm-tool module?

Comment 6 Edward Haas 2017-08-17 12:43:40 UTC
We will like to connect remotely to this host while the problem exists.
Could you please arrange this?

Comment 7 Elad 2017-08-17 12:56:26 UTC
(In reply to Edward Haas from comment #6)
> We will like to connect remotely to this host while the problem exists.
> Could you please arrange this?

Provided offline (in mail)

Comment 8 Edward Haas 2017-08-17 13:12:02 UTC
Based on the logs, the time has changed in the middle of the boot and therefore the bond options dump validity has failed (it checked that the dump is not older then the time elapsed from the system boot).

Comment 9 Michael Burman 2017-08-17 14:26:52 UTC
Never saw it before and can't reproduce the report on my setup with my hosts. 
Everything working as expected and hosts coming up after reboot. 
vdsm-4.20.2-77.gite43f776.el7.centos

Comment 10 Dan Kenigsberg 2017-08-17 19:52:51 UTC
Elad, it seems that what triggers this bug is the a UTC+3 timezone set on your host. Did you configure this explicitly? if so, how?

Comment 11 Elad 2017-08-20 08:17:27 UTC
(In reply to Dan Kenigsberg from comment #10)
> Elad, it seems that what triggers this bug is the a UTC+3 timezone set on
> your host. Did you configure this explicitly? 
No, this wasn't explicitly configured by me. In fact, this bug occurs on many of our hosts used for automation. They are all provisioned with the same image, which can be the place to look.

Comment 12 Dan Kenigsberg 2017-08-20 08:38:12 UTC
(In reply to Elad from comment #11)
> They are all provisioned with the
> same image, which can be the place to look.

Can you? Or at lease relay to someone who can?

Comment 13 Elad 2017-08-20 12:04:05 UTC
(In reply to Dan Kenigsberg from comment #12)
> (In reply to Elad from comment #11)
> > They are all provisioned with the
> > same image, which can be the place to look.
> 
> Can you? Or at lease relay to someone who can?
The only change that is being done is by the kickstart which changes the TZ to the local time.

Comment 15 Elad 2017-08-27 08:52:19 UTC
vdsmd service starts normally after host reboot.
Tested using the same host that the bug was seen on.

Used:
vdsm-4.20.2-120.git72011e6.el7.centos.x86_64

Comment 16 Sandro Bonazzola 2017-09-01 09:25:10 UTC
Elad, this bug is targeted to 4.1.6 so 4.19.29 should have been used for testing.
4.20 is oVirt 4.2 material.

Comment 17 Elad 2017-09-03 07:18:14 UTC
vdsmd service starts normally after host reboot.


Used:
vdsm-4.19.29-1.el7ev.x86_64


Note You need to log in before you can comment on or make changes to this bug.