1482014 – vdsmd fails to start if system time moves backwards during boot

Bug 1482014 - vdsmd fails to start if system time moves backwards during boot

Summary: vdsmd fails to start if system time moves backwards during boot

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	Core
Sub Component:
Version:	4.20.0
Hardware:	x86_64
OS:	Unspecified
Priority:	medium
Severity:	urgent
Target Milestone:	ovirt-4.1.6
Target Release:	4.19.29
Assignee:	Edward Haas
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-08-16 09:29 UTC by Elad
Modified:	2021-09-09 12:33 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-09-19 10:03:00 UTC
oVirt Team:	Network
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.1+ rule-engine: ovirt-4.2+ rule-engine: blocker+

Attachments	(Terms of Use)
logs and journalctl output (724.59 KB, application/x-gzip) 2017-08-16 09:29 UTC, Elad	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHV-43452	None	None	None	2021-09-09 12:33:34 UTC
oVirt gerrit	80750	'None'	MERGED	vdsm tool configurators: bond_defaults - do not depend on time	2020-11-05 15:18:42 UTC
oVirt gerrit	80804	'None'	MERGED	net: place bonding defaults under /var/run	2020-11-05 15:18:26 UTC
oVirt gerrit	80805	'None'	MERGED	vdsm tool configurators: bond_defaults - do not depend on time	2020-11-05 15:18:25 UTC

Description Elad 2017-08-16 09:29:58 UTC

Created attachment 1314028 [details]
logs and journalctl output

Description of problem:
When rebooting the host while vdsmd service is running, the service enters failed state when the host comes up again:


Aug 16 12:13:01 storage-ge2-vdsm1.scl.lab.tlv.redhat.com kernel: BRK [0x01feb000, 0x01febfff] PGTABLE
...skipping...
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: One of the modules is not configured to work with VDSM.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: To configure the module use the following:
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: 'vdsm-tool configure [--module module-name]'.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: If all modules are not configured try to use:
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: 'vdsm-tool configure --force'
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: (The force flag will stop the module's service and start it
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: afterwards automatically to load the new configuration.)
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: abrt is already configured for vdsm
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: lvm is configured for vdsm
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: libvirt is already configured for vdsm
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: Current revision of multipath.conf detected, preserving
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: Modules bond_defaults are not configured
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service: control process exited, code=exited status=1
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[2241]: vdsm: stopped during execute check_is_configured task (task returned with error code 1).
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Dependency failed for MOM instance configured for VDSM purposes.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Job mom-vdsm.service/start failed with result 'dependency'.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Invalid request descriptor
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service holdoff time over, scheduling restart.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: start request repeated too quickly for vdsmd.service
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Dependency failed for MOM instance configured for VDSM purposes.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Job mom-vdsm.service/start failed with result 'dependency'.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state.
Aug 16 12:14:12 storage-ge2-vdsm1.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed.



Version-Release number of selected component (if applicable):
kernel 3.10.0-693.1.1.el7.x86_64 
Red Hat Enterprise Linux Server 7.4 (Maipo)
vdsm-yajsonrpc-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-hook-ethtool-options-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-http-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-api-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-jsonrpc-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-4.20.2-64.git072feb0.el7.centos.x86_64
vdsm-client-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-hook-vmfex-dev-4.20.2-64.git072feb0.el7.centos.noarch
vdsm-python-4.20.2-64.git072feb0.el7.centos.noarch
mom-0.5.10-1.el7.noarch
libvirt-3.2.0-14.el7_4.2.x86_64
qemu-kvm-ev-2.6.0-28.el7.10.1.x86_64
selinux-policy-3.13.1-166.el7.noarch
selinux-policy-targeted-3.13.1-166.el7.noarch
sanlock-3.5.0-1.el7.x86_64
lvm2-2.02.171-8.el7.x86_64


How reproducible:
Always


Steps to Reproduce:
1. Reboot the host while vdsmd service is running and the host is connected to an active storage pool (in my env I had few storage domains from iSCSI, NFS and Gluster storage types and no running VMs)

Actual results:
vdsmd service enters failed state

Expected results:
vdsmd service should be started successfully

Additional info:
logs and journalctl output

Comment 1 Elad 2017-08-16 09:33:31 UTC

Workaround: 
'vdsm-tool configure --force' and start vdsmd service when the host comes up.

Comment 2 Martin Perina 2017-08-16 15:32:50 UTC

Seems to me like some storage issue, Nir could you please take a look?

Comment 3 Nir Soffer 2017-08-16 16:00:58 UTC

Martin, why do you think there is a storage issue?

The issue seems to be:

Aug 16 12:13:33 storage-ge2-vdsm1.scl.lab.tlv.redhat.com vdsmd_init_common.sh[1598]: Modules bond_defaults are not configured

We do not maintain this module.

Comment 4 Martin Perina 2017-08-16 18:50:04 UTC

Ahh, sorry, I'm blind  :-( I saw the line with multipath.conf and somehow overlooked that there's also line with bond_defaults error before line with vdsm startup failure

Comment 5 Edward Haas 2017-08-17 07:53:42 UTC

I do not understand how this is possible. The operation suggested is already applied on upgrade (see spec file):

%posttrans
...
    if ! %{_bindir}/vdsm-tool is-configured >/dev/null 2>&1; then
        %{_bindir}/vdsm-tool configure --force >/dev/null 2>&1
    fi
...

If this was not an upgrade but a clean installation, then the command should have been executed as part of the installation.

And I do not know where has this name came from: bond_defaults. Maybe some old vdsm-tool module?

Comment 6 Edward Haas 2017-08-17 12:43:40 UTC

We will like to connect remotely to this host while the problem exists.
Could you please arrange this?

Comment 7 Elad 2017-08-17 12:56:26 UTC

(In reply to Edward Haas from comment #6)
> We will like to connect remotely to this host while the problem exists.
> Could you please arrange this?

Provided offline (in mail)

Comment 8 Edward Haas 2017-08-17 13:12:02 UTC

Based on the logs, the time has changed in the middle of the boot and therefore the bond options dump validity has failed (it checked that the dump is not older then the time elapsed from the system boot).

Comment 9 Michael Burman 2017-08-17 14:26:52 UTC

Never saw it before and can't reproduce the report on my setup with my hosts. 
Everything working as expected and hosts coming up after reboot. 
vdsm-4.20.2-77.gite43f776.el7.centos

Comment 10 Dan Kenigsberg 2017-08-17 19:52:51 UTC

Elad, it seems that what triggers this bug is the a UTC+3 timezone set on your host. Did you configure this explicitly? if so, how?

Comment 11 Elad 2017-08-20 08:17:27 UTC

(In reply to Dan Kenigsberg from comment #10)
> Elad, it seems that what triggers this bug is the a UTC+3 timezone set on
> your host. Did you configure this explicitly? 
No, this wasn't explicitly configured by me. In fact, this bug occurs on many of our hosts used for automation. They are all provisioned with the same image, which can be the place to look.

Comment 12 Dan Kenigsberg 2017-08-20 08:38:12 UTC

(In reply to Elad from comment #11)
> They are all provisioned with the
> same image, which can be the place to look.

Can you? Or at lease relay to someone who can?

Comment 13 Elad 2017-08-20 12:04:05 UTC

(In reply to Dan Kenigsberg from comment #12)
> (In reply to Elad from comment #11)
> > They are all provisioned with the
> > same image, which can be the place to look.
> 
> Can you? Or at lease relay to someone who can?
The only change that is being done is by the kickstart which changes the TZ to the local time.

Comment 15 Elad 2017-08-27 08:52:19 UTC

vdsmd service starts normally after host reboot.
Tested using the same host that the bug was seen on.

Used:
vdsm-4.20.2-120.git72011e6.el7.centos.x86_64

Comment 16 Sandro Bonazzola 2017-09-01 09:25:10 UTC

Elad, this bug is targeted to 4.1.6 so 4.19.29 should have been used for testing.
4.20 is oVirt 4.2 material.

Comment 17 Elad 2017-09-03 07:18:14 UTC

vdsmd service starts normally after host reboot.


Used:
vdsm-4.19.29-1.el7ev.x86_64

Note You need to log in before you can comment on or make changes to this bug.