Bug 2013383 - After upgrading RHV-H to 4.4.8, the vdsmd service fails to start(even after putting the host in maintenance before upgrading)
Summary: After upgrading RHV-H to 4.4.8, the vdsmd service fails to start(even after p...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.4.8
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.4.9-1
: 4.4.9
Assignee: Lev Veyde
QA Contact: peyu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-12 17:47 UTC by Abhishekh Patil
Modified: 2024-12-20 21:22 UTC (History)
16 users (show)

Fixed In Version: vdsm-4.40.90.3
Doc Type: Bug Fix
Doc Text:
Previously, upgrading from an old sanlock version would silently fail and leave the system partially configured. This happened because the older sanlock version would not report some of the runtime configuration values. The current release fixes this issue.
Clone Of:
Environment:
Last Closed: 2021-11-16 13:54:05 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43798 0 None None None 2021-10-12 17:49:43 UTC
Red Hat Knowledge Base (Solution) 6430161 0 None None None 2021-10-18 12:12:51 UTC
Red Hat Product Errata RHBA-2021:4698 0 None None None 2021-11-16 13:54:06 UTC
oVirt gerrit 117170 0 master MERGED tool: sanlock: Fixed upgrading old sanlock 2021-10-17 13:35:16 UTC
oVirt gerrit 117176 0 ovirt-4.4.8 ABANDONED tool: sanlock: Fixed upgrading old sanlock 2021-10-20 13:01:03 UTC
oVirt gerrit 117177 0 ovirt-4.4.z MERGED tool: sanlock: Fixed upgrading old sanlock 2021-10-18 12:57:50 UTC

Description Abhishekh Patil 2021-10-12 17:47:01 UTC
Description of problem:

Below error is observed after upgrading the host rhvh-4.4.8.1-0.20210903.0 where vdsmd service fails to start:
~~~~
Oct 11 20:28:32 server.example.com vdsmd_init_common.sh[4690]: vdsm: Running run_init_hooks
Oct 11 20:28:32 server.example.com vdsmd_init_common.sh[4690]: vdsm: Running check_is_configured
Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: Error:
Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: One of the modules is not configured to work with VDSM.
Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: To configure the module use the following:
Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: 'vdsm-tool configure [--module module-name]'.
Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: If all modules are not configured try to use:
Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: 'vdsm-tool configure --force'
~~~~


Version-Release number of selected component (if applicable):
redhat-virtualization-host-image-update-4.4.8-20210903.0.el8_4.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Put the host into maintenance mode.
2. Upgrade the host to 4.4.8


Actual results:

Host goes into non-responsive state after upgrade as vdsmd service fails to start. 

Expected results:

vdsmd service should start successfully post reboot after upgrading the host 

Additional info:

Running 'vdsm-tool configure --force' fixes the issue.

Comment 10 Nir Soffer 2021-10-15 18:22:48 UTC
Sanlock version required does report the number worker threads, so this failure should be impossible.

Which sanlock version is running when this error happened?

Vdsm took is designed to run after vdsm and all its requirements were installed, and daemons like sanlock restarted so they run the required version.

Comment 11 Nir Soffer 2021-10-15 21:53:49 UTC
Looking at the code, it seems that this issue will fail also

    vdsm-tool configure

when running it after upgrading older sanlock (<3.8.3). Sanlock is not 
restarted after the upgrade, so when "vdsm-tool configure" check if
sanlock is configured, it must support missing "max_worker_threads".

This was broken by commit 8fb0596b3ae6cf6befae4d9fb4d97cbaeea9c543

    tool: sanlock: Validate max_worker_threads

Released in v4.40.70.1.

Comment 13 Nir Soffer 2021-10-15 23:17:01 UTC
Reproduced on RHEL host:

1. Build sanlock 3.8.2 from source

This version does not report max_worker_threads.

2. Force install it:

# rpm -i --force ./python3-sanlock-3.8.2-sanlock.3.8.2.el8.x86_64.rpm ./sanlock-3.8.2-sanlock.3.8.2.el8.x86_64.rpm ./sanlock-lib-3.8.2-sanlock.3.8.2.el8.x86_64.rpm

3. Restart sanlock service

# systemctl restart sanlock

4. Verify that sanlock daemon does not report max_worker_threads

# sanlock client status -D | grep max_worker_threads

5. Check if vdsm is configured

# vdsm-tool is-configured
lvm is configured for vdsm
Managed volume database is already configured
Current revision of multipath.conf detected, preserving
abrt is already configured for vdsm
Traceback (most recent call last):
  File "/bin/vdsm-tool", line 209, in main
    return tool_command[cmd]["command"](*args)
  File "/usr/lib/python3.6/site-packages/vdsm/tool/__init__.py", line 40, in wrapper
    func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 164, in isconfigured
    m = [c.name for c in pargs.modules if _isconfigured(c) == configurators.NO]
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 164, in <listcomp>
    m = [c.name for c in pargs.modules if _isconfigured(c) == configurators.NO]
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 102, in _isconfigured
    return getattr(module, 'isconfigured', lambda: configurators.NO)()
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 55, in isconfigured
    if _restart_needed():
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 173, in _restart_needed
    if options[key] != value:
KeyError: 'max_worker_threads'

6. Configure vdsm

# vdsm-tool configure --force

Checking configuration status...

Managed volume database is already configured
Current revision of multipath.conf detected, preserving
abrt is already configured for vdsm
lvm is configured for vdsm
libvirt is already configured for vdsm
SUCCESS: ssl configured to true. No conflicts
Traceback (most recent call last):
  File "/bin/vdsm-tool", line 209, in main
    return tool_command[cmd]["command"](*args)
  File "/usr/lib/python3.6/site-packages/vdsm/tool/__init__.py", line 40, in wrapper
    func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 125, in configure
    configurer_to_trigger = [c for c in pargs.modules
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 126, in <listcomp>
    if _should_configure(c, pargs.force)]
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 317, in _should_configure
    configured = _isconfigured(c)
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 102, in _isconfigured
    return getattr(module, 'isconfigured', lambda: configurators.NO)()
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 55, in isconfigured
    if _restart_needed():
  File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 173, in _restart_needed
    if options[key] != value:
KeyError: 'max_worker_threads'


I think what happens in the RHVH upgrade is:

1. Old sanlock (< 3.8.3)
2. Upgrading vdsm install new sanlock
3. When "vdsm-tool configure" was run during the upgrade, it failed
4. The system was not configured, so starting vdsm after the reboot failed.
5. Running vdsm-tool configure --force after the reboot configured the
   system and fixed the issue.

So we have 2 issues:
1. vdsm-tool fails when upgrading old sanlock (fix by Lev patch)
2. Upgrade ignored "vdsm-tool configure" failure - this should be handled
   in new bug for the component responsible for upgrading the host.

Comment 15 Nir Soffer 2021-10-18 11:51:28 UTC
This is a very low risk fix, suggesting for 4.4.9.

Comment 20 peyu 2021-11-05 06:55:01 UTC
This bug has been verified on "redhat-virtualization-host-4.4.9-20213704".

Test Versions:
RHVM: 4.4.9.4-0.1.el8ev
RHVH: Upgrade RHVH from rhvh-4.4.4.1-0.20210201.0+1 to rhvh-4.4.9.2-0.20211104.0+1

Test Steps:
1. Install RHVH-4.4-20210202.0-RHVH-x86_64-dvd1.iso
2. Set up local repo and point to "redhat-virtualization-host-image-update-4.4.9-20213704.x86_64.rpm"
3. Add RHVH to RHVM (The versions of "Data Centers" and "Clusters" are both 4.4)
4. Add a NFS Storage Domain, create a VM and then start the VM
5. Stop the VM
6. Upgrade the host via RHVM
7. After upgrade, check the host status via RHVM

Test result:
1. After upgrade, the host reboots successfully and enters new layer. In RHVM, the host status is "Up".
2. The VM can be started after RHVH upgrade.
3. There is no error message related to VDSM configuration in /var/log/messages.

Will move bug Status to "VERIFIED".

Comment 24 errata-xmlrpc 2021-11-16 13:54:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (0-day RHV RHEL Host (ovirt-host) [ovirt-4.4.9]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4698


Note You need to log in before you can comment on or make changes to this bug.