Description of problem: Below error is observed after upgrading the host rhvh-4.4.8.1-0.20210903.0 where vdsmd service fails to start: ~~~~ Oct 11 20:28:32 server.example.com vdsmd_init_common.sh[4690]: vdsm: Running run_init_hooks Oct 11 20:28:32 server.example.com vdsmd_init_common.sh[4690]: vdsm: Running check_is_configured Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: Error: Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: One of the modules is not configured to work with VDSM. Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: To configure the module use the following: Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: 'vdsm-tool configure [--module module-name]'. Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: If all modules are not configured try to use: Oct 11 20:28:33 server.example.com vdsmd_init_common.sh[4690]: 'vdsm-tool configure --force' ~~~~ Version-Release number of selected component (if applicable): redhat-virtualization-host-image-update-4.4.8-20210903.0.el8_4.x86_64 How reproducible: Always Steps to Reproduce: 1. Put the host into maintenance mode. 2. Upgrade the host to 4.4.8 Actual results: Host goes into non-responsive state after upgrade as vdsmd service fails to start. Expected results: vdsmd service should start successfully post reboot after upgrading the host Additional info: Running 'vdsm-tool configure --force' fixes the issue.
Sanlock version required does report the number worker threads, so this failure should be impossible. Which sanlock version is running when this error happened? Vdsm took is designed to run after vdsm and all its requirements were installed, and daemons like sanlock restarted so they run the required version.
Looking at the code, it seems that this issue will fail also vdsm-tool configure when running it after upgrading older sanlock (<3.8.3). Sanlock is not restarted after the upgrade, so when "vdsm-tool configure" check if sanlock is configured, it must support missing "max_worker_threads". This was broken by commit 8fb0596b3ae6cf6befae4d9fb4d97cbaeea9c543 tool: sanlock: Validate max_worker_threads Released in v4.40.70.1.
Reproduced on RHEL host: 1. Build sanlock 3.8.2 from source This version does not report max_worker_threads. 2. Force install it: # rpm -i --force ./python3-sanlock-3.8.2-sanlock.3.8.2.el8.x86_64.rpm ./sanlock-3.8.2-sanlock.3.8.2.el8.x86_64.rpm ./sanlock-lib-3.8.2-sanlock.3.8.2.el8.x86_64.rpm 3. Restart sanlock service # systemctl restart sanlock 4. Verify that sanlock daemon does not report max_worker_threads # sanlock client status -D | grep max_worker_threads 5. Check if vdsm is configured # vdsm-tool is-configured lvm is configured for vdsm Managed volume database is already configured Current revision of multipath.conf detected, preserving abrt is already configured for vdsm Traceback (most recent call last): File "/bin/vdsm-tool", line 209, in main return tool_command[cmd]["command"](*args) File "/usr/lib/python3.6/site-packages/vdsm/tool/__init__.py", line 40, in wrapper func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 164, in isconfigured m = [c.name for c in pargs.modules if _isconfigured(c) == configurators.NO] File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 164, in <listcomp> m = [c.name for c in pargs.modules if _isconfigured(c) == configurators.NO] File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 102, in _isconfigured return getattr(module, 'isconfigured', lambda: configurators.NO)() File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 55, in isconfigured if _restart_needed(): File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 173, in _restart_needed if options[key] != value: KeyError: 'max_worker_threads' 6. Configure vdsm # vdsm-tool configure --force Checking configuration status... Managed volume database is already configured Current revision of multipath.conf detected, preserving abrt is already configured for vdsm lvm is configured for vdsm libvirt is already configured for vdsm SUCCESS: ssl configured to true. No conflicts Traceback (most recent call last): File "/bin/vdsm-tool", line 209, in main return tool_command[cmd]["command"](*args) File "/usr/lib/python3.6/site-packages/vdsm/tool/__init__.py", line 40, in wrapper func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 125, in configure configurer_to_trigger = [c for c in pargs.modules File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 126, in <listcomp> if _should_configure(c, pargs.force)] File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 317, in _should_configure configured = _isconfigured(c) File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 102, in _isconfigured return getattr(module, 'isconfigured', lambda: configurators.NO)() File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 55, in isconfigured if _restart_needed(): File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 173, in _restart_needed if options[key] != value: KeyError: 'max_worker_threads' I think what happens in the RHVH upgrade is: 1. Old sanlock (< 3.8.3) 2. Upgrading vdsm install new sanlock 3. When "vdsm-tool configure" was run during the upgrade, it failed 4. The system was not configured, so starting vdsm after the reboot failed. 5. Running vdsm-tool configure --force after the reboot configured the system and fixed the issue. So we have 2 issues: 1. vdsm-tool fails when upgrading old sanlock (fix by Lev patch) 2. Upgrade ignored "vdsm-tool configure" failure - this should be handled in new bug for the component responsible for upgrading the host.
This is a very low risk fix, suggesting for 4.4.9.
This bug has been verified on "redhat-virtualization-host-4.4.9-20213704". Test Versions: RHVM: 4.4.9.4-0.1.el8ev RHVH: Upgrade RHVH from rhvh-4.4.4.1-0.20210201.0+1 to rhvh-4.4.9.2-0.20211104.0+1 Test Steps: 1. Install RHVH-4.4-20210202.0-RHVH-x86_64-dvd1.iso 2. Set up local repo and point to "redhat-virtualization-host-image-update-4.4.9-20213704.x86_64.rpm" 3. Add RHVH to RHVM (The versions of "Data Centers" and "Clusters" are both 4.4) 4. Add a NFS Storage Domain, create a VM and then start the VM 5. Stop the VM 6. Upgrade the host via RHVM 7. After upgrade, check the host status via RHVM Test result: 1. After upgrade, the host reboots successfully and enters new layer. In RHVM, the host status is "Up". 2. The VM can be started after RHVH upgrade. 3. There is no error message related to VDSM configuration in /var/log/messages. Will move bug Status to "VERIFIED".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (0-day RHV RHEL Host (ovirt-host) [ovirt-4.4.9]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4698