Bug 2013383
| Summary: | After upgrading RHV-H to 4.4.8, the vdsmd service fails to start(even after putting the host in maintenance before upgrading) | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Abhishekh Patil <abpatil> |
| Component: | vdsm | Assignee: | Lev Veyde <lveyde> |
| Status: | CLOSED ERRATA | QA Contact: | peyu |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.4.8 | CC: | aperotti, arachman, cshao, ddacosta, lsurette, lsvaty, lveyde, mavital, peyu, sanja, sbonazzo, shlei, srevivo, weiwang, yaniwang, ycui |
| Target Milestone: | ovirt-4.4.9-1 | Keywords: | Regression |
| Target Release: | 4.4.9 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | vdsm-4.40.90.3 | Doc Type: | Bug Fix |
| Doc Text: |
Previously, upgrading from an old sanlock version would silently fail and leave the system partially configured. This happened because the older sanlock version would not report some of the runtime configuration values. The current release fixes this issue.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-11-16 13:54:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Abhishekh Patil
2021-10-12 17:47:01 UTC
Sanlock version required does report the number worker threads, so this failure should be impossible. Which sanlock version is running when this error happened? Vdsm took is designed to run after vdsm and all its requirements were installed, and daemons like sanlock restarted so they run the required version. Looking at the code, it seems that this issue will fail also
vdsm-tool configure
when running it after upgrading older sanlock (<3.8.3). Sanlock is not
restarted after the upgrade, so when "vdsm-tool configure" check if
sanlock is configured, it must support missing "max_worker_threads".
This was broken by commit 8fb0596b3ae6cf6befae4d9fb4d97cbaeea9c543
tool: sanlock: Validate max_worker_threads
Released in v4.40.70.1.
Reproduced on RHEL host:
1. Build sanlock 3.8.2 from source
This version does not report max_worker_threads.
2. Force install it:
# rpm -i --force ./python3-sanlock-3.8.2-sanlock.3.8.2.el8.x86_64.rpm ./sanlock-3.8.2-sanlock.3.8.2.el8.x86_64.rpm ./sanlock-lib-3.8.2-sanlock.3.8.2.el8.x86_64.rpm
3. Restart sanlock service
# systemctl restart sanlock
4. Verify that sanlock daemon does not report max_worker_threads
# sanlock client status -D | grep max_worker_threads
5. Check if vdsm is configured
# vdsm-tool is-configured
lvm is configured for vdsm
Managed volume database is already configured
Current revision of multipath.conf detected, preserving
abrt is already configured for vdsm
Traceback (most recent call last):
File "/bin/vdsm-tool", line 209, in main
return tool_command[cmd]["command"](*args)
File "/usr/lib/python3.6/site-packages/vdsm/tool/__init__.py", line 40, in wrapper
func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 164, in isconfigured
m = [c.name for c in pargs.modules if _isconfigured(c) == configurators.NO]
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 164, in <listcomp>
m = [c.name for c in pargs.modules if _isconfigured(c) == configurators.NO]
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 102, in _isconfigured
return getattr(module, 'isconfigured', lambda: configurators.NO)()
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 55, in isconfigured
if _restart_needed():
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 173, in _restart_needed
if options[key] != value:
KeyError: 'max_worker_threads'
6. Configure vdsm
# vdsm-tool configure --force
Checking configuration status...
Managed volume database is already configured
Current revision of multipath.conf detected, preserving
abrt is already configured for vdsm
lvm is configured for vdsm
libvirt is already configured for vdsm
SUCCESS: ssl configured to true. No conflicts
Traceback (most recent call last):
File "/bin/vdsm-tool", line 209, in main
return tool_command[cmd]["command"](*args)
File "/usr/lib/python3.6/site-packages/vdsm/tool/__init__.py", line 40, in wrapper
func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 125, in configure
configurer_to_trigger = [c for c in pargs.modules
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 126, in <listcomp>
if _should_configure(c, pargs.force)]
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 317, in _should_configure
configured = _isconfigured(c)
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurator.py", line 102, in _isconfigured
return getattr(module, 'isconfigured', lambda: configurators.NO)()
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 55, in isconfigured
if _restart_needed():
File "/usr/lib/python3.6/site-packages/vdsm/tool/configurators/sanlock.py", line 173, in _restart_needed
if options[key] != value:
KeyError: 'max_worker_threads'
I think what happens in the RHVH upgrade is:
1. Old sanlock (< 3.8.3)
2. Upgrading vdsm install new sanlock
3. When "vdsm-tool configure" was run during the upgrade, it failed
4. The system was not configured, so starting vdsm after the reboot failed.
5. Running vdsm-tool configure --force after the reboot configured the
system and fixed the issue.
So we have 2 issues:
1. vdsm-tool fails when upgrading old sanlock (fix by Lev patch)
2. Upgrade ignored "vdsm-tool configure" failure - this should be handled
in new bug for the component responsible for upgrading the host.
This is a very low risk fix, suggesting for 4.4.9. This bug has been verified on "redhat-virtualization-host-4.4.9-20213704". Test Versions: RHVM: 4.4.9.4-0.1.el8ev RHVH: Upgrade RHVH from rhvh-4.4.4.1-0.20210201.0+1 to rhvh-4.4.9.2-0.20211104.0+1 Test Steps: 1. Install RHVH-4.4-20210202.0-RHVH-x86_64-dvd1.iso 2. Set up local repo and point to "redhat-virtualization-host-image-update-4.4.9-20213704.x86_64.rpm" 3. Add RHVH to RHVM (The versions of "Data Centers" and "Clusters" are both 4.4) 4. Add a NFS Storage Domain, create a VM and then start the VM 5. Stop the VM 6. Upgrade the host via RHVM 7. After upgrade, check the host status via RHVM Test result: 1. After upgrade, the host reboots successfully and enters new layer. In RHVM, the host status is "Up". 2. The VM can be started after RHVH upgrade. 3. There is no error message related to VDSM configuration in /var/log/messages. Will move bug Status to "VERIFIED". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (0-day RHV RHEL Host (ovirt-host) [ovirt-4.4.9]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4698 |