Description of problem: Upgrading RHV-H from 4.4.10 to 4.5.3-202302150956_8.6 breaks with an obscure vdsm error, which breaks subsequent re-installations Version-Release number of selected component (if applicable): 4.5.3 How reproducible: At will Steps to Reproduce: 1. Start with a working 4.4-SP1 RHVM and RHV-H 4.4 hosts. 2. From the RHVM GUI, graphically update a RHV-H host. 3. RHVM reports an upgrade failure. 4. Reboot the host. 5. RHVM incorrectly reports the host is up to date. 6. From the host, try dnf reinstall redhat-virtualization-host-image-update. 7. The post-installation script fails, but dnf still reports a successful upgrade. Actual results: After several minutes, the RHV-M GUI reports an upgrade failure. The host goes non-responsive. On the host, "nodectl check" shows a VDSM problem. Further digging shows a VDSM problem with sebool. The vdsmd service no longer starts. Fix the VDSM problem (see below) and reboot the host. RMVM incorrectly shows the host is up to date. The failed upgrade leaves a legacy of a bogus LV and two loop mounts on the host. The legacy LV breaks subsequent dnf reinstall attempts from an ssh session on the host, but dnf reports a successful upgrade anyway. Expected results: A host upgrade from the GUI should not break VDSM. Upgrade failures should not leave behind legacies to break subsequent re-installations. And if an upgrade fails, nobody should report success. Additional info: On the RHV-H host, work around the VDSM problem like this: [root@rhva2020 tmp]# semodule -i /usr/share/selinux/packages/ovirt-vmconsole/ovirt_vmconsole.pp [root@rhva2020 tmp]# vdsm-tool configure --module sebool [root@rhva2020 tmp]# systemctl start vdsmd.service Clean up the legacy LV like this: [root@rhva2020 tmp]# lvremove rhvh/rhvh-4.5.3.4-0.20230215.0+1 Do you really want to remove active logical volume rhvh/rhvh-4.5.3.4-0.20230215.0+1? [y/n]: y Logical volume "rhvh-4.5.3.4-0.20230215.0+1" successfully removed. [root@rhva2020 tmp]# And get rid of the loop mounts like this: [root@rhvb2020 tmp]# df -h | grep loop /dev/loop1 3.9G 3.3G 455M 88% /tmp/tmp.nxHntHklwn [root@rhva2020 tmp]# umount /dev/loop1 [root@rhva2020 tmp]# Do this twice because the failed upgrade leaves two loop mounts. And then remove the tmp directories the loops were mounted on. ******************************************************** Here is output from a failed upgrade with the recovery from above to work around the problem. [root@rhva2020 log]# date Sat Mar 25 17:46:43 UTC 2023 [root@rhva2020 log]# # a minute after starting the upgrade [root@rhva2020 log]# nodectl check Status: OK Bootloader ... OK Layer boot entries ... OK Valid boot entries ... OK Mount points ... OK Separate /var ... OK Discard is used ... OK Basic storage ... OK Initialized VG ... OK Initialized Thin Pool ... OK Initialized LVs ... OK Thin storage ... OK Checking available space in thinpool ... OK Checking thinpool auto-extend ... OK vdsmd ... OK [root@rhva2020 log]# [root@rhva2020 log]# [root@rhva2020 log]# date Sat Mar 25 17:52:09 UTC 2023 [root@rhva2020 log]# nodectl check Status: OK Bootloader ... OK Layer boot entries ... OK Valid boot entries ... OK Mount points ... OK Separate /var ... OK Discard is used ... OK Basic storage ... OK Initialized VG ... OK Initialized Thin Pool ... OK Initialized LVs ... OK Thin storage ... OK Checking available space in thinpool ... OK Checking thinpool auto-extend ... OK vdsmd ... OK [root@rhva2020 log]# lvs LV VG Attr LSize Pool Origin Data% Meta% Mo ve Log Cpy%Sync Convert home rhvh Vwi-aotz-- 1.00g pool00 1.04 pool00 rhvh twi-aotz-- 162.05g 10.30 2.52 rhvh-4.4.10.1-0.20220208.0 rhvh Vri---tz-k 125.05g pool00 rhvh-4.4.10.1-0.20220208.0+1 rhvh Vwi-aotz-- 125.05g pool00 rhvh-4.4.10.1-0.20220208.0 3.46 rhvh-4.4.3.2-0.20201210.0 rhvh Vri---tz-k 125.05g pool00 rhvh-4.4.3.2-0.20201210.0+1 rhvh Vwi-a-tz-- 125.05g pool00 rhvh-4.4.3.2-0.20201210.0 3.01 rhvh-4.5.3.4-0.20230215.0 rhvh Vri-a-tz-k 125.05g pool00 2.62 rhvh-4.5.3.4-0.20230215.0+1 rhvh Vwi-aotz-- 125.05g pool00 rhvh-4.5.3.4-0.20230215.0 2.62 root rhvh Vri---tz-k 125.05g pool00 swap rhvh -wi-ao---- <23.29g tmp rhvh Vwi-aotz-- 1.00g pool00 3.49 var rhvh Vwi-aotz-- 15.00g pool00 13.48 var_crash rhvh Vwi-aotz-- 10.00g pool00 0.11 var_log rhvh Vwi-aotz-- 8.00g pool00 4.22 var_log_audit rhvh Vwi-aotz-- 2.00g pool00 2.64 [root@rhva2020 log]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 16K 32G 1% /dev/shm tmpfs 32G 914M 31G 3% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/mapper/rhvh-rhvh--4.4.10.1--0.20220208.0+1 125G 5.1G 120G 5% / /dev/mapper/rhvh-tmp 1014M 40M 975M 4% /tmp /dev/mapper/rhvh-home 1014M 40M 975M 4% /home /dev/mapper/rhvh-var 15G 2.1G 13G 14% /var /dev/sda1 976M 437M 473M 49% /boot /dev/mapper/rhvh-var_log 8.0G 377M 7.7G 5% /var/log /dev/mapper/rhvh-var_crash 10G 105M 9.9G 2% /var/crash /dev/mapper/rhvh-var_log_audit 2.0G 88M 2.0G 5% /var/log/audit tmpfs 6.3G 0 6.3G 0% /run/user/0 /dev/loop1 3.9G 3.3G 455M 88% /tmp/mnt.hFITg /dev/loop0 1.1G 1.1G 0 100% /tmp/mnt.JlHR2 /dev/mapper/rhvh-rhvh--4.5.3.4--0.20230215.0+1 125G 4.2G 121G 4% /tmp/mnt.MJSQC **************** After the RHVM GUI reported a failed upgrade - See the legacy LV and loop mounts. - VDSM is broken. [root@rhva2020 log]# [root@rhva2020 log]# lvs LV VG Attr LSize Pool Origin Data% Meta% Mo ve Log Cpy%Sync Convert home rhvh Vwi-aotz-- 1.00g pool00 1.04 pool00 rhvh twi-aotz-- 162.05g 10.26 2.52 rhvh-4.4.10.1-0.20220208.0 rhvh Vri---tz-k 125.05g pool00 rhvh-4.4.10.1-0.20220208.0+1 rhvh Vwi-aotz-- 125.05g pool00 rhvh-4.4.10.1-0.20220208.0 3.46 rhvh-4.4.3.2-0.20201210.0 rhvh Vri---tz-k 125.05g pool00 rhvh-4.4.3.2-0.20201210.0+1 rhvh Vwi-a-tz-- 125.05g pool00 rhvh-4.4.3.2-0.20201210.0 3.01 rhvh-4.5.3.4-0.20230215.0+1 rhvh Vwi-a-tz-- 125.05g pool00 2.62 root rhvh Vri---tz-k 125.05g pool00 swap rhvh -wi-ao---- <23.29g tmp rhvh Vwi-aotz-- 1.00g pool00 3.50 var rhvh Vwi-aotz-- 15.00g pool00 13.55 var_crash rhvh Vwi-aotz-- 10.00g pool00 0.11 var_log rhvh Vwi-aotz-- 8.00g pool00 4.22 var_log_audit rhvh Vwi-aotz-- 2.00g pool00 2.65 [root@rhva2020 log]# nodectl check Status: WARN Bootloader ... OK Layer boot entries ... OK Valid boot entries ... OK Mount points ... OK Separate /var ... OK Discard is used ... OK Basic storage ... OK Initialized VG ... OK Initialized Thin Pool ... OK Initialized LVs ... OK Thin storage ... OK Checking available space in thinpool ... OK Checking thinpool auto-extend ... OK vdsmd ... BAD [root@rhva2020 log]# [root@rhva2020 log]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 16K 32G 1% /dev/shm tmpfs 32G 914M 31G 3% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/mapper/rhvh-rhvh--4.4.10.1--0.20220208.0+1 125G 5.1G 120G 5% / /dev/mapper/rhvh-tmp 1014M 40M 975M 4% /tmp /dev/mapper/rhvh-home 1014M 40M 975M 4% /home /dev/mapper/rhvh-var 15G 2.1G 13G 14% /var /dev/sda1 976M 437M 473M 49% /boot /dev/mapper/rhvh-var_log 8.0G 375M 7.7G 5% /var/log /dev/mapper/rhvh-var_crash 10G 105M 9.9G 2% /var/crash /dev/mapper/rhvh-var_log_audit 2.0G 87M 2.0G 5% /var/log/audit tmpfs 6.3G 0 6.3G 0% /run/user/0 /dev/loop1 3.9G 3.3G 455M 88% /tmp/tmp.f5Bwzc2F6q [root@rhva2020 log]# cd /tmp [root@rhva2020 tmp]# umount /dev/loop1 [root@rhva2020 tmp]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 16K 32G 1% /dev/shm tmpfs 32G 914M 31G 3% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/mapper/rhvh-rhvh--4.4.10.1--0.20220208.0+1 125G 5.1G 120G 5% / /dev/mapper/rhvh-tmp 1014M 40M 975M 4% /tmp /dev/mapper/rhvh-home 1014M 40M 975M 4% /home /dev/mapper/rhvh-var 15G 2.1G 13G 14% /var /dev/sda1 976M 437M 473M 49% /boot /dev/mapper/rhvh-var_log 8.0G 375M 7.7G 5% /var/log /dev/mapper/rhvh-var_crash 10G 105M 9.9G 2% /var/crash /dev/mapper/rhvh-var_log_audit 2.0G 87M 2.0G 5% /var/log/audit tmpfs 6.3G 0 6.3G 0% /run/user/0 /dev/loop0 1.1G 1.1G 0 100% /tmp/tmp.f5Bwzc2F6q [root@rhva2020 tmp]# umount /dev/loop0 [root@rhva2020 tmp]# lvremove rhvh/rhvh-4.5.3.4-0.20230215.0+1 Do you really want to remove active logical volume rhvh/rhvh-4.5.3.4-0.20230215.0+1? [y/n]: y Logical volume "rhvh-4.5.3.4-0.20230215.0+1" successfully removed. [root@rhva2020 tmp]# [root@rhva2020 tmp]# semodule -i /usr/share/selinux/packages/ovirt-vmconsole/ovirt_vmconsole.pp [root@rhva2020 tmp]# vdsm-tool configure --module sebool Checking configuration status... Running configure... Done configuring modules to VDSM. [root@rhva2020 tmp]# [root@rhva2020 tmp]# systemctl start vdsmd.service [root@rhva2020 tmp]# nodectl check Status: OK Bootloader ... OK Layer boot entries ... OK Valid boot entries ... OK Mount points ... OK Separate /var ... OK Discard is used ... OK Basic storage ... OK Initialized VG ... OK Initialized Thin Pool ... OK Initialized LVs ... OK Thin storage ... OK Checking available space in thinpool ... OK Checking thinpool auto-extend ... OK vdsmd ... OK [root@rhva2020 tmp]# yum reinstall redhat-virtualization-host-image-update Updating Subscription Management repositories. Last metadata expiration check: 0:19:39 ago on Sat 25 Mar 2023 05:47:42 PM UTC. Dependencies resolved. ==================================================================================================================================== Package Architecture Version Repository Size ==================================================================================================================================== Reinstalling: redhat-virtualization-host-image-update x86_64 4.5.3-202302150956_8.6 rhvh-4-for-rhel-8-x86_64-rpms 1.0 G Transaction Summary ==================================================================================================================================== Total size: 1.0 G Installed size: 1.0 G Is this ok [y/N]: y Downloading Packages: [SKIPPED] redhat-virtualization-host-image-update-4.5.3-202302150956_8.6.x86_64.rpm: Already downloaded Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Running scriptlet: redhat-virtualization-host-image-update-4.5.3-202302150956_8.6.x86_64 1/2 Reinstalling : redhat-virtualization-host-image-update-4.5.3-202302150956_8.6.x86_64 1/2 Running scriptlet: redhat-virtualization-host-image-update-4.5.3-202302150956_8.6.x86_64 1/2 Cleanup : redhat-virtualization-host-image-update-4.5.3-202302150956_8.6.x86_64 2/2 Verifying : redhat-virtualization-host-image-update-4.5.3-202302150956_8.6.x86_64 1/2 Verifying : redhat-virtualization-host-image-update-4.5.3-202302150956_8.6.x86_64 2/2 Installed products updated. Unpersisting: redhat-virtualization-host-image-update-4.5.3-202302150956_8.6.x86_64.rpm Reinstalled: redhat-virtualization-host-image-update-4.5.3-202302150956_8.6.x86_64 Complete! [root@rhva2020 tmp]# reboot login as: root root.10.21's password: Web console: https://rhva2020.mgmt.local:9090/ or https://10.10.10.21:9090/ Last failed login: Sat Mar 25 18:21:22 UTC 2023 from 10.10.10.115 on ssh:notty There was 1 failed login attempt since the last successful login. Last login: Sat Mar 25 17:55:36 2023 from 10.10.10.20 node status: OK See `nodectl check` for more information Admin Console: https://10.10.11.21:9090/ or https://10.10.10.21:9090/ [root@rhva2020 ~]# more /etc/redhat-release Red Hat Enterprise Linux release 8.6 [root@rhva2020 ~]#
A question came up about my selinux settings. My RHV-H hosts have been set to enforcing for years. [root@rhvb2020 ~]# cat /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=enforcing # SELINUXTYPE= can take one of these three values: # targeted - Targeted processes are protected, # minimum - Modification of targeted policy. Only selected processes are protected. # mls - Multi Level Security protection. SELINUXTYPE=targeted [root@rhvb2020 ~]#
2023-03-25 17:53:13,520 [DEBUG] (MainThread) Calling: (['restorecon', '-Rv', '/var/tmp/'],) {'close_fds': True, 'stderr': -2} 2023-03-25 17:53:13,560 [DEBUG] (MainThread) Exception! b'restorecon: Could not set context for /var/tmp/insights-client: Invalid argument\nrestorecon: Could not set context for /var/tmp/insights-client/insights-archive-406barzz: Invalid argument\nrestorecon: Could not set context for /var/tmp/insights-client/insights-archive-406barzz/insights-rhva2020.mgmt.local-20230325010202.tar.gz: Invalid argument\nrestorecon: Could not set context for /var/tmp/insights-client/insights-client-egg-release: Invalid argument\n' I wonder if those files are just temporary files that aren't supposed to be there. Probably. It could be that it's there in all connected systems, I do not think we test it at all. Sounds like insights problem really (or selinux policies that do not have correct rule for these insights' files) I don't think we can do much about that for 4.4.10. We should check a connected system upgrade in more recent version perhaps.
> I don't think we can do much about that for 4.4.10. We should check a > connected system upgrade in more recent version perhaps. But this isn't a 4.4.10 issue - that RPM with the scripts that run the upgrade is part of 4.5.1. The update sees something it doesn't expect and just fails. An updated 4.5.1 z-stream could fix that, right?
Aw nuts, you can't edit comments. I should have said 4.5.3, not 4.5.1.
(In reply to Greg Scott from comment #5) > > I don't think we can do much about that for 4.4.10. We should check a > > connected system upgrade in more recent version perhaps. > > But this isn't a 4.4.10 issue - that RPM with the scripts that run the > upgrade is part of 4.5.1. The update sees something it doesn't expect and > just fails. An updated 4.5.1 z-stream could fix that, right? the script that runs is from the new layer, yes, but it's a blanket restorecon over the whole /var/tmp. The problem is in the old layer's insights. Likely they're missing the right rule, and it makes restorecon explode. I don't know if ignoring the error code is the best way forward, it may just hide issues. I mean...ok, "rm -rf /var/tmp/insights-client" would probably be ok... Can you give it a check/try, if you have other hosts to upgrade?
> Can you give it a check/try, if you have other hosts to upgrade? I already did both of my hosts. I have a junk one I haven't powered on in several months. I'll check to see what's on it. Ya know - if that's the root problem - restorecon explodes because of bogus files in /var/tmp, seems like it's okay to report the error with a reasonable error message and then fail, without telling the world that it applied the update. Or if dnf returns success anyway, even if a script inside fails, then say something in the error message about doing a dnf reinstall {packagename} from the host. And get rid of legacies from the failure, so the reinstall doesn't also blow up.
QE tried to reproduce this bug, but it was not reproduced. Test version: RHVM: 4.5.2.4-0.1.el8ev RHVH: Upgrade RHVH from rhvh-4.4.10.1-0.20220208.0+1 to rhvh-4.5.3.4-0.20230215.0+1 Test steps: 1. Install RHVH-4.4-20220208.0-RHVH-x86_64-dvd1.iso 2. Login to the host, set up local repo and point to "redhat-virtualization-host-4.5.3-202302150956_8.6" 3. Add host to RHVM 4. Upgrade host via RHVM GUI 5. Focus on the host status after upgrade Test results: The RHVH upgrade is successful, and the status of the host in RHVM is "Up". Additional info: ~~~~~~ # imgbase w You are on rhvh-4.5.3.4-0.20230215.0+1 # imgbase layout rhvh-4.4.10.1-0.20220208.0 +- rhvh-4.4.10.1-0.20220208.0+1 rhvh-4.5.3.4-0.20230215.0 +- rhvh-4.5.3.4-0.20230215.0+1 vdsmd is active after upgrade. # systemctl status vdsmd.service ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2023-03-31 04:56:26 UTC; 1h 19min ago Process: 5565 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 8201 (vdsmd) Tasks: 42 (limit: 820699) Memory: 159.5M CGroup: /system.slice/vdsmd.service └─8201 /usr/bin/python3 /usr/libexec/vdsm/vdsmd Mar 31 04:56:24 dell-per7425-03.lab.eng.pek2.redhat.com vdsmd_init_common.sh[5565]: vdsm: Running prepare_transient_repository Mar 31 04:56:25 dell-per7425-03.lab.eng.pek2.redhat.com vdsmd_init_common.sh[5565]: vdsm: Running syslog_available Mar 31 04:56:25 dell-per7425-03.lab.eng.pek2.redhat.com vdsmd_init_common.sh[5565]: vdsm: Running nwfilter Mar 31 04:56:25 dell-per7425-03.lab.eng.pek2.redhat.com vdsmd_init_common.sh[5565]: vdsm: Running dummybr Mar 31 04:56:26 dell-per7425-03.lab.eng.pek2.redhat.com vdsmd_init_common.sh[5565]: vdsm: Running tune_system Mar 31 04:56:26 dell-per7425-03.lab.eng.pek2.redhat.com vdsmd_init_common.sh[5565]: vdsm: Running test_space Mar 31 04:56:26 dell-per7425-03.lab.eng.pek2.redhat.com vdsmd_init_common.sh[5565]: vdsm: Running test_lo Mar 31 04:56:26 dell-per7425-03.lab.eng.pek2.redhat.com systemd[1]: Started Virtual Desktop Server Manager. Mar 31 04:56:29 dell-per7425-03.lab.eng.pek2.redhat.com vdsm[8201]: WARN MOM not available. Error: [Errno 2] No such file or directo> Mar 31 04:56:29 dell-per7425-03.lab.eng.pek2.redhat.com vdsm[8201]: WARN MOM not available, KSM stats will be missing. Error: ~~~~~~
Which leads to the $million question, what's different about my hosts vs. the QE ones? I had forgotten this earlier - my 398 day certificates expired again on March 24. I renewed my RHVM certificates with engine-setup, and upgraded my RHVM to the latest at the same time. Here is my RHVM version - a little bit new than the QE one. [root@rhvm2020 ~]# [root@rhvm2020 ~]# rpm -qa | grep rhvm-4 rhvm-4.5.3.7-1.el8ev.noarch [root@rhvm2020 ~]# After RHVM came back alive, I used the RHVM GUI to renew my 4.4 host certificates. They all came back to green. And then I upgraded them. I don't remember enrolling either of my RHV-H hosts with Insights, but I forget lots of things. This is after the upgrade, but the upgrade process does an rsync from the old to new layers, so this insights-client directory might be relevant. Per @Michal's analysis above, maybe that's the difference. [root@rhva2020 tmp]# pwd /var/tmp [root@rhva2020 tmp]# ls abrt systemd-private-88212f5c517643ccae0c7efb9521cc08-chronyd.service-Vc8Hnm insights-client systemd-private-88212f5c517643ccae0c7efb9521cc08-systemd-resolved.service-Sz75pt [root@rhva2020 tmp]# [root@rhva2020 tmp]# ls -al -R insights-client/ insights-client/: total 4 drwx------. 3 root root 74 Mar 25 01:03 . drwxrwxrwt. 6 root root 208 Mar 25 18:15 .. drwx------. 2 root root 64 Mar 25 01:03 insights-archive-406barzz -rw-r--r--. 1 root root 8 Mar 25 18:10 insights-client-egg-release insights-client/insights-archive-406barzz: total 348 drwx------. 2 root root 64 Mar 25 01:03 . drwx------. 3 root root 74 Mar 25 01:03 .. -rw-r--r--. 1 root root 355446 Mar 25 01:03 insights-rhva2020.mgmt.local-20230325010202.tar.gz [root@rhva2020 tmp]#
I think I have a revised set of steps to reproduce the problem. From the host, do: cd /var/tmp ls -al (nothing exciting) insights-client (Abort or let it finish, doesn't matter.) ls -al again. Note a new directory named insights-client, dated right now. From the RHVM GUI, try the upgrade again. This time, vdsm should break and the upgrade should fail. Subsequent upgrade attempts from the RHVM GUI will claim to complete, but they won't perform any upgrade. I'll attach a copy of imgbased.log from this host, named twelvetesthost. The action should all be from April 1, 2023.
Looks like this is where the first attempt goes off the rails. 2023-04-01 21:14:43,722 [DEBUG] (MainThread) Calling: (['mount', '/dev/rhvh/var_tmp', '/tmp/mnt.lZ4hQ'],) {'close_fds': True, 'stderr': -2} 2023-04-01 21:14:44,477 [DEBUG] (MainThread) Running: ['restorecon', '-Rv', '/var/tmp/'] 2023-04-01 21:14:44,477 [DEBUG] (MainThread) Calling: (['restorecon', '-Rv', '/var/tmp/'],) {'close_fds': True, 'stderr': -2} 2023-04-01 21:14:44,516 [DEBUG] (MainThread) Exception! b'restorecon: Could not set context for /var/tmp/insights-client: Invalid argument\nrestorecon: Could not set context for /var/tmp/insights-client/insights-client-egg-release: Invalid argument\n' 2023-04-01 21:14:44,517 [DEBUG] (MainThread) Calling: (['umount', '-l', '/tmp/mnt.lZ4hQ'],) {'close_fds': True, 'stderr': -2} 2023-04-01 21:14:44,553 [DEBUG] (MainThread) Calling: (['rmdir', '/tmp/mnt.lZ4hQ'],) {'close_fds': True, 'stderr': -2} 2023-04-01 21:14:44,556 [DEBUG] (MainThread) Calling: (['umount', '-l', '/etc'],) {'close_fds': True, 'stderr': -2} 2023-04-01 21:14:44,567 [DEBUG] (MainThread) Calling: (['umount', '-l', '/tmp/mnt.b4RZG'],) {'close_fds': True, 'stderr': -2} 2023-04-01 21:14:44,578 [DEBUG] (MainThread) Calling: (['rmdir', '/tmp/mnt.b4RZG'],) {'close_fds': True, 'stderr': -2} 2023-04-01 21:14:44,581 [ERROR] (MainThread) Failed to migrate etc After fixing VDSM, the second attempt starts at 21:51. This one runs to completion without error and the node reboots. But when it comes back up, it's still on RHEL 8.5, so the upgrade didn't work. But now when the GUI checks for upgrades, it says the host is up to date. So the next attempt is yum reinstall redhat-virtualization-host-image-update. This runs smoothly and after a reboot, the node is now on 8.6.
no capacity to handle. suspecting insight-client selinux rules may be broken, hard to say. It looks rare enough to ignore, with two potential workarounds - running restorecon -Rv /var/tmp manually prior to upgrade - cleaning up /var/tmp/insights-client would fix the problem