Bug 2089871
| Summary: | kexec-tools package update failed | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Nicolas Hicher <nhicher> |
| Component: | kexec-tools | Assignee: | Coiby <coxu> |
| Status: | CLOSED ERRATA | QA Contact: | Jie Li <jieli> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | CentOS Stream | CC: | bstinson, coxu, jwboyer, lersek, ltao, rjones, ruyang, virt-maint, xiawu, yiyan |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kexec-tools-2.0.25-1.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-09 08:14:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Nicolas Hicher
2022-05-24 15:41:56 UTC
Thanks for reporting this issue! The bug happens because kdumpctl can't create /var/lock/kdump if ! exec 9> /var/lock/kdump; then derror "Create file lock failed" exit 1 fi `virt-customize -x -v -a CentOS-Stream-GenericCloud-9-20220509.0.x86_64.qcow2 --run-command "ls -l /var/lock; ls -l /run/lock"` shows the following logs, ... + mkdir -p /dev/pts /dev/shm + mount -t devpts /dev/pts /dev/pts + mount -t tmpfs -o mode=1777 shmfs /dev/shm + mkdir -p /sysroot + mkdir -p /run + mount -t tmpfs -o nosuid,size=20%,mode=0755 tmpfs /run + mkdir -p /run/lock + ln -s ../run/lock /var/lock ... + systemd-tmpfiles --prefix=/dev --prefix=/run --prefix=/var/run --create --boot ... supermin: chroot ... chroot: /sysroot: running 'is_file: /grub/menu.lst' ... [ 6.6] Running: ls -l /var/lock; ls -l /run/lock commandrvf: mount --bind /dev /sysroot/dev commandrvf: mount --bind /dev/pts /sysroot/dev/pts commandrvf: mount --bind /proc /sysroot/proc commandrvf: mount --bind /sys/fs/selinux /sysroot/selinux ... lrwxrwxrwx. 1 root root 11 Oct 26 2021 /var/lock -> ../run/lock ls: cannot access '/run/lock': No such file or directory I interpret the above logs as 1. virt-customize creates /run/lock and create a soft link to point /var/lock to ../run/lock for systemd-tmpfiles 2. virt-customize doesn't "mount --bind /run /sysroot/run" thus /var/lock doesn't exist after chrooting into /sysroot So it seems there is a problem with virt-customize. Let me re-assign this bug to guestfs-tools. Btw, the same problem happens to Fedora. Likely similar in nature to bug 1643888. (In reply to Coiby from comment #1) > 2. virt-customize doesn't "mount --bind /run /sysroot/run" thus > /var/lock doesn't exist after chrooting into /sysroot I disagree. The bind mounts (coming from "daemon/sh.c", function bind_mount()) are only done for pseudo-filesystems under /dev, /proc and /sys where chroot environment in the guest needs access to the same guest kernel resources as the outer appliance environment. There is no intent to share *either* /var *or* /run between the chroot and the outer environment, as far as I know. (1) The following commands in the "appliance/init" file: > mkdir -p /run/lock > ln -s ../run/lock /var/lock > [...] > systemd-tmpfiles --prefix=/dev --prefix=/run --prefix=/var/run --create --boot only affect the appliance root, not the filesystem (if any) that's going to be mounted later under /sysroot, and chrooted into. This is why, in > $ virt-rescue -a foobar.img the following directories exist immediately: > ><rescue> ls -ld /var /var/lock /run /run/lock > drwxr-xr-x 17 root root 380 May 27 11:05 /run > drwxr-xr-x 4 root root 100 May 27 11:05 /run/lock > drwxr-xr-x 18 1000 1000 4096 May 25 11:44 /var > lrwxrwxrwx 1 root root 11 May 25 11:44 /var/lock -> ../run/lock (2) Sharing between the appliance root's "/var" and/or "/run", and the sysroot's "/var" and/or "/run", is undesirable, and has never been the intent. (3) The actual problem is the following. On both Fedora 35 and RHEL9, the "filesystem" package *owns* (and creates) the following directories: /run /var /var/lock but does not own or create "/run/lock". This is why, in > $ virt-rescue -a foobar.img -i after > ><rescue> chroot /sysroot/ we get > ><rescue> ls -ld /var /var/lock /run /run/lock > ls: cannot access '/run/lock': No such file or directory > drwxr-xr-x. 2 root root 6 Apr 8 11:02 /run > drwxr-xr-x. 18 root root 4096 Apr 8 11:03 /var > lrwxrwxrwx. 1 root root 11 Apr 8 11:02 /var/lock -> ../run/lock Instead, "/run/lock" comes from "/usr/lib/tmpfiles.d/legacy.conf", which configures the systemd-tmpfiles service: > d /run/lock 0755 root root - > L /var/lock - - - - ../run/lock The second quoted entry is actually superfluous, as the "/var/lock" symlink is already owned/created by the "filesystem" package. However, the first quoted entry ("/run/lock") is *only* created by systemd-tmpfiles. Therefore the issue is that systemd-tmpfiles is not executed separately inside the chroot jail. Note: > $ virt-rescue -a foobar.img -i > > ><rescue> chroot /sysroot/ > > ><rescue> ls -ld /var /var/lock /run /run/lock > ls: cannot access '/run/lock': No such file or directory > drwxr-xr-x. 2 root root 6 Apr 8 11:02 /run > drwxr-xr-x. 18 root root 4096 Apr 8 11:03 /var > lrwxrwxrwx. 1 root root 11 Apr 8 11:02 /var/lock -> ../run/lock > > ><rescue> systemd-tmpfiles --create --prefix=/run > > ><rescue> ls -ld /var /var/lock /run /run/lock > drwxr-xr-x. 14 root root 190 May 27 07:29 /run > drwxr-xr-x 3 root root 20 May 27 07:29 /run/lock > drwxr-xr-x. 18 root root 4096 Apr 8 11:03 /var > lrwxrwxrwx. 1 root root 11 Apr 8 11:02 /var/lock -> ../run/lock Basically we need to take the `InstallPackages, `UninstallPackages, and `Update branches in "customize/customize_run.ml", and run > systemd-tmpfiles --create before executing "guest_install_command" / "guest_uninstall_command" / "guest_update_command", and run > systemd-tmpfiles --remove right after. ( The "--boot" option should *not* be added. The systemd-tmpfiles(8) manual writes, > --boot > Also execute lines with an exclamation mark. > > [...] > > [...] during boot the following command line is > executed to ensure that all temporary and volatile directories > are removed and created according to the configuration file: > > systemd-tmpfiles --remove --create Note the absence of "--boot". Furthermore, tmpfiles.d(5) writes, > If the exclamation mark ("!") is used, this line is only safe > to execute during boot, and can break a running system. Lines > without the exclamation mark are presumed to be safe to execute > at any time, e.g. on package upgrades. systemd-tmpfiles will > take lines with an exclamation mark only into consideration, if > the --boot option is given. ) Note that the `FirstbootPackages pattern's handling needs no update: the "guest_install_command" queued there will be executed as a part of an actual boot into the guest root, and then "systemd-tmpfiles --create --boot" will have been done anyway. ... I think my analysis in <https://bugzilla.redhat.com/show_bug.cgi?id=1643888> was incorrect. I'm reopening that BZ, and marking this one TestOnly, and dependent on that one. Well, actually, no. This is a packaging bug in kexec-tools-2.0.24-1.el9. I have re-read the references in <https://bugzilla.redhat.com/show_bug.cgi?id=1643888#c16> yet another time. Refer to this comment: - https://bugzilla.redhat.com/show_bug.cgi?id=1373833#c5 Also refer to the following systemd commits: - 042e33ae3a7f ("rpm: add RPM macro for creating tmpfiles entries after package installation", 2013-07-16) https://github.com/systemd/systemd/commit/042e33ae3a7f - 0f78fee8d039 ("rpm macros: add %tmpfiles_create_package", 2018-02-05) https://github.com/systemd/systemd/commit/0f78fee8d039 The idea is that kexec-tools needs to: (1) move its lock file from /var/lock to /run/lock, (2) call the %tmpfiles_create macro in %post, (3) and it needs to %ghost /run/lock/kdump. Aargh I'm getting really confused. The opencryptoki stuff in bug 1373833 is about a lock *directory* under /var/lock; in other words, opencryptoki is indeed supposed to own /run/lock/opencryptoki, because it creates new files under that directory. For that reason, opencryptoki had to introduce a new tmpfiles.d config snippet (rule) as well, and that snippet was what needed %tmpfiles_create during %post. But that does not entirely match kdumpctl's case. kdumpctl only creates a new *file*, not a directory, and one could argue that this file exists under a system-wide directory (/var/lock, coming even from the "filesystems" package), so it should just work, without kexec-tools having to do anything particular in its SPEC file. Countering *that* however is the fact that tmpfiles.d(5) clearly documents temporary *files*, not just directories. So yes -- for now, I'm going to stick with my interpretation that this is a bug in the kexec-tools packaging. Kexec-tools should register its *file* "/var/lock/kdump" as a temporary *file*, with the systemd-tmpfiles(8) service. In other words, the same situation applies as with opencryptoki in <https://bugzilla.redhat.com/show_bug.cgi?id=1373833#c5> -- regular file vs. directory should make no difference here. Thus, let me append an item to the list in comment 6 above: (4) register "kdump-tmpfiles.conf" with systemd-tmpfiles (under "/usr/lib/tmpfiles.d"), listing /var/lock/kdump as a temporary file. Prior art for such *regular files*, on RHEL-9: # grep '^f ' /usr/lib/tmpfiles.d/*.conf /usr/lib/tmpfiles.d/iscsi.conf:f /run/lock/iscsi/lock 0600 root root - /usr/lib/tmpfiles.d/pam.conf:f /var/log/tallylog 0600 root root - /usr/lib/tmpfiles.d/setup.conf:f /run/motd 0644 root root - /usr/lib/tmpfiles.d/var.conf:f /var/log/wtmp 0664 root utmp - /usr/lib/tmpfiles.d/var.conf:f /var/log/btmp 0660 root utmp - /usr/lib/tmpfiles.d/var.conf:f /var/log/lastlog 0664 root utmp - (In reply to Laszlo Ersek from comment #4) > (In reply to Coiby from comment #1) > > > 2. virt-customize doesn't "mount --bind /run /sysroot/run" thus > > /var/lock doesn't exist after chrooting into /sysroot > > I disagree. The bind mounts (coming from "daemon/sh.c", function > bind_mount()) are only done for pseudo-filesystems under /dev, /proc and > /sys where chroot environment in the guest needs access to the same > guest kernel resources as the outer appliance environment. There is no > intent to share *either* /var *or* /run between the chroot and the outer > environment, as far as I know. Thanks for correcting my mistake! I manually mount CentOS-Stream-GenericCloud-9-20220509.0.x86_64.qcow2 and find /var/lock is a soft link to /run/lock which doesn't exist. So indeed there is no bind mount. (In reply to Laszlo Ersek from comment #6) > Well, actually, no. This is a packaging bug in kexec-tools-2.0.24-1.el9. > > I have re-read the references in > <https://bugzilla.redhat.com/show_bug.cgi?id=1643888#c16> yet another time. > Refer to this comment: > > - https://bugzilla.redhat.com/show_bug.cgi?id=1373833#c5 > > Also refer to the following systemd commits: > > - 042e33ae3a7f ("rpm: add RPM macro for creating tmpfiles entries after > package installation", 2013-07-16) > > https://github.com/systemd/systemd/commit/042e33ae3a7f > > - 0f78fee8d039 ("rpm macros: add %tmpfiles_create_package", 2018-02-05) > > https://github.com/systemd/systemd/commit/0f78fee8d039 > > The idea is that kexec-tools needs to: > > (1) move its lock file from /var/lock to /run/lock, > > (2) call the %tmpfiles_create macro in %post, > > (3) and it needs to %ghost /run/lock/kdump. > So yes -- for now, I'm going to stick with my interpretation that this is a bug in the kexec-tools packaging. Kexec-tools should register its *file* "/var/lock/kdump" as a temporary *file*, with the systemd-tmpfiles(8) service. In other words, the same situation applies as with opencryptoki in <https://bugzilla.redhat.com/show_bug.cgi?id=1373833#c5> -- regular file vs. directory should make no difference here. Thus, let me append an item to the list in comment 6 above: > (4) register "kdump-tmpfiles.conf" with systemd-tmpfiles (under "/usr/lib/tmpfiles.d"), listing /var/lock/kdump as a temporary file. Thanks for analyzing this bug in such details and also catching a packaging bug in kexec-tools and offering a solution! I'll fix this bug with a slight change. Since kdumpctl will also be called in the %pre scriptlet, I need to call the %tmpfiles_create_package macro instead. One thing I want to confirm is I shouldn't expect virt-customize to create /run/lock maybe by systemd-tmpfiles and it's the package's responsibility, right? (In reply to Coiby from comment #9) > One thing I want to confirm is I shouldn't expect virt-customize to create > /run/lock maybe by systemd-tmpfiles and it's the package's responsibility, > right? Well that's my current understanding. I've really not dealt much with systemd-tmpfiles before, but as far as I understand the documentation and the BZs I linked previously, it seems that every package is responsible for "announcing" (registering) its "persistent" temp files (including lock files). My expectation is that once kexec-tools satisfies that general-looking requirement, we won't have to touch virt-customize. If, on the other hand, it turns out that even after kexec-tools is updated, things still don't work, I guess I'll just have to accept that I don't understand how systemd-tmpfiles is supposed to work (the full picture), and we'll have to add a kludge to virt-customize to pre-create these temp directories. I've not managed to find a comprehensive description about these responsibilities anywhere. In brief, I'd like kexec-tools to be fixed first, and if the pkg upgrade under virt-customize still does not work, then we'd look into adding the kludge to virt-customize. Does that sound acceptable to you? Thanks! (In reply to Laszlo Ersek from comment #10) > (In reply to Coiby from comment #9) > > > One thing I want to confirm is I shouldn't expect virt-customize to create > > /run/lock maybe by systemd-tmpfiles and it's the package's responsibility, > > right? > > Well that's my current understanding. I've really not dealt much with > systemd-tmpfiles before, but as far as I understand the documentation and > the BZs I linked previously, it seems that every package is responsible for > "announcing" (registering) its "persistent" temp files (including lock > files). My expectation is that once kexec-tools satisfies that > general-looking requirement, we won't have to touch virt-customize. > > If, on the other hand, it turns out that even after kexec-tools is updated, > things still don't work, I guess I'll just have to accept that I don't > understand how systemd-tmpfiles is supposed to work (the full picture), and > we'll have to add a kludge to virt-customize to pre-create these temp > directories. I've not managed to find a comprehensive description about > these responsibilities anywhere. > > In brief, I'd like kexec-tools to be fixed first, and if the pkg upgrade > under virt-customize still does not work, then we'd look into adding the > kludge to virt-customize. Does that sound acceptable to you? Thanks! Thanks for the explanation! I can confirm [1] alone could fix this bug. One small issue is /run/lock/kdump still exists after finishing running virt-customize, could this cause a real problem? I exepct systemd-tmpfiles to create a ramfs which will disappear after the system quits but it's not the case. [1] https://gitlab.com/coxu/fedora-kexec-tools/-/merge_requests/10 Hi Coiby, "/usr/lib/tmpfiles.d/x11.conf" contains an example like this: # Unlink the X11 lock files r! /tmp/.X[0-9]*-lock It is explained by the manual tmpfiles.d(5). It means that during boot, and only during boot, these lock files are removed. Would this scheme work for kexec-tools? The lock file could remain in place after the upgrade finishes, but at next boot, it should be cleaned up. Can you try adding r! /run/lock/kdump to "kdump-tempfiles.conf"? BTW, is the left-over lock file specific to virt-customize, or do you see the same when you update kexec-tools manually on a physical installation (or even in a stand-alone, running guest)? (In reply to Laszlo Ersek from comment #12) > Hi Coiby, "/usr/lib/tmpfiles.d/x11.conf" contains an example like this: > > # Unlink the X11 lock files > r! /tmp/.X[0-9]*-lock > > It is explained by the manual tmpfiles.d(5). It means that during boot, and > only during boot, these lock files are removed. > > Would this scheme work for kexec-tools? The lock file could remain in place > after the upgrade finishes, but at next boot, it should be cleaned up. Can > you try adding > > r! /run/lock/kdump > > to "kdump-tempfiles.conf"? Thanks for the suggestion! Unfortunately, it doesn't work. It seems there is a sort of overlay fs for /run according to the following experiment and delete a file in the upper layer never delete a file in the lower layer, 1. mount the guest image with tools like guestmount 2. mkdir /run/lock/ && touch /run/lock/kdump 3. umount the guest image and start the guest VM 4. delete /run/lock/kdump in the running VM 5. shutdown the VM and mount the guest image 6. /run/lock/kdump is still there > > BTW, is the left-over lock file specific to virt-customize, or do you see > the same when you update kexec-tools manually on a physical installation (or > even in a stand-alone, running guest)? This left-over lock is specific to virt-customize. Updating kexec-tools in a running guest doesn't have this issue. For kexec-tools, we are completely fine with this left-over lock file. If it doesn't creates a problem for virt-customize, maybe we can tolerate it. Btw, you mentioned --firstboot-command last week, is it better for updating packages then --run-command? One problem with --firstboot-command is I notice /etc/rc.d/rc*.d/S99virt-sysprep-firstboot somehow isn't executed so it doesn't take info effect. (In reply to Coiby from comment #13) > (In reply to Laszlo Ersek from comment #12) > > Hi Coiby, "/usr/lib/tmpfiles.d/x11.conf" contains an example like this: > > > > # Unlink the X11 lock files > > r! /tmp/.X[0-9]*-lock > > > > It is explained by the manual tmpfiles.d(5). It means that during boot, and > > only during boot, these lock files are removed. > > > > Would this scheme work for kexec-tools? The lock file could remain in place > > after the upgrade finishes, but at next boot, it should be cleaned up. Can > > you try adding > > > > r! /run/lock/kdump > > > > to "kdump-tempfiles.conf"? > > Thanks for the suggestion! Unfortunately, it doesn't work. It seems there is > a sort of overlay fs for /run according to the following experiment and > delete a file in the upper layer never delete a file in the lower layer, > 1. mount the guest image with tools like guestmount > 2. mkdir /run/lock/ && touch /run/lock/kdump > 3. umount the guest image and start the guest VM > 4. delete /run/lock/kdump in the running VM > 5. shutdown the VM and mount the guest image > 6. /run/lock/kdump is still there This is weird. However, it does not mean that my suggestion does not work. I didn't try to suggest an approach for removing the file at shutdown. My suggestion was a tmpfiles.d(5) addition for removing the stale lockfile *at next boot*. > > > > BTW, is the left-over lock file specific to virt-customize, or do you see > > the same when you update kexec-tools manually on a physical installation (or > > even in a stand-alone, running guest)? > > This left-over lock is specific to virt-customize. Updating kexec-tools in a > running guest doesn't have this issue. For kexec-tools, we are completely > fine with this left-over lock file. If it doesn't creates a problem for > virt-customize, maybe we can tolerate it. Yeah, let's just ignore it then. > Btw, you mentioned > --firstboot-command last week, is it better for updating packages then > --run-command? There are advantages and disadvantages. One advantage is that the package upgrade runs in the actual guest environment. One disadvantage is that the firstboot service runs in the virtd_exec_t SELinux context, and that context doesn't currently have permission to transition to rpm_script_t, and therefore package install scripts cannot be executed (without disabling SELinux). > One problem with --firstboot-command is I notice > /etc/rc.d/rc*.d/S99virt-sysprep-firstboot somehow isn't executed so it > doesn't take info effect. The firstboot facility attempts to cover both the SysV init flavor and the systemd flavor, so it installs the service for both flavors. The one you are mentioning is irrelevant in Fedora / RHEL guests; for those, the systemd service matters. But really I wouldn't investigate this any longer if you can live with the left-over lock file after shutdown. Again my latest proposal has been to extend tmpfiles.d(5) with a rule that removes the lock file at next boot. Thanks! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (kexec-tools bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2463 |