Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
We see this on RHEL 9 all the time, so cloning the bug to track it there as well. It breaks quite a number of scenarios.
See bug 1934567 for some initial discussion, I trimmed the comments here to make this easier to read.
+++ This bug was initially created as a clone of Bug #1934567 +++
Description of problem:
Cockpit [1] tests related to encrypted volume resizing are failing on the newly introduced in CI fedora 34 image.
Version-Release number of selected component (if applicable):
cryptsetup-2.3.4-2.fc34.x86_64
systemd-248~rc2-1.fc34.x86_64
$ uname -r
5.10.16-200.fc33.x86_64
How reproducible:
Always
Steps to Reproduce:
1. Create a LV formated with LUKS or use an existing one
2. run "cryptsetup resize name-of-luks-volume --size target-size"
Our test's disk setup looks like this:
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 500M 0 disk
└─TEST-vol 253:0 0 300M 0 lvm
└─luks-c00dc49b-0b69-41f2-8eb4-7e6c1d4c9004 253:1 0 198M 0 crypt /run/foo
sr0 11:0 1 366K 0 rom
vda 252:0 0 13G 0 disk
└─vda1 252:1 0 13G 0 part /
And I run:
cryptsetup resize /dev/mapper/luks-c00dc49b-0b69-41f2-8eb4-7e6c1d4c9004 --size 610304
Actual results:
/run/foo gets automatically unmounted when cryptsetup resize command finishes. Looking at the system journal it's apparent that systemd itself unmounts the /run/foo target.
I have enabled systemd debug logs for having more information here.
# systemctl status run-foo.mount
○ run-foo.mount - /run/foo
Loaded: loaded (/etc/fstab; generated)
Active: inactive (dead) since Wed 2021-03-03 13:22:33 UTC; 20min ago
Where: /run/foo
What: /dev/disk/by-uuid/43ca09ce-f60b-4e8a-8851-cfc9d74f73da
Docs: man:fstab(5)
man:systemd-fstab-generator(8)
CPU: 6ms
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Deactivated successfully.
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Changed unmounting -> dead
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Job 1764 run-foo.mount/stop finished, result=done
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: Unmounted /run/foo.
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Consumed 6ms CPU time.
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency BindsTo=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency After=blockdev@dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.target
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency After=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency References=blockdev@dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.target
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency References=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device
Expected results:
/run/foo will not get unmounted.
Additional info:
Find in attachments the whole system journal output with debug logs enabled.
Interesting is what is happening around line: 'run-foo.mount: About to execute /usr/bin/umount /run/foo -c'
Expanding the reproducer to translate the complex step 1 into CLI:
modprobe scsi_debug dev_size_mb=300
DEV=/dev/sda # this is usually right for a VM with virtiofs, where the root device is /dev/vda
vgcreate testvg $DEV
lvcreate testvg -n vol -L 200m
cryptsetup luksFormat --pbkdf-memory 32768 /dev/mapper/testvg-vol
cryptsetup luksOpen /dev/mapper/testvg-vol c1
mkfs -t ext4 /dev/mapper/c1
mount /dev/mapper/c1 /mnt/
(In reply to David Tardon from comment #3)
> Seems to work with systemd 249 -> closing.
Our tests are still occasionally failing with systemd-249-9.el9.x86_64. I am afraid the reproducer we have given you were not correct, sorry for that. The important bit is that systemd needs to take an interest in the mount point, and it does that when it is listed in /etc/fstab. I attach a script for reproducing the bug. Just run it a couple of times and you should see "umount: /mnt: not mounted."
When I ran the script in our rhel-9-0 CI image, the bug happened 27 out of 100 times.
The chain of events that causes the unmount is this:
- "cryptsetup resize" temporarily suspends the device
- something triggers a uevent about the device
- udev runs while the device is still suspended:
- DM_SUSPENDED=1
=> ID_FS_TYPE is removed
=> SYSTEMD_READY=0
- systemd unmounts /mnt because the device is treated as if it had disappeared
This is timing sensitive; udev needs to run while the device is suspended. All the remaining things would have happened for many years already I guess, if only udev would have been triggered while a LUKS container is suspended.
Thus, if you need a reliable reproducer, replace "cryptsetup resize" in the BUG script with
dmsetup suspend /dev/mapper/luks0
udevadm trigger
udevadm settle
dmsetup resume /dev/mapper/luks0
Just for completeness is reincarnation of this bug handled via bug #2158628.
Optimal fix is a drop of duplicate CHANGE event in kernel posted in 'suspend'.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (systemd bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2023:2531