Description of problem: Cockpit [1] tests related to encrypted volume resizing are failing on the newly introduced in CI fedora 34 image. Version-Release number of selected component (if applicable): cryptsetup-2.3.4-2.fc34.x86_64 systemd-248~rc2-1.fc34.x86_64 $ uname -r 5.10.16-200.fc33.x86_64 How reproducible: Always Steps to Reproduce: 1. Create a LV formated with LUKS or use an existing one 2. run "cryptsetup resize name-of-luks-volume --size target-size" Our test's disk setup looks like this: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 500M 0 disk └─TEST-vol 253:0 0 300M 0 lvm └─luks-c00dc49b-0b69-41f2-8eb4-7e6c1d4c9004 253:1 0 198M 0 crypt /run/foo sr0 11:0 1 366K 0 rom vda 252:0 0 13G 0 disk └─vda1 252:1 0 13G 0 part / And I run: cryptsetup resize /dev/mapper/luks-c00dc49b-0b69-41f2-8eb4-7e6c1d4c9004 --size 610304 Actual results: /run/foo gets automatically unmounted when cryptsetup resize command finishes. Looking at the system journal it's apparent that systemd itself unmounts the /run/foo target. I have enabled systemd debug logs for having more information here. # systemctl status run-foo.mount ○ run-foo.mount - /run/foo Loaded: loaded (/etc/fstab; generated) Active: inactive (dead) since Wed 2021-03-03 13:22:33 UTC; 20min ago Where: /run/foo What: /dev/disk/by-uuid/43ca09ce-f60b-4e8a-8851-cfc9d74f73da Docs: man:fstab(5) man:systemd-fstab-generator(8) CPU: 6ms Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Deactivated successfully. Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Changed unmounting -> dead Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Job 1764 run-foo.mount/stop finished, result=done Mar 03 13:22:33 m1.cockpit.lan systemd[1]: Unmounted /run/foo. Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Consumed 6ms CPU time. Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency BindsTo=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency After=blockdev@dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.target Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency After=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency References=blockdev@dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.target Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency References=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device Expected results: /run/foo will not get unmounted. Additional info: Find in attachments the whole system journal output with debug logs enabled. Interesting is what is happening around line: 'run-foo.mount: About to execute /usr/bin/umount /run/foo -c'
Created attachment 1760395 [details] Journal with debug enabled
cryptsetup does not touch mounted fs. Let's see if systemd team has an idea... (dm device resize definitely generates change event)
One more hint to the journal: kernel: dm-1: detected capacity change from 610304 to 405504 This is the event that caused the unmounting of the /run/foo. This is pressumably wrong, we actually tried to grow the LUKS container, not shrink it. The numbers should be opposite.
I did some more debugging, and the thing that causes systemd to do the unmount is that the SYSTEMD_READY udev property for dm-1 is temporarily set to 0. As far as systemd is concerned, dm-1 has completed disappeared at that point and it cleans up accordingly. KERNEL[57.422018] change /devices/virtual/block/dm-1 (block) ACTION=change DEVPATH=/devices/virtual/block/dm-1 SUBSYSTEM=block RESIZE=1 DEVNAME=/dev/dm-1 DEVTYPE=disk SEQNUM=2295 MAJOR=253 MINOR=1 UDEV [57.424746] change /devices/virtual/block/dm-1 (block) ACTION=change DEVPATH=/devices/virtual/block/dm-1 SUBSYSTEM=block RESIZE=1 DEVNAME=/dev/dm-1 DEVTYPE=disk SEQNUM=2295 USEC_INITIALIZED=29949362 DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1 DM_UDEV_PRIMARY_SOURCE_FLAG=1 DM_UDEV_RULES_VSN=2 DM_NAME=luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 DM_UUID=CRYPT-LUKS1-c9037f7968b54814b7b98cb7f6f916d9-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 DM_SUSPENDED=1 DM_UDEV_DISABLE_OTHER_RULES_FLAG=1 SYSTEMD_READY=0 MAJOR=253 MINOR=1 DEVLINKS=/dev/mapper/luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 /dev/disk/by-id/dm-uuid-CRYPT-LUKS1-c9037f7968b54814b7b98cb7f6f916d9-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 /dev/disk/by-id/dm-name-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 TAGS=:systemd: CURRENT_TAGS=:systemd: KERNEL[57.427809] change /devices/virtual/block/dm-1 (block) ACTION=change DEVPATH=/devices/virtual/block/dm-1 SUBSYSTEM=block DM_COOKIE=6342948 DEVNAME=/dev/dm-1 DEVTYPE=disk SEQNUM=2296 MAJOR=253 MINOR=1 UDEV [57.505807] change /devices/virtual/block/dm-1 (block) ACTION=change DEVPATH=/devices/virtual/block/dm-1 SUBSYSTEM=block DM_COOKIE=6342948 DEVNAME=/dev/dm-1 DEVTYPE=disk SEQNUM=2296 USEC_INITIALIZED=29949362 DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1 DM_UDEV_PRIMARY_SOURCE_FLAG=1 DM_ACTIVATION=1 DM_NAME=luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 DM_UUID=CRYPT-LUKS1-c9037f7968b54814b7b98cb7f6f916d9-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 DM_SUSPENDED=0 DM_UDEV_RULES_VSN=2 ID_FS_LABEL=FSYS ID_FS_LABEL_ENC=FSYS ID_FS_UUID=4962827d-782a-4b8a-88c3-80bb5ace0e51 ID_FS_UUID_ENC=4962827d-782a-4b8a-88c3-80bb5ace0e51 ID_FS_VERSION=1.0 ID_FS_TYPE=ext4 ID_FS_USAGE=filesystem .ID_FS_TYPE_NEW=ext4 MAJOR=253 MINOR=1 DEVLINKS=/dev/disk/by-uuid/4962827d-782a-4b8a-88c3-80bb5ace0e51 /dev/mapper/luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 /dev/disk/by-id/dm-name-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 /dev/disk/by-id/dm-uuid-CRYPT-LUKS1-c9037f7968b54814b7b98cb7f6f916d9-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 /dev/disk/by-label/FSYS TAGS=:systemd: CURRENT_TAGS=:systemd:
Looking further, I think the initial trigger here is DM_SUSPENDED=1. Something temporarily suspends dm-1 and sends an event about it. On Fedora 33, there is no event, and maybe dm-1 doesn't even get temporarily suspended. From DM_SUSPENDED=1 we get DM_UDEV_DISABLE_OTHER_RULES_FLAG=1 which (I think) skips running blkid, which leaves ID_FS_TYPE empty, which results in SYSTEMD_READY=0 for a DM_UUID that starts with CRYPT-. In any case, suspending a device mapper device does not warrant unmounting a filesystem on it, imo. This rule in 99-systemd.rules is probably triggered unintentionally here: SUBSYSTEM=="block", ENV{DM_UUID}=="CRYPT-*", ENV{ID_PART_TABLE_TYPE}=="", ENV{ID_FS_USAGE}=="", \ ENV{SYSTEMD_READY}="0"
(In reply to Marius Vollmer from comment #5) > Looking further, I think the initial trigger here is DM_SUSPENDED=1. > Something temporarily suspends dm-1 and sends an event about it. On Fedora > 33, there is no event, and maybe dm-1 doesn't even get temporarily suspended. Just a remark from DM perspective: dm-crypt device gets suspended during resize (and always was). It's the usual cycle for changing active DM device table. 1) new table gets loaded into inactive slot (with new size) 2) device gets suspended 3) device gets resumed with new table in active (effective) slot. If SYSTEMD_READY=0 is set in reaction to device suspend event... it's broken.
(In reply to Ondrej Kozina from comment #6) > If SYSTEMD_READY=0 is set in reaction to device suspend event... it's broken. Yeah, I think this was always broken, but we got lucky on Fedora 33 and earlier that no uevent was generated. Just running `dmsetup suspend ...` does not generate an event on any platform that I have tested, but arguably it should.
(In reply to Ondrej Kozina from comment #6) > If SYSTEMD_READY=0 is set in reaction to device suspend event... it's broken. I'd say removing ID_FS_TYPE in reaction to device suspend is broken as well, and is the thing that needs fixing.
(In reply to Marius Vollmer from comment #7) > Just running `dmsetup suspend ...` > does not generate an event on any platform that I have tested, but arguably > it should. I take that back... Is there any point in running udev rules on a suspended device? Without being an expert here, what seems to make sense to me is to skip all rule processing for suspended devices (and leave their udev properties unchanged), and run the rules when the device is resumed. This should remove a lot of complexity, no? Anyway, you guys figure it out. :-)
This turns out to be hard to reproduce. I made a reproducer script for bug 1985288, so let me attach it here as well. The bug is triggered only accasionally, but the script has a commented out section that triggers it reliably (using dmsetup suspend instead of cryptsetup).
Created attachment 1859039 [details] Reproducer
This still affects every Fedora version since F34. However, as F36 is around the corner, and thus F34 EOL at some point, moving to F35.