Bug 1934567 - cryptsetup resize on a LUKS container on top of an LVM logical volume unmounts the filesystem contained on the LUKS
Summary: cryptsetup resize on a LUKS container on top of an LVM logical volume unmount...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 35
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: CockpitTest
Depends On:
Blocks: 1985288
TreeView+ depends on / blocked
 
Reported: 2021-03-03 13:53 UTC by Katerina Koukiou
Modified: 2022-08-24 09:25 UTC (History)
19 users (show)

Fixed In Version:
Clone Of:
: 1985288 (view as bug list)
Environment:
Last Closed: 2022-08-23 17:27:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Journal with debug enabled (5.56 MB, text/plain)
2021-03-03 13:54 UTC, Katerina Koukiou
no flags Details
Reproducer (1.35 KB, text/plain)
2022-02-04 12:20 UTC, Marius Vollmer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github systemd systemd pull 24177 0 None Merged rules: do not "unready" suspended encrypted devices w/o superblock info 2022-08-23 17:27:00 UTC

Description Katerina Koukiou 2021-03-03 13:53:45 UTC
Description of problem:

Cockpit [1] tests related to encrypted volume resizing are failing on the newly introduced in CI fedora 34 image.

Version-Release number of selected component (if applicable):

cryptsetup-2.3.4-2.fc34.x86_64
systemd-248~rc2-1.fc34.x86_64

$ uname -r
5.10.16-200.fc33.x86_64

How reproducible:
Always


Steps to Reproduce:

1. Create a LV formated with LUKS or use an existing one
2. run "cryptsetup resize name-of-luks-volume --size target-size"

Our test's disk setup looks like this:

# lsblk
NAME                                          MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                                             8:0    0  500M  0 disk  
└─TEST-vol                                    253:0    0  300M  0 lvm   
  └─luks-c00dc49b-0b69-41f2-8eb4-7e6c1d4c9004 253:1    0  198M  0 crypt /run/foo
sr0                                            11:0    1  366K  0 rom   
vda                                           252:0    0   13G  0 disk  
└─vda1                                        252:1    0   13G  0 part  /

And I run:
cryptsetup resize /dev/mapper/luks-c00dc49b-0b69-41f2-8eb4-7e6c1d4c9004 --size 610304

Actual results:

/run/foo gets automatically unmounted when cryptsetup resize command finishes. Looking at the system journal it's apparent that systemd itself unmounts the /run/foo target. 

I have enabled systemd debug logs for having more information here.

# systemctl status run-foo.mount
○ run-foo.mount - /run/foo
     Loaded: loaded (/etc/fstab; generated)
     Active: inactive (dead) since Wed 2021-03-03 13:22:33 UTC; 20min ago
      Where: /run/foo
       What: /dev/disk/by-uuid/43ca09ce-f60b-4e8a-8851-cfc9d74f73da
       Docs: man:fstab(5)
             man:systemd-fstab-generator(8)
        CPU: 6ms

Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Deactivated successfully.
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Changed unmounting -> dead
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Job 1764 run-foo.mount/stop finished, result=done
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: Unmounted /run/foo.
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Consumed 6ms CPU time.
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency BindsTo=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency After=blockdev@dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.target
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency After=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency References=blockdev@dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.target
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency References=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device


Expected results:

/run/foo will not get unmounted.

Additional info:

Find in attachments the whole system journal output with debug logs enabled.

Interesting is what is happening around line: 'run-foo.mount: About to execute /usr/bin/umount /run/foo -c'

Comment 1 Katerina Koukiou 2021-03-03 13:54:43 UTC
Created attachment 1760395 [details]
Journal with debug enabled

Comment 2 Ondrej Kozina 2021-03-03 14:05:26 UTC
cryptsetup does not touch mounted fs. Let's see if systemd team has an idea... (dm device resize definitely generates change event)

Comment 3 Katerina Koukiou 2021-03-03 14:48:09 UTC
One more hint to the journal:

kernel: dm-1: detected capacity change from 610304 to 405504


This is the event that caused the unmounting of the /run/foo.
This is pressumably wrong, we actually tried to grow the LUKS container, not shrink it. The numbers should be opposite.

Comment 4 Marius Vollmer 2021-03-04 10:34:27 UTC
I did some more debugging, and the thing that causes systemd to do the unmount is that the SYSTEMD_READY udev property for dm-1 is temporarily set to 0.  As far as systemd is concerned, dm-1 has completed disappeared at that point and it cleans up accordingly.

KERNEL[57.422018] change   /devices/virtual/block/dm-1 (block)
ACTION=change
DEVPATH=/devices/virtual/block/dm-1
SUBSYSTEM=block
RESIZE=1
DEVNAME=/dev/dm-1
DEVTYPE=disk
SEQNUM=2295
MAJOR=253
MINOR=1

UDEV  [57.424746] change   /devices/virtual/block/dm-1 (block)
ACTION=change
DEVPATH=/devices/virtual/block/dm-1
SUBSYSTEM=block
RESIZE=1
DEVNAME=/dev/dm-1
DEVTYPE=disk
SEQNUM=2295
USEC_INITIALIZED=29949362
DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1
DM_UDEV_PRIMARY_SOURCE_FLAG=1
DM_UDEV_RULES_VSN=2
DM_NAME=luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9
DM_UUID=CRYPT-LUKS1-c9037f7968b54814b7b98cb7f6f916d9-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9
DM_SUSPENDED=1
DM_UDEV_DISABLE_OTHER_RULES_FLAG=1
SYSTEMD_READY=0
MAJOR=253
MINOR=1
DEVLINKS=/dev/mapper/luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 /dev/disk/by-id/dm-uuid-CRYPT-LUKS1-c9037f7968b54814b7b98cb7f6f916d9-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 /dev/disk/by-id/dm-name-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9
TAGS=:systemd:
CURRENT_TAGS=:systemd:

KERNEL[57.427809] change   /devices/virtual/block/dm-1 (block)
ACTION=change
DEVPATH=/devices/virtual/block/dm-1
SUBSYSTEM=block
DM_COOKIE=6342948
DEVNAME=/dev/dm-1
DEVTYPE=disk
SEQNUM=2296
MAJOR=253
MINOR=1

UDEV  [57.505807] change   /devices/virtual/block/dm-1 (block)
ACTION=change
DEVPATH=/devices/virtual/block/dm-1
SUBSYSTEM=block
DM_COOKIE=6342948
DEVNAME=/dev/dm-1
DEVTYPE=disk
SEQNUM=2296
USEC_INITIALIZED=29949362
DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1
DM_UDEV_PRIMARY_SOURCE_FLAG=1
DM_ACTIVATION=1
DM_NAME=luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9
DM_UUID=CRYPT-LUKS1-c9037f7968b54814b7b98cb7f6f916d9-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9
DM_SUSPENDED=0
DM_UDEV_RULES_VSN=2
ID_FS_LABEL=FSYS
ID_FS_LABEL_ENC=FSYS
ID_FS_UUID=4962827d-782a-4b8a-88c3-80bb5ace0e51
ID_FS_UUID_ENC=4962827d-782a-4b8a-88c3-80bb5ace0e51
ID_FS_VERSION=1.0
ID_FS_TYPE=ext4
ID_FS_USAGE=filesystem
.ID_FS_TYPE_NEW=ext4
MAJOR=253
MINOR=1
DEVLINKS=/dev/disk/by-uuid/4962827d-782a-4b8a-88c3-80bb5ace0e51 /dev/mapper/luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 /dev/disk/by-id/dm-name-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 /dev/disk/by-id/dm-uuid-CRYPT-LUKS1-c9037f7968b54814b7b98cb7f6f916d9-luks-c9037f79-68b5-4814-b7b9-8cb7f6f916d9 /dev/disk/by-label/FSYS
TAGS=:systemd:
CURRENT_TAGS=:systemd:

Comment 5 Marius Vollmer 2021-03-04 11:49:35 UTC
Looking further, I think the initial trigger here is DM_SUSPENDED=1.  Something temporarily suspends dm-1 and sends an event about it.  On Fedora 33, there is no event, and maybe dm-1 doesn't even get temporarily suspended.

From DM_SUSPENDED=1 we get DM_UDEV_DISABLE_OTHER_RULES_FLAG=1 which (I think) skips running blkid, which leaves ID_FS_TYPE empty, which results in SYSTEMD_READY=0 for a DM_UUID that starts with CRYPT-.

In any case, suspending a device mapper device does not warrant unmounting a filesystem on it, imo.  This rule in 99-systemd.rules is probably triggered unintentionally here:

    SUBSYSTEM=="block", ENV{DM_UUID}=="CRYPT-*", ENV{ID_PART_TABLE_TYPE}=="", ENV{ID_FS_USAGE}=="", \
    ENV{SYSTEMD_READY}="0"

Comment 6 Ondrej Kozina 2021-03-04 14:29:49 UTC
(In reply to Marius Vollmer from comment #5)
> Looking further, I think the initial trigger here is DM_SUSPENDED=1. 
> Something temporarily suspends dm-1 and sends an event about it.  On Fedora
> 33, there is no event, and maybe dm-1 doesn't even get temporarily suspended.

Just a remark from DM perspective:

dm-crypt device gets suspended during resize (and always was). It's the usual cycle for changing active DM device table.

1) new table gets loaded into inactive slot (with new size)
2) device gets suspended
3) device gets resumed with new table in active (effective) slot.

If SYSTEMD_READY=0 is set in reaction to device suspend event... it's broken.

Comment 7 Marius Vollmer 2021-03-05 07:13:21 UTC
(In reply to Ondrej Kozina from comment #6)

> If SYSTEMD_READY=0 is set in reaction to device suspend event... it's broken.

Yeah, I think this was always broken, but we got lucky on Fedora 33 and earlier that no uevent was generated.  Just running `dmsetup suspend ...` does not generate an event on any platform that I have tested, but arguably it should.

Comment 8 Marius Vollmer 2021-03-05 08:03:31 UTC
(In reply to Ondrej Kozina from comment #6)

> If SYSTEMD_READY=0 is set in reaction to device suspend event... it's broken.

I'd say removing ID_FS_TYPE in reaction to device suspend is broken as well, and is the thing that needs fixing.

Comment 9 Marius Vollmer 2021-03-05 08:07:49 UTC
(In reply to Marius Vollmer from comment #7)
> Just running `dmsetup suspend ...`
> does not generate an event on any platform that I have tested, but arguably
> it should.

I take that back... Is there any point in running udev rules on a suspended device?  Without being an expert here, what seems to make sense to me is to skip all rule processing for suspended devices (and leave their udev properties unchanged), and run the rules when the device is resumed.  This should remove a lot of complexity, no?

Anyway, you guys figure it out. :-)

Comment 10 Marius Vollmer 2022-02-04 12:19:49 UTC
This turns out to be hard to reproduce.  I made a reproducer script for bug 1985288, so let me attach it here as well.  The bug is triggered only accasionally, but the script has a commented out section that triggers it reliably (using dmsetup suspend instead of cryptsetup).

Comment 11 Marius Vollmer 2022-02-04 12:20:31 UTC
Created attachment 1859039 [details]
Reproducer

Comment 12 Martin Pitt 2022-03-25 10:03:10 UTC
This still affects every Fedora version since F34. However, as F36 is around the corner, and thus F34 EOL at some point, moving to F35.


Note You need to log in before you can comment on or make changes to this bug.