RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1985288 - cryptsetup resize on a LUKS container on top of an LVM logical volume unmounts the filesystem contained on the LUKS
Summary: cryptsetup resize on a LUKS container on top of an LVM logical volume unmount...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: systemd
Version: 9.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: beta
: ---
Assignee: Michal Sekletar
QA Contact: Frantisek Sumsal
URL:
Whiteboard: CockpitTest
Depends On: 1934567 2138081
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-23 10:12 UTC by Martin Pitt
Modified: 2023-05-09 10:32 UTC (History)
21 users (show)

Fixed In Version: systemd-252-7.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1934567
Environment:
Last Closed: 2023-05-09 08:21:58 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Reproducer (1.35 KB, text/plain)
2022-02-04 12:12 UTC, Marius Vollmer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github redhat-plumbers systemd-rhel9 pull 145 0 None open (#1985288) test: add coverage for #24177 2023-02-23 10:05:05 UTC
Github systemd systemd pull 24177 0 None Merged rules: do not "unready" suspended encrypted devices w/o superblock info 2022-08-23 17:26:11 UTC
Github systemd systemd pull 26547 0 None Merged test: add coverage for #24177 2023-02-23 10:05:05 UTC
Red Hat Product Errata RHBA-2023:2531 0 None None None 2023-05-09 08:22:25 UTC

Description Martin Pitt 2021-07-23 10:12:48 UTC
We see this on RHEL 9 all the time, so cloning the bug to track it there as well. It breaks quite a number of scenarios.

See bug 1934567 for some initial discussion, I trimmed the comments here to make this easier to read.

+++ This bug was initially created as a clone of Bug #1934567 +++

Description of problem:

Cockpit [1] tests related to encrypted volume resizing are failing on the newly introduced in CI fedora 34 image.

Version-Release number of selected component (if applicable):

cryptsetup-2.3.4-2.fc34.x86_64
systemd-248~rc2-1.fc34.x86_64

$ uname -r
5.10.16-200.fc33.x86_64

How reproducible:
Always


Steps to Reproduce:

1. Create a LV formated with LUKS or use an existing one
2. run "cryptsetup resize name-of-luks-volume --size target-size"

Our test's disk setup looks like this:

# lsblk
NAME                                          MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                                             8:0    0  500M  0 disk  
└─TEST-vol                                    253:0    0  300M  0 lvm   
  └─luks-c00dc49b-0b69-41f2-8eb4-7e6c1d4c9004 253:1    0  198M  0 crypt /run/foo
sr0                                            11:0    1  366K  0 rom   
vda                                           252:0    0   13G  0 disk  
└─vda1                                        252:1    0   13G  0 part  /

And I run:
cryptsetup resize /dev/mapper/luks-c00dc49b-0b69-41f2-8eb4-7e6c1d4c9004 --size 610304

Actual results:

/run/foo gets automatically unmounted when cryptsetup resize command finishes. Looking at the system journal it's apparent that systemd itself unmounts the /run/foo target. 

I have enabled systemd debug logs for having more information here.

# systemctl status run-foo.mount
○ run-foo.mount - /run/foo
     Loaded: loaded (/etc/fstab; generated)
     Active: inactive (dead) since Wed 2021-03-03 13:22:33 UTC; 20min ago
      Where: /run/foo
       What: /dev/disk/by-uuid/43ca09ce-f60b-4e8a-8851-cfc9d74f73da
       Docs: man:fstab(5)
             man:systemd-fstab-generator(8)
        CPU: 6ms

Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Deactivated successfully.
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Changed unmounting -> dead
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Job 1764 run-foo.mount/stop finished, result=done
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: Unmounted /run/foo.
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: Consumed 6ms CPU time.
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency BindsTo=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency After=blockdev@dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.target
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency After=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency References=blockdev@dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.target
Mar 03 13:22:33 m1.cockpit.lan systemd[1]: run-foo.mount: lost dependency References=dev-mapper-luks\x2dc00dc49b\x2d0b69\x2d41f2\x2d8eb4\x2d7e6c1d4c9004.device


Expected results:

/run/foo will not get unmounted.

Additional info:

Find in attachments the whole system journal output with debug logs enabled.

Interesting is what is happening around line: 'run-foo.mount: About to execute /usr/bin/umount /run/foo -c'

Comment 1 Martin Pitt 2021-07-23 10:28:38 UTC
Expanding the reproducer to translate the complex step 1 into CLI:

modprobe scsi_debug dev_size_mb=300
DEV=/dev/sda  # this is usually right for a VM with virtiofs, where the root device is /dev/vda
vgcreate testvg $DEV
lvcreate testvg -n vol -L 200m
cryptsetup luksFormat --pbkdf-memory 32768 /dev/mapper/testvg-vol
cryptsetup luksOpen /dev/mapper/testvg-vol c1
mkfs -t ext4 /dev/mapper/c1
mount /dev/mapper/c1 /mnt/

Comment 3 David Tardon 2022-02-03 13:44:03 UTC
Seems to work with systemd 249 -> closing.

Comment 4 Marius Vollmer 2022-02-04 12:11:03 UTC
(In reply to David Tardon from comment #3)
> Seems to work with systemd 249 -> closing.

Our tests are still occasionally failing with systemd-249-9.el9.x86_64.  I am afraid the reproducer we have given you were not correct, sorry for that.  The important bit is that systemd needs to take an interest in the mount point, and it does that when it is listed in /etc/fstab.  I attach a script for reproducing the bug. Just run it a couple of times and you should see "umount: /mnt: not mounted."

When I ran the script in our rhel-9-0 CI image, the bug happened 27 out of 100 times.

The chain of events that causes the unmount is this:

 - "cryptsetup resize" temporarily suspends the device
 - something triggers a uevent about the device
 - udev runs while the device is still suspended:
   - DM_SUSPENDED=1
   => ID_FS_TYPE is removed
   => SYSTEMD_READY=0
 - systemd unmounts /mnt because the device is treated as if it had disappeared

This is timing sensitive; udev needs to run while the device is suspended. All the remaining things would have happened for many years already I guess, if only udev would have been triggered while a LUKS container is suspended.

Thus, if you need a reliable reproducer, replace "cryptsetup resize" in the BUG script with

    dmsetup suspend /dev/mapper/luks0
    udevadm trigger
    udevadm settle
    dmsetup resume /dev/mapper/luks0

Comment 6 Marius Vollmer 2022-02-04 12:12:20 UTC
Created attachment 1859037 [details]
Reproducer

Comment 8 Martin Pitt 2022-07-10 10:21:44 UTC
We still see this on all Fedora versions, RHEL 9, Debian and Ubuntu releases. This regression isn't fixed anywhere.

Comment 9 Michal Sekletar 2022-08-02 12:03:32 UTC
I've posted the PR with fix upstream.

https://github.com/systemd/systemd/pull/24177

Comment 10 Marius Vollmer 2022-08-08 07:22:07 UTC
(In reply to Michal Sekletar from comment #9)
> I've posted the PR with fix upstream.
> 
> https://github.com/systemd/systemd/pull/24177

Awesome, thanks a lot!

Comment 11 Jan Macku 2023-01-24 12:31:50 UTC
This issue should be fixed via rebase to systemd v252.

Comment 14 Zdenek Kabelac 2023-02-14 13:26:13 UTC
Just for completeness is reincarnation of this bug handled via bug #2158628.

Optimal fix is a drop of duplicate CHANGE event in kernel posted in 'suspend'.

Comment 16 Plumber Bot 2023-02-27 09:26:50 UTC
fix merged to github main branch -> https://github.com/redhat-plumbers/systemd-rhel9/pull/145

Comment 19 errata-xmlrpc 2023-05-09 08:21:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (systemd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2531


Note You need to log in before you can comment on or make changes to this bug.