Bug 2158628
Summary: | lvextend of a volume backed by a thin pool sometimes triggers unmount | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Steve Baker <sbaker> | ||||||||||
Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> | ||||||||||
lvm2 sub component: | Udev | QA Contact: | cluster-qe <cluster-qe> | ||||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||||
Severity: | high | ||||||||||||
Priority: | urgent | CC: | afazekas, agk, alfrgarc, apevec, bstinson, cmarthal, dhughes, dlehman, dtardon, heinzm, hjensas, jbrassow, jgrosso, jwboyer, kthakre, mcsontos, mgarciac, mpatocka, msekleta, msnitzer, mvollmer, prajnoha, pvlasin, rdiazcam, spower, stchen, systemd-maint-list, yuwatana, zkabelac | ||||||||||
Version: | CentOS Stream | Keywords: | Triaged | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | lvm2-2.03.17-6.el9 | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2023-05-09 08:23:51 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Steve Baker
2023-01-05 21:41:33 UTC
Setting to high Severity since this is blocking upstream OpenStack CI and the only workaround is to pin to systemd-250 Created attachment 1936098 [details]
journal of successful growvols run
Created attachment 1936099 [details]
journal of failed growvols run, volume gets unmounted
Please enable systemd debug logging, reproduce and attach the log. Thanks! Created attachment 1938491 [details]
systemd debug of successful boot
Created attachment 1938492 [details]
systemd debug of /home (dm-9) being unmounted after lvextend
The command executed by `/usr/local/sbin/growvols -yv /=8GB /tmp=1GB /var/log=10GB /var/log/audit=2GB /home=1GB /var=100%` when the disk size extended to 50G. sgdisk --new=5:11487232:104857566 --change-name=5:growvols /dev/vda partprobe pvcreate /dev/vda5 vgextend vg /dev/vda5 lvextend --poolmetadatasize +1073741824B /dev/mapper/vg-lv_thinpool /dev/vda5 lvextend -L+46728740864B /dev/mapper/vg-lv_thinpool /dev/vda5 lvextend --size +7998537728B /dev/mapper/vg-lv_root lvextend --size +998244352B /dev/mapper/vg-lv_tmp lvextend --size +9999220736B /dev/mapper/vg-lv_log lvextend --size +1996488704B /dev/mapper/vg-lv_audit #sleep 60 lvextend --size +998244352B /dev/mapper/vg-lv_home lvextend --size +24738004992B /dev/mapper/vg-lv_var xfs_growfs /dev/mapper/vg-lv_root xfs_growfs /dev/mapper/vg-lv_tmp xfs_growfs /dev/mapper/vg-lv_log xfs_growfs /dev/mapper/vg-lv_audit xfs_growfs /dev/mapper/vg-lv_home xfs_growfs /dev/mapper/vg-lv_var The "sleep 60" seams to be able to workaround the issue. If I reboot before lvextend --size +7998537728B /dev/mapper/vg-lv_root lvextend --size +998244352B /dev/mapper/vg-lv_tmp lvextend --size +9999220736B /dev/mapper/vg-lv_log lvextend --size +1996488704B /dev/mapper/vg-lv_audit lvextend --size +998244352B /dev/mapper/vg-lv_home lvextend --size +24738004992B /dev/mapper/vg-lv_var The issue still can happen, likely multiple lvextend "in flight" is required to trigger the issue. Repeating the above 6 lvextend even with minimal sizes (+512B become 4M) can trigger the issue within 1~11 loop. In case I downgrade systemd to 250-12.el9_1.2 before the reboot, the issue disappears. v251 does not seams to be affected, bisecting the change might be possible. bisect ended up: """ 4228306b9d50df9a804859d00e84588a9fc4c4b9 is the first bad commit commit 4228306b9d50df9a804859d00e84588a9fc4c4b9 Author: Yu Watanabe <watanabe.yu+github> Date: Thu Sep 1 01:17:27 2022 +0900 core/device: always update existing devlink or alias units on uevent Previously, existing device units for devlinks or aliases were not removed unless the main device unit is removed. This makes all existing device units for devlinks and aliases are checked if they are still required, and remove if not necessary anymore. Fixes #24518. src/core/device.c | 315 +++++++++++++++++++++++++----------------------------- 1 file changed, 146 insertions(+), 169 deletions(-) """ Not verified yet, I hope the random noise did not lead to very bad results.. Looks like increasing the vpcu count in the vm makes the issue less reproducible. Yu, do you think LVM volumes backed by a thin provisioned pool changes some timing assumptions in commit 4228306b9d50df9a804859d00e84588a9fc4c4b9, causing this issue? Thank you for bisecting the commits. The issue is caused by that 13-dm-disk.rules does not enable device node symlink (SYMLINK+=) based on the filesystem label (and also by UUID). The upstream of lvm2 has a fix to address an issue something similar, and the fix is included in v2.03.15. https://github.com/lvmteam/lvm2/commit/e10f67e91728f1e576803df884049ecbd92874d0 Note, no .rules files provided by LVM2 package contain IMPORT{db}="ID_FS_LABEL_ENC", but the import is done by 11-dm-parts.rules, which is provided by kpartx.rpm (at least on Fedora 37, I am not familiar with RHEL, sorry). Summary, please try LVM2-2.03.15 or newer with kpartx.rpm. CS9 had lvm2-2.03.16-1.el9 since June 2022 and kpartx-0.8.7 since ever, so it must be something else? Ah, 11-dm-parts.rules from kpartx.rpm does not work for LVM devices, which satisfy DM_UUID=="LLVM-*". So, kpartx.rpm is not relevant here, sorry. And, https://github.com/lvmteam/lvm2/commit/e10f67e91728f1e576803df884049ecbd92874d0 is not enough to fix the issue. Could you test the following? ============= diff --git a/udev/13-dm-disk.rules.in b/udev/13-dm-disk.rules.in index 5cc08121e..dca00bc01 100644 --- a/udev/13-dm-disk.rules.in +++ b/udev/13-dm-disk.rules.in @@ -17,12 +17,22 @@ ENV{DM_UDEV_DISABLE_DISK_RULES_FLAG}=="1", GOTO="dm_end" SYMLINK+="disk/by-id/dm-name-$env{DM_NAME}" ENV{DM_UUID}=="?*", SYMLINK+="disk/by-id/dm-uuid-$env{DM_UUID}" -ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_link" -ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_link" +ENV{DM_SUSPENDED}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_import" +ENV{DM_NOSCAN}=="1", ENV{DM_UDEV_PRIMARY_SOURCE_FLAG}=="1", GOTO="dm_import" ENV{DM_SUSPENDED}=="1", GOTO="dm_end" ENV{DM_NOSCAN}=="1", GOTO="dm_watch" (BLKID_RULE) +GOTO="dm_link" + +LABEL="dm_import" +IMPORT{db}="ID_FS_USAGE" +IMPORT{db}="ID_FS_UUID_ENC" +IMPORT{db}="ID_FS_LABEL_ENC" +IMPORT{db}="ID_PART_ENTRY_NAME" +IMPORT{db}="ID_PART_ENTRY_UUID" +IMPORT{db}="ID_PART_ENTRY_SCHEME" +IMPORT{db}="ID_PART_GPT_AUTO_ROOT" LABEL="dm_link" ENV{DM_UDEV_LOW_PRIORITY_FLAG}=="1", OPTIONS="link_priority=-100" The above patch is submitted as https://github.com/lvmteam/lvm2/pull/105 (In reply to Yu Watanabe from comment #20) > The above patch is submitted as https://github.com/lvmteam/lvm2/pull/105 I've tested this change on my reproducer image for 17 runs with zero failures. We could install a modified /usr/lib/udev/rules.d/13-dm-disk.rules on the image until this fix is packaged in a device-mapper-9 rpm. This will give some more test coverage and unblock our release pipeline. The issue here is that there are two uevents generated with a race. From my test, I can see (the dm-6 is the top level thin LV): KERNEL[1632.888585] change /devices/virtual/block/dm-6 (block) ACTION=change DEVPATH=/devices/virtual/block/dm-6 SUBSYSTEM=block RESIZE=1 DEVNAME=/dev/dm-6 DEVTYPE=disk DISKSEQ=26 SEQNUM=2942 MAJOR=253 MINOR=6 KERNEL[1632.889364] change /devices/virtual/block/dm-6 (block) ACTION=change DEVPATH=/devices/virtual/block/dm-6 SUBSYSTEM=block DM_COOKIE=6333296 DEVNAME=/dev/dm-6 DEVTYPE=disk DISKSEQ=26 SEQNUM=2943 MAJOR=253 MINOR=6 The first uevent is generated to notify about the change in size (it contains RESIZE="1"). This is relatively new in kernel: https://github.com/torvalds/linux/commit/e598a72faeb543599bdf0d930df3a71906404e6f The second uevent notifies about the DM device being resumed, that is, flipping DM device state from "suspended" to "active". While the first uevent is being processed by udev, the DM device doesn't need to be resumed yet and so we may see the device as suspended. This causes blkid scan to be skipped and so the appropriate ID_FS_* variables are not set. As described earlier somewhere in this thread (or the github PR from comment #20), this causes systemd to unmount the mount point for which we lose the appropriate identification. To fix this, we either need to reorder the uevents so that the "resize" one goes after "DM resume" one - that would be probably more correct solution to this. Or, alternatively, we make the udev rules to simply count with this situation (which is basically what is proposed in PR from comment #20). Since it's easier to change userspace and we need to fix this quickly, we will change the udev rules. Applied patch from comment #20: https://sourceware.org/git/?p=lvm2.git;a=commit;h=94f77a4d8d9737fca05fb4e451678ec440c68670 Created attachment 1942712 [details]
A patch for the upstream kernel
Hi
Here I'm submitting the upstream kernel patch for this bug. Please test if it helps.
For the purpose of solving the issue described in this BZ report, the fix from comment #19 (comment #20) is enough - that is a patch for 13-dm-disk.rules which belong to lvm2 package. This fix will cause the ID_FS_* variables in udev to not be temporarily lost while DM/LVM device is suspended and we happen to receive a uevent during this suspended period. Such situation happens during DM/LVM device resize where we suspend the device first, the we resize it (so there is the CHANGE uevent with RESIZE="1") and then we resume the device (so there is another CHANGE event notifying about the resume itself). Keeping the ID_FS_* variables is important to not lose the /dev/disk/* content and also for systemd to trigger unmount operation. The kernel patch from comment #26 is the more correct solution to this which will cause that NO other udev rule is skipped during resize operation (not just 13-dm-disk and associated systemd hooks), because it won't have a chance to see the device as suspended at all. Marking Verified:Tested in the latest rpms. kernel-5.14.0-252.el9 BUILT: Wed Feb 1 03:30:10 PM CET 2023 lvm2-2.03.17-6.el9 BUILT: Thu Feb 9 09:52:52 PM CET 2023 lvm2-libs-2.03.17-6.el9 BUILT: Thu Feb 9 09:52:52 PM CET 2023 [root@virt-506 ~]# vgcreate test /dev/sd[ab] Physical volume "/dev/sda" successfully created. Physical volume "/dev/sdb" successfully created. Volume group "test" successfully created [root@virt-506 ~]# lvcreate -L200M test Logical volume "lvol0" created. [root@virt-506 ~]# mkfs.xfs /dev/test/lvol0 meta-data=/dev/test/lvol0 isize=512 agcount=4, agsize=12800 blks [...] [root@virt-506 ~]# mount /dev/test/lvol0 /mnt [root@virt-506 ~]# ls -l /dev/mapper/ lrwxrwxrwx. 1 root root 7 Feb 13 17:02 test-lvol0 -> ../dm-2 [root@virt-506 ~]# udevadm info --name /dev/mapper/test-lvol0 | grep ID_FS_ E: ID_FS_UUID=44b900ba-3d39-4870-a8ca-6a0acba5b882 E: ID_FS_UUID_ENC=44b900ba-3d39-4870-a8ca-6a0acba5b882 E: ID_FS_SIZE=204111872 E: ID_FS_LASTBLOCK=51200 E: ID_FS_BLOCKSIZE=4096 E: ID_FS_TYPE=xfs E: ID_FS_USAGE=filesystem [root@virt-506 ~]# dmsetup suspend /dev/dm-2 [root@virt-506 ~]# echo change > /sys/block/dm-2/uevent [root@virt-506 ~]# udevadm info --name /dev/mapper/test-lvol0 | grep ID_FS_ E: ID_FS_USAGE=filesystem E: ID_FS_UUID_ENC=44b900ba-3d39-4870-a8ca-6a0acba5b882 [root@virt-506 ~]# mount /dev/mapper/test-lvol0 on /mnt type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) [root@virt-506 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/test-lvol0 199328 1660 197668 1% /mnt This was probably also fixed with https://github.com/systemd/systemd/pull/24177. (In reply to Marius Vollmer from comment #33) > This was probably also fixed with > https://github.com/systemd/systemd/pull/24177. Ah, no, the systemd change was about SYSTEMD_READY, this here is about more attributes with the same problem. Marking VERIFIED in the latest build as well. kernel-5.14.0-252.el9 BUILT: Wed Feb 1 03:30:10 PM CET 2023 lvm2-2.03.17-7.el9 BUILT: Thu Feb 16 03:24:54 PM CET 2023 lvm2-libs-2.03.17-7.el9 BUILT: Thu Feb 16 03:24:54 PM CET 2023 [root@virt-497 ~]# udevadm info --name /dev/mapper/test-lvol0 | grep ID_FS_ E: ID_FS_UUID=940b98d7-392d-4e9b-99e5-6afd78be88d8 E: ID_FS_UUID_ENC=940b98d7-392d-4e9b-99e5-6afd78be88d8 E: ID_FS_SIZE=204111872 E: ID_FS_LASTBLOCK=51200 E: ID_FS_BLOCKSIZE=4096 E: ID_FS_TYPE=xfs E: ID_FS_USAGE=filesystem [root@virt-497 ~]# dmsetup suspend /dev/dm-2 [root@virt-497 ~]# echo change > /sys/block/dm-2/uevent [root@virt-497 ~]# udevadm info --name /dev/mapper/test-lvol0 | grep ID_FS_ E: ID_FS_USAGE=filesystem E: ID_FS_UUID_ENC=940b98d7-392d-4e9b-99e5-6afd78be88d8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2544 |