Bug 2193222 - uncaching or splitcaching write cache volumes with raid+integrity cause 'multipathd[]: libdevmapper: ioctl/libdm-iface.c' failure messages
Summary: uncaching or splitcaching write cache volumes with raid+integrity cause 'mult...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: lvm2
Version: 9.3
Hardware: x86_64
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: LVM Team
QA Contact: cluster-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-04 18:22 UTC by Corey Marthaler
Modified: 2023-08-10 15:41 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-156449 0 None None None 2023-05-04 18:23:15 UTC

Description Corey Marthaler 2023-05-04 18:22:13 UTC
Description of problem:
[root@grant-01 archive]# pvchange --addtag slow /dev/sdg1 /dev/sdb1
  Physical volume "/dev/sdb1" changed
  Physical volume "/dev/sdg1" changed
  2 physical volumes changed / 0 physical volumes not changed
[root@grant-01 archive]# pvchange --addtag fast /dev/sdc1 /dev/sdd1
  Physical volume "/dev/sdc1" changed
  Physical volume "/dev/sdd1" changed
  2 physical volumes changed / 0 physical volumes not changed

[root@grant-01 archive]# lvcreate --yes  --type raid1 -m 1 -L 4G -n display_writecache writecache_sanity @slow
  Logical volume "display_writecache" created.
[root@grant-01 archive]# lvconvert --yes --raidintegrity y writecache_sanity/display_writecache
  Using integrity block size 512 for unknown file system block size, logical block size 512, physical block size 4096.
  Logical volume writecache_sanity/display_writecache has added integrity.

[root@grant-01 archive]# lvcreate --yes  -L 4G -n pool writecache_sanity @fast
  Logical volume "pool" created.
[root@grant-01 archive]# lvchange -an writecache_sanity/pool
[root@grant-01 archive]# lvconvert --yes --type writecache --cachevol writecache_sanity/pool writecache_sanity/display_writecache
  Using writecache block size 4096 for unknown file system block size, logical block size 512, physical block size 4096.
  WARNING: unable to detect a file system block size on writecache_sanity/display_writecache
  WARNING: using a writecache block size larger than the file system block size may corrupt the file system.
  Logical volume writecache_sanity/display_writecache now has writecache.

[root@grant-01 archive]# lvs -a -o +devices,segtype
  LV                                         VG                Attr       LSize  Pool        Origin                                     Data%  Cpy%Sync Devices                                                                     Type
  display_writecache                         writecache_sanity Cwi-a-C---  4.00g [pool_cvol] [display_writecache_wcorig]                0.00            display_writecache_wcorig(0)                                                writecache
  [display_writecache_wcorig]                writecache_sanity rwi-aoC---  4.00g                                                               100.00   display_writecache_wcorig_rimage_0(0),display_writecache_wcorig_rimage_1(0) raid1
  [display_writecache_wcorig_rimage_0]       writecache_sanity gwi-aor---  4.00g             [display_writecache_wcorig_rimage_0_iorig]        100.00   display_writecache_wcorig_rimage_0_iorig(0)                                 integrity
  [display_writecache_wcorig_rimage_0_imeta] writecache_sanity ewi-ao---- 68.00m                                                                        /dev/sdb1(1025)                                                             linear
  [display_writecache_wcorig_rimage_0_iorig] writecache_sanity -wi-ao----  4.00g                                                                        /dev/sdb1(1)                                                                linear
  [display_writecache_wcorig_rimage_1]       writecache_sanity gwi-aor---  4.00g             [display_writecache_wcorig_rimage_1_iorig]        100.00   display_writecache_wcorig_rimage_1_iorig(0)                                 integrity
  [display_writecache_wcorig_rimage_1_imeta] writecache_sanity ewi-ao---- 68.00m                                                                        /dev/sdg1(1025)                                                             linear
  [display_writecache_wcorig_rimage_1_iorig] writecache_sanity -wi-ao----  4.00g                                                                        /dev/sdg1(1)                                                                linear
  [display_writecache_wcorig_rmeta_0]        writecache_sanity ewi-aor---  4.00m                                                                        /dev/sdb1(0)                                                                linear
  [display_writecache_wcorig_rmeta_1]        writecache_sanity ewi-aor---  4.00m                                                                        /dev/sdg1(0)                                                                linear
  [pool_cvol]                                writecache_sanity Cwi-aoC---  4.00g                                                                        /dev/sdc1(0)                                                                linear

[root@grant-01 archive]# lvconvert --yes --uncache writecache_sanity/display_writecache
  Detaching writecache already clean.
  Logical volume writecache_sanity/display_writecache writecache has been detached.
  Logical volume "pool" successfully removed.




May  4 18:14:14 grant-01 dmeventd[380652]: No longer monitoring RAID device writecache_sanity-display_writecache for events.
May  4 18:14:14 grant-01 dmeventd[380652]: Monitoring RAID device writecache_sanity-display_writecache_wcorig for events.
May  4 18:14:39 grant-01 kernel: md/raid1:mdX: active with 2 out of 2 mirrors
May  4 18:14:39 grant-01 dmeventd[380652]: No longer monitoring RAID device writecache_sanity-display_writecache_wcorig for events.
May  4 18:14:39 grant-01 multipathd[1389]: libdevmapper: ioctl/libdm-iface.c(1987): device-mapper: table ioctl on writecache_sanity-display_writecache_wcorig_rimage_0  failed: No such device or address
May  4 18:14:39 grant-01 multipathd[1389]: libdevmapper: ioctl/libdm-iface.c(1987): device-mapper: table ioctl on writecache_sanity-display_writecache_wcorig_rimage_0_iorig  failed: No such device or address
May  4 18:14:39 grant-01 multipathd[1389]: libdevmapper: ioctl/libdm-iface.c(1987): device-mapper: table ioctl on writecache_sanity-display_writecache_wcorig_rimage_1  failed: No such device or address
May  4 18:14:39 grant-01 multipathd[1389]: libdevmapper: ioctl/libdm-iface.c(1987): device-mapper: table ioctl on writecache_sanity-display_writecache_wcorig_rmeta_0  failed: No such device or address
May  4 18:14:39 grant-01 multipathd[1389]: libdevmapper: ioctl/libdm-iface.c(1987): device-mapper: table ioctl on writecache_sanity-display_writecache_wcorig_rmeta_1  failed: No such device or address
May  4 18:14:39 grant-01 dmeventd[380652]: Monitoring RAID device writecache_sanity-display_writecache for events.


Version-Release number of selected component (if applicable):
kernel-5.14.0-302.el9    BUILT: Thu Apr 20 11:48:37 AM CEST 2023
lvm2-2.03.21-1.el9    BUILT: Fri Apr 21 02:33:33 PM CEST 2023
lvm2-libs-2.03.21-1.el9    BUILT: Fri Apr 21 02:33:33 PM CEST 2023


How reproducible:
Everytime

Comment 1 David Teigland 2023-05-04 21:09:59 UTC
multipathd should be not seeing or ignore the internal lvm devices.  I'm also seeing this with just writecache on raid:

multipathd[891]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: table ioctl on ff-rr1_wcorig_rimage_0  failed: No such device or address
multipathd[891]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: table ioctl on ff-rr1_wcorig_rmeta_0 failed: No such device or address
multipathd[891]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: table ioctl on ff-rr1_wcorig_rmeta_1 failed: No such device or address

Comment 2 David Teigland 2023-05-05 18:31:15 UTC
This also happens with dm-cache on raid:

multipathd[891]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: table ioctl on ff-rr2_corig_rimage_0  failed: No such device or ad>
multipathd[891]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: table ioctl on ff-rr2_corig_rimage_0_imeta  failed: No such device>
multipathd[891]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: table ioctl on ff-rr2_corig_rimage_0_iorig  failed: No such device>
multipathd[891]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: table ioctl on ff-rr2_corig_rimage_1  failed: No such device or ad>
multipathd[891]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: table ioctl on ff-rr2_corig_rimage_1_iorig  failed: No such device>
multipathd[891]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: table ioctl on ff-rr2_corig_rmeta_0  failed: No such device or add>
multipathd[891]: libdevmapper: ioctl/libdm-iface.c(1998): device-mapper: table ioctl on ff-rr2_corig_rmeta_1  failed: No such device or add>

I think this is an old issue, resulting from the fact that dm-raid devs do not include a dm uuid suffix. Adding a suffix to dm uuid's is magical way of telling blkid to ignore the device:

https://github.com/util-linux/util-linux/blob/master/lib/sysfs.c#L653

(I'm not sure how multipathd is applying this logic for other dm devs that do use suffixes.)

Comment 3 Ben Marzinski 2023-05-08 21:31:19 UTC
These messages occur when while multipathd is listening for dm events.  When a new dm event occurs, it triggers the multipathd event polling code. The first thing this code does is get a list of all dm devices with the DM_DEVICE_LIST ioctl. Then it calls the DM_DEVICE_TABLE ioctl on each device to see if its a multipath device. If a device is removed after a dm event is triggered, between when the dm device list is populated, and when DM_DEVICE_TABLE ioctl is run on the device, the libdevmapper code will log an error.

Multlipathd can work around this, but not without adding additional pointless work that it will do on all non-multipath devices.


Note You need to log in before you can comment on or make changes to this bug.