Description of problem: The filter is enabled in lvm to scan only local disks. When a user asks to reduce the LUN from SD, a pvmove was initiated by the vdsm to move the data from the LUN. However, the pvmove failed with the below error. == 2021-04-08 14:08:57,235+0530 INFO (tasks/9) [storage.LVM] Moving pv /dev/mapper/3600140502883faba30a48269540d2518 data (vg b60cca61-4081-4b63-b462-e18d40f14e96) (lvm:1304) 2021-04-08 14:09:13,666+0530 WARN (tasks/9) [storage.LVM] All 1 tries have failed: cmd=['/sbin/lvm', 'pvmove', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/3600140502883faba30a48269540d2518$|^/dev/mapper/3600140596c5c258935c4e0da0e085ca1$|^/dev/mapper/36001405d02635dc615a4ee5b104c9b2e$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }', '/dev/mapper/3600140502883faba30a48269540d2518', '/dev/mapper/3600140596c5c258935c4e0da0e085ca1', '/dev/mapper/36001405d02635dc615a4ee5b104c9b2e'] rc=5 err=[' LVM command executed by lvmpolld failed.'] (lvm:538) === The system logs show the below error. === Apr 8 14:08:58 dhcp130-124 lvmpolld[122843]: LVMPOLLD: lvm2 cmd (PID 122846) failed (retcode: 5) === It's because the LVMPOLLD is not able to see the storage domain PVs because of the whitelist configuration in lvm.conf. The pvmove worked well after disabling lvmpolld by editing "use_lvmpolld = 0" in lvm.conf. Also, the pvmove was working when we disable the lvm filter. Version-Release number of selected component (if applicable): vdsm-4.40.40-1.el8ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. Try to remove a LUN from SD in RHV 4.4 environment. The pvmove will fail with the above error. Actual results: Reducing LUNs from storage domain is failing with the error "LVM command executed by lvmpolld failed" Expected results: Reducing LUNs should work. Additional info:
Trying to reproduce this with vdsm 4.4.6 without success so far. I configured iscsi storage domain with two luns and removed the second one (which does not hold metadata) but I'm not observing any pvmove attempt in logs. Perhaps the lun needs to have some data written to it? Please update vdsm to latest version and try reproducing the bug again. Also provide more detailed steps how the bug can be reproduced as well as more logs. vdsm-4.40.60.4-11.git6a1827b00.el8.x86_64
(In reply to Roman Bednář from comment #1) > Trying to reproduce this with vdsm 4.4.6 without success so far. I > configured iscsi storage domain with two luns and removed the second one > (which does not hold metadata) but I'm not observing any pvmove attempt in > logs. Perhaps the lun needs to have some data written to it? > Of course, pvmove will only be initiated when there is data on the 2nd LUN.
Still unable to reproduce, posting more detailed steps below for further reference. Tested again with non empty lun and defaults but log shows successful pvmove. Either there is some other configuration that is required to reproduce this that was not mentioned here yet or the bug is already fixed in latest version. If no further feedback received this bug will be closed. 1) Create iscsi storage domain with multiple luns via GUI 2) Create a disk and VM - attach the disk to that VM to populate lun with data 3) Find which PV/lun was used for this disk, make sure the pv is not used for any other data than the disk, manual pvmove might be needed to change the layout, ex: [root@host]# lvs -o lv_name,devices,lv_size,tags 1c630112-944a-4d43-b01b-59066ba57d76 LV Devices LSize LV Tags 14441d83-841f-4537-ba5e-023e89a34384 /dev/mapper/360014057f34331e872e45959d393ff68(37) 128.00m IU_ea118436-df2d-4901-b7a0-094be2ace5c8,MD_2,PU_00000000-0000-0000-0000-000000000000 342a73c1-5ca4-4684-bcf7-e012fec6dc8d /dev/mapper/360014057f34331e872e45959d393ff68(36) 128.00m IU_337a3d70-1f28-4996-92a9-e031d77951c9,MD_1,PU_00000000-0000-0000-0000-000000000000 f4469643-0a0b-477b-894d-298f7d00b1da /dev/mapper/36001405ef14db8660f14e1ba9f6216fc(0) 1.00g IU_48d5bb02-ffab-40c6-b16f-0cac923f7302,MD_3,PU_00000000-0000-0000-0000-000000000000 ... 4) Check which PV maps to which LUN, ex: [root@host]# multipath -ll /dev/mapper/36001405ef14db8660f14e1ba9f6216fc 36001405ef14db8660f14e1ba9f6216fc dm-0 LIO-ORG,target2-00 size=10G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active `- 6:0:0:0 sda 8:0 active ready running 5) Note the model name, this is the LUN we're removing from GUI. It can be found in Manage Domain dialog window as "Product ID" [root@host]# lsblk -S /dev/sda NAME HCTL TYPE VENDOR MODEL REV TRAN sda 6:0:0:0 disk LIO-ORG target2-00 4.0 iscsi 6) Put SD to maintenance mode, this option is hidden under Compute>Data Centers - Storage tab. (ovirt-engine service might need a restart) 7) Go back to Storage>Domains and under "Manage domain" remove the lun and confirm, check vdsm.log: 2021-05-06 06:03:30,735-0400 INFO (tasks/5) [storage.SANLock] Acquiring Lease(name='SDM', path='/dev/38456991-2572-45a5-86dc-3a78def784c9/leases', offset=1048576) for host id 1, lvb=False (clusterlock:456) 2021-05-06 06:03:30,944-0400 INFO (tasks/5) [storage.SANLock] Successfully acquired Lease(name='SDM', path='/dev/38456991-2572-45a5-86dc-3a78def784c9/leases', offset=1048576) for host id 1 (clusterlock:500) 2021-05-06 06:03:31,017-0400 DEBUG (tasks/5) [storage.LVM] lvs reloaded (lvm:759) 2021-05-06 06:03:31,113-0400 DEBUG (mailbox-spm) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 dd if=/rhev/data-center/4fc1c8aa-6245-11eb-83ae-525400ea2c38/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=8192000 (cwd None) (commands:211) 2021-05-06 06:03:31,127-0400 INFO (tasks/5) [storage.LVM] Moving pv /dev/mapper/360014053ed8a11da57241a08fc5b83f3 data (vg 38456991-2572-45a5-86dc-3a78def784c9) (lvm:1304) 2021-05-06 06:03:31,150-0400 DEBUG (mailbox-spm) [storage.Misc.excCmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n8192000 bytes (8.2 MB, 7.8 MiB) copied, 0.0257069 s, 319 MB/s\n'; <rc> = 0 (commands:224) 2021-05-06 06:03:33,186-0400 DEBUG (mailbox-spm) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-0 dd if=/rhev/data-center/4fc1c8aa-6245-11eb-83ae-525400ea2c38/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=8192000 (cwd None) (commands:211) 2021-05-06 06:03:33,257-0400 DEBUG (mailbox-spm) [storage.Misc.excCmd] SUCCESS: <err> = b'1+0 records in\n1+0 records out\n8192000 bytes (8.2 MB, 7.8 MiB) copied, 0.0467382 s, 175 MB/s\n'; <rc> = 0 (commands:224) vdsm-4.40.60.6-6.gita7d59f0b5.el8.x86_64 ovirt-engine-4.4.5.4-0.0.master.20210131105008.gita122fce6af0.el8.noarch
(In reply to Roman Bednář from comment #3) > [root@host]# lvs -o lv_name,devices,lv_size,tags > 1c630112-944a-4d43-b01b-59066ba57d76 > LV Devices > LSize LV Tags > 14441d83-841f-4537-ba5e-023e89a34384 > /dev/mapper/360014057f34331e872e45959d393ff68(37) 128.00m > IU_ea118436-df2d-4901-b7a0-094be2ace5c8,MD_2,PU_00000000-0000-0000-0000- > 000000000000 > 342a73c1-5ca4-4684-bcf7-e012fec6dc8d > /dev/mapper/360014057f34331e872e45959d393ff68(36) 128.00m > IU_337a3d70-1f28-4996-92a9-e031d77951c9,MD_1,PU_00000000-0000-0000-0000- > 000000000000 > f4469643-0a0b-477b-894d-298f7d00b1da > /dev/mapper/36001405ef14db8660f14e1ba9f6216fc(0) 1.00g > IU_48d5bb02-ffab-40c6-b16f-0cac923f7302,MD_3,PU_00000000-0000-0000-0000- > 000000000000 > ... > I think this host doesn't have LVM filter configured? As mentioned in comment 0, the issue is not reproducible without LVM filter. Also, note that a whitelist LVM filter is added by default in 4.4. I tested again with the latest version available vdsm-4.40.50.10-1.el8ev.x86_64 and can reproduce the issue. Added a new LUN to an existing SD and then removed the old LUN from the portal. No other steps was done to reproduce the issue. === 2021-05-07 11:16:00,136+0530 DEBUG (tasks/0) [common.commands] /usr/bin/taskset --cpu-list 0-15 /usr/bin/sudo -n /sbin/lvm pvmove --config 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/3600140502883faba30a48269540d2518$|^/dev/mapper/3600140596c5c258935c4e0da0e085ca1$|^/dev/mapper/36001405d02635dc615a4ee5b104c9b2e$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' /dev/mapper/3600140502883faba30a48269540d2518 /dev/mapper/3600140596c5c258935c4e0da0e085ca1 /dev/mapper/36001405d02635dc615a4ee5b104c9b2e (cwd None) (commands:153) 2021-05-07 11:16:16,393+0530 WARN (tasks/0) [storage.LVM] All 1 tries have failed: cmd=['/sbin/lvm', 'pvmove', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/3600140502883faba30a48269540d2518$|^/dev/mapper/3600140596c5c258935c4e0da0e085ca1$|^/dev/mapper/36001405d02635dc615a4ee5b104c9b2e$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }', '/dev/mapper/3600140502883faba30a48269540d2518', '/dev/mapper/3600140596c5c258935c4e0da0e085ca1', '/dev/mapper/36001405d02635dc615a4ee5b104c9b2e'] rc=5 err=[' LVM command executed by lvmpolld failed.'] (lvm:538) === Attaching the vdsm.log. In simple, pvmove fails if the lvmpolld is unable to see the SD LUNs. ==== ~]# pvmove --config 'global { use_lvmetad=0 locking_type=1 } devices { filter = [ "a|.*/|" ] }' /dev/mapper/3600140502883faba30a48269540d2518 /dev/mapper/3600140596c5c258935c4e0da0e085ca1 /dev/mapper/36001405d02635dc615a4ee5b104c9b2e Detected pvmove in progress for /dev/mapper/3600140502883faba30a48269540d2518. WARNING: Ignoring remaining command line arguments. /dev/mapper/3600140502883faba30a48269540d2518: Moved: 85.00% LVM command executed by lvmpolld failed. For more information see lvmpolld messages in syslog or lvmpolld log file. ===
Thanks for feedback, managed to reproduce it today. Forgot to check host filter settings, it was indeed missing and it does not need to be a whitelist. In my case, not having any local volumes, "reject all" rule did the trick. Missing filter can be solved using vdsm-tool: # vdsm-tool config-lvm-filter Analyzing host... Found these mounted logical volumes on this host: This is the recommended LVM filter for this host: filter = [ "r|.*|" ] This filter allows LVM to access the local devices used by the hypervisor, but not shared storage owned by Vdsm. If you add a new device to the volume group, you will need to edit the filter manually. Configure host? [yes,NO] yes Configuration completed successfully! Please reboot to verify the configuration. ======== Also there's a way to recover from stuck pv move, although it does not finish the removal. Leaving this here just in case it's needed. Manual pvmove with disabled lvmpolld can be used after activating the SD: # lvs -o+devices --config='devices/filter = ["a|.*|"]' LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices 2acd8ce3-e646-4f09-ab7b-c21468b36da3 38456991-2572-45a5-86dc-3a78def784c9 -wI------- <2.38g pvmove0(0) # pvmove --config 'global { use_lvmetad=0 locking_type=1 use_lvmpolld=0 } devices { filter = [ "a|.*/|" ] }' /dev/mapper/360014053ed8a11da57241a08fc5b83f3 Detected pvmove in progress for /dev/mapper/360014053ed8a11da57241a08fc5b83f3. /dev/mapper/360014053ed8a11da57241a08fc5b83f3: Moved: 100.00% # lvs -o+devices --config='devices/filter = ["a|.*|"]' LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices 2acd8ce3-e646-4f09-ab7b-c21468b36da3 38456991-2572-45a5-86dc-3a78def784c9 -wi-a----- <2.38g /dev/mapper/36001405c46881ff4a2f45dfb48b65c9d(0)
Created attachment 1782006 [details] pvmove_vvvv Investigated vdsm flows but did not find anything strange, it looks like lvm/lvmpolld issue only. I'll loop in lvm guys so they can provide some feedback as well. Attaching vvvv output of attempted pvmove. From vdsm point of view the only thing we could do is to disable the filter which is a bad idea, or disable the use of lvmpolld however I'm not yet sure what the impact might be. # lvs -a -o +devices --config='devices/filter = ["a|.*|"]' 38456991-2572-45a5-86dc-3a78def784c9 LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices 2acd8ce3-e646-4f09-ab7b-c21468b36da3 38456991-2572-45a5-86dc-3a78def784c9 -wi-a----- <2.38g /dev/mapper/360014053eb603c6e08d4b09aa5746b6e(0) ids 38456991-2572-45a5-86dc-3a78def784c9 -wi-ao---- 128.00m /dev/mapper/360014053ed8a11da57241a08fc5b83f3(20) inbox 38456991-2572-45a5-86dc-3a78def784c9 -wi-a----- 128.00m /dev/mapper/360014053ed8a11da57241a08fc5b83f3(21) leases 38456991-2572-45a5-86dc-3a78def784c9 -wi-a----- 2.00g /dev/mapper/360014053ed8a11da57241a08fc5b83f3(23) master 38456991-2572-45a5-86dc-3a78def784c9 -wi-a----- 1.00g /dev/mapper/360014053ed8a11da57241a08fc5b83f3(47) metadata 38456991-2572-45a5-86dc-3a78def784c9 -wi-a----- 128.00m /dev/mapper/360014053ed8a11da57241a08fc5b83f3(19) outbox 38456991-2572-45a5-86dc-3a78def784c9 -wi-a----- 128.00m /dev/mapper/360014053ed8a11da57241a08fc5b83f3(22) xleases 38456991-2572-45a5-86dc-3a78def784c9 -wi-a----- 1.00g /dev/mapper/360014053ed8a11da57241a08fc5b83f3(39) # pvmove --config=' devices { filter = [ "a|^/dev/mapper/360014053eb603c6e08d4b09aa5746b6e$|^/dev/mapper/360014053ed8a11da57241a08fc5b83f3$|^/dev/mapper/36001405c46881ff4a2f45dfb48b65c9d$|", "r|.*|" ] }' -n /dev/38456991-2572-45a5-86dc-3a78def784c9/2acd8ce3-e646-4f09-ab7b-c21468b36da3 /dev/mapper/360014053eb603c6e08d4b09aa5746b6e /dev/mapper/360014053eb603c6e08d4b09aa5746b6e: Moved: 0.00% LVM command executed by lvmpolld failed. For more information see lvmpolld messages in syslog or lvmpolld log file. SYSLOG: May 10 11:16:22 host-vm-devel lvmpolld[14729]: W: #011LVPOLL: PID 14732: STDERR: ' Volume group "38456991-2572-45a5-86dc-3a78def784c9" not found' May 10 11:16:22 host-vm-devel lvmpolld[14729]: LVMPOLLD: lvm2 cmd (PID 14732) failed (retcode: 5) May 10 11:16:33 host-vm-devel kernel: connection4:0: detected conn error (1020) May 10 11:16:33 host-vm-devel kernel: sd 7:0:0:2: [sdc] tag#9 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK cmd_age=9s May 10 11:16:33 host-vm-devel kernel: sd 7:0:0:2: [sdc] tag#9 CDB: Test Unit Ready 00 00 00 00 00 00 May 10 11:16:33 host-vm-devel kernel: sd 7:0:0:0: [sdb] tag#10 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK cmd_age=5s May 10 11:16:33 host-vm-devel kernel: sd 7:0:0:0: [sdb] tag#10 CDB: Test Unit Ready 00 00 00 00 00 00 May 10 11:16:33 host-vm-devel kernel: sd 7:0:0:1: [sdd] tag#12 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK cmd_age=4s May 10 11:16:33 host-vm-devel kernel: sd 7:0:0:1: [sdd] tag#12 CDB: Test Unit Ready 00 00 00 00 00 00 May 10 11:16:33 host-vm-devel iscsid[1072]: iscsid: Kernel reported iSCSI connection 4:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) May 10 11:16:35 host-vm-devel iscsid[1072]: iscsid: connection4:0 is operational after recovery (1 attempts) May 10 11:16:42 host-vm-devel sanlock[766]: 2021-05-10 11:16:42 5036 [14317]: s4 delta_renew long write time 19 sec May 10 11:16:48 host-vm-devel kernel: connection4:0: detected conn error (1020) May 10 11:16:48 host-vm-devel iscsid[1072]: iscsid: Kernel reported iSCSI connection 4:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) May 10 11:16:50 host-vm-devel iscsid[1072]: iscsid: connection4:0 is operational after recovery (1 attempts) May 10 11:16:55 host-vm-devel sanlock[766]: 2021-05-10 11:16:55 5049 [14317]: s4 delta_renew long write time 11 sec May 10 11:17:16 host-vm-devel kernel: device-mapper: core: periodic/3: sending ioctl 5401 to DM device without required privilege. May 10 11:17:37 host-vm-devel systemd[1]: lvm2-lvmpolld.service: Succeeded.
lvmpolld only uses the filter from lvm.conf, so if you pass a custom filter to pvmove, then pvmove and lvmpolld will be seeing different sets of devices. To make this work, pvmove would need to pass its custom filter (i.e. from the command line option) to lvmpolld and lvmpolld would need to apply that filter for the operations it's performing on behalf of pvmove. (Incidentally, pvmove+lvmpolld will work correctly if you use pvmove with a custom devices file (pvmove --devicesfile foo); pvmove sends the name of the custom devices file to lvmpolld which then runs its commands using the same.)
Thank you for fast reply David, is there any plan to add the ability of passing custom filter from pvmove to lvmpolld in the near future? Is the devices file a feature that should currently work as a workaround? Is there any documentation I could check? Because it seems like --devicesfile is not a valid option for pvmove: # pvmove --devicesfile /tmp/test pvmove: unrecognized option '--devicesfile' Error during parsing of command line. # lvm version LVM version: 2.03.09(2)-RHEL8 (2020-05-28) Library version: 1.02.171-RHEL8 (2020-05-28)
Conceptually it's simple to pass the --config filter string from pvmove to lvmpolld and then from lvmpolld to the lvpoll commands that are run. In practice it would be more difficult because of the complex syntax involved in the filter line. So it's possible to fix but probably not trivial. The devices file will be in 8.5 and 9.0, and even if RHV were to begin using the devices file, it would probably take a while. A workaround is to temporarily disable the host's lvm.conf filter and temporarily let the host see the RHV devices so that lvmpolld can work with them.
The bug is verified. versions: vdsm-4.40.70.3-1.el8ev.x86_64 ovirt-engine-4.4.7.3-0.3.el8ev.noarch steps to reproduce: 1) Run on the host “vdsm-tool config-lvm-filter” and if a filter is missing this tool will add it. 2) Create iscsi storage domain with 3 luns via GUI. 3) Create a disk and VM - attach the disk to that VM. 4) Find which PV/lun was used for this disk: [root@storage-ge3-vdsm1 ~]# lvs -olv_name,lv_size,devices,tags --config='devices/filter = ["a|.*|"]' 29b65367-8c84-4895-9c76-f008c7a9d305 LV LSize Devices LV Tags 688b2b53-dbdd-4d13-87a4-ba1602171943 128.00m /dev/mapper/3600a098038304479363f4c4870455031(60) IU_db6868b5-67a6-49f9-8c27-a75bec732a1b,MD_2,PU_00000000-0000-0000-0000-000000000000 6921f926-ad1d-4c78-a55e-157efb3eec46 128.00m /dev/mapper/3600a098038304479363f4c4870455031(61) IU_7ea25e75-b989-4895-9a8d-4dc9fab73bff,MD_3,PU_00000000-0000-0000-0000-000000000000 81a21678-9da1-4053-97cc-9c21bac8b190 3.00g /dev/mapper/3600a098038304479363f4c4870455032(0) IU_5297200e-ef10-4bcf-b0d4-aee488ca277e,MD_1,PU_00000000-0000-0000-0000-000000000000 ... 5)Move some data to the LUN: pvmove --config='devices {filter = ["a|.*|"]} global {use_lvmpolld=0}' -n 81a21678-9da1-4053-97cc-9c21bac8b190 /dev/mapper/3600a098038304479363f4c4870455031 /dev/mapper/3600a098038304479363f4c4870455032 6) Set the SD on maintenance and remove the LUN 3600a098038304479363f4c4870455032 7)Look in VDSM log for any errors related to 'moving pv'. Actual results: Logs are clean and the LUN removal succeeded. logs: [root@storage-ge3-vdsm1 ~]# less /var/log/vdsm/vdsm.log | grep 'moving pv' -i 2021-06-14 17:02:40,761+0300 INFO (tasks/5) [storage.LVM] Moving pv /dev/mapper/3600a098038304479363f4c4870455032 data (vg 29b65367-8c84-4895-9c76-f008c7a9d305) (lvm:1312) 2021-06-14 17:03:37,166+0300 INFO (tasks/0) [storage.LVM] Removing pv /dev/mapper/3600a098038304479363f4c4870455032 from vg 29b65367-8c84-4895-9c76-f008c7a9d305 (lvm:1476)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) [ovirt-4.4.7]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2864
Due to QE capacity, we are not going to cover this issue in our automation