Description of problem: MS cluster fails with validation of the scsi reservation feature. Version-Release number of selected component (if applicable): RHV 4 How reproducible: 100% Steps to Reproduce: 1. Install two hypervisors 2. Crete two Windows VMs each running on one of the hosts 3. Provide pasthrough LUN to bot of the VMs 4. Enable SCSI Pass-Through 5. Allow Privileged SCSI I/O 6. Using SCSI Reservation 7. Install AD on one of the VM and join both of the VMs to the same domain. 8 Run the test in power shell # Test-Cluster -include "List Disks","List Disks To Be Validated","Validate Disk Failover","Validate SCSI device Vital Product Data (VPD)","Validate SCSI-3 Persistent Reservation" Actual results: The cluster verification fails. Logs are located in C:\Windows\Cluster\Reports\ Expected results: Test passes Additional info: There are couple of identified issues: 1) libvirt version needs to be at least libvirt-4.5.0-36.el7_9.1 Bug 1839992 - qemu-pr-helper does not pass scsi reservations due to qemu mount namespace 2) The multipath configuration is incorrect in the hypervisor. There is a missing paramter "reservation_key file". Without that the reserwation will not work properly. By default the requested keys are stored in /etc/multipath/prkeys. This is configured by 'prkeys_file "/etc/multipath/prkeys"'. 3) There is a bug in device-multipath preventing the VM to initiate register-ignore request with no key. Bug 1894103 - mpathpersist doe not clear reservation keys if --param-sark is set to zeroes libvirt-4.5.0-36.el7_9.1 4) Most probably a selinux issue was identified when WinVM wsa requesting "SCSI page 83h VPD descriptor". It was not reproduced second time in our lab, but a cusomter experience the same issue.
Hi Qing Wang or Avihai, can you try reproducing this issue with the test packages from http://people.redhat.com/pbonzini/bz1894103/? Either with RHV or directly on top of RHEL should do. The main issue to figure out is the SELinux bug, so I suggest that you first install setroubleshoot. Hopefully audit2allow will give the right lead if you can reproduce. Also, if you can run "strace -ttt -ff" on the qemu-pr-helper process (on both machines) while the MSCS test runs, that might help creating a new, independent test that runs on Linux and does not require Cluster Services. Thanks!
Created attachment 1730059 [details] cluster test report with multipath libs 1117
Test passed Using iscsi as backend, 1.host attach lun first host A mpathb (36001405767b90641cb54827b15aeafc2) dm-5 LIO-ORG,disk0 size=4.0G features='0' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 18:0:0:0 sde 8:64 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 17:0:0:0 sdd 8:48 active undef running host B mpatha (36001405767b90641cb54827b15aeafc2) dm-3 LIO-ORG,disk0 size=4.0G features='0' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 16:0:0:0 sdc 8:32 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 15:0:0:0 sdb 8:16 active undef running root@dell-per440-07 ~ # cat /etc/multipath.conf # device-mapper-multipath configuration file ... defaults { user_friendly_names yes find_multipaths yes enable_foreign "^$" reservation_key file } ... root@dell-per440-07 ~ # rpm -qa|grep multipath device-mapper-multipath-debuginfo-0.8.4-5.el8.bz1894103.x86_64 device-mapper-multipath-0.8.4-5.el8.bz1894103.x86_64 device-mapper-multipath-libs-0.8.4-5.el8.bz1894103.x86_64 device-mapper-multipath-debugsource-0.8.4-5.el8.bz1894103.x86_64 device-mapper-multipath-libs-debuginfo-0.8.4-5.el8.bz1894103.x86_64 device-mapper-multipath-devel-0.8.4-5.el8.bz1894103.x86_64 2. boot vms on hosts vm 1 run host A -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/mapper/mpathb,node-name=drive2,file.pr-manager=helper0 \ -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \ \ vm2 run host B -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/mapper/mpatha,node-name=drive2,file.pr-manager=helper0 \ -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \ \ 3 run validate configuration of failover cluster manager Most steps passed , but there is warning in "Validate Storage Spaces Persistent Reservation", i am not sure it is a bug? Validate Storage Spaces Persistent Reservation Description: Validate that storage supports the SCSI-3 Persistent Reservation commands needed by Storage Spaces to support clustering. Start: 11/17/2020 4:52:27 AM. Verifying there are no Persistent Reservations, or Registration keys, on Test Disk 0 from node wqvm1.wqtest.com. Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0xa for Test Disk 0 from node wqvm1.wqtest.com. Issuing Persistent Reservation RESERVE on Test Disk 0 from node wqvm1.wqtest.com using key 0xa. Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0x100aa for Test Disk 0 from node wqvm2.wqtest.com. Issuing Persistent Reservation REGISTER using RESERVATION KEY 0xa SERVICE ACTION RESERVATION KEY 0xb for Test Disk 0 from node wqvm1.wqtest.com to change the registered key while holding the reservation for the disk. Issuing Persistent Reservation REGISTER using RESERVATION KEY 0x100aa SERVICE ACTION RESERVATION KEY 0x100bb for Test Disk 0 from node wqvm2.wqtest.com to change the registered key on node that is not holding the reservation for the disk. Issuing Persistent Reservation REGISTER using RESERVATION KEY 0xb SERVICE ACTION RESERVATION KEY 0xb for Test Disk 0 from node wqvm1.wqtest.com to change the registered key while holding the reservation for the disk. Failure issuing call to Persistent Reservation REGISTER. RESERVATION KEY 0xb SERVICE ACTION RESERVATION KEY 0xb for Test Disk 0 from node wqvm1.wqtest.com: The request could not be performed because of an I/O device error. Test Disk 0 does not support SCSI-3 Persistent Reservations commands needed by clustered storage pools that use the Storage Spaces subsystem. Some storage devices require specific firmware versions or settings to function properly with failover clusters. Contact your storage administrator or storage vendor for help with configuring the storage to function properly with failover clusters that use Storage Spaces. Stop: 11/17/2020 4:53:12 AM.
I have reviewed the reservation requests ant there seems to be another bug in multipath. I am honestly not sure why it worked in my environment, but it seems that multipath doe not modify the prkeys file. The operation: Issuing Persistent Reservation REGISTER using RESERVATION KEY 0xa SERVICE ACTION RESERVATION KEY 0xb for Test Disk 0 from node wqvm1.wqtest.com to change the registered key while holding the reservation for the disk. Succeeds and the reservation key is changed, but the content of the file /etc/multipath/prkeys does not change So the key there remains 0xa there and that is why it fails when the last operation "REGISTER. RESERVATION KEY 0xb" 0xb does not match the key. I will create a new bug shortly.
Moving to 4.4.6 as the fixes will be introduced in RHEL 8.4
RHEL-8.4 contains the following version - device-mapper-multipath-libs-0.8.4-10.el8.x86_64 which includes the fix. Moving to QA to verify.
Hi, tested and unfortunately getting an issue. After I started the validation part inside MS Failover cluster, it tries to do iSCSI reservation on the volume and VM is marked as paused and unreachable due to unknown storage error. See details below. Reproduction steps 1. Have an environment (hosts, clean iscsi volumes) 2. Altered /etc/multipath.conf - added line "reservation_key file" in default section (didn't change anything) - restarted services (multipathd, vdsmd) 3. Two VMs with installed roles for MPIO and MS Failover Clustering - they share iSCSI volume (checked privileged IO, iscsi reservations on both VMs) 4. Configure a storage pool in one of the Windows VM (check second VM) 5. Run Cluster Failover Validation wizard Result: when it try to do the Validation of iSCSI-3 persistent reservation, VM is paused engine log: 2021-05-10 09:50:20,926+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-23) [3fba4286] VM '1592d300-4211-45b2-b11a-d27bfc73824b'(windows-a) moved from 'Up' --> 'Paused' 2021-05-10 09:50:20,975+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-23) [3fba4286] EVENT_ID: VM_PAUSED(1,025), VM windows-a has been paused. 2021-05-10 09:50:20,985+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-23) [3fba4286] EVENT_ID: VM_PAUSED_ERROR(139), VM windows-a has been paused due to unknown storage error. vdsm log from host 2021-05-10 09:50:20,918+0300 INFO (libvirt/events) [virt.vm] (vmId='1592d300-4211-45b2-b11a-d27bfc73824b') abnormal vm stop device ua-a0620dda-ce8c-402f-be5e-baa545b00b25 error (vm:4936) 2021-05-10 09:50:20,918+0300 INFO (libvirt/events) [virt.vm] (vmId='1592d300-4211-45b2-b11a-d27bfc73824b') CPU stopped: onIOError (vm:5778) 2021-05-10 09:50:20,920+0300 INFO (libvirt/events) [virt.vm] (vmId='1592d300-4211-45b2-b11a-d27bfc73824b') CPU stopped: onSuspend (vm:5778) 2021-05-10 09:50:20,953+0300 WARN (libvirt/events) [virt.vm] (vmId='1592d300-4211-45b2-b11a-d27bfc73824b') device sdb reported I/O error (vm:3901) libvirt/qemu log: -blockdev '{"driver":"host_device","filename":"/rhev/data-center/mnt/blockSD/056fa6e5-36c0-4c72-b479-9ed865ed9444/images/ea3b406b-01d3-4b8c-8de2-cc8859be28e7/949765eb-b384-48cc-a915-87d2e1f7710f","aio":"native","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ -device scsi-hd,bus=ua-083ed37d-f4df-495a-b08f-27d7a504b936.0,channel=0,scsi-id=0,lun=0,device_id=ea3b406b-01d3-4b8c-8de2-cc8859be28e7,drive=libvirt-2-format,id=ua-ea3b406b-01d3-4b8c-8de2-cc8859be28e7,bootindex=1,write-cache=on,serial=ea3b406b-01d3-4b8c-8de2-cc8859be28e7,werror=stop,rerror=stop \ -blockdev '{"driver":"host_device","filename":"/dev/mapper/3600a098038304479363f4c487045514f","aio":"native","pr-manager":"pr-helper0","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}' \ -device scsi-block,bus=ua-083ed37d-f4df-495a-b08f-27d7a504b936.0,channel=0,scsi-id=0,lun=1,share-rw=on,drive=libvirt-1-format,id=ua-a0620dda-ce8c-402f-be5e-baa545b00b25,werror=stop,rerror=stop \ Using correct multipath device-mapper-multipath-0.8.4-10.el8.x86_64 device-mapper-multipath-libs-0.8.4-10.el8.x86_64
just checked reservation keys on the volume and they are there # mpathpersist --in -k -d /dev/mapper/3600a098038304479363f4c487045514f PR generation=0x69, 3 registered reservation keys follow: 0xd54e804a7466734d 0xd54e804a7466734d 0xd54e804a7466734d
The disk definition on the VM is as follows: <disk type='block' device='lun' sgio='unfiltered' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/dev/mapper/3600a09803830447a4f244c4657616f6f' index='1'> <seclabel model='dac' relabel='no'/> <reservations managed='yes'> <source type='unix' path='/var/lib/libvirt/qemu/domain-1-Windows-2016-2/pr-helper0.sock' mode='client'/> </reservations> </source> <backingStore/> <target dev='sdb' bus='scsi'/> <shareable/> <alias name='ua-26b4975e-e1d4-4e27-b2c6-2ea0894a571b'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> The issue seems to be caused by using error_policy='stop' in <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> which causes VM to stop when disk reports an error. It seems correct setup should use error_policy='report' which passes the error to the guest (thanks Nir Soffer for pointing this out). It can be setup by configuring engine to PropagateDiskErrors=true: engine-config -s PropagateDiskErrors=true This sets error_policy='report' for direct LUNs and Windows HA cluster validators passes without any error. Pert, could you please double check that this really fixes the issue?
Should be fixed by configuration change (engine-config -s PropagateDiskErrors=true), moving to QA to verify.
(In reply to Vojtech Juranek from comment #17) > The disk definition on the VM is as follows: > > <disk type='block' device='lun' sgio='unfiltered' snapshot='no'> > <driver name='qemu' type='raw' cache='none' error_policy='stop' > io='native'/> > <source dev='/dev/mapper/3600a09803830447a4f244c4657616f6f' index='1'> > <seclabel model='dac' relabel='no'/> > <reservations managed='yes'> > <source type='unix' > path='/var/lib/libvirt/qemu/domain-1-Windows-2016-2/pr-helper0.sock' > mode='client'/> > </reservations> > </source> > <backingStore/> > <target dev='sdb' bus='scsi'/> > <shareable/> > <alias name='ua-26b4975e-e1d4-4e27-b2c6-2ea0894a571b'/> > <address type='drive' controller='0' bus='0' target='0' unit='1'/> > </disk> > > > The issue seems to be caused by using error_policy='stop' in > > <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> > > which causes VM to stop when disk reports an error. It seems correct setup > should use error_policy='report' which passes the error to the guest (thanks > Nir Soffer for pointing this out). > It can be setup by configuring engine to PropagateDiskErrors=true: > > engine-config -s PropagateDiskErrors=true > > This sets error_policy='report' for direct LUNs and Windows HA cluster > validators passes without any error. > > Pert, could you please double check that this really fixes the issue? Yes, it fixed the issue Verified with ovirt-engine-4.4.9.3-0.3.el8ev.noarch It's necessary to do 2 things: 1) enable iSCSI reservation in /etc/multipath.conf on all hosts: defaults { ...(omitted)... reservation_key file } 2) configure engine: # engine-config -s PropagateDiskErrors=true # systemctl restart ovirt-engine
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.10]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0461