1898049 – Windows HA cluster does not work with RHV scsi reservation

Bug 1898049 - Windows HA cluster does not work with RHV scsi reservation

Summary: Windows HA cluster does not work with RHV scsi reservation

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	4.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.4.10
Target Release:	---
Assignee:	Vojtech Juranek
QA Contact:	Petr Kubica
Docs Contact:
URL:
Whiteboard:
Depends On:	1892576 1894103 1900522
Blocks:	2019011
TreeView+	depends on / blocked

Reported:	2020-11-16 09:22 UTC by Roman Hodain
Modified:	2022-02-08 10:05 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, the default disk configuration did not propagate disk errors to the client, which caused the virtual machine to stop. As a result, the Windows high availability cluster validator failed and one of the virtual machines in the cluster was paused. In this release, disk errors are propagated to the client using the "engine-config -s PropagateDiskErrors=true" setting. The Windows high availability cluster validator works and all tests, including iSCSI reservations, have passed.
Clone Of:
Environment:
Last Closed:	2022-02-08 10:04:44 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
cluster test report with multipath libs 1117 (87.07 KB, text/html) 2020-11-17 10:05 UTC, qing.wang	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2022:0461	0	None	None	None	2022-02-08 10:05:02 UTC

Description Roman Hodain 2020-11-16 09:22:49 UTC

Description of problem:
MS cluster fails with validation of the scsi reservation feature.

Version-Release number of selected component (if applicable):
RHV 4

How reproducible:
100%

Steps to Reproduce:
1. Install two hypervisors
2. Crete two Windows VMs each running on one of the hosts
3. Provide pasthrough LUN to bot of the VMs
4. Enable SCSI Pass-Through
5. Allow Privileged SCSI I/O
6. Using SCSI Reservation
7. Install AD on one of the VM and join both of the VMs to the same domain.
8 Run the test in power shell

# Test-Cluster -include "List Disks","List Disks To Be Validated","Validate Disk Failover","Validate SCSI device Vital Product Data (VPD)","Validate SCSI-3 Persistent Reservation"

Actual results:
The cluster verification fails. Logs are located in
C:\Windows\Cluster\Reports\

Expected results:
Test passes

Additional info:

There are couple of identified issues:
1) libvirt version needs to be at least libvirt-4.5.0-36.el7_9.1

Bug 1839992 - qemu-pr-helper does not pass scsi reservations due to qemu mount namespace

2) The multipath configuration is incorrect in the hypervisor. There is a missing paramter "reservation_key file". Without that the reserwation will not work properly. By default the requested keys are stored in /etc/multipath/prkeys. This is configured by 'prkeys_file "/etc/multipath/prkeys"'.

3) There is a bug in device-multipath preventing the VM to initiate register-ignore request with no key.

Bug 1894103 - mpathpersist doe not clear reservation keys if --param-sark is set to zeroes
libvirt-4.5.0-36.el7_9.1

4) Most probably a selinux issue was identified when WinVM wsa requesting "SCSI page 83h VPD descriptor". It was not reproduced second time in our lab, but a cusomter experience the same issue.

Comment 2 Paolo Bonzini 2020-11-16 09:45:50 UTC

Hi Qing Wang or Avihai, can you try reproducing this issue with the test packages from http://people.redhat.com/pbonzini/bz1894103/? Either with RHV or directly on top of RHEL should do. The main issue to figure out is the SELinux bug, so I suggest that you first install setroubleshoot. Hopefully audit2allow will give the right lead if you can reproduce.

Also, if you can run "strace -ttt -ff" on the qemu-pr-helper process (on both machines) while the MSCS test runs, that might help creating a new, independent test that runs on Linux and does not require Cluster Services.

Thanks!

Comment 5 qing.wang 2020-11-17 10:05:33 UTC

Created attachment 1730059 [details]
cluster test report with multipath libs 1117

Comment 6 qing.wang 2020-11-17 10:19:44 UTC

Test passed 

Using iscsi as backend,

1.host attach lun first
host A
mpathb (36001405767b90641cb54827b15aeafc2) dm-5 LIO-ORG,disk0
size=4.0G features='0' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 18:0:0:0 sde 8:64 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 17:0:0:0 sdd 8:48 active undef running


host B
mpatha (36001405767b90641cb54827b15aeafc2) dm-3 LIO-ORG,disk0
size=4.0G features='0' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 16:0:0:0 sdc 8:32 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 15:0:0:0 sdb 8:16 active undef running


root@dell-per440-07 ~ # cat /etc/multipath.conf 
# device-mapper-multipath configuration file
...

defaults {
	user_friendly_names yes
	find_multipaths yes
	enable_foreign "^$"
        reservation_key file
}
...

root@dell-per440-07 ~ # rpm -qa|grep multipath
device-mapper-multipath-debuginfo-0.8.4-5.el8.bz1894103.x86_64
device-mapper-multipath-0.8.4-5.el8.bz1894103.x86_64
device-mapper-multipath-libs-0.8.4-5.el8.bz1894103.x86_64
device-mapper-multipath-debugsource-0.8.4-5.el8.bz1894103.x86_64
device-mapper-multipath-libs-debuginfo-0.8.4-5.el8.bz1894103.x86_64
device-mapper-multipath-devel-0.8.4-5.el8.bz1894103.x86_64


2. boot vms on hosts
vm 1 run host A
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \
    -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/mapper/mpathb,node-name=drive2,file.pr-manager=helper0 \
    -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \
    \
	
vm2 run host B

-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \
    -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/mapper/mpatha,node-name=drive2,file.pr-manager=helper0 \
    -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \
    \


3 run validate configuration of failover cluster manager 

Most steps passed , but there is warning in "Validate Storage Spaces Persistent Reservation", i am not sure it is a bug?


Validate Storage Spaces Persistent Reservation

    Description: Validate that storage supports the SCSI-3 Persistent Reservation commands needed by Storage Spaces to support clustering.
    Start: 11/17/2020 4:52:27 AM.

    Verifying there are no Persistent Reservations, or Registration keys, on Test Disk 0 from node wqvm1.wqtest.com.
    Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0xa for Test Disk 0 from node wqvm1.wqtest.com.
    Issuing Persistent Reservation RESERVE on Test Disk 0 from node wqvm1.wqtest.com using key 0xa.
    Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0x100aa for Test Disk 0 from node wqvm2.wqtest.com.
    Issuing Persistent Reservation REGISTER using RESERVATION KEY 0xa SERVICE ACTION RESERVATION KEY 0xb for Test Disk 0 from node wqvm1.wqtest.com to change the registered key while holding the reservation for the disk.
    Issuing Persistent Reservation REGISTER using RESERVATION KEY 0x100aa SERVICE ACTION RESERVATION KEY 0x100bb for Test Disk 0 from node wqvm2.wqtest.com to change the registered key on node that is not holding the reservation for the disk.
    Issuing Persistent Reservation REGISTER using RESERVATION KEY 0xb SERVICE ACTION RESERVATION KEY 0xb for Test Disk 0 from node wqvm1.wqtest.com to change the registered key while holding the reservation for the disk.
    Failure issuing call to Persistent Reservation REGISTER. RESERVATION KEY 0xb SERVICE ACTION RESERVATION KEY 0xb for Test Disk 0 from node wqvm1.wqtest.com: The request could not be performed because of an I/O device error.
    Test Disk 0 does not support SCSI-3 Persistent Reservations commands needed by clustered storage pools that use the Storage Spaces subsystem. Some storage devices require specific firmware versions or settings to function properly with failover clusters. Contact your storage administrator or storage vendor for help with configuring the storage to function properly with failover clusters that use Storage Spaces.
    Stop: 11/17/2020 4:53:12 AM.

Comment 7 Roman Hodain 2020-11-23 09:05:07 UTC

I have reviewed the reservation requests ant there seems to be another bug in multipath. I am honestly not sure why it worked in my environment, but it seems that multipath doe not modify the prkeys file. The operation:

    Issuing Persistent Reservation REGISTER using RESERVATION KEY 0xa SERVICE ACTION RESERVATION KEY 0xb for Test Disk 0 from node wqvm1.wqtest.com to change the registered key while holding the reservation for the disk.

Succeeds and the reservation key is changed, but the content of the file /etc/multipath/prkeys does not change So the key there remains 0xa there and that is why it fails when the last operation "REGISTER. RESERVATION KEY 0xb" 0xb does not match the key. I will create a new bug shortly.

Comment 8 Tal Nisan 2021-01-04 15:14:26 UTC

Moving to 4.4.6 as the fixes will be introduced in RHEL 8.4

Comment 9 Eyal Shenitzky 2021-04-19 10:47:58 UTC

RHEL-8.4 contains the following version - device-mapper-multipath-libs-0.8.4-10.el8.x86_64 which includes the fix.

Moving to QA to verify.

Comment 12 Petr Kubica 2021-05-10 07:27:26 UTC

Hi,

tested and unfortunately getting an issue. After I started the validation part inside MS Failover cluster, it tries to do iSCSI reservation on the volume and VM is marked as paused and unreachable due to unknown storage error. See details below.

Reproduction steps
1. Have an environment (hosts, clean iscsi volumes)
2. Altered /etc/multipath.conf - added line "reservation_key file" in default section (didn't change anything) - restarted services (multipathd, vdsmd)
3. Two VMs with installed roles for MPIO and MS Failover Clustering - they share iSCSI volume (checked privileged IO, iscsi reservations on both VMs)
4. Configure a storage pool in one of the Windows VM (check second VM)
5. Run Cluster Failover Validation wizard 
Result: when it try to do the Validation of iSCSI-3 persistent reservation, VM is paused

engine log:
2021-05-10 09:50:20,926+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-23) [3fba4286] VM '1592d300-4211-45b2-b11a-d27bfc73824b'(windows-a) moved from 'Up' --> 'Paused'
2021-05-10 09:50:20,975+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-23) [3fba4286] EVENT_ID: VM_PAUSED(1,025), VM windows-a has been paused.
2021-05-10 09:50:20,985+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-23) [3fba4286] EVENT_ID: VM_PAUSED_ERROR(139), VM windows-a has been paused due to unknown storage error.

vdsm log from host
2021-05-10 09:50:20,918+0300 INFO  (libvirt/events) [virt.vm] (vmId='1592d300-4211-45b2-b11a-d27bfc73824b') abnormal vm stop device ua-a0620dda-ce8c-402f-be5e-baa545b00b25 error  (vm:4936)
2021-05-10 09:50:20,918+0300 INFO  (libvirt/events) [virt.vm] (vmId='1592d300-4211-45b2-b11a-d27bfc73824b') CPU stopped: onIOError (vm:5778)
2021-05-10 09:50:20,920+0300 INFO  (libvirt/events) [virt.vm] (vmId='1592d300-4211-45b2-b11a-d27bfc73824b') CPU stopped: onSuspend (vm:5778)
2021-05-10 09:50:20,953+0300 WARN  (libvirt/events) [virt.vm] (vmId='1592d300-4211-45b2-b11a-d27bfc73824b') device sdb reported I/O error (vm:3901)

libvirt/qemu log:
-blockdev '{"driver":"host_device","filename":"/rhev/data-center/mnt/blockSD/056fa6e5-36c0-4c72-b479-9ed865ed9444/images/ea3b406b-01d3-4b8c-8de2-cc8859be28e7/949765eb-b384-48cc-a915-87d2e1f7710f","aio":"native","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \
-device scsi-hd,bus=ua-083ed37d-f4df-495a-b08f-27d7a504b936.0,channel=0,scsi-id=0,lun=0,device_id=ea3b406b-01d3-4b8c-8de2-cc8859be28e7,drive=libvirt-2-format,id=ua-ea3b406b-01d3-4b8c-8de2-cc8859be28e7,bootindex=1,write-cache=on,serial=ea3b406b-01d3-4b8c-8de2-cc8859be28e7,werror=stop,rerror=stop \
-blockdev '{"driver":"host_device","filename":"/dev/mapper/3600a098038304479363f4c487045514f","aio":"native","pr-manager":"pr-helper0","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}' \
-device scsi-block,bus=ua-083ed37d-f4df-495a-b08f-27d7a504b936.0,channel=0,scsi-id=0,lun=1,share-rw=on,drive=libvirt-1-format,id=ua-a0620dda-ce8c-402f-be5e-baa545b00b25,werror=stop,rerror=stop \

Using correct multipath
device-mapper-multipath-0.8.4-10.el8.x86_64
device-mapper-multipath-libs-0.8.4-10.el8.x86_64

Comment 15 Petr Kubica 2021-05-10 10:29:40 UTC

just checked reservation keys on the volume and they are there

# mpathpersist --in -k -d /dev/mapper/3600a098038304479363f4c487045514f
  PR generation=0x69, 	3 registered reservation keys follow:
    0xd54e804a7466734d
    0xd54e804a7466734d
    0xd54e804a7466734d

Comment 17 Vojtech Juranek 2021-10-14 10:43:45 UTC

The disk definition on the VM is as follows:

    <disk type='block' device='lun' sgio='unfiltered' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/dev/mapper/3600a09803830447a4f244c4657616f6f' index='1'>
        <seclabel model='dac' relabel='no'/>
        <reservations managed='yes'>
          <source type='unix' path='/var/lib/libvirt/qemu/domain-1-Windows-2016-2/pr-helper0.sock' mode='client'/>
        </reservations>
      </source>
      <backingStore/>
      <target dev='sdb' bus='scsi'/>
      <shareable/>
      <alias name='ua-26b4975e-e1d4-4e27-b2c6-2ea0894a571b'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>


The issue seems to be caused by using error_policy='stop' in

<driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>

which causes VM to stop when disk reports an error. It seems correct setup should use error_policy='report' which passes the error to the guest (thanks Nir Soffer for pointing this out).
It can be setup by configuring engine to PropagateDiskErrors=true:

    engine-config -s PropagateDiskErrors=true

This sets error_policy='report' for direct LUNs and Windows HA cluster validators passes without any error.

Pert, could you please double check that this really fixes the issue?

Comment 18 Vojtech Juranek 2021-10-20 11:15:01 UTC

Should be fixed by configuration change (engine-config -s PropagateDiskErrors=true), moving to QA to verify.

Comment 22 Petr Kubica 2021-11-01 16:10:02 UTC

(In reply to Vojtech Juranek from comment #17)
> The disk definition on the VM is as follows:
> 
>     <disk type='block' device='lun' sgio='unfiltered' snapshot='no'>
>       <driver name='qemu' type='raw' cache='none' error_policy='stop'
> io='native'/>
>       <source dev='/dev/mapper/3600a09803830447a4f244c4657616f6f' index='1'>
>         <seclabel model='dac' relabel='no'/>
>         <reservations managed='yes'>
>           <source type='unix'
> path='/var/lib/libvirt/qemu/domain-1-Windows-2016-2/pr-helper0.sock'
> mode='client'/>
>         </reservations>
>       </source>
>       <backingStore/>
>       <target dev='sdb' bus='scsi'/>
>       <shareable/>
>       <alias name='ua-26b4975e-e1d4-4e27-b2c6-2ea0894a571b'/>
>       <address type='drive' controller='0' bus='0' target='0' unit='1'/>
>     </disk>
> 
> 
> The issue seems to be caused by using error_policy='stop' in
> 
> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
> 
> which causes VM to stop when disk reports an error. It seems correct setup
> should use error_policy='report' which passes the error to the guest (thanks
> Nir Soffer for pointing this out).
> It can be setup by configuring engine to PropagateDiskErrors=true:
> 
>     engine-config -s PropagateDiskErrors=true
> 
> This sets error_policy='report' for direct LUNs and Windows HA cluster
> validators passes without any error.
> 
> Pert, could you please double check that this really fixes the issue?
Yes, it fixed the issue


Verified with
ovirt-engine-4.4.9.3-0.3.el8ev.noarch

It's necessary to do 2 things:
1) enable iSCSI reservation in /etc/multipath.conf on all hosts:

defaults {
   ...(omitted)...
   reservation_key file
}

2) configure engine:
# engine-config -s PropagateDiskErrors=true
# systemctl restart ovirt-engine

Comment 27 errata-xmlrpc 2022-02-08 10:04:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.10]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0461

Note You need to log in before you can comment on or make changes to this bug.