Bug 1892576
Summary: | qemu-pr-helper does not properly handle reservation key registration | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Roman Hodain <rhodain> | ||||||||
Component: | qemu-kvm | Assignee: | Paolo Bonzini <pbonzini> | ||||||||
qemu-kvm sub component: | Storage | QA Contact: | qing.wang <qinwang> | ||||||||
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||||||
Severity: | high | ||||||||||
Priority: | medium | CC: | coli, jinzhao, juzhang, knoel, pbonzini, qinwang, ribarry, virt-maint, xuwei, zhguo | ||||||||
Version: | 8.2 | Keywords: | Triaged | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | All | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | qemu-kvm-common-4.2.0-38.module+el8.4.0+9133+5346b06d.x86_64 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2021-05-26 09:21:27 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1894103, 1900522 | ||||||||||
Bug Blocks: | 1710323, 1898049 | ||||||||||
Attachments: |
|
Description
Roman Hodain
2020-10-29 08:49:28 UTC
> I believe that the qemu-pr-helper has to request to add the mapping. As I understand the patchset you linked, libmpathpersist will take care of that (https://www.spinics.net/lists/dm-devel/msg32204.html). qemu-pr-helper does not issue any command, it delegates that to libmpathpersist via mpath_persistent_reserve_out. Maybe qemu-pr-helper cannot access /etc/multipath/prkeys because it drops privileges? You could try "strace" on the qemu-pr-helper while you issue the sg_persist command from a Linux guest, and look for EACCESS, EPERM or ENOENT errors. By disabling the selinux the cluster test moved furhter, but it failed again later: OK Issuing Persistent Reservation REGISTER AND IGNORE EXISTING for Test Disk 0 from node WIN-V34M49VGQP2.example.com. OK Issuing call to Persistent Reservation RESERVE on Test Disk 0 from node WIN-V34M49VGQP2.example.com. OK Issuing Persistent Reservation READ RESERVATION on Test Disk 0 from node WIN-V34M49VGQP2.example.com. OK Issuing Persistent Reservation REGISTER AND IGNORE EXISTING for Test Disk 0 from node WIN02.example.com. OK Issuing call to Persistent Reservation RESERVE on Test Disk 0 from node WIN02.example.com. OK Issuing call Persistent Reservation PREEMPT on Test Disk 0 from registered node WIN02.example.com. OK Issuing call to Persistent Reservation RESERVE on Test Disk 0 from node WIN02.example.com. OK Issuing Persistent Reservation READ RESERVATION on Test Disk 0 from node WIN02.example.com. OK Issuing call Persistent Reservation PREEMPT on Test Disk 0 from unregistered node WIN-V34M49VGQP2.example.com. This is expected to fail. OK Issuing Persistent Reservation REGISTER AND IGNORE EXISTING for Test Disk 0 from node WIN-V34M49VGQP2.example.com. OK Issuing call to Persistent Reservation RESERVE on Test Disk 0 from node WIN-V34M49VGQP2.example.com. FAIL Issuing Persistent Reservation REGISTER AND IGNORE EXISTING to unregister a key for Test Disk 0 from node WIN-V34M49VGQP2.example.com. Failure issuing call to Persistent Reservation REGISTER AND IGNORE EXISTING to unregister on Test Disk 0 from node WIN-V34M49VGQP2.example.com. It is expected to succeed. The request could not be performed because of an I/O device error. . > By disabling the selinux the cluster test moved furhter, but it failed again later:
Any clues from setroubleshoot (and from strace)?
I, unforutantly, cannot provide the data for the preempt issue as since I set the setenforce to 0 the keys are updated in /etc/multipath/prkeys even if I set the selinux to enforcing again. Is there a way to trace what is requested by the qemu-pr-hepler? Or do we have to use just stap or gdb for that pupose? Some errors are printed on stderr but they end up on /dev/null I think. We can recover them from the strace if you add the -s128 argument: write(2, "Oct 29 13:17:17 | 3600c0ff00050b"..., 114) = -1 EPIPE (Broken pipe) When they happen, qemu-pr-helper sends an ABORTED_COMMAND error to the VM and that presumably causes the test to fail. sendmsg(12, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0\0\0\2\0\0\0\0p\0\v\0\0\0\0\n\0\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0"..., ^^ ^^ ^^^ (\2 = CHECK_CONDITION, \v = 11 = ABORTED COMMAND, \10 = 8 = LUN COMMUNICATION FAILURE) Created attachment 1725258 [details]
list disk
Created attachment 1725263 [details]
failover report
I tested 3 scenarios: Red Hat Enterprise Linux release 8.2 (Ootpa) 4.18.0-193.el8.x86_64 qemu-kvm-common-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64 Guest Wind2019 scrnario 1, It does not active multipath service, the two vm attach disk with different path like as vm1: /usr/libexec/qemu-kvm \ -name pub-vm${idx} \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -device pvpanic,ioport=0x505,id=idZcGD6F \ -device pcie-root-port,id=pcie-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \ -device pcie-root-port,id=pcie-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device pcie-root-port,id=pcie-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device pcie-root-port,id=pcie-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -device pcie-root-port,id=pcie-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \ -device virtio-scsi-pci,id=scsi0,bus=pcie-root-port-3,addr=0x0 \ -device virtio-scsi-pci,id=scsi1,bus=pcie-root-port-4,addr=0x0 \ -blockdev driver=file,node-name=file_disk,cache.direct=off,cache.no-flush=on,filename=/home/windbg/pub-dnode${idx}.qcow2 \ -blockdev driver=qcow2,node-name=protocol_disk,file=file_disk \ -device scsi-hd,drive=protocol_disk,bus=scsi0.0,id=os_disk,bootindex=1 \ \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/sdb,node-name=drive2,file.pr-manager=helper0 \ -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \ \ vm2: ... -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/sdc,node-name=drive2,file.pr-manager=helper0 \ -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \ \ ... It passed validate configuration of failover cluster manager scrnario 2, active multipath, the 2 vms attach same mapper path : -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/mapper/mpatha,node-name=drive2,file.pr-manager=helper0 \ -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \ \ It failed on "list disks to be validated " step,please check https://bugzilla.redhat.com/attachment.cgi?id=1725258 scrnario 3 : add "reservation_key file" into /etc/multipath.conf, then restart multipathd service. other steps like scenario 2, it faile on step "Validate SCSI-3 Persistent Reservation" Please check report https://bugzilla.redhat.com/attachment.cgi?id=1725263 cat /etc/multipath.conf defaults { user_friendly_names yes find_multipaths yes enable_foreign "^$" reservation_key file } blacklist_exceptions { property "(SCSI_IDENT_|ID_WWN)" } blacklist { } Hi, The request failed in the mapthpersist as the reservation was done with a different key that the on stored in the prky file. The content was generated during the test. 170913 read(19, "# Multipath persistent reservation keys, Version : 1.0\n# NOTE: this file is automatically maintained by the multipathd program.\n# You should not need to edit this file in normal circumstances. \n#\n# Format:\n# prkey wwid\n#\n0x499979fb7466734d 3600c0ff00050b9fda179aa5e01000000\n", 4096) = 273 170913 close(19) = 0 170913 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 170913 write(2, "Oct 30 12:11:32 | 3600c0ff00050b9fda179aa5e01000000: configured reservation key doesn't match: 0x499979fb7466734d\n", 114) = -1 EPIPE (Broken pipe) (In reply to qing.wang from comment #12) > I tested 3 scenarios: > > Red Hat Enterprise Linux release 8.2 (Ootpa) > 4.18.0-193.el8.x86_64 > qemu-kvm-common-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64 > Guest Wind2019 > > > scrnario 1, > It does not active multipath service, the two vm attach disk with different > path like as > > > > vm1: > /usr/libexec/qemu-kvm \ > -name pub-vm${idx} \ > -machine q35 \ > -nodefaults \ > -device VGA,bus=pcie.0,addr=0x1 \ > -device pvpanic,ioport=0x505,id=idZcGD6F \ > -device > pcie-root-port,id=pcie-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ > -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \ > -device > pcie-root-port,id=pcie-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ > -device > pcie-root-port,id=pcie-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ > -device > pcie-root-port,id=pcie-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ > -device > pcie-root-port,id=pcie-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \ > -device virtio-scsi-pci,id=scsi0,bus=pcie-root-port-3,addr=0x0 \ > -device virtio-scsi-pci,id=scsi1,bus=pcie-root-port-4,addr=0x0 \ > -blockdev > driver=file,node-name=file_disk,cache.direct=off,cache.no-flush=on,filename=/ > home/windbg/pub-dnode${idx}.qcow2 \ > -blockdev driver=qcow2,node-name=protocol_disk,file=file_disk \ > -device scsi-hd,drive=protocol_disk,bus=scsi0.0,id=os_disk,bootindex=1 \ > \ > -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ > -blockdev > driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file. > filename=/dev/sdb,node-name=drive2,file.pr-manager=helper0 \ > -device > scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0, > share-rw=on,bootindex=2 \ > \ > > vm2: > ... > -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ > -blockdev > driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file. > filename=/dev/sdc,node-name=drive2,file.pr-manager=helper0 \ > -device > scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0, > share-rw=on,bootindex=2 \ > \ > ... > > It passed validate configuration of failover cluster manager > > > > scrnario 2, active multipath, the 2 vms attach same mapper path : > -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ > -blockdev > driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file. > filename=/dev/mapper/mpatha,node-name=drive2,file.pr-manager=helper0 \ > -device > scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0, > share-rw=on,bootindex=2 \ > \ > > It failed on "list disks to be validated " step,please check > https://bugzilla.redhat.com/attachment.cgi?id=1725258 > > > scrnario 3 : > > add "reservation_key file" into /etc/multipath.conf, then restart multipathd > service. > other steps like scenario 2, it faile on step "Validate SCSI-3 Persistent > Reservation" > Please check report https://bugzilla.redhat.com/attachment.cgi?id=1725263 > > > cat /etc/multipath.conf > > defaults { > user_friendly_names yes > find_multipaths yes > enable_foreign "^$" > reservation_key file > } > > blacklist_exceptions { > property "(SCSI_IDENT_|ID_WWN)" > } > > blacklist { > } It seems the you run the VMs on the same hypervisor. They have to run on different hosts. Created attachment 1726108 [details]
failover report 2
Using iscsi as backend, 1.host attach lun first host A mpathc (360014058fb70e4134fa40ed9421c1fff) dm-5 LIO-ORG,disk0 size=4.0G features='0' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 17:0:0:0 sdd 8:48 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 18:0:0:0 sde 8:64 active undef running host B mpathd (360014058fb70e4134fa40ed9421c1fff) dm-5 LIO-ORG,disk0 size=4.0G features='0' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 39:0:0:0 sdh 8:112 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 38:0:0:0 sdg 8:96 active undef running 2. boot vms on hosts vm 1 run host A -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/mapper/mpathc,node-name=drive2,file.pr-manager=helper0 \ -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \ \ vm2 run host B -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/mapper/mpathd,node-name=drive2,file.pr-manager=helper0 \ -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \ \ 3 run validate configuration of failover cluster manager It is failed on step "Validate SCSI-3 Persistent Reservation" Please check https://bugzilla.redhat.com/attachment.cgi?id=1726108 I added some debugging into the qemu-pr-helper and the pr out requests are the following: Host1 where the failure seesm to be: OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=a5171b77466734d result=0 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_RES_SA key=a5171b77466734d sa_key=00000000 result=0 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_PREE_SA key=a5171b77466734d sa_key=df74c487466734d result=24 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=a5171b77466734d result=0 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_RES_SA key=a5171b77466734d sa_key=00000000 result=24 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_REL_SA key=a5171b77466734d sa_key=00000000 result=0 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=a5171b77466734d result=0 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_RES_SA key=a5171b77466734d sa_key=00000000 result=0 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_REL_SA key=a5171b77466734d sa_key=00000000 result=0 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 Host2: OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=0 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=df74c487466734d result=0 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_RES_SA key=df74c487466734d sa_key=00000000 result=24 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_PREE_SA key=df74c487466734d sa_key=a5171b77466734d result=0 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_RES_SA key=df74c487466734d sa_key=00000000 result=0 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_REL_SA key=df74c487466734d sa_key=00000000 result=0 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=2 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=df74c487466734d result=0 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_CLEAR_SA key=df74c487466734d sa_key=00000000 result=0 OUT rq_scope=0 rq_type=5 rq_servact=MPATH_PROUT_REL_SA key=df74c487466734d sa_key=00000000 result=0 OUT rq_scope=0 rq_type=0 rq_servact=MPATH_PROUT_REG_IGN_SA key=00000000 sa_key=00000000 result=0 The logs are clogs are printed out just before r = mpath_persistent_reserve_out(fd, rq_servact, rq_scope, rq_type, paramp, noisy, verbose); and the result is the result of mpath_reconstruct_sense(fd, r, sense); It seems that the MPATH_PROUT_REG_IGN_SA with return code 2 is the problem. If I try that manualy I get: [root@dell-r440-01 ~]# mpathpersist -i -k /dev/mapper/3600c0ff00050b9fda179aa5e01000000 PR generation=0xb9f, 1 registered reservation key follows: 0x123456 [root@dell-r440-01 ~]# mpathpersist --out --register-ignore /dev/mapper/3600c0ff00050b9fda179aa5e01000000 Nov 03 08:56:26 | 3600c0ff00050b9fda179aa5e01000000: configured reservation key doesn't match: 0xa5171b077466734d PR out: command failed [root@dell-r440-01 ~]# mpathpersist --out --register-ignore --param-sark=0000000 /dev/mapper/3600c0ff00050b9fda179aa5e01000000 Nov 03 08:57:28 | 3600c0ff00050b9fda179aa5e01000000: configured reservation key doesn't match: 0xa5171b077466734d PR out: command failed Looks like a bug in the muptipathpersist, what do you think Paolo? BTW it works if anything else than 000000 is provided. If I use sg_persist directly on the path then it works even with 000000 [root@dell-r440-01 ~]# sg_persist --in -k /dev/sdv DellEMC ME4 G280 Peripheral device type: disk PR generation=0xbac, 2 registered reservation keys follow: 0x345678 0x345678 [root@dell-r440-01 ~]# sg_persist --out --register-ignore --param-sark=000000 /dev/sdv DellEMC ME4 G280 Peripheral device type: disk [root@dell-r440-01 ~]# sg_persist --in -k /dev/sdv DellEMC ME4 G280 Peripheral device type: disk PR generation=0xbad, 1 registered reservation key follows: 0x345678 > I added some debugging into the qemu-pr-helper and the pr out requests are the following:
If you send me the patch privately, I'll be happy to integrate it upstream in some way!
Regarding SELinux, the /etc/multipath/prkeys file probably should have been in /var too.
Can you test again with packages from http://people.redhat.com/pbonzini/bz1894103/ (and if it passes, with SELinux enabled)? If you include your qemu-pr-helper test logs, I can try to make a standalone testcase that doesn't require MS Cluster Services. Hi, I did it and the VM gets paused during the test: 2020-11-12 13:18:33,511+0000 INFO (libvirt/events) [virt.vm] (vmId='1df143b1-a82f-4cdc-8ea1-6f56993ad5bd') abnormal vm stop device ua-fa2f04f1-a920-4a23-9b40-5dff869575e2 error (vm:4493) 2020-11-12 13:18:33,512+0000 INFO (libvirt/events) [virt.vm] (vmId='1df143b1-a82f-4cdc-8ea1-6f56993ad5bd') CPU stopped: onIOError (vm:5551) 2020-11-12 13:18:33,514+0000 INFO (libvirt/events) [virt.vm] (vmId='1df143b1-a82f-4cdc-8ea1-6f56993ad5bd') CPU stopped: onSuspend (vm:5551) 2020-11-12 13:18:33,684+0000 WARN (libvirt/events) [virt.vm] (vmId='1df143b1-a82f-4cdc-8ea1-6f56993ad5bd') device sdb reported I/O error (vm:3643) sdb is the passthrough disk. I have rerun the test with error_policy='enospace' for the passthrough disk. I assume that this should be expected configuration for scsi reservation. The policy was paused before. I can confirm that the test finished succesfully after this setting and applaying the test package. Can you confirm that it fails with SELinux on a fresh host install? And this time it would be great to have the setroubleshoot logs for that too. (Also do you have your extended qemu-pr-helper logs to write a standalone reproducer? Or a "strace -ttt" of qemu-pr-helper on both hosts would be great alternatively) (In reply to Paolo Bonzini from comment #26) > Can you confirm that it fails with SELinux on a fresh host install? And this > time it would be great to have the setroubleshoot logs for that too. > > (Also do you have your extended qemu-pr-helper logs to write a standalone > reproducer? Or a "strace -ttt" of qemu-pr-helper on both hosts would be > great alternatively) Hi, I tried to repeat the selinux issue on another two hyperviosrs and I did not hit the issue. I, unfortuatnely, cannot reinstall the hypervisors as in order to do the tests I have to missuse our production environment. We do not have anough resources to accomodate all the VMs from two hyperviosrs. The environment is memory overloaded. BTW here is the path, but it is not anything I am proud off. It was just a broot force patch to get same data from the test. I will attach it to the BZ. Since bug 1894103 is ON_QA, should this be re-tested with a fresh setup? (In reply to Paolo Bonzini from comment #29) > Since bug 1894103 is ON_QA, should this be re-tested with a fresh setup? Hi Qing, Can you try re-test it too? Thanks. Test passed on : Red Hat Enterprise Linux release 8.4 Beta (Ootpa) 4.18.0-262.el8.x86_64 qemu-kvm-common-4.2.0-38.module+el8.4.0+9133+5346b06d.x86_64 edk2-ovmf-20200602gitca407c7246bf-4.el8.noarch virtio-win-prewhql-0.1-191.iso device-mapper-multipath-0.8.4-6.el8.x86_64 Test steps: 1.add "reservation_key file" into /etc/multipath.conf cat /etc/multipath.conf defaults { user_friendly_names yes find_multipaths yes enable_foreign "^$" reservation_key file } 2.hosts attach lun with multipath host A mpathb (36001405c93f45b64db7490cacbdb5465) dm-4 LIO-ORG,disk0 size=4.0G features='0' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 17:0:0:0 sdd 8:48 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 18:0:0:0 sde 8:64 active undef running host B mpathc (36001405c93f45b64db7490cacbdb5465) dm-5 LIO-ORG,disk0 size=4.0G features='0' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 18:0:0:0 sdg 8:96 active undef running `-+- policy='service-time 0' prio=0 status=enabled `- 17:0:0:0 sdf 8:80 active undef running 3. boot vms on hosts vm 1 run host A -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/mapper/mpathb,node-name=drive2,file.pr-manager=helper0 \ -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \ \ vm2 run host B -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock \ -blockdev driver=raw,file.driver=host_device,cache.direct=off,cache.no-flush=on,file.filename=/dev/mapper/mpathc,node-name=drive2,file.pr-manager=helper0 \ -device scsi-block,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive2,id=scsi0-0-0-0,share-rw=on,bootindex=2 \ \ 4 run validate configuration of failover cluster manager This bug have verified on https://bugzilla.redhat.com/show_bug.cgi?id=1892576#c31 Hi, Since the issue described in this bug should be resolved (VERIFIED), could you please close this bug with resolution 'CURRENTRELEASE' if this bug got fixed ? If the fix for this is not released yet, check if this will ever get fixed. In case of a negative answer then please change it as WONTFIX. If there's anything else to be done on this BZ, if it's still active, not released yet and we actually intend to release it, then please ignore my message. Please note: for those bugs which are not included in errata, please add 'TestOnly' keyword, and those bugs with 'TestOnly' keyword will be closed automatically after GA. TestOnly: Use this when there is no code delivery involved, or for use when code is already upstream and will be incorporated automatically to the next release for testing purposes only. Thank you. According to comment 31, close with CURRENTRELEASE. |