1970030 – In FC multipath scenario VM is paused if one of two paths go down

Bug 1970030 - In FC multipath scenario VM is paused if one of two paths go down

Summary: In FC multipath scenario VM is paused if one of two paths go down

Keywords:
Status:	CLOSED DUPLICATE of bug 1854659
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	8.5
Assignee:	Virtualization Maintenance
QA Contact:	qing.wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-09 16:36 UTC by Nils Koenig
Modified:	2023-03-14 19:06 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-06-11 01:52:35 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
multipath.conf (7.14 KB, text/plain) 2021-06-09 16:36 UTC, Nils Koenig	no flags	Details
View All

Description Nils Koenig 2021-06-09 16:36:23 UTC

Created attachment 1789627 [details]
multipath.conf

Hi @all,

we observed the behavior of a VM being stopped in a FC multipath environment if one (of two) paths of a direct attached LUN with SCSI passthrough goes down.

The qemu command line for the disk is:

blockdev {"driver":"host_device","filename":"/dev/mapper/36000d31005771c00000000000000005a","aio":"native","node-name":"libvirt-4-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
 -blockdev {"node-name":"libvirt-4-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-4-storage"} -device scsi-block,bus=ua-c31a9cae-ba40-4c5c-8748-0e9407099879.0,channel=0,scsi-id=0,lun=1,share-rw=on,drive=libvirt-4-format,id=ua-7c4e45e3-e577-48e2-bb04-c463c3596bb9,werror=stop,rerror=stop

AFAIR the werror/rerror statement above determines what should happen in case of a (read|write)error.


Using no SCSI Passthrough, werror|rerror is also set to stop, but the VM is not paused. The only difference I spot is the -device scsi-hd vs. -device scsi-block:

-blockdev {"driver":"host_device","filename":"/dev/mapper/36000d31005771c000000000000000066","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
-blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}
-device scsi-hd,bus=ua-eff86ed8-d744-43d1-8f69-463c4974691f.0,channel=0,scsi-id=0,lun=0,device_id=3686d9c0-d674-4ddd-b801-a69d7657ea47,drive=libvirt-1-format,id=ua-3686d9c0-d674-4ddd-b801-a69d7657ea47,bootindex=2,write-cache=on,serial=3686d9c0-d674-4ddd-b801-a69d7657ea47,werror=stop,rerror=stop

The multipath is terminated on the hypervisor and the device created by the multipath device mapper is passed into the VM.

# multipath -ll
36000d31005771c00000000000000005a dm-1 COMPELNT,Compellent Vol
size=10T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 14:0:0:1  sdc  8:32   active ready running
  `- 15:0:0:1  sds  65:32  active ready running

Multipath configuration is handled by vsdm, but see attached multipath.conf.

Comment 1 Nils Koenig 2021-06-09 16:51:44 UTC

My questions are:

1. Is it a bug or a feature, that the VM is paused? 

2. How to enable tracing to see what caused the pause?

3. Would setting werror|rerror=report be a workaround and how to archive that with RHV?

Cheers,
Nils

Comment 2 qing.wang 2021-06-10 01:34:41 UTC

This issue looks like same as Bug 1854659 - qemu-kvm: SCSI passthrough does not work properly with an underlying DM-multipath device?

Comment 3 Klaus Heinrich Kiwi 2021-06-10 12:49:26 UTC

(In reply to qing.wang from comment #2)
> This issue looks like same as Bug 1854659 - qemu-kvm: SCSI passthrough does
> not work properly with an underlying DM-multipath device?

Good catch. It certainly feels like the same bug, which would mean this to be in the kernel/DM domain (and not virt).

Paolo, can you double-check and update the status of this bug accordingly?

Comment 4 Stefan Hajnoczi 2021-06-10 13:24:59 UTC

Thanks Qing Wang. Sounds like a duplicate the BZ you linked.

Comment 5 qing.wang 2021-06-11 01:52:35 UTC


*** This bug has been marked as a duplicate of bug 1854659 ***

Note You need to log in before you can comment on or make changes to this bug.