Bug 1974807

Summary: [aarch64] Launch guest with virtio-gpu-pci and virtual smmu causes "virtio_gpu_dequeue_ctrl_func" ERROR
Product: Red Hat Enterprise Linux 8 Reporter: Yihuang Yu <yihyu>
Component: kernelAssignee: Eric Auger <eric.auger>
kernel sub component: Virtualization QA Contact: Yihuang Yu <yihyu>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: drjones, eric.auger, jinzhao, juzhang, kraxel, lcapitulino, peterx, qzhang, zhenyzha
Version: 8.5Keywords: Triaged
Target Milestone: betaFlags: pm-rhel: mirror+
Target Release: 8.5   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-29 14:57:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1885765    

Description Yihuang Yu 2021-06-22 15:14:38 UTC
Description of problem:
This is the issue after bug 1971821 is fixed. When launching a guest with "-device virtio-gpu-pci,iommu_platform=on" and "-machine virt,gic-version=host,iommu=smmuv3", the guest can be launched but the console output has some error messages, and the vnc interface displays a black screen.

Version-Release number of selected component (if applicable):
host kernel: 5.13.0-0.rc4.33.el9.aarch64
guest kernel: kernel-4.18.0-316.el8.aarch64
qemu version: qemu-kvm-6.0.0-5.el9.aarch64

How reproducible:
always

Steps to Reproduce:
1. Launch a guest with iommu and smmuv3
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \
    -blockdev node-name=file_aavmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel850-aarch64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \
    -machine virt,gic-version=host,iommu=smmuv3,memory-backend=mem-machine_mem,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0,iommu_platform=on \
    -m 9216 \
    -object memory-backend-ram,size=9216M,id=mem-machine_mem  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'host' \
    -serial unix:'/tmp/avocado_i72teslk/serial-serial0-20210622-102506-U1g0aqA9',server=on,wait=off \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0,iommu_platform=on \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel850-aarch64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
    -device virtio-net-pci,mac=9a:10:11:62:e0:fa,rombar=0,id=idYy6la2,netdev=idi6Dvfu,bus=pcie-root-port-4,addr=0x0,iommu_platform=on  \
    -netdev tap,id=idi6Dvfu,vhost=on  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew \
    -enable-kvm \

2. Check the console output and vnc interface

Actual results:
2021-06-22 10:25:34: [    4.470512] [drm] pci: virtio-gpu-pci detected at 0000:03:00.0
2021-06-22 10:25:34: [    4.472270] [drm] features: -virgl +edid
2021-06-22 10:25:34: [    4.474606] [drm] number of scanouts: 1
2021-06-22 10:25:34: [    4.475786] [drm] number of cap sets: 0
2021-06-22 10:25:34: [    4.477680] [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0
2021-06-22 10:25:34: [    4.482960] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x106)
2021-06-22 10:25:34: [    4.486070] Console: switching to colour frame buffer device 128x48
2021-06-22 10:25:34: [    4.486668] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
2021-06-22 10:25:34: [    4.486982] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
2021-06-22 10:25:34: [    4.488421] random: fast init done
2021-06-22 10:25:34: [    4.489963] sd 0:0:0:0: Power-on or device reset occurred
2021-06-22 10:25:34: [    4.490531] virtio_gpu virtio0: [drm] fb0: virtio_gpudrmfb frame buffer device
2021-06-22 10:25:34: [    4.494351] sd 0:0:0:0: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
2021-06-22 10:25:34: [    4.495655] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
2021-06-22 10:25:34: [    4.496483] sd 0:0:0:0: [sda] Write Protect is off
2021-06-22 10:25:34: [    4.498805] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
2021-06-22 10:25:34: [    4.501227] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
2021-06-22 10:25:34: [    4.501783] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
2021-06-22 10:25:34: [    4.503842] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
2021-06-22 10:25:34: [    4.504724] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
2021-06-22 10:25:34: [    4.508894]  sda: sda1 sda2 sda3
2021-06-22 10:25:34: [    4.509802] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
2021-06-22 10:25:34: [    4.514538] sd 0:0:0:0: [sda] Attached SCSI disk
2021-06-22 10:25:34: [    4.514608] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
2021-06-22 10:25:34: [    4.527798] sd 0:0:0:0: Attached scsi generic sg0 type 0

VNC displays a black screen.

Expected results:
Not related error messages, the VNC can display a graphical interface.

Additional info:

Comment 1 Eric Auger 2021-06-22 16:42:58 UTC
Hum you said it happened with a RHEL9 guest. In the above command I see you launch a rhel8.5. Please could you clarify?

Comment 2 Yihuang Yu 2021-06-23 00:57:48 UTC
(In reply to Eric Auger from comment #1)
> Hum you said it happened with a RHEL9 guest. In the above command I see you
> launch a rhel8.5. Please could you clarify?

Eric, this problem is in both RHEL8 and RHEL9 guests, the first time I hit it was in the guest of RHEL9, but after bug 1971821 is fixed, the RHEL8 guest also has the same problem. So this bug is to track the RHEL8 guest.

Comment 3 Eric Auger 2021-06-23 15:49:05 UTC
I am a total beginner at graphics on ARM. I am looking for advices on how to exercice the virtio-gpu with rhel8.5/9?

I installed a RHEL8.5 VM with virt-manager adding vnc and virtio-gpu. I got the graphical installer and completed the install (Note I was able to do that only on RHEL8.5 since on RHEL9.0 since I got some issues with the mouse which was not working properly). Then I patched the xml to add the smmuv3 and added <driver iommu='on'/> on the block, net and virtio-gpu-pci. I cannot reproduce the reported issue. Yihuang, is there any manner for me to launch the exact same test as you?

Thanks

Eric

Comment 4 Eric Auger 2021-06-24 12:04:35 UTC
Correction, with the above libvirt test case I can reproduce *sometimes* but it looks less than 50% of the times. However my testcase is sufficient. This happens with the latest ark kernel as a guest.

../..
[   67.330862] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[   67.540790] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[   70.690482] virtio_gpu_dequeue_ctrl_func: 14 callbacks suppressed

Gerd, do you have any clue of what could be the cause? I suspect a problem in the virtio-gpu driver?

Comment 5 Gerd Hoffmann 2021-06-24 13:38:04 UTC
> Gerd, do you have any clue of what could be the cause? I suspect a problem
> in the virtio-gpu driver?

Anything in the logs on the host?
Anything in the logs with "-d guest_errors" added to qemu cmd line?

Comment 6 Eric Auger 2021-06-24 13:52:29 UTC
I am not able to reproduce with upstream qemu whereas with downstream the issues occurs with 20% reproducibility. 

Interestingly we miss the following upstream commit both in 8.5 and 9.0.
9049f8bc44  virtio-gpu: handle partial maps properly (6 weeks ago) <Gerd Hoffmann>

This was the first issue found when investigating BZ1932279 and then we found the guest kernel issue ... and the kernel issue let us forget the bug in qemu ;-) which now does not produce an assert as it did in the past.

with this fix backported in downstream qemu 9.0 I cannot reproduce anymore. So I will send a backport on both 8.5 and 9.0

Comment 7 Gerd Hoffmann 2021-06-24 14:25:15 UTC
> Interestingly we miss the following upstream commit both in 8.5 and 9.0.
> 9049f8bc44  virtio-gpu: handle partial maps properly (6 weeks ago) <Gerd
> Hoffmann>

Ah, right, it was after 6.0 release so not picked up by rebase.

> So I will send a backport on both 8.5 and 9.0

thanks.

Comment 8 Qunfang Zhang 2021-06-28 02:02:48 UTC
(In reply to Eric Auger from comment #6)
> I am not able to reproduce with upstream qemu whereas with downstream the
> issues occurs with 20% reproducibility. 
> 
> Interestingly we miss the following upstream commit both in 8.5 and 9.0.
> 9049f8bc44  virtio-gpu: handle partial maps properly (6 weeks ago) <Gerd
> Hoffmann>
> 
> This was the first issue found when investigating BZ1932279 and then we
> found the guest kernel issue ... and the kernel issue let us forget the bug
> in qemu ;-) which now does not produce an assert as it did in the past.
> 
> with this fix backported in downstream qemu 9.0 I cannot reproduce anymore.
> So I will send a backport on both 8.5 and 9.0

Hi Eric,

This bug is with devel_ack+ and qa_ack+, then which DTM and ITM should we set? 

Thanks,
Qunfang

Comment 9 Eric Auger 2021-06-29 14:57:47 UTC
So Eventually this happens to be a qemu bug tracked by BZ1932279. So let's close this one as DUP

*** This bug has been marked as a duplicate of bug 1932279 ***