Bug 2079347

Summary: Guest boot blocked when scsi disks using same iothread and 100% CPU consumption
Product: Red Hat Enterprise Linux 9 Reporter: qing.wang <qinwang>
Component: qemu-kvmAssignee: Stefan Hajnoczi <stefanha>
qemu-kvm sub component: virtio-blk,scsi QA Contact: qing.wang <qinwang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: chayang, coli, crobinso, jinzhao, juzhang, kkiwi, kwolf, lijin, nsoffer, qizhu, stefanha, virt-maint, xuwei, ymankad, zhenyzha
Version: 9.1Keywords: Regression, TestBlocker
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-7.0.0-4.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-15 09:54:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description qing.wang 2022-04-27 12:36:19 UTC
Description of problem:
The guest can not boot succeed if there are multi disks share same data plane(iothread)

It has 100% CPU consumption on host


Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux release 9.1 Beta (Plow)
5.14.0-80.el9.x86_64
qemu-kvm-7.0.0-1.el9.x86_64
seabios-bin-1.16.0-1.el9.noarch

How reproducible:

100%
Steps to Reproduce:
1.boot vm with disks using same iothread
/usr/libexec/qemu-kvm \
  -name testvm \
  -machine q35 \
  -m  6G \
  -smp 2 \
  -cpu host,+kvm_pv_unhalt \
  -device ich9-usb-ehci1,id=usb1 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
   \
   \
  -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x3,chassis=1 \
  -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x3.0x1,bus=pcie.0,chassis=2 \
  -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x3.0x2,bus=pcie.0,chassis=3 \
  -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x3.0x3,bus=pcie.0,chassis=4 \
  -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x3.0x4,bus=pcie.0,chassis=5 \
  -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x3.0x5,bus=pcie.0,chassis=6 \
  -device pcie-root-port,id=pcie-root-port-6,port=0x6,addr=0x3.0x6,bus=pcie.0,chassis=7 \
  -device pcie-root-port,id=pcie-root-port-7,port=0x7,addr=0x3.0x7,bus=pcie.0,chassis=8 \
  -device pcie-root-port,id=pcie_extra_root_port_0,bus=pcie.0,addr=0x4  \
  -object iothread,id=iothread0 \
  -object iothread,id=iothread1 \
  -object iothread,id=iothread2 \
  -device virtio-scsi-pci,id=scsi0,bus=pcie-root-port-4,iothread=iothread0 \
  -device virtio-scsi-pci,id=scsi1,bus=pcie-root-port-5,iothread=iothread0 \
  -device virtio-scsi-pci,id=scsi2,bus=pcie-root-port-6,iothread=iothread0 \
  -blockdev driver=qcow2,file.driver=file,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi.qcow2,node-name=drive_image1   \
  -device scsi-hd,id=os,drive=drive_image1,bus=scsi0.0,bootindex=0,serial=OS_DISK   \
  \
  -blockdev driver=qcow2,file.driver=file,file.filename=/home/kvm_autotest_root/images/data1.qcow2,node-name=data_image1   \
  -device scsi-hd,id=data1,drive=data_image1,bus=scsi1.0,bootindex=1,serial=DATA_DISK1   \
  -blockdev driver=qcow2,file.driver=file,file.filename=/home/kvm_autotest_root/images/data2.qcow2,node-name=data_image2   \
  -device scsi-hd,id=data2,drive=data_image2,bus=scsi2.0,bootindex=2,serial=DATA_DISK2  \
  -vnc :5 \
  -monitor stdio \
  -qmp tcp:0:5955,server=on,wait=off \
  -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b7,id=nic1,netdev=nicpci,bus=pcie-root-port-7 \
  -netdev tap,id=nicpci \
  -chardev socket,id=qmpmonitor1,path=/var/tmp/run-qmp.log,server=on,wait=off \
  -mon chardev=qmpmonitor1,mode=control \
  -chardev socket,id=hmpmonitor1,path=/var/tmp/run-hmp.log,server=on,wait=off \
  -mon chardev=hmpmonitor1,mode=readline \
  -chardev socket,id=charserial1,path=/var/tmp/run-serial.log,server=on,wait=off \
  -device isa-serial,chardev=charserial1,id=serial1 \
  -chardev file,path=/var/tmp/run-seabios.log,id=charseabios1 \
  -device isa-debugcon,chardev=charseabios1,iobase=0x402 \
  -D debug.log \
  -boot menu=on,reboot-timeout=1000,strict=off \

2.login guest


Actual results:
Guest boot blocked and can not login
Expected results:
Guest boot succeed
Additional info:
  
Not issue found on
Red Hat Enterprise Linux release 9.0 (Plow)
5.14.0-70.13.1.el9_0.x86_64
qemu-kvm-6.2.0-11.el9_0.2.x86_64

If scsi controller using different iothread or disable iothread will not hit this issue.

Comment 2 qing.wang 2022-04-27 12:57:21 UTC
if blk and scsi disk using same iothread will result in boot hang.
 Please check screen shot via
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/2079347/2022-04-27/ 

/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35,memory-backend=mem-machine_mem \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 30720 \
    -object memory-backend-ram,size=30720M,id=mem-machine_mem  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'EPYC-Rome',+kvm_pv_unhalt \
    \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -object iothread,id=iothread2 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    \
    -blockdev node-name=file_stg1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/stg1.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_stg1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_stg1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-blk-pci,id=stg1,drive=drive_stg1,write-cache=on,serial=stg1,iothread=iothread0,bus=pcie-root-port-3,addr=0x0 \
    \
    -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
			-device virtio-scsi-pci,id=virtio_scsi_pci1,iothread=iothread0,bus=pcie-root-port-4,addr=0x0 \
    -blockdev node-name=file_stg2,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/stg2.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_stg2,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_stg2 \
    -device scsi-hd,id=stg2,bus=virtio_scsi_pci1.0,drive=drive_stg2,write-cache=on,serial=stg2 \
    \
    -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \
    -device virtio-net-pci,mac=9a:39:15:33:c9:1f,id=idykOgu8,netdev=idNL7Fql,bus=pcie-root-port-5,addr=0x0  \
    -netdev tap,id=idNL7Fql,vhost=on  \
        -vnc :5  \
    -monitor stdio \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=7

Comment 3 Stefan Hajnoczi 2022-04-27 15:34:05 UTC
I have posted patches to solve the CPU consumption issue but I'm not sure it is related to this BZ:
https://lists.gnu.org/archive/html/qemu-devel/2022-04/msg04724.html

Brew is currently broken so I'm unable to prove an RPM for testing. The patches I linked to apply cleanly on top of qemu-kvm-7.0.0-1.el9.x86_64 or you could build qemu.git/master if you want.

Please let me know if you get a chance to try out these patches and then we could debug further from there. Thanks!

Comment 5 qing.wang 2022-04-28 06:42:39 UTC
check cpu consumption on host it will display 100% when boot blocked.
107116 root      20   0 8617572   1.9g  19852 S 100.3   3.0   5:30.62 qemu-kvm

I am not sure they are same issue, just open a new bug to track cpu consumption.
Bug 2079701 - 100% CPU consumption in IOThread

Comment 10 Klaus Heinrich Kiwi 2022-05-09 13:18:29 UTC
(In reply to Stefan Hajnoczi from comment #3)
> I have posted patches to solve the CPU consumption issue but I'm not sure it
> is related to this BZ:
> https://lists.gnu.org/archive/html/qemu-devel/2022-04/msg04724.html
> 
> Brew is currently broken so I'm unable to prove an RPM for testing. The
> patches I linked to apply cleanly on top of qemu-kvm-7.0.0-1.el9.x86_64 or
> you could build qemu.git/master if you want.
> 
> Please let me know if you get a chance to try out these patches and then we
> could debug further from there. Thanks!

Since this is a blocker, can we try again and report if Brew continues to be a problem here?

Comment 11 Stefan Hajnoczi 2022-05-10 08:53:13 UTC
I have kicked off a build here:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=45182730

I am on sick leave. If this needs to be merged urgently please re-assign it to another engineer. Thanks!

Comment 12 qing.wang 2022-05-11 08:59:07 UTC
No issue found on 
Red Hat Enterprise Linux release 9.1 Beta (Plow)
5.14.0-86.el9.x86_64
qemu-kvm-7.0.0-2.el9.stefanha202205100940.x86_64
seabios-bin-1.16.0-1.el9.noarch
edk2-ovmf-20220221gitb24306f15d-1.el9.noarch
virtio-win-prewhql-0.1-215.iso


/usr/libexec/qemu-kvm \
  -name testvm \
  -machine q35 \
  -m  6G \
  -smp 2 \
  -cpu host,+kvm_pv_unhalt \
  -device ich9-usb-ehci1,id=usb1 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
   \
   \
  -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x3,chassis=1 \
  -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x3.0x1,bus=pcie.0,chassis=2 \
  -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x3.0x2,bus=pcie.0,chassis=3 \
  -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x3.0x3,bus=pcie.0,chassis=4 \
  -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x3.0x4,bus=pcie.0,chassis=5 \
  -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x3.0x5,bus=pcie.0,chassis=6 \
  -device pcie-root-port,id=pcie-root-port-6,port=0x6,addr=0x3.0x6,bus=pcie.0,chassis=7 \
  -device pcie-root-port,id=pcie-root-port-7,port=0x7,addr=0x3.0x7,bus=pcie.0,chassis=8 \
  -device pcie-root-port,id=pcie_extra_root_port_0,bus=pcie.0,addr=0x4  \
  -object iothread,id=iothread0 \
  -object iothread,id=iothread1 \
  -object iothread,id=iothread2 \
  -device virtio-scsi-pci,id=scsi0,bus=pcie-root-port-4,iothread=iothread0 \
  -device virtio-scsi-pci,id=scsi1,bus=pcie-root-port-5,iothread=iothread0 \
  -device virtio-scsi-pci,id=scsi2,bus=pcie-root-port-6,iothread=iothread0 \
  -blockdev driver=qcow2,file.driver=file,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi.qcow2,node-name=drive_image1   \
  -device scsi-hd,id=os,drive=drive_image1,bus=scsi0.0,bootindex=0,serial=OS_DISK   \
  \
  -blockdev driver=qcow2,file.driver=file,file.filename=/home/kvm_autotest_root/images/stg1.qcow2,node-name=data_image1   \
  -device scsi-hd,id=data1,drive=data_image1,bus=scsi1.0,bootindex=1,serial=DATA_DISK1   \
  -blockdev driver=qcow2,file.driver=file,file.filename=/home/kvm_autotest_root/images/stg2.qcow2,node-name=data_image2   \
  -device scsi-hd,id=data2,drive=data_image2,bus=scsi2.0,bootindex=2,serial=DATA_DISK2  \
  -vnc :5 \
  -monitor stdio \
  -qmp tcp:0:5955,server=on,wait=off \
  -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b7,id=nic1,netdev=nicpci,bus=pcie-root-port-7 \
  -netdev tap,id=nicpci \
  -chardev socket,id=qmpmonitor1,path=/var/tmp/run-qmp.log,server=on,wait=off \
  -mon chardev=qmpmonitor1,mode=control \
  -chardev socket,id=hmpmonitor1,path=/var/tmp/run-hmp.log,server=on,wait=off \
  -mon chardev=hmpmonitor1,mode=readline \
  -chardev socket,id=charserial1,path=/var/tmp/run-serial.log,server=on,wait=off \
  -device isa-serial,chardev=charserial1,id=serial1 \
  -chardev file,path=/var/tmp/run-seabios.log,id=charseabios1 \
  -device isa-debugcon,chardev=charseabios1,iobase=0x402 \
  -D debug.log \
  -boot menu=on,reboot-timeout=1000,strict=off \

The guest may boot and cpu consumption is normal
top -d1 -n5|grep qemu|awk '{print $10}'
12.0
3.0

Comment 14 qing.wang 2022-05-11 09:11:01 UTC
*** Bug 2079701 has been marked as a duplicate of this bug. ***

Comment 18 Yanan Fu 2022-05-23 05:43:23 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 21 qing.wang 2022-05-24 08:00:43 UTC
Passed test with steps on #12 
Red Hat Enterprise Linux release 9.1 Beta (Plow)
5.14.0-96.el9.x86_64
qemu-kvm-7.0.0-4.el9.x86_64
seabios-bin-1.16.0-2.el9.noarch
edk2-ovmf-20220221gitb24306f15d-1.el9.noarch
virtio-win-prewhql-0.1-219.iso

Comment 24 errata-xmlrpc 2022-11-15 09:54:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7967