Hide Forgot
Created attachment 1194598 [details] 125BLN200832 Description of problem: Balloon Whql Job "Commom scenario stress with IO" failed on 2008-32/64 When ranjob- Commom_scenario_stress_with_IO, the guest quited with qemu error message "virtqueue size exceeded" Version-Release number of selected component (if applicable): kernel-3.10.0-493.el7.x86_64 qemu-kvm-rhev-2.6.0-22.el7.x86_64 virtio-win-prewhql-125 How reproducible: 100% Steps to Reproduce: 1.Boot a client guest: /usr/libexec/qemu-kvm -name 125BLN200832EYJ -enable-kvm -m 4G -smp 4 -uuid 267a6a93-2c55-42b3-b5e5-c1cd984ad009 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/tmp/125BLN200832EYJ,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=125BLN200832EYJ,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=en_windows_server_2008_datacenter_enterprise_standard_sp2_x86_dvd_342333.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=125BLN200832EYJ.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none -global isa-fdc.driveA=drive-fdc0-0-0 -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:52:1f:43:46:da,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:1 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7,disable-legacy=off,disable-modern=off 2. Run the job "Commom scenario stress with IO" 3. Check the guest state Actual results: guest quit, job fail Expected results: job pass Additional info: 1. Tried without virtio1.0, it's also failed. 2. The wlk log as attachment 125 [details]BLN200832.cpk
Isolation: 1. Retested this case with virtio-win-prewhql-126 + qemu(rhel7.2) on rhel7.2 host, it's passed. qemu-kvm-rhev-2.3.0-31.el7_2.3.x86_64 kernel-3.10.0-327.3.1.el7.x86_64 virtio-win-prewhql-126 2. Retested this case with virtio-win-prewhql-126 + qemu(rhel7.3) on rhel7.2 host, it's failed. qemu-kvm-rhev-2.6.0-22.el7.x86_64 kernel-3.10.0-327.3.1.el7.x86_64 virtio-win-prewhql-126 Accordding to above, it's should a rhel7.3 qemu's issue. Best Regards~ Peixiu Hou
Retested this case with qemu-kvm-rhev-2.6.0-14.el7.x86_64, it can be passed. Best Regards~ Peixiu Hou
The relevant QEMU's commit is afd9096e. Are you using the new HCK to test the driver? I can't find this test and it seems it belongs to the old WHQL test suite.
(In reply to Gal Hammer from comment #5) > The relevant QEMU's commit is afd9096e. > > Are you using the new HCK to test the driver? I can't find this test and it > seems it belongs to the old WHQL test suite. Hi Gal, We used WLK to test 2008-32&64 whql job, used wlk version is Windows Logo Kit 1.6. You can use this tool to find the "Commom scenario stress with IO" job. https://www.microsoft.com/en-us/download/details.aspx?id=39359. And we used HCK to test 2008-R2 ~ 2012-R2 whql job. It doesn't include "Commom scenario stress with IO" job. Best Regards~ Peixiu Hou
(In reply to Peixiu Hou from comment #6) > (In reply to Gal Hammer from comment #5) > > The relevant QEMU's commit is afd9096e. > > > > Are you using the new HCK to test the driver? I can't find this test and it > > seems it belongs to the old WHQL test suite. > > Hi Gal, > > We used WLK to test 2008-32&64 whql job, used wlk version is Windows Logo > Kit 1.6. You can use this tool to find the "Commom scenario stress with IO" > job. > https://www.microsoft.com/en-us/download/details.aspx?id=39359. > > And we used HCK to test 2008-R2 ~ 2012-R2 whql job. It doesn't include > "Commom scenario stress with IO" job. > > > Best Regards~ > Peixiu Hou I was unable to reproduce with the same qemu/kernel/driver versions. Did you install the balloon service on the client machine?
A quicker way to reproduce this bug is using the devcon.exe util in an administrator command prompt window: FOR /L %i IN (1,1,130) DO devcon.exe restart "PCI\VEN_1AF4&DEV_1045"
I'm concerned this bug can be triggered by rebooting guests. Therefore it could affect customers and become urgent in RHEL 7.2.z/7.3. Please try this simplified reproducer as root in a Linux guest: host# qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,file=rhel72.img,format=raw -device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 guest# for ((i = 0; i < 129; i++)); do rmmod virtio_balloon; modprobe virtio_balloon; done Expected result: The for loop completes successfully. Actual result: The VM terminates and QEMU prints "Virtqueue size exceeded". Bug description: The problem is that the vq->inuse counter is not zeroed when the device resets. This causes virtqueue_pop() to abort with the error message when the counter exceeds the virtqueue size. A real life scenario would be rebooting a guest with virtio-balloon (and stats polling enabled) 129 times. I will backport two patches from upstream that address this issue.
Gal and Ladi, According to comment#13,linux guest hit the same issue. Could you help to confirm whether this bug is qemu or virtio-win bug,so that we can change to the correct component.
Hi Li Jin, (In reply to lijin from comment #14) > Gal and Ladi, > According to comment#13,linux guest hit the same issue. > Could you help to confirm whether this bug is qemu or virtio-win bug,so that > we can change to the correct component. This is a QEMU bug. It was found on Windows because the "Common scenario stress with IO" test repeatedly restarts the driver, but is not virtio-win specific.
change component to qemu according to comment#16 and comment#17
Fix included in qemu-kvm-rhev-2.6.0-25.el7
------------------------reproduce----------------------- Test version: kernel: kernel-3.10.0-418.el7.x86_64 qemu: qemu-kvm-rhev-2.6.0-24.el7.x86_64 1. boot one linux guest with: -drive id=virtio-blk-drive,if=virtio,format=qcow2,file=/home/rhel7.3.qcow2 \ -device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 \ 2. after guest boot up, in the guest, do: for ((i = 0; i < 129; i++)); do rmmod virtio_balloon; modprobe virtio_balloon; done Guest terminates immediately, and qemu output: (qemu) qemu-kvm: Virtqueue size exceeded reproduce this bug successfully. ------------------------verification----------------------- Test version: kernel: kernel-3.10.0-418.el7.x86_64 qemu: qemu-kvm-rhev-2.6.0-25.el7.x86_64 1. boot one linux guest with: -drive id=virtio-blk-drive,if=virtio,format=qcow2,file=/home/rhel7.3.qcow2 \ -device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 \ 2. after guest boot up, in the guest, do: for ((i = 0; i < 129; i++)); do rmmod virtio_balloon; modprobe virtio_balloon; done This loop complete successfully So,move this bug to VERIFIED according to the test result above. cmd: /usr/libexec/qemu-kvm \ -name 'VM1' \ -sandbox off \ -machine pc \ -nodefaults \ -vga qxl \ -global kvm-pit.lost_tick_policy=delay \ -chardev socket,id=qmp_monitor,path=/var/tmp/qmpmonitor,server,nowait \ -mon chardev=qmp_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idkP1Yip \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=05 \ -drive id=virtio-blk-drive,if=virtio,format=qcow2,file=/home/rhel7.3.qcow2 \ -device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 \ -m 2048 \ -smp 4,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu host \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -boot order=cdn,once=c,menu=on,strict=off \ -enable-kvm \ -monitor stdio \ -qmp tcp:0:4444,server,nowait \ -netdev tap,id=hostnet,vhost=on \ -device virtio-net-pci,netdev=hostnet,id=virtio-net \
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html