Bug 1370703
Summary: | [Balloon] Whql Job "Commom scenario stress with IO" failed on 2008-32/64 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Peixiu Hou <phou> | ||||
Component: | qemu-kvm-rhev | Assignee: | Stefan Hajnoczi <stefanha> | ||||
Status: | CLOSED ERRATA | QA Contact: | Yumei Huang <yuhuang> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.3 | CC: | ailan, chayang, ghammer, jen, juzhang, knoel, lijin, lmiksik, michen, mrezanin, phou, rbalakri, stefanha, virt-maint, yfu, yuhuang | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.6.0-25.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-07 21:32:13 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Peixiu Hou
2016-08-27 04:37:33 UTC
Isolation: 1. Retested this case with virtio-win-prewhql-126 + qemu(rhel7.2) on rhel7.2 host, it's passed. qemu-kvm-rhev-2.3.0-31.el7_2.3.x86_64 kernel-3.10.0-327.3.1.el7.x86_64 virtio-win-prewhql-126 2. Retested this case with virtio-win-prewhql-126 + qemu(rhel7.3) on rhel7.2 host, it's failed. qemu-kvm-rhev-2.6.0-22.el7.x86_64 kernel-3.10.0-327.3.1.el7.x86_64 virtio-win-prewhql-126 Accordding to above, it's should a rhel7.3 qemu's issue. Best Regards~ Peixiu Hou Retested this case with qemu-kvm-rhev-2.6.0-14.el7.x86_64, it can be passed. Best Regards~ Peixiu Hou The relevant QEMU's commit is afd9096e. Are you using the new HCK to test the driver? I can't find this test and it seems it belongs to the old WHQL test suite. (In reply to Gal Hammer from comment #5) > The relevant QEMU's commit is afd9096e. > > Are you using the new HCK to test the driver? I can't find this test and it > seems it belongs to the old WHQL test suite. Hi Gal, We used WLK to test 2008-32&64 whql job, used wlk version is Windows Logo Kit 1.6. You can use this tool to find the "Commom scenario stress with IO" job. https://www.microsoft.com/en-us/download/details.aspx?id=39359. And we used HCK to test 2008-R2 ~ 2012-R2 whql job. It doesn't include "Commom scenario stress with IO" job. Best Regards~ Peixiu Hou (In reply to Peixiu Hou from comment #6) > (In reply to Gal Hammer from comment #5) > > The relevant QEMU's commit is afd9096e. > > > > Are you using the new HCK to test the driver? I can't find this test and it > > seems it belongs to the old WHQL test suite. > > Hi Gal, > > We used WLK to test 2008-32&64 whql job, used wlk version is Windows Logo > Kit 1.6. You can use this tool to find the "Commom scenario stress with IO" > job. > https://www.microsoft.com/en-us/download/details.aspx?id=39359. > > And we used HCK to test 2008-R2 ~ 2012-R2 whql job. It doesn't include > "Commom scenario stress with IO" job. > > > Best Regards~ > Peixiu Hou I was unable to reproduce with the same qemu/kernel/driver versions. Did you install the balloon service on the client machine? A quicker way to reproduce this bug is using the devcon.exe util in an administrator command prompt window: FOR /L %i IN (1,1,130) DO devcon.exe restart "PCI\VEN_1AF4&DEV_1045" I'm concerned this bug can be triggered by rebooting guests. Therefore it could affect customers and become urgent in RHEL 7.2.z/7.3. Please try this simplified reproducer as root in a Linux guest: host# qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,file=rhel72.img,format=raw -device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 guest# for ((i = 0; i < 129; i++)); do rmmod virtio_balloon; modprobe virtio_balloon; done Expected result: The for loop completes successfully. Actual result: The VM terminates and QEMU prints "Virtqueue size exceeded". Bug description: The problem is that the vq->inuse counter is not zeroed when the device resets. This causes virtqueue_pop() to abort with the error message when the counter exceeds the virtqueue size. A real life scenario would be rebooting a guest with virtio-balloon (and stats polling enabled) 129 times. I will backport two patches from upstream that address this issue. Gal and Ladi, According to comment#13,linux guest hit the same issue. Could you help to confirm whether this bug is qemu or virtio-win bug,so that we can change to the correct component. Hi Li Jin, (In reply to lijin from comment #14) > Gal and Ladi, > According to comment#13,linux guest hit the same issue. > Could you help to confirm whether this bug is qemu or virtio-win bug,so that > we can change to the correct component. This is a QEMU bug. It was found on Windows because the "Common scenario stress with IO" test repeatedly restarts the driver, but is not virtio-win specific. change component to qemu according to comment#16 and comment#17 Fix included in qemu-kvm-rhev-2.6.0-25.el7 ------------------------reproduce----------------------- Test version: kernel: kernel-3.10.0-418.el7.x86_64 qemu: qemu-kvm-rhev-2.6.0-24.el7.x86_64 1. boot one linux guest with: -drive id=virtio-blk-drive,if=virtio,format=qcow2,file=/home/rhel7.3.qcow2 \ -device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 \ 2. after guest boot up, in the guest, do: for ((i = 0; i < 129; i++)); do rmmod virtio_balloon; modprobe virtio_balloon; done Guest terminates immediately, and qemu output: (qemu) qemu-kvm: Virtqueue size exceeded reproduce this bug successfully. ------------------------verification----------------------- Test version: kernel: kernel-3.10.0-418.el7.x86_64 qemu: qemu-kvm-rhev-2.6.0-25.el7.x86_64 1. boot one linux guest with: -drive id=virtio-blk-drive,if=virtio,format=qcow2,file=/home/rhel7.3.qcow2 \ -device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 \ 2. after guest boot up, in the guest, do: for ((i = 0; i < 129; i++)); do rmmod virtio_balloon; modprobe virtio_balloon; done This loop complete successfully So,move this bug to VERIFIED according to the test result above. cmd: /usr/libexec/qemu-kvm \ -name 'VM1' \ -sandbox off \ -machine pc \ -nodefaults \ -vga qxl \ -global kvm-pit.lost_tick_policy=delay \ -chardev socket,id=qmp_monitor,path=/var/tmp/qmpmonitor,server,nowait \ -mon chardev=qmp_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idkP1Yip \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=05 \ -drive id=virtio-blk-drive,if=virtio,format=qcow2,file=/home/rhel7.3.qcow2 \ -device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 \ -m 2048 \ -smp 4,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu host \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -boot order=cdn,once=c,menu=on,strict=off \ -enable-kvm \ -monitor stdio \ -qmp tcp:0:4444,server,nowait \ -netdev tap,id=hostnet,vhost=on \ -device virtio-net-pci,netdev=hostnet,id=virtio-net \ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |