| Summary: | [Balloon] Whql Job "Commom scenario stress with IO" failed on 2008-32/64 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Peixiu Hou <phou> | ||||
| Component: | qemu-kvm-rhev | Assignee: | Stefan Hajnoczi <stefanha> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Yumei Huang <yuhuang> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.3 | CC: | ailan, chayang, ghammer, jen, juzhang, knoel, lijin, lmiksik, michen, mrezanin, phou, rbalakri, stefanha, virt-maint, yfu, yuhuang | ||||
| Target Milestone: | rc | Keywords: | Regression | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | qemu-kvm-rhev-2.6.0-25.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-11-07 21:32:13 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Peixiu Hou
2016-08-27 04:37:33 UTC
Isolation: 1. Retested this case with virtio-win-prewhql-126 + qemu(rhel7.2) on rhel7.2 host, it's passed. qemu-kvm-rhev-2.3.0-31.el7_2.3.x86_64 kernel-3.10.0-327.3.1.el7.x86_64 virtio-win-prewhql-126 2. Retested this case with virtio-win-prewhql-126 + qemu(rhel7.3) on rhel7.2 host, it's failed. qemu-kvm-rhev-2.6.0-22.el7.x86_64 kernel-3.10.0-327.3.1.el7.x86_64 virtio-win-prewhql-126 Accordding to above, it's should a rhel7.3 qemu's issue. Best Regards~ Peixiu Hou Retested this case with qemu-kvm-rhev-2.6.0-14.el7.x86_64, it can be passed. Best Regards~ Peixiu Hou The relevant QEMU's commit is afd9096e. Are you using the new HCK to test the driver? I can't find this test and it seems it belongs to the old WHQL test suite. (In reply to Gal Hammer from comment #5) > The relevant QEMU's commit is afd9096e. > > Are you using the new HCK to test the driver? I can't find this test and it > seems it belongs to the old WHQL test suite. Hi Gal, We used WLK to test 2008-32&64 whql job, used wlk version is Windows Logo Kit 1.6. You can use this tool to find the "Commom scenario stress with IO" job. https://www.microsoft.com/en-us/download/details.aspx?id=39359. And we used HCK to test 2008-R2 ~ 2012-R2 whql job. It doesn't include "Commom scenario stress with IO" job. Best Regards~ Peixiu Hou (In reply to Peixiu Hou from comment #6) > (In reply to Gal Hammer from comment #5) > > The relevant QEMU's commit is afd9096e. > > > > Are you using the new HCK to test the driver? I can't find this test and it > > seems it belongs to the old WHQL test suite. > > Hi Gal, > > We used WLK to test 2008-32&64 whql job, used wlk version is Windows Logo > Kit 1.6. You can use this tool to find the "Commom scenario stress with IO" > job. > https://www.microsoft.com/en-us/download/details.aspx?id=39359. > > And we used HCK to test 2008-R2 ~ 2012-R2 whql job. It doesn't include > "Commom scenario stress with IO" job. > > > Best Regards~ > Peixiu Hou I was unable to reproduce with the same qemu/kernel/driver versions. Did you install the balloon service on the client machine? A quicker way to reproduce this bug is using the devcon.exe util in an administrator command prompt window: FOR /L %i IN (1,1,130) DO devcon.exe restart "PCI\VEN_1AF4&DEV_1045" I'm concerned this bug can be triggered by rebooting guests. Therefore it could affect customers and become urgent in RHEL 7.2.z/7.3. Please try this simplified reproducer as root in a Linux guest: host# qemu-system-x86_64 -enable-kvm -m 1024 -cpu host -drive if=virtio,file=rhel72.img,format=raw -device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 guest# for ((i = 0; i < 129; i++)); do rmmod virtio_balloon; modprobe virtio_balloon; done Expected result: The for loop completes successfully. Actual result: The VM terminates and QEMU prints "Virtqueue size exceeded". Bug description: The problem is that the vq->inuse counter is not zeroed when the device resets. This causes virtqueue_pop() to abort with the error message when the counter exceeds the virtqueue size. A real life scenario would be rebooting a guest with virtio-balloon (and stats polling enabled) 129 times. I will backport two patches from upstream that address this issue. Gal and Ladi, According to comment#13,linux guest hit the same issue. Could you help to confirm whether this bug is qemu or virtio-win bug,so that we can change to the correct component. Hi Li Jin, (In reply to lijin from comment #14) > Gal and Ladi, > According to comment#13,linux guest hit the same issue. > Could you help to confirm whether this bug is qemu or virtio-win bug,so that > we can change to the correct component. This is a QEMU bug. It was found on Windows because the "Common scenario stress with IO" test repeatedly restarts the driver, but is not virtio-win specific. change component to qemu according to comment#16 and comment#17 Fix included in qemu-kvm-rhev-2.6.0-25.el7 ------------------------reproduce-----------------------
Test version:
kernel: kernel-3.10.0-418.el7.x86_64
qemu: qemu-kvm-rhev-2.6.0-24.el7.x86_64
1. boot one linux guest with:
-drive id=virtio-blk-drive,if=virtio,format=qcow2,file=/home/rhel7.3.qcow2 \
-device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 \
2. after guest boot up, in the guest, do:
for ((i = 0; i < 129; i++)); do rmmod virtio_balloon; modprobe virtio_balloon; done
Guest terminates immediately, and qemu output:
(qemu) qemu-kvm: Virtqueue size exceeded
reproduce this bug successfully.
------------------------verification-----------------------
Test version:
kernel: kernel-3.10.0-418.el7.x86_64
qemu: qemu-kvm-rhev-2.6.0-25.el7.x86_64
1. boot one linux guest with:
-drive id=virtio-blk-drive,if=virtio,format=qcow2,file=/home/rhel7.3.qcow2 \
-device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 \
2. after guest boot up, in the guest, do:
for ((i = 0; i < 129; i++)); do rmmod virtio_balloon; modprobe virtio_balloon; done
This loop complete successfully
So,move this bug to VERIFIED according to the test result above.
cmd:
/usr/libexec/qemu-kvm \
-name 'VM1' \
-sandbox off \
-machine pc \
-nodefaults \
-vga qxl \
-global kvm-pit.lost_tick_policy=delay \
-chardev socket,id=qmp_monitor,path=/var/tmp/qmpmonitor,server,nowait \
-mon chardev=qmp_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idkP1Yip \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=05 \
-drive id=virtio-blk-drive,if=virtio,format=qcow2,file=/home/rhel7.3.qcow2 \
-device virtio-balloon-pci,id=virtio-balloon0,guest-stats-polling-interval=5 \
-m 2048 \
-smp 4,maxcpus=8,cores=4,threads=1,sockets=2 \
-cpu host \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :0 \
-boot order=cdn,once=c,menu=on,strict=off \
-enable-kvm \
-monitor stdio \
-qmp tcp:0:4444,server,nowait \
-netdev tap,id=hostnet,vhost=on \
-device virtio-net-pci,netdev=hostnet,id=virtio-net \
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |