Bug 1608226
Summary: | [virtual-network][mq] prompt warning "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio" when boot with win8+ guests with multi-queue | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yu Wang <wyu> | ||||||||
Component: | qemu-kvm-rhev | Assignee: | jason wang <jasowang> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Lei Yang <leiyang> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 7.6 | CC: | aadam, ailan, chayang, fdeutsch, gwatson, jasowang, jen, jomurphy, juzhang, maxime.coquelin, mjankula, mrezanin, mtessun, ngu, pezhang, rbarry, siliu, virt-bugs, virt-maint, wyu, ybendito, yturgema, yvugenfi | ||||||||
Target Milestone: | rc | Keywords: | Regression | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | qemu-kvm-rhev-2.12.0-33.el7 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1702608 (view as bug list) | Environment: | |||||||||
Last Closed: | 2019-08-22 09:18:48 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1649160, 1702608, 1824126 | ||||||||||
Attachments: |
|
Description
Yu Wang
2018-07-25 07:24:53 UTC
This only occurred when boot guest with multi-queue. On my setup: 4.13.9-200.fc26.x86_64 qemu upstream I can always reproduce the BSOD with vhsot on no matter what, even with build NetKVM build 144. (In reply to Sameeh Jubran from comment #4) > On my setup: > 4.13.9-200.fc26.x86_64 > qemu upstream > > I can always reproduce the BSOD with vhsot on no matter what, even with > build NetKVM build 144. Please ignore this comment is meant for BZ#1599631 hw/net/vhost_net.c +249: r = vhost_net_set_backend(&net->dev, &file); got errno r=-1 Any updates on this? Ok, so I've test with Linux guest. But I can't reproduce the issue. More questions: - Is this issue gone if you disable multiqueue? - Please paste the strace result. Thanks This bug does not happen on linux guests. If disable mq on win, the bug is gone. qemu-kvm-rhev-2.10.0-21.el7.x86_64, OK qemu-kvm-rhev-2.12.0-1.el7.x86_64, the bug appear Created attachment 1476368 [details]
strace output
And the output of strace is uploaded as an attachment. (In reply to xiywang from comment #15) > And the output of strace is uploaded as an attachment. Thanks, have an environment for me to try? I don't have windows guests. Thanks Ok, it looks to me that the driver does not set vring address correctly which hit the following check in vhost in vhost_net_set_backend(): /* Verify that ring has been setup correctly. */ if (!vhost_vq_access_ok(vq)) { r = -EFAULT; goto err_vq; } I guess this is a regression of guest driver? Xi yue, can you try with some earlier windows driver? Thanks (In reply to jason wang from comment #18) > Ok, it looks to me that the driver does not set vring address correctly > which hit the following check in vhost in vhost_net_set_backend(): > > /* Verify that ring has been setup correctly. */ > if (!vhost_vq_access_ok(vq)) { > r = -EFAULT; > goto err_vq; > } > > I guess this is a regression of guest driver? > > Xi yue, can you try with some earlier windows driver? > > Thanks I can't reproduce this, it seems to me that the issue is with the host since 7.5 works fine, but I could be wrong. Xi yue, Can you please try a known good build such as 144 such as jason requested? Moreover can you please reproduce while you run the following command line and provide me with trace.dat? sudo trace-cmd record -p function_graph -g vhost_vq_access_ok (you need to install trace-cmd using yum) Thanks :) Tested with virtio-win-1.9.4-2.el7.iso, the problem is still there. Here's the output of trace-cmd: # trace-cmd record -p function_graph -g vhost_vq_access_ok plugin 'function_graph' Hit Ctrl^C to stop recording CPU0 data recorded at offset=0x53f000 0 bytes in size CPU1 data recorded at offset=0x53f000 4096 bytes in size CPU2 data recorded at offset=0x540000 0 bytes in size CPU3 data recorded at offset=0x540000 0 bytes in size CPU4 data recorded at offset=0x540000 0 bytes in size CPU5 data recorded at offset=0x540000 0 bytes in size CPU6 data recorded at offset=0x540000 0 bytes in size CPU7 data recorded at offset=0x540000 0 bytes in size CPU8 data recorded at offset=0x540000 0 bytes in size CPU9 data recorded at offset=0x540000 0 bytes in size CPU10 data recorded at offset=0x540000 0 bytes in size CPU11 data recorded at offset=0x540000 0 bytes in size CPU12 data recorded at offset=0x540000 0 bytes in size CPU13 data recorded at offset=0x540000 4096 bytes in size CPU14 data recorded at offset=0x541000 0 bytes in size CPU15 data recorded at offset=0x541000 0 bytes in size CPU16 data recorded at offset=0x541000 0 bytes in size CPU17 data recorded at offset=0x541000 0 bytes in size CPU18 data recorded at offset=0x541000 0 bytes in size CPU19 data recorded at offset=0x541000 0 bytes in size CPU20 data recorded at offset=0x541000 0 bytes in size CPU21 data recorded at offset=0x541000 0 bytes in size CPU22 data recorded at offset=0x541000 0 bytes in size CPU23 data recorded at offset=0x541000 0 bytes in size CPU24 data recorded at offset=0x541000 0 bytes in size CPU25 data recorded at offset=0x541000 0 bytes in size CPU26 data recorded at offset=0x541000 0 bytes in size CPU27 data recorded at offset=0x541000 0 bytes in size CPU28 data recorded at offset=0x541000 0 bytes in size CPU29 data recorded at offset=0x541000 0 bytes in size CPU30 data recorded at offset=0x541000 0 bytes in size CPU31 data recorded at offset=0x541000 0 bytes in size Created attachment 1477120 [details]
trace.dat
(In reply to xiywang from comment #21) > Created attachment 1477120 [details] > trace.dat Since this still reproduces with the old build, it seems like this is a vhost issue / kernel issue. From the trace file that was attached - trace.dat - it seems that the path for vhost_vq_access_ok is not that informative as there is no function calls inside the vhost_vq_access_ok but it could be an issue with the command line which I am not aware of. (In reply to Sameeh Jubran from comment #22) > (In reply to xiywang from comment #21) > > Created attachment 1477120 [details] > > trace.dat > > Since this still reproduces with the old build, it seems like this is a > vhost issue / kernel issue. > > From the trace file that was attached - trace.dat - it seems that the path > for vhost_vq_access_ok is not that informative as there is no function calls > inside the vhost_vq_access_ok but it could be an issue with the command line > which I am not aware of. The calls inside vhost_vq_access_ok are static functions. It seems that trace doesn't log these function: https://lkml.org/lkml/2017/3/31/451 (In reply to Sameeh Jubran from comment #22) > (In reply to xiywang from comment #21) > > Created attachment 1477120 [details] > > trace.dat > > Since this still reproduces with the old build, it seems like this is a > vhost issue / kernel issue. > Probably, I just recall there's are some changes in 915. Xiyue, can you reproduce it on 914? Thanks (In reply to jason wang from comment #24) > (In reply to Sameeh Jubran from comment #22) > > (In reply to xiywang from comment #21) > > > Created attachment 1477120 [details] > > > trace.dat > > > > Since this still reproduces with the old build, it seems like this is a > > vhost issue / kernel issue. > > > > Probably, I just recall there's are some changes in 915. > > Xiyue, can you reproduce it on 914? > > Thanks Hi Jason, What is 914 and 915 you mentioned? Could you kindly inlight me? Thanks Jason, Sorry, I didn't download the build in time, could you provide it again? Thanks Hi Jason, Tested with the build you provided. The message as follows: # /usr/libexec/qemu-kvm -enable-kvm -m 6G -smp 4 -cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -drive file=157NICBLUE32CS9,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -vnc :0 -vga qxl -M pc -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,id=hostnet1,vhost=on,queues=4 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:0a:4b:1b:c3,mq=on,vectors=10 -monitor stdio QEMU 2.12.0 monitor - type 'help' for more information (qemu) start queue 0 used 7fba74cc3240 set vring addr! used 7fba541f7240 set vring addr! set backend! set backend! start queue 1 used 7fba54467240 set vring addr! used 7fba55689240 set vring addr! set backend! set backend! start queue 2 set backend! set backend! set backend! set backend! set backend! qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio start queue 0 used 7fba74cc3240 set vring addr! used 7fba541f7240 set vring addr! set backend! set backend! start queue 1 used 7fba54467240 set vring addr! used 7fba55689240 set vring addr! set backend! set backend! start queue 2 set backend! set backend! set backend! set backend! set backend! qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio So it looks like the vring addr is not set correctly before set status. Looks like a driver bug to me. Sameeh, any idea on this? Thanks (In reply to jason wang from comment #30) > So it looks like the vring addr is not set correctly before set status. > Looks like a driver bug to me. > > Sameeh, any idea on this? > > Thanks Hi Jason, I have taken a look at the driver code and that's what we do: * We ack all needed features * We create all queues * We set DRIVER OK in the status (VIRTIO_CONFIG_S_DRIVER_OK) * And then we send VIRTIO_NET_CTRL_MQ, VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET (In reply to Sameeh Jubran from comment #32) > (In reply to jason wang from comment #30) > > So it looks like the vring addr is not set correctly before set status. > > Looks like a driver bug to me. > > > > Sameeh, any idea on this? > > > > Thanks > > Hi Jason, > > I have taken a look at the driver code and that's what we do: > * We ack all needed features > * We create all queues > * We set DRIVER OK in the status (VIRTIO_CONFIG_S_DRIVER_OK) > * And then we send VIRTIO_NET_CTRL_MQ, VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET Jason do you think this sequence is buggy?? Xiyue, Can you please provide me with trace log of enabling the driver and disabling it. Please find info on how to collect the traces in the following document: https://github.com/virtio-win/kvm-guest-drivers-windows/blob/master/NetKVM/Documentation/Tracing.md Thanks! (In reply to Sameeh Jubran from comment #33) > (In reply to Sameeh Jubran from comment #32) > > (In reply to jason wang from comment #30) > > > So it looks like the vring addr is not set correctly before set status. > > > Looks like a driver bug to me. > > > > > > Sameeh, any idea on this? > > > > > > Thanks > > > > Hi Jason, > > > > I have taken a look at the driver code and that's what we do: > > * We ack all needed features > > * We create all queues > > * We set DRIVER OK in the status (VIRTIO_CONFIG_S_DRIVER_OK) > > * And then we send VIRTIO_NET_CTRL_MQ, VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET > > Jason do you think this sequence is buggy?? Looks ok, but need to make sure the queue address was set before set status. E.g according the above log The queue address of queue 0 and queue 1 were set. But queue 2 was not set before DRIVER_OK. Thanks Let's wait for the NetKVM traces as they should provide specific information on failures in the driver. Created attachment 1483201 [details] TraceView.exe with win8.1_32_netkvm (In reply to Sameeh Jubran from comment #34) > Xiyue, > > Can you please provide me with trace log of enabling the driver and > disabling it. > > Please find info on how to collect the traces in the following document: > > https://github.com/virtio-win/kvm-guest-drivers-windows/blob/master/NetKVM/ > Documentation/Tracing.md > > Thanks! Hi Sameeh, Thanks for your instructions about trace. The trace file is attached. Versions: 3.10.0-945.el7.x86_64 qemu-kvm-rhev-2.12.0-15.el7.x86_64 virtio-win-1.9.6-1.el7.noarch Win8.1 Guest Best regards, Pei The problem is that qemu:virtio_net.c optimistically requests vhost to start all the queues (even those that the guest will not use). Driver on Linux has all the queues (max_queues) allocated, the driver on Windows - only those it will use. Such a way, if Windows guest uses less than max queues QEMU actually disables vhost. Posted a patch for qemu:virtio-net.c for review Fix included in qemu-kvm-rhev-2.12.0-33.el7 ==Steps: 1.Boot win8_guest with multi-queues. Qemu cli: /usr/libexec/qemu-kvm \ -enable-kvm \ -m 6G \ -smp 4 \ -cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff \ -drive file=/home/win8-32-scsi.qcow2,if=none,id=drive-ide0-0-0,format=qcow2,cache=none \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ -vnc :0 \ -vga qxl \ -M pc \ -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,id=hostnet1,vhost=on,queues=4 \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:0a:4b:1b:c3,mq=on,vectors=10 \ -monitor stdio \ ==Reproduced with qemu-img-rhev-2.12.0-7.el7.x86_64 After steps 1, (qemu) qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio So this bug has been reproduced. ==Verified with qemu-img-rhev-2.12.0-33.el7.x86_64.rpm. After step 1,(qemu) doesn't have any error info. Guest works well after ping/reboot/shutdwon test. So this bug has been fixed very well. Move to 'VERIFIED'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2553 |