Bug 1608226

Summary: [virtual-network][mq] prompt warning "qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio" when boot with win8+ guests with multi-queue
Product: Red Hat Enterprise Linux 7 Reporter: Yu Wang <wyu>
Component: qemu-kvm-rhevAssignee: jason wang <jasowang>
Status: CLOSED ERRATA QA Contact: Lei Yang <leiyang>
Severity: high Docs Contact:
Priority: high    
Version: 7.6CC: aadam, ailan, chayang, fdeutsch, gwatson, jasowang, jen, jomurphy, juzhang, maxime.coquelin, mjankula, mrezanin, mtessun, ngu, pezhang, rbarry, siliu, virt-bugs, virt-maint, wyu, ybendito, yturgema, yvugenfi
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-33.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1702608 (view as bug list) Environment:
Last Closed: 2019-08-22 09:18:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1649160, 1702608, 1824126    
Attachments:
Description Flags
strace output
none
trace.dat
none
TraceView.exe with win8.1_32_netkvm none

Description Yu Wang 2018-07-25 07:24:53 UTC
Description of problem:


Version-Release number of selected component (if applicable):
Guest:win8+ guests
qemu-img-rhev-2.12.0-7.el7.x86_64
virtio-win-prewhql-157
kernel-3.10.0-920.el7.x86_64
seabios-bin-1.11.0-2.el7.noarch


How reproducible:
100%

Steps to Reproduce:
1. boot guest
/usr/libexec/qemu-kvm -name 157NICBLUE32CS9 -enable-kvm -m 6G -smp 4 -uuid 582b39b2-de68-48c6-8e17-c4f0860d2676 -nodefconfig -nodefaults -cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -chardev socket,id=charmonitor,path=/tmp/157NICBLUE32CS9,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb -drive file=157NICBLUE32CS9,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/home/kvm_autotest_root/iso/ISO/Win8.1/en_windows_8.1_enterprise_with_update_x86_dvd_6050710.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=157NICBLUE32CS9.vfd,if=floppy,id=drive-fdc0-0-0,format=raw,cache=none -netdev tap,script=/etc/qemu-ifup1,downscript=no,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:52:71:47:16:b1 -device usb-tablet,id=input0 -vnc 0.0.0.0:2 -vga std -M pc -netdev tap,script=/etc/qemu-ifup-private,downscript=no,id=hostnet1,vhost=on,queues=4 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:0a:4b:1b:c3,bus=pci.0,mq=on,vectors=10 -monitor stdio

2.
3.

Actual results:
qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio
qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio

Expected results:
no warning

Additional info:
1 cannot reproduce with rhel7.5 release build(qemu-kvm-rhev-2.10.0-21.el7.x86_64), so it is a regression.

Comment 3 Yu Wang 2018-07-25 10:04:37 UTC
This only occurred when boot guest with multi-queue.

Comment 4 Sameeh Jubran 2018-07-25 15:16:24 UTC
On my setup:
 4.13.9-200.fc26.x86_64
 qemu upstream 

I can always reproduce the BSOD with vhsot on no matter what, even with build NetKVM build 144.

Comment 5 Sameeh Jubran 2018-07-25 22:52:43 UTC
(In reply to Sameeh Jubran from comment #4)
> On my setup:
>  4.13.9-200.fc26.x86_64
>  qemu upstream 
> 
> I can always reproduce the BSOD with vhsot on no matter what, even with
> build NetKVM build 144.

Please ignore this comment is meant for BZ#1599631

Comment 6 xiywang 2018-07-26 09:57:07 UTC
hw/net/vhost_net.c +249:
r = vhost_net_set_backend(&net->dev, &file); got errno r=-1

Comment 10 jason wang 2018-08-13 06:26:00 UTC
Any updates on this?

Comment 12 jason wang 2018-08-15 09:02:35 UTC
Ok, so I've test with Linux guest. But I can't reproduce the issue.

More questions:

- Is this issue gone if you disable multiqueue?
- Please paste the strace result.

Thanks

Comment 13 xiywang 2018-08-16 07:40:25 UTC
This bug does not happen on linux guests.

If disable mq on win, the bug is gone.
 
qemu-kvm-rhev-2.10.0-21.el7.x86_64, OK
qemu-kvm-rhev-2.12.0-1.el7.x86_64, the bug appear

Comment 14 xiywang 2018-08-16 07:48:43 UTC
Created attachment 1476368 [details]
strace output

Comment 15 xiywang 2018-08-16 07:50:17 UTC
And the output of strace is uploaded as an attachment.

Comment 16 jason wang 2018-08-16 08:26:31 UTC
(In reply to xiywang from comment #15)
> And the output of strace is uploaded as an attachment.

Thanks, have an environment for me to try? I don't have windows guests.

Thanks

Comment 18 jason wang 2018-08-17 04:04:26 UTC
Ok, it looks to me that the driver does not set vring address correctly which hit the following check in vhost in vhost_net_set_backend():

	/* Verify that ring has been setup correctly. */
	if (!vhost_vq_access_ok(vq)) {
		r = -EFAULT;
		goto err_vq;
	}

I guess this is a regression of guest driver?

Xi yue, can you try with some earlier windows driver?

Thanks

Comment 19 Sameeh Jubran 2018-08-18 23:25:57 UTC
(In reply to jason wang from comment #18)
> Ok, it looks to me that the driver does not set vring address correctly
> which hit the following check in vhost in vhost_net_set_backend():
> 
> 	/* Verify that ring has been setup correctly. */
> 	if (!vhost_vq_access_ok(vq)) {
> 		r = -EFAULT;
> 		goto err_vq;
> 	}
> 
> I guess this is a regression of guest driver?
> 
> Xi yue, can you try with some earlier windows driver?
> 
> Thanks

I can't reproduce this, it seems to me that the issue is with the host since 7.5 works fine, but I could be wrong.

Xi yue,
Can you please try a known good build such as 144 such as jason requested? Moreover can you please reproduce while you run the following command line and provide me with trace.dat?

sudo trace-cmd record -p function_graph -g vhost_vq_access_ok
(you need to install trace-cmd using yum)

Thanks :)

Comment 20 xiywang 2018-08-20 09:44:47 UTC
Tested with virtio-win-1.9.4-2.el7.iso, the problem is still there.

Here's the output of trace-cmd:
# trace-cmd record -p function_graph -g vhost_vq_access_ok
  plugin 'function_graph'
Hit Ctrl^C to stop recording
CPU0 data recorded at offset=0x53f000
    0 bytes in size
CPU1 data recorded at offset=0x53f000
    4096 bytes in size
CPU2 data recorded at offset=0x540000
    0 bytes in size
CPU3 data recorded at offset=0x540000
    0 bytes in size
CPU4 data recorded at offset=0x540000
    0 bytes in size
CPU5 data recorded at offset=0x540000
    0 bytes in size
CPU6 data recorded at offset=0x540000
    0 bytes in size
CPU7 data recorded at offset=0x540000
    0 bytes in size
CPU8 data recorded at offset=0x540000
    0 bytes in size
CPU9 data recorded at offset=0x540000
    0 bytes in size
CPU10 data recorded at offset=0x540000
    0 bytes in size
CPU11 data recorded at offset=0x540000
    0 bytes in size
CPU12 data recorded at offset=0x540000
    0 bytes in size
CPU13 data recorded at offset=0x540000
    4096 bytes in size
CPU14 data recorded at offset=0x541000
    0 bytes in size
CPU15 data recorded at offset=0x541000
    0 bytes in size
CPU16 data recorded at offset=0x541000
    0 bytes in size
CPU17 data recorded at offset=0x541000
    0 bytes in size
CPU18 data recorded at offset=0x541000
    0 bytes in size
CPU19 data recorded at offset=0x541000
    0 bytes in size
CPU20 data recorded at offset=0x541000
    0 bytes in size
CPU21 data recorded at offset=0x541000
    0 bytes in size
CPU22 data recorded at offset=0x541000
    0 bytes in size
CPU23 data recorded at offset=0x541000
    0 bytes in size
CPU24 data recorded at offset=0x541000
    0 bytes in size
CPU25 data recorded at offset=0x541000
    0 bytes in size
CPU26 data recorded at offset=0x541000
    0 bytes in size
CPU27 data recorded at offset=0x541000
    0 bytes in size
CPU28 data recorded at offset=0x541000
    0 bytes in size
CPU29 data recorded at offset=0x541000
    0 bytes in size
CPU30 data recorded at offset=0x541000
    0 bytes in size
CPU31 data recorded at offset=0x541000
    0 bytes in size

Comment 21 xiywang 2018-08-20 09:48:03 UTC
Created attachment 1477120 [details]
trace.dat

Comment 22 Sameeh Jubran 2018-08-20 12:24:26 UTC
(In reply to xiywang from comment #21)
> Created attachment 1477120 [details]
> trace.dat

Since this still reproduces with the old build, it seems like this is a vhost issue / kernel issue.

From the trace file that was attached - trace.dat - it seems that the path for vhost_vq_access_ok is not that informative as there is no function calls inside the vhost_vq_access_ok but it could be an issue with the command line which I am not aware of.

Comment 23 Sameeh Jubran 2018-08-20 12:28:29 UTC
(In reply to Sameeh Jubran from comment #22)
> (In reply to xiywang from comment #21)
> > Created attachment 1477120 [details]
> > trace.dat
> 
> Since this still reproduces with the old build, it seems like this is a
> vhost issue / kernel issue.
> 
> From the trace file that was attached - trace.dat - it seems that the path
> for vhost_vq_access_ok is not that informative as there is no function calls
> inside the vhost_vq_access_ok but it could be an issue with the command line
> which I am not aware of.

The calls inside vhost_vq_access_ok are static functions. It seems that trace doesn't log these function:

https://lkml.org/lkml/2017/3/31/451

Comment 24 jason wang 2018-08-21 05:31:12 UTC
(In reply to Sameeh Jubran from comment #22)
> (In reply to xiywang from comment #21)
> > Created attachment 1477120 [details]
> > trace.dat
> 
> Since this still reproduces with the old build, it seems like this is a
> vhost issue / kernel issue.
> 

Probably, I just recall there's are some changes in 915.

Xiyue, can you reproduce it on 914?

Thanks

Comment 25 xiywang 2018-08-21 08:17:56 UTC
(In reply to jason wang from comment #24)
> (In reply to Sameeh Jubran from comment #22)
> > (In reply to xiywang from comment #21)
> > > Created attachment 1477120 [details]
> > > trace.dat
> > 
> > Since this still reproduces with the old build, it seems like this is a
> > vhost issue / kernel issue.
> > 
> 
> Probably, I just recall there's are some changes in 915.
> 
> Xiyue, can you reproduce it on 914?
> 
> Thanks

Hi Jason,

What is 914 and 915 you mentioned? Could you kindly inlight me?

Thanks

Comment 27 xiywang 2018-08-30 09:55:17 UTC
Jason,

Sorry, I didn't download the build in time, could you provide it again?

Thanks

Comment 29 xiywang 2018-09-06 02:36:36 UTC
Hi Jason,

Tested with the build you provided. The message as follows:

# /usr/libexec/qemu-kvm -enable-kvm -m 6G -smp 4 -cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -drive file=157NICBLUE32CS9,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -vnc :0 -vga qxl -M pc -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,id=hostnet1,vhost=on,queues=4 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:0a:4b:1b:c3,mq=on,vectors=10 -monitor stdio
QEMU 2.12.0 monitor - type 'help' for more information
(qemu) start queue 0
used 7fba74cc3240 
set vring addr!
used 7fba541f7240 
set vring addr!
set backend!
set backend!
start queue 1
used 7fba54467240 
set vring addr!
used 7fba55689240 
set vring addr!
set backend!
set backend!
start queue 2
set backend!
set backend!
set backend!
set backend!
set backend!
qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio
start queue 0
used 7fba74cc3240 
set vring addr!
used 7fba541f7240 
set vring addr!
set backend!
set backend!
start queue 1
used 7fba54467240 
set vring addr!
used 7fba55689240 
set vring addr!
set backend!
set backend!
start queue 2
set backend!
set backend!
set backend!
set backend!
set backend!
qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio

Comment 30 jason wang 2018-09-06 03:00:30 UTC
So it looks like the vring addr is not set correctly before set status. Looks like a driver bug to me.

Sameeh, any idea on this?

Thanks

Comment 32 Sameeh Jubran 2018-09-12 08:53:12 UTC
(In reply to jason wang from comment #30)
> So it looks like the vring addr is not set correctly before set status.
> Looks like a driver bug to me.
> 
> Sameeh, any idea on this?
> 
> Thanks

Hi Jason,

I have taken a look at the driver code and that's what we do:
* We ack all needed features
* We create all queues
* We set DRIVER OK in the status (VIRTIO_CONFIG_S_DRIVER_OK)
* And then we send VIRTIO_NET_CTRL_MQ, VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET

Comment 33 Sameeh Jubran 2018-09-12 09:04:21 UTC
(In reply to Sameeh Jubran from comment #32)
> (In reply to jason wang from comment #30)
> > So it looks like the vring addr is not set correctly before set status.
> > Looks like a driver bug to me.
> > 
> > Sameeh, any idea on this?
> > 
> > Thanks
> 
> Hi Jason,
> 
> I have taken a look at the driver code and that's what we do:
> * We ack all needed features
> * We create all queues
> * We set DRIVER OK in the status (VIRTIO_CONFIG_S_DRIVER_OK)
> * And then we send VIRTIO_NET_CTRL_MQ, VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET

Jason do you think this sequence is buggy??

Comment 34 Sameeh Jubran 2018-09-12 09:11:22 UTC
Xiyue,

Can you please provide me with trace log of enabling the driver and disabling it. 

Please find info on how to collect the traces in the following document:

https://github.com/virtio-win/kvm-guest-drivers-windows/blob/master/NetKVM/Documentation/Tracing.md

Thanks!

Comment 35 jason wang 2018-09-12 10:06:46 UTC
(In reply to Sameeh Jubran from comment #33)
> (In reply to Sameeh Jubran from comment #32)
> > (In reply to jason wang from comment #30)
> > > So it looks like the vring addr is not set correctly before set status.
> > > Looks like a driver bug to me.
> > > 
> > > Sameeh, any idea on this?
> > > 
> > > Thanks
> > 
> > Hi Jason,
> > 
> > I have taken a look at the driver code and that's what we do:
> > * We ack all needed features
> > * We create all queues
> > * We set DRIVER OK in the status (VIRTIO_CONFIG_S_DRIVER_OK)
> > * And then we send VIRTIO_NET_CTRL_MQ, VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> 
> Jason do you think this sequence is buggy??

Looks ok, but need to make sure the queue address was set before set status.

E.g according the above log

The queue address of queue 0 and queue 1 were set. But queue 2 was not set before DRIVER_OK.

Thanks

Comment 36 Sameeh Jubran 2018-09-12 11:10:48 UTC
Let's wait for the NetKVM traces as they should provide specific information on failures in the driver.

Comment 38 Pei Zhang 2018-09-14 06:56:39 UTC
Created attachment 1483201 [details]
TraceView.exe with win8.1_32_netkvm

(In reply to Sameeh Jubran from comment #34)
> Xiyue,
> 
> Can you please provide me with trace log of enabling the driver and
> disabling it. 
> 
> Please find info on how to collect the traces in the following document:
> 
> https://github.com/virtio-win/kvm-guest-drivers-windows/blob/master/NetKVM/
> Documentation/Tracing.md
> 
> Thanks!

Hi Sameeh,

Thanks for your instructions about trace. The trace file is attached.

Versions:
3.10.0-945.el7.x86_64
qemu-kvm-rhev-2.12.0-15.el7.x86_64
virtio-win-1.9.6-1.el7.noarch
Win8.1 Guest


Best regards,
Pei

Comment 47 ybendito 2019-02-13 14:52:27 UTC
The problem is that qemu:virtio_net.c optimistically requests vhost to start all the queues (even those that the guest will not use).
Driver on Linux has all the queues (max_queues) allocated, the driver on Windows - only those it will use.
Such a way, if Windows guest uses less than max queues QEMU actually disables vhost.

Posted a patch for qemu:virtio-net.c for review

Comment 64 Miroslav Rezanina 2019-06-20 09:41:53 UTC
Fix included in qemu-kvm-rhev-2.12.0-33.el7

Comment 66 Lei Yang 2019-06-21 09:49:52 UTC
==Steps:

1.Boot win8_guest with multi-queues.

Qemu cli:
/usr/libexec/qemu-kvm \
-enable-kvm \
-m 6G \
-smp 4 \
-cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff \
-drive file=/home/win8-32-scsi.qcow2,if=none,id=drive-ide0-0-0,format=qcow2,cache=none \
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \
-vnc :0 \
-vga qxl \
-M pc \
-netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,id=hostnet1,vhost=on,queues=4 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:0a:4b:1b:c3,mq=on,vectors=10 \
-monitor stdio \

==Reproduced with qemu-img-rhev-2.12.0-7.el7.x86_64

After steps 1,
(qemu) qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio
qemu-kvm: unable to start vhost net: 14: falling back on userspace virtio

So this bug has been reproduced.

==Verified with qemu-img-rhev-2.12.0-33.el7.x86_64.rpm.

After step 1,(qemu) doesn't have any error info. Guest works well after ping/reboot/shutdwon test.

So this bug has been fixed very well. Move to 'VERIFIED'.

Comment 75 errata-xmlrpc 2019-08-22 09:18:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2553