Bug 1792683

Summary: [RHEL8]packed=on: guest fails to recover receiving packets after vhost-user reconnect
Product: Red Hat Enterprise Linux 8 Reporter: Pei Zhang <pezhang>
Component: qemu-kvmAssignee: Eugenio Pérez Martín <eperezma>
qemu-kvm sub component: Networking QA Contact: Yanghang Liu <yanghliu>
Status: CLOSED WONTFIX Docs Contact: Daniel Vozenilek <davozeni>
Severity: medium    
Priority: medium CC: aadam, ailan, amorenoz, chayang, eperezma, jherrman, jinzhao, juzhang, virt-maint, yanghliu
Version: 8.2Keywords: Reopened, Triaged
Target Milestone: rc   
Target Release: 8.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.Restarting the OVS service on a host might block network connectivity on its running VMs When the Open vSwitch (OVS) service restarts or crashes on a host, virtual machines (VMs) that are running on this host cannot recover the state of the networking device. As a consequence, VMs might be completely unable to receive packets. This problem only affects systems that use the packed virtqueue format in their `virtio` networking stack. To work around this problem, use the `packed=off` parameter in the `virtio` networking device definition to disable packed virtqueue. With packed virtqueue disabled, the state of the networking device can, in some situations, be recovered from RAM.
Story Points: ---
Clone Of:
: 1947422 (view as bug list) Environment:
Last Closed: 2022-05-02 07:27:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1897025, 1947422    
Attachments:
Description Flags
Full XML none

Description Pei Zhang 2020-01-19 05:18:27 UTC
Created attachment 1653520 [details]
Full XML

Description of problem:
Boot guest with ovs+vhost-user packed=on+dpdk. Then re-connect vhost-user by re-starting ovs, testpmd in guest fails to recover receiving packets.

Version-Release number of selected component (if applicable):
4.18.0-170.el8.x86_64
qemu-kvm-4.2.0-6.module+el8.2.0+5453+31b2b136.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot ovs, refer to [1]

# sh  boot_ovs_client.sh

2. Boot guest with vhost-user packed=on, full cmd refer to [2]

-chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \
-netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \
-device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \
-chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \
-netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \
-device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \

3. Start testpmd in guest and start Moongen in another host, guest can receive packets, refer to[3]

4. Re-connect vhost-user by re-start ovs

# sh  boot_ovs_client.sh

5. Check testpmd in guest, the packets receiving can not recover, refer to[5]

Actual results:
dpdk packets receiving can not recover after vhost-user reconnect.

Expected results:
dpdk packets receiving should recover well after vhost-user reconnect.


Additional info:
1. 


Reference:
[1]
# cat boot_ovs_client.sh 
#!/bin/bash

set -e

echo "killing old ovs process"
pkill -f ovs-vswitchd || true
sleep 5
pkill -f ovsdb-server || true

echo "probing ovs kernel module"
modprobe -r openvswitch || true
modprobe openvswitch

echo "clean env"
DB_FILE=/etc/openvswitch/conf.db
rm -rf /var/run/openvswitch
mkdir /var/run/openvswitch
rm -f $DB_FILE

echo "init ovs db and boot db server"
export DB_SOCK=/var/run/openvswitch/db.sock
ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema
ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file
ovs-vsctl --no-wait init

echo "start ovs vswitch daemon"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1"
ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true
ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log

echo "creating bridge and ports"

ovs-vsctl --if-exists del-br ovsbr0
ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:5e:00.0 
ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock
ovs-ofctl del-flows ovsbr0
ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2"
ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1"

ovs-vsctl --if-exists del-br ovsbr1
ovs-vsctl add-br ovsbr1 -- set bridge ovsbr1 datapath_type=netdev
ovs-vsctl add-port ovsbr1 dpdk1 -- set Interface dpdk1 type=dpdk options:dpdk-devargs=0000:5e:00.1
ovs-vsctl add-port ovsbr1 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser1.sock
ovs-ofctl del-flows ovsbr1
ovs-ofctl add-flow ovsbr1 "in_port=1,idle_timeout=0 actions=output:2"
ovs-ofctl add-flow ovsbr1 "in_port=2,idle_timeout=0 actions=output:1"

ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x1
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x154
ovs-vsctl set Interface dpdk0 options:n_rxq=2
ovs-vsctl set Interface dpdk1 options:n_rxq=2

echo "all done"

[2]
# cat qemu.sh 
/usr/libexec/qemu-kvm \
-name guest=rhel8.2 \
-machine q35,kernel_irqchip=split \
-cpu host \
-m 8192 \
-smp 6,sockets=6,cores=1,threads=1 \
-device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/images_nfv-virt-rt-kvm/rhel8.2.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,id=hostnet0 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:11:22:5f:dd:01,bus=pci.3,addr=0x0 \
-chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \
-netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \
-device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \
-chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \
-netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \
-device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \
-monitor stdio \
-vnc :2 \
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=8589934592,host-nodes=0,policy=bind \
-numa node,memdev=ram-node0 \

[3]
/usr/bin/testpmd \
	-l 1,2,3,4,5 \
	-n 4 \
	-d /usr/lib64/librte_pmd_virtio.so \
	-w 0000:06:00.0 -w 0000:07:00.0 \
	-- \
	--nb-cores=4 \
	-i \
	--disable-rss \
	--rxd=512 --txd=512 \
	--rxq=2 --txq=2

testpmd> show port stats all 

  ######################## NIC statistics for port 0  ########################
  RX-packets: 1363219    RX-missed: 0          RX-bytes:  81793140
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 1165880    TX-errors: 0          TX-bytes:  69952800

  Throughput (since last show)
  Rx-pps:       146529          Rx-bps:     70334000
  Tx-pps:       125308          Tx-bps:     60148048
  ############################################################################

  ######################## NIC statistics for port 1  ########################
  RX-packets: 1167088    RX-missed: 0          RX-bytes:  70025280
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 1362017    TX-errors: 0          TX-bytes:  81721020

  Throughput (since last show)
  Rx-pps:       125309          Rx-bps:     60148432
  Tx-pps:       146531          Tx-bps:     70334984
  ############################################################################


[5]
testpmd> show port stats all 

  ######################## NIC statistics for port 0  ########################
  RX-packets: 8825386    RX-missed: 0          RX-bytes:  529524798
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 7548095    TX-errors: 0          TX-bytes:  452887338

  Throughput (since last show)
  Rx-pps:            0          Rx-bps:            0
  Tx-pps:            0          Tx-bps:            0
  ############################################################################

  ######################## NIC statistics for port 1  ########################
  RX-packets: 7549297    RX-missed: 0          RX-bytes:  452959458
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 8824177    TX-errors: 0          TX-bytes:  529452258

  Throughput (since last show)
  Rx-pps:            0          Rx-bps:            0
  Tx-pps:            0          Tx-bps:            0
  ############################################################################

Comment 3 Ademar Reis 2020-02-05 23:13:18 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 6 Pei Zhang 2021-02-01 10:55:04 UTC
This issue still exists with latest 8.4-av.

Versions:
4.18.0-278.rt7.43.el8.dt4.x86_64
qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.x86_64
tuned-2.15.0-1.el8.noarch
libvirt-7.0.0-3.module+el8.4.0+9709+a99efd61.x86_64
python3-libvirt-6.10.0-1.module+el8.4.0+8948+a39b3f3a.x86_64
openvswitch2.13-2.13.0-86.el8fdp.x86_64
dpdk-20.11-1.el8.x86_64

Comment 13 John Ferlan 2021-09-09 12:42:03 UTC
Bulk update: Move RHEL-AV bugs to RHEL8 with existing RHEL9 clone.

Comment 14 RHEL Program Management 2021-11-01 07:27:01 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 18 RHEL Program Management 2022-05-02 07:27:17 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 19 Yanghang Liu 2022-07-22 15:49:32 UTC
Hi  Eugenio,

May I ask if you have plan to fix this bug in the rhel8 ?

If yes, I will re-open this bug.

Comment 20 Eugenio Pérez Martín 2022-07-22 16:00:24 UTC
(In reply to Yanghang Liu from comment #19)
> Hi  Eugenio,
> 
> May I ask if you have plan to fix this bug in the rhel8 ?
> 
> If yes, I will re-open this bug.

I think it is not worth to fix for rhel8,

It cannot be considered as a regression (it never worked for packed in old versions) and it needs to introduce other features also complicated for rhel8 (inflight_fd).

I'll let it WONTFIX unless some other has a different opinion.

Thanks!

Comment 21 Yanghang Liu 2022-07-24 13:18:15 UTC
(In reply to Eugenio Pérez Martín from comment #20)
> (In reply to Yanghang Liu from comment #19)
> > Hi  Eugenio,
> > 
> > May I ask if you have plan to fix this bug in the rhel8 ?
> > 
> > If yes, I will re-open this bug.
> 
> I think it is not worth to fix for rhel8,
> 
> It cannot be considered as a regression (it never worked for packed in old
> versions) and it needs to introduce other features also complicated for
> rhel8 (inflight_fd).
> 
> I'll let it WONTFIX unless some other has a different opinion.
> 
> Thanks!

Thanks Eugenio for the confirmation.

Mark qe_test_coverage- first

Comment 22 Daniel Vozenilek 2022-10-24 13:20:44 UTC
Hi Eugenio,

our docs team marked this issue as a release note candidate for RHEL 8.7.
I'm not sure I understand all the details about this issue. However, I've prepared this first rough RN draft:

"""
.Restarting the OVS service on a host might limit network connectivity on its running guests

Restarting the Open vSwitch (OVS) service on a host might affect network connectivity on all its running guests. Guests might not be able to receive packets at all.
"""

I don't know if it's accurate and I could use your help with adding additional details. 

I assume this isssue affects only vhost-user library users? Is there some workaround for this issue?

Thanks,
Daniel

Comment 23 Eugenio Pérez Martín 2022-10-24 14:15:38 UTC
(In reply to Daniel Vozenilek from comment #22)
> Hi Eugenio,
> 
> our docs team marked this issue as a release note candidate for RHEL 8.7.
> I'm not sure I understand all the details about this issue. However, I've
> prepared this first rough RN draft:
> 
> """
> .Restarting the OVS service on a host might limit network connectivity on
> its running guests
> 
> Restarting the Open vSwitch (OVS) service on a host might affect network
> connectivity on all its running guests. Guests might not be able to receive
> packets at all.
> """
> 
> I don't know if it's accurate and I could use your help with adding
> additional details.

Sure, anytime.

This issue only happens if the packed virtqueue is in use. Packed vq is the
new virtqueue format, that comes with better performance and less stress on
memory bus but at the cost of not being possible to recover the state if
OVS crash. We have a blog covering that topic if you want to dig deeper [1].

> 
> I assume this isssue affects only vhost-user library users? Is there some
> workaround for this issue?
> 

I think yes, only vhost-user is affected because vhost-net does not support
packed vq at the moment.

Regarding the workaround it should be possible to make the guest use always
split virtqueue, the format before packed vq. QEMU should be able to recover
the state in some situations from a split virtqueue, but not all of them
are covered. The way to do it in libvirt is with the "packed" virtio option [2].

Please let me know if you need more information.

[1] https://www.redhat.com/en/blog/packed-virtqueue-how-reduce-overhead-virtio
[2] https://libvirt.org/formatdomain.html#virtio-related-options

> Thanks,
> Daniel

Comment 24 Daniel Vozenilek 2022-10-27 17:52:24 UTC
Thanks for the info Eugenio.
I improved the RN description and included your workaround suggestion. Can you let me know if the description looks good to you now?

"""
.Restarting the OVS service on a host might block network connectivity on its running VMs

When the Open vSwitch (OVS) service restarts or crashes on a host, VMs that are running on this host cannot recover the state of the networking device. As a consequence, VMs might not be able to receive packets at all.

This problem only affects systems that use the packed virtqueue format in their virtio networking stack.

To work around this problem, use the `packed=off` parameter in the virtio networking device definition to disable packed virtqueue. With packed virtqueue disabled, the state of the networking device can, in some situations, be recovered from memory.
"""

Comment 25 Eugenio Pérez Martín 2022-10-31 11:59:16 UTC
(In reply to Daniel Vozenilek from comment #24)
> Thanks for the info Eugenio.
> I improved the RN description and included your workaround suggestion. Can
> you let me know if the description looks good to you now?
> 
> """
> .Restarting the OVS service on a host might block network connectivity on
> its running VMs
> 
> When the Open vSwitch (OVS) service restarts or crashes on a host, VMs that
> are running on this host cannot recover the state of the networking device.
> As a consequence, VMs might not be able to receive packets at all.
> 
> This problem only affects systems that use the packed virtqueue format in
> their virtio networking stack.
> 
> To work around this problem, use the `packed=off` parameter in the virtio
> networking device definition to disable packed virtqueue. With packed
> virtqueue disabled, the state of the networking device can, in some
> situations, be recovered from memory.
> """

Yes, LGTM.

Comment 26 Daniel Vozenilek 2022-11-01 11:41:14 UTC
Thank you for the review Eugenio.

Thank you Jiri Herrmann for your peer review: https://docs.google.com/document/d/1MzZHuyTob606l0IFbBkRUS7BBmwrLT5CGZRF7HLNpaY/edit

I implemented your suggestions and marked the RN as ready for publication.