Bug 1947422
| Summary: | [RHEL9]packed=on: guest fails to recover receiving packets after vhost-user reconnect | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Pei Zhang <pezhang> |
| Component: | qemu-kvm | Assignee: | Eugenio Pérez Martín <eperezma> |
| qemu-kvm sub component: | Networking | QA Contact: | Yanghang Liu <yanghliu> |
| Status: | CLOSED MIGRATED | Docs Contact: | Daniel Vozenilek <davozeni> |
| Severity: | medium | ||
| Priority: | medium | CC: | aadam, ailan, amorenoz, chayang, eperezma, jherrman, jinzhao, juzhang, lvivier, virt-maint |
| Version: | 9.0 | Keywords: | MigratedToJIRA, Triaged |
| Target Milestone: | beta | Flags: | lvivier:
mirror-
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: |
.Restarting the OVS service on a host might block network connectivity on its running VMs
When the Open vSwitch (OVS) service restarts or crashes on a host, virtual machines (VMs) that are running on this host cannot recover the state of the networking device. As a consequence, VMs might be completely unable to receive packets.
This problem only affects systems that use the packed virtqueue format in their `virtio` networking stack.
To work around this problem, use the `packed=off` parameter in the `virtio` networking device definition to disable packed virtqueue. With packed virtqueue disabled, the state of the networking device can, in some situations, be recovered from RAM.
|
Story Points: | --- |
| Clone Of: | 1792683 | Environment: | |
| Last Closed: | 2023-04-08 07:28:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1792683 | ||
| Bug Blocks: | 1897025 | ||
|
Description
Pei Zhang
2021-04-08 13:02:04 UTC
Eugenio, if you don't plan to fix this issue in 9.1.0, could you set the ITR to 9.2.0 or '---' (backlog). Thanks (In reply to Laurent Vivier from comment #5) > Eugenio, > > if you don't plan to fix this issue in 9.1.0, could you set the ITR to 9.2.0 > or '---' (backlog). > > Thanks Moving to ITR 9.2.0, as there are still a few features needed to make this work. Thanks! This problem can be reproduced in the following test env: 5.14.0-133.el9.x86_64 qemu-kvm-7.0.0-9.el9.x86_64 dpdk-21.11-1.el9_0.x86_64 libvirt-8.5.0-2.el9.x86_64 The detailed test step: (1) setup the first host's test env grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` echo "isolated_cores=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11" >> /etc/tuned/cpu-partitioning-variables.conf tuned-adm profile cpu-partitioning reboot echo 20 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages echo 20 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages modprobe vfio modprobe vfio-pci dpdk-devbind.py --bind=vfio-pci 0000:5e:00.0 dpdk-devbind.py --bind=vfio-pci 0000:5e:00.1 (3) start a testpmd on the first host # dpdk-testpmd -l 2,4,6,8,10 --socket-mem 1024,1024 -n 4 --vdev 'net_vhost0,iface=/tmp/vhost-user1,queues=1,client=1,iommu-support=1' --vdev 'net_vhost1,iface=/tmp/vhost-user2,queues=1,client=1,iommu-support=1' -b 0000:3b:00.0 -b 0000:3b:00.1 -d /usr/lib64/librte_net_vhost.so -- --portmask=f -i --rxd=512 --txd=512 --rxq=1 --txq=1 --nb-cores=4 --forward-mode=io testpmd> set portlist 0,2,1,3 testpmd> start (4) start a vm with two packed=on vhostuser interfaces [1][2] # virt-install --graphics type=vnc,listen=0.0.0.0 --name=rhel9.1 --machine q35 --vcpu=6,vcpu.placement="static" --memory=8192,hugepages=yes --memorybacking hugepages=yes,size=1,unit=G,locked=yes,access.mode=shared --cpu host,numa.cell0.memory=8388608,numa.cell0.unit='KiB',numa.cell0.id="0",numa.cell0.cpus="0-5",numa.cell0.memAccess="shared" --numatune memory.mode="strict",memory.nodeset="0",memnode.cellid="0",memnode.mode="strict",memnode.nodeset="0" --features pmu.state="off",ioapic.driver="qemu" --memballoon virtio,driver.iommu=on,driver.ats=on --disk path=/home/images_nfv-virt-rt-kvm/rhel9.1.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20,driver.iommu=on,driver.ats=on --network bridge=switch,model=virtio,mac=88:66:da:5f:dd:11,driver.iommu=on,driver.ats=on --osinfo detect=on,require=off --check all=off --iommu model='intel',driver.intremap='on',driver.caching_mode='on',driver.iotlb='on' --cputune vcpupin0.vcpu=0,vcpupin0.cpuset=30,vcpupin1.vcpu=1,vcpupin1.cpuset=28,vcpupin2.vcpu=2,vcpupin2.cpuset=26,vcpupin3.vcpu=3,vcpupin3.cpuset=24,vcpupin4.vcpu=4,vcpupin4.cpuset=22,vcpupin5.vcpu=5,vcpupin5.cpuset=20,emulatorpin.cpuset="25,27,29,31" --network type=vhostuser,mac.address=18:66:da:5f:dd:22,model=virtio,source.type=unix,source.path=/tmp/vhost-user1,source.mode=server,driver.name=vhost,driver.iommu=on,driver.ats=on,driver.packed=on --network type=vhostuser,mac.address=18:66:da:5f:dd:23,model=virtio,source.type=unix,source.path=/tmp/vhost-user2,source.mode=server,driver.name=vhost,driver.iommu=on,driver.ats=on,driver.packed=on --import --noautoconsole --noreboot [1] --network type=vhostuser,mac.address=18:66:da:5f:dd:22,model=virtio,source.type=unix,source.path=/tmp/vhost-user1,source.mode=server,driver.name=vhost,driver.iommu=on,driver.ats=on,driver.packed=on [2] --network type=vhostuser,mac.address=18:66:da:5f:dd:23,model=virtio,source.type=unix,source.path=/tmp/vhost-user2,source.mode=server,driver.name=vhost,driver.iommu=on,driver.ats=on,driver.packed=on (5) setup the vm's kernel option grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` echo "isolated_cores=1,2,3,4,5" >> /etc/tuned/cpu-partitioning-variables.conf tuned-adm profile cpu-partitioning reboot echo 2 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages modprobe vfio modprobe vfio-pci dpdk-devbind.py --bind=vfio-pci 0000:02:00.0 dpdk-devbind.py --bind=vfio-pci 0000:03:00.0 (6) start a testpmd in the vm dpdk-testpmd -l 1,2,3 -n 4 -d /usr/lib64/librte_net_virtio.so -- --nb-cores=2 -i --disable-rss --rxd=512 --txd=512 --rxq=1 --txq=1 testpmd> start (7) setup the second host for downloading the Moongen tool in the /home/ dir and then running the Moongen grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` echo "isolated_cores=2,4,6,8,10,12,14,16,18" >> /etc/tuned/cpu-partitioning-variables.conf tuned-adm profile cpu-partitioning reboot tuned-adm profile cpu-partitioning dpdk-devbind.py --bind=vfio-pci 0000:82:00.0 dpdk-devbind.py --bind=vfio-pci 0000:82:00.1 echo 10 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages echo 10 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages cd /home/MoonGen/ ./build/MoonGen examples/opnfv-vsperf.lua > /tmp/throughput.log (8) check the Traffic Statistics in the vm testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 8046105833 RX-missed: 0 RX-bytes: 482766349980 RX-errors: 0 RX-nombuf: 0 TX-packets: 5578939316 TX-errors: 0 TX-bytes: 334736358960 Throughput (since last show) Rx-pps: 8274489 Rx-bps: 3971755176 Tx-pps: 7096092 Tx-bps: 3406124624 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 5578939316 RX-missed: 0 RX-bytes: 334736358960 RX-errors: 0 RX-nombuf: 0 TX-packets: 8046105821 TX-errors: 0 TX-bytes: 482766349260 Throughput (since last show) Rx-pps: 7096092 Rx-bps: 3406124624 Tx-pps: 8274471 Tx-bps: 3971746368 ############################################################################ (9) restart the first host's dpdk-testpmd # pkill dpdk-testpmd # dpdk-testpmd -l 2,4,6,8,10 --socket-mem 1024,1024 -n 4 --vdev 'net_vhost0,iface=/tmp/vhost-user1,queues=1,client=1,iommu-support=1' --vdev 'net_vhost1,iface=/tmp/vhost-user2,queues=1,client=1,iommu-support=1' -b 0000:3b:00.0 -b 0000:3b:00.1 -d /usr/lib64/librte_net_vhost.so -- --portmask=f -i --rxd=512 --txd=512 --rxq=1 --txq=1 --nb-cores=4 --forward-mode=io testpmd> set portlist 0,2,1,3 testpmd> start (10) re-run the Moongen cd /home/MoonGen/ ./build/MoonGen examples/opnfv-vsperf.lua > /tmp/throughput.log (11) re-check the Traffic Statistics in the vm testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 8046105833 RX-missed: 0 RX-bytes: 482766349980 RX-errors: 0 RX-nombuf: 0 TX-packets: 5578939316 TX-errors: 0 TX-bytes: 334736358960 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 0 Tx-bps: 0 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 5578939316 RX-missed: 0 RX-bytes: 334736358960 RX-errors: 0 RX-nombuf: 0 TX-packets: 8046105821 TX-errors: 0 TX-bytes: 482766349260 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 0 Tx-bps: 0 ############################################################################ This bug can not be reproduced when vhost-user interfaces do not have the packed=on option The traffic statistics in the vm after vhost-user interfaces reconnect are as following: testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 17545208821 RX-missed: 0 RX-bytes: 1052712529260 RX-errors: 0 RX-nombuf: 0 TX-packets: 11816831315 TX-errors: 0 TX-bytes: 709009878900 Throughput (since last show) Rx-pps: 1157235 Rx-bps: 555472864 Tx-pps: 860093 Tx-bps: 412844640 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 11816831315 RX-missed: 0 RX-bytes: 709009878900 RX-errors: 0 RX-nombuf: 0 TX-packets: 17545208629 TX-errors: 0 TX-bytes: 1052712517740 Throughput (since last show) Rx-pps: 860092 Rx-bps: 412844632 Tx-pps: 1157234 Tx-bps: 555472488 ############################################################################ After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |