Bug 1790360
Summary: | qemu-kvm: event flood when vhost-user backed virtio netdev is unexpectedly closed while guest is transmitting | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Pei Zhang <pezhang> |
Component: | qemu-kvm | Assignee: | lulu <lulu> |
qemu-kvm sub component: | Networking | QA Contact: | Yanghang Liu <yanghliu> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aadam, ailan, amorenoz, chayang, jasowang, jinzhao, juzhang, lulu, lvivier, mhou, pezhang, virt-maint, yanghliu, ymankad |
Version: | unspecified | Keywords: | Reopened, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1782528 | Environment: | |
Last Closed: | 2023-09-30 07:28:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1782528 | ||
Bug Blocks: |
Comment 1
Pei Zhang
2020-01-13 08:23:31 UTC
This is for RHEL-AV 8.1.1. Assigned to Adrián Moreno since he owns the RHEL-AV 8.2.0 clone. *** Bug 1782528 has been marked as a duplicate of this bug. *** QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Hi Lulu, Adrian, This issue can not be reproduced any more with qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e.x86_64. The vhost-user re-connect works very well. The vhost-user NICs can recover well and no packet loss in guest testpmd. Testing versions: qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5.x86_64 fail qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e.x86_64 work Can we move this bug to 'CurrentRelease'? Thank you. Best regards, Pei That's strange. I browsed quickly through the changes between 4.2.0-9 and 4.2.0-10 and I really don't see anything that might have fixed this. (In reply to Adrián Moreno from comment #9) > That's strange. I browsed quickly through the changes between 4.2.0-9 and > 4.2.0-10 and I really don't see anything that might have fixed this. Adrian, I'm not sure if issue gone is related with dpdk version. I was testing with latest dpdk-19.11-3.el8.x86_64 and dpdk 20.02-rc3, both working well with qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e.x86_64. Versions info of Comment 8: 4.18.0-179.el8.x86_64 tuned-2.13.0-5.el8.noarch openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch openvswitch2.13-2.13.0-0.20200121git2a4f006.el8fdp.x86_64 dpdk-19.11-3.el8.x86_64 python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e.x86_64 Best regards, Pei Hi Adrian, You are right, the issue still exists. Sorry for the confusion about Comment 8 and Comment 10(something might be wrong in my setup or versions). In the recent latest testing, the vhost-user re-connect still doesn't work well. Versions: 4.18.0-184.el8.x86_64 tuned-2.13.0-5.el8.noarch dpdk-19.11-4.el8.x86_64 openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64 python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64 testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 1068744 RX-missed: 0 RX-bytes: 64124640 RX-errors: 24252557 RX-nombuf: 0 TX-packets: 912338 TX-errors: 0 TX-bytes: 54740280 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 0 Tx-bps: 0 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 913500 RX-missed: 0 RX-bytes: 54810000 RX-errors: 24211690 RX-nombuf: 0 TX-packets: 1067596 TX-errors: 0 TX-bytes: 64055760 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 0 Tx-bps: 0 ############################################################################ Best regards, Pei (In reply to Pei Zhang from comment #12) > Hi Adrian, > > You are right, the issue still exists. Sorry for the confusion about Comment > 8 and Comment 10(something might be wrong in my setup or versions). > > In the recent latest testing, the vhost-user re-connect still doesn't work > well. > > Versions: > 4.18.0-184.el8.x86_64 > tuned-2.13.0-5.el8.noarch > dpdk-19.11-4.el8.x86_64 > openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch > openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64 > python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 > qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64 > > testpmd> show port stats all > > ######################## NIC statistics for port 0 > ######################## > RX-packets: 1068744 RX-missed: 0 RX-bytes: 64124640 > RX-errors: 24252557 > RX-nombuf: 0 > TX-packets: 912338 TX-errors: 0 TX-bytes: 54740280 > > Throughput (since last show) > Rx-pps: 0 Rx-bps: 0 > Tx-pps: 0 Tx-bps: 0 > > ############################################################################ > > ######################## NIC statistics for port 1 > ######################## > RX-packets: 913500 RX-missed: 0 RX-bytes: 54810000 > RX-errors: 24211690 > RX-nombuf: 0 > TX-packets: 1067596 TX-errors: 0 TX-bytes: 64055760 > > Throughput (since last show) > Rx-pps: 0 Rx-bps: 0 > Tx-pps: 0 Tx-bps: 0 > > ############################################################################ > > > Best regards, > > Pei Thanks Pei for this update, but I'm now working in another urgent problem, will working in this problem later After discussed with pei, We plan to move this to AV8.4 Hi Lulu, Adrian, Could you help confirm from code level if this issue has been fixed? I may close this bz as CurrentRelease after your confirmation. In recent 8.4-av testing, vhost-user keeps working well after ovs re-connect and no any error shows in qemu/kernel/ovs/guest/host. Versions: 4.18.0-262.el8.x86_64 qemu-kvm-5.2.0-1.module+el8.4.0+9091+650b220a.x86_64 tuned-2.15.0-0.1.rc1.el8.noarch libvirt-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64 python3-libvirt-6.6.0-1.module+el8.3.0+7572+bcbf6b90.x86_64 openvswitch2.13-2.13.0-77.el8fdp.x86_64 dpdk-19.11.3-1.el8.x86_64 Results: Testcase: vhostuser_reconnect_nonrt_iommu_ovs Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.307395 21.307395 0 64 0 21.307384 21.307384 0 64 0 21.307395 21.307395 Best regards, Pei Hi Pei, (In reply to Pei Zhang from comment #16) > Hi Lulu, Adrian, > > Could you help confirm from code level if this issue has been fixed? I may > close this bz as CurrentRelease after your confirmation. > I don't think this issue has been fixed from code level. I don't have my old setup handy but I just reproduced it on qemu 5.1.0 > In recent 8.4-av testing, vhost-user keeps working well after ovs re-connect > and no any error shows in qemu/kernel/ovs/guest/host. > In this test-case, is testpmd transmitting packets in the guest? The problem is triggered when testpmd, in txonly mode, is sending many packets. Also, the longer the queue size, the higher the chances of hitting the issue. (In reply to Adrián Moreno from comment #17) > Hi Pei, > > > (In reply to Pei Zhang from comment #16) > > Hi Lulu, Adrian, > > > > Could you help confirm from code level if this issue has been fixed? I may > > close this bz as CurrentRelease after your confirmation. > > > > I don't think this issue has been fixed from code level. > I don't have my old setup handy but I just reproduced it on qemu 5.1.0 Adrian, Thanks for your fast reply. Could you try qemu 5.2? As in my setup, this issue only is gone with qemu 5.2, not qemu 5.1. > > > > In recent 8.4-av testing, vhost-user keeps working well after ovs re-connect > > and no any error shows in qemu/kernel/ovs/guest/host. > > > > In this test-case, is testpmd transmitting packets in the guest? Yes, testpmd is running and transmitting packets in the guest. Best regards, Pei > The problem is triggered when testpmd, in txonly mode, is sending many > packets. > Also, the longer the queue size, the higher the chances of hitting the issue. Same with 5.2.0 Host testpmd: sudo testpmd -l 0,20,21,22,23 --socket-mem=1024 -n 4 --vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' --vdev 'net_vhost1,iface=/tmp/vhost-user2,client=1' --no-pci -- --rxq=1 --txq=1 --portmask=f -a --forward-mode=rxonly --nb-cores=4 -i Guest vhost config: <interface type='vhostuser'> <mac address='56:48:4f:53:54:01'/> <source type='unix' path='/tmp/vhost-user1' mode='server'/> <model type='virtio'/> <driver name='vhost' rx_queue_size='1024'/> <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='56:48:4f:53:54:02'/> <source type='unix' path='/tmp/vhost-user2' mode='server'/> <model type='virtio'/> <driver name='vhost' rx_queue_size='1024'/> <address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/> </interface> Command in guest: testpmd -l 1,2 --socket-mem 1024 -n 2 -- --portmask=3 -i I think Adrian have answer this This issue still can be reproduced with latest rhel8.4-av. Versions: 4.18.0-278.rt7.43.el8.dt4.x86_64 qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.x86_64 tuned-2.15.0-1.el8.noarch libvirt-7.0.0-3.module+el8.4.0+9709+a99efd61.x86_64 python3-libvirt-6.10.0-1.module+el8.4.0+8948+a39b3f3a.x86_64 openvswitch2.13-2.13.0-86.el8fdp.x86_64 dpdk-20.11-1.el8.x86_64 There are 2 ways to reproduce: 1. I can reproduce this issue following Adrian's step. Thanks Adrian for many times confirm about this issue. After many tries, I can confirm this is a different scenario with Comment 16. I will add a new test case after this bug is fixed. 2. I can also reproduce this issue with ovs on host Steps: (1) boot OVS # ovs-vsctl show 4aa75943-b583-4eb1-9b7b-999c8409f68b Bridge "ovsbr1" Port "ovsbr1" Interface "ovsbr1" type: internal Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser1.sock"} Port "dpdk1" Interface "dpdk1" type: dpdk options: {dpdk-devargs="0000:5e:00.1", n_rxq="2"} Bridge "ovsbr0" Port "ovsbr0" Interface "ovsbr0" type: internal Port "vhost-user0" Interface "vhost-user0" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser0.sock"} Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:5e:00.0", n_rxq="2"} (2) Boot VM <interface type='vhostuser'> <mac address='18:66:da:5f:dd:22'/> <source type='unix' path='/tmp/vhostuser0.sock' mode='server'/> <target dev='vhost-user0'/> <model type='virtio'/> <driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='18:66:da:5f:dd:23'/> <source type='unix' path='/tmp/vhostuser1.sock' mode='server'/> <target dev='vhost-user1'/> <model type='virtio'/> <driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/> <alias name='net2'/> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </interface> (3) In VM, start testpmd as txonly mode dpdk-testpmd \ --socket-mem 1024 \ -l 1,4,5 \ -w 0000:07:00.0 \ --proc-type auto \ --file-prefix tx \ -- \ --port-topology=chained \ --disable-rss \ -i \ --txq=2 \ --nb-cores=2 \ --auto-start \ --forward-mode=txonly testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 0 RX-missed: 0 RX-bytes: 0 RX-errors: 0 RX-nombuf: 0 TX-packets: 35477152 TX-errors: 0 TX-bytes: 2270538240 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 5651028 Tx-bps: 2893319344 ############################################################################ (4) In host, restart ovs, VM become hang. This issue is reproduced. Hello Cindy, As Comment 21, this issue still exits and it can cause guest hang. So I would ask: Do we plan to fix it on rhel8.4-av? Thanks a lot. If you need to debug on my setup, feel free to let me know, I can prepare it for you. Best regards, Pei (In reply to Pei Zhang from comment #22) > Hello Cindy, > > As Comment 21, this issue still exits and it can cause guest hang. So I > would ask: Do we plan to fix it on rhel8.4-av? Thanks a lot. > > If you need to debug on my setup, feel free to let me know, I can prepare it > for you. > > Best regards, > > Pei sure,Thanks pei, I'm checking this bug now. will let you know if we can catch the rhel 8.4-av After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Re-open this bug as this issue still exists with rhel8.5-av: Versions: 4.18.0-322.el8.x86_64 qemu-kvm-6.0.0-23.module+el8.5.0+11740+35571f13.x86_64 dpdk-20.11-3.el8.x86_64 Cindy, Feel free to let me know if you have any other comments or concern about the re-open? Thanks a lot. Best regards, Pei sure, please reopen it, sorry for my late reply Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. This issue still exits with rhel8.6. 4.18.0-367.el8.x86_64 qemu-kvm-6.2.0-5.module+el8.6.0+14025+ca131e0a.x86_64 Re-open the bug. Cindy, feel free to correct me if any concerns :) After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Hi lulu, May I ask if you plan to fix this bug on the rhel9 ? Is it ok that we re-open this bug ? (In reply to Yanghang Liu from comment #41) > Hi lulu, > > May I ask if you plan to fix this bug on the rhel9 ? > > Is it ok that we re-open this bug ? sure, I will be working in this Thanks cindy After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |