Bug 1790360

Summary: qemu-kvm: event flood when vhost-user backed virtio netdev is unexpectedly closed while guest is transmitting
Product: Red Hat Enterprise Linux 9 Reporter: Pei Zhang <pezhang>
Component: qemu-kvmAssignee: lulu <lulu>
qemu-kvm sub component: Networking QA Contact: Yanghang Liu <yanghliu>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: medium CC: aadam, ailan, amorenoz, chayang, jasowang, jinzhao, juzhang, lulu, lvivier, mhou, pezhang, virt-maint, yanghliu, ymankad
Version: unspecifiedKeywords: Reopened, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1782528 Environment:
Last Closed: 2023-09-30 07:28:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1782528    
Bug Blocks:    

Comment 1 Pei Zhang 2020-01-13 08:23:31 UTC
RHEL8.2-AV Versions:

4.18.0-168.el8.x86_64
qemu-kvm-4.2.0-5.module+el8.2.0+5389+367d9739.x86_64

Comment 2 Rick Barry 2020-01-16 19:39:01 UTC
This is for RHEL-AV 8.1.1. Assigned to  Adrián Moreno since he owns the RHEL-AV 8.2.0 clone.

Comment 6 Pei Zhang 2020-02-04 09:42:07 UTC
*** Bug 1782528 has been marked as a duplicate of this bug. ***

Comment 7 Ademar Reis 2020-02-05 23:12:27 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 8 Pei Zhang 2020-02-19 04:07:54 UTC
Hi Lulu, Adrian,

This issue can not be reproduced any more with qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e.x86_64. The vhost-user re-connect works very well. The vhost-user NICs can recover well and no packet loss in guest testpmd.

Testing versions:
qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5.x86_64   fail 
qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e.x86_64  work

Can we move this bug to 'CurrentRelease'? Thank you.

Best regards,

Pei

Comment 9 Adrián Moreno 2020-02-19 08:39:20 UTC
That's strange. I browsed quickly through the changes between 4.2.0-9 and 4.2.0-10 and I really don't see anything that might have fixed this.

Comment 10 Pei Zhang 2020-02-19 10:14:41 UTC
(In reply to Adrián Moreno from comment #9)
> That's strange. I browsed quickly through the changes between 4.2.0-9 and
> 4.2.0-10 and I really don't see anything that might have fixed this.

Adrian, 

I'm not sure if issue gone is related with dpdk version. I was testing with latest dpdk-19.11-3.el8.x86_64 and dpdk 20.02-rc3, both working well with qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e.x86_64.


Versions info of Comment 8:
4.18.0-179.el8.x86_64
tuned-2.13.0-5.el8.noarch
openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch
openvswitch2.13-2.13.0-0.20200121git2a4f006.el8fdp.x86_64
dpdk-19.11-3.el8.x86_64
python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e.x86_64


Best regards,

Pei

Comment 12 Pei Zhang 2020-02-28 07:13:53 UTC
Hi Adrian, 

You are right, the issue still exists. Sorry for the confusion about Comment 8 and Comment 10(something might be wrong in my setup or versions).

In the recent latest testing,  the vhost-user re-connect still doesn't work well. 

Versions:
4.18.0-184.el8.x86_64
tuned-2.13.0-5.el8.noarch
dpdk-19.11-4.el8.x86_64
openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch
openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64
python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64

testpmd> show port stats all 

  ######################## NIC statistics for port 0  ########################
  RX-packets: 1068744    RX-missed: 0          RX-bytes:  64124640
  RX-errors: 24252557
  RX-nombuf:  0         
  TX-packets: 912338     TX-errors: 0          TX-bytes:  54740280

  Throughput (since last show)
  Rx-pps:            0          Rx-bps:            0
  Tx-pps:            0          Tx-bps:            0
  ############################################################################

  ######################## NIC statistics for port 1  ########################
  RX-packets: 913500     RX-missed: 0          RX-bytes:  54810000
  RX-errors: 24211690
  RX-nombuf:  0         
  TX-packets: 1067596    TX-errors: 0          TX-bytes:  64055760

  Throughput (since last show)
  Rx-pps:            0          Rx-bps:            0
  Tx-pps:            0          Tx-bps:            0
  ############################################################################


Best regards,

Pei

Comment 13 lulu@redhat.com 2020-02-28 07:20:23 UTC
(In reply to Pei Zhang from comment #12)
> Hi Adrian, 
> 
> You are right, the issue still exists. Sorry for the confusion about Comment
> 8 and Comment 10(something might be wrong in my setup or versions).
> 
> In the recent latest testing,  the vhost-user re-connect still doesn't work
> well. 
> 
> Versions:
> 4.18.0-184.el8.x86_64
> tuned-2.13.0-5.el8.noarch
> dpdk-19.11-4.el8.x86_64
> openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch
> openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64
> python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
> qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64
> 
> testpmd> show port stats all 
> 
>   ######################## NIC statistics for port 0 
> ########################
>   RX-packets: 1068744    RX-missed: 0          RX-bytes:  64124640
>   RX-errors: 24252557
>   RX-nombuf:  0         
>   TX-packets: 912338     TX-errors: 0          TX-bytes:  54740280
> 
>   Throughput (since last show)
>   Rx-pps:            0          Rx-bps:            0
>   Tx-pps:            0          Tx-bps:            0
>  
> ############################################################################
> 
>   ######################## NIC statistics for port 1 
> ########################
>   RX-packets: 913500     RX-missed: 0          RX-bytes:  54810000
>   RX-errors: 24211690
>   RX-nombuf:  0         
>   TX-packets: 1067596    TX-errors: 0          TX-bytes:  64055760
> 
>   Throughput (since last show)
>   Rx-pps:            0          Rx-bps:            0
>   Tx-pps:            0          Tx-bps:            0
>  
> ############################################################################
> 
> 
> Best regards,
> 
> Pei

Thanks Pei for this update,  but I'm now working in another urgent problem, will working in this problem later

Comment 14 lulu@redhat.com 2020-09-21 05:49:43 UTC
After discussed with pei, We plan to move this to AV8.4

Comment 16 Pei Zhang 2020-12-16 08:04:24 UTC
Hi Lulu, Adrian,

Could you help confirm from code level if this issue has been fixed? I may close this bz as CurrentRelease after your confirmation.

In recent 8.4-av testing, vhost-user keeps working well after ovs re-connect and no any error shows in qemu/kernel/ovs/guest/host.

Versions:
4.18.0-262.el8.x86_64
qemu-kvm-5.2.0-1.module+el8.4.0+9091+650b220a.x86_64
tuned-2.15.0-0.1.rc1.el8.noarch
libvirt-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64
python3-libvirt-6.6.0-1.module+el8.3.0+7572+bcbf6b90.x86_64
openvswitch2.13-2.13.0-77.el8fdp.x86_64
dpdk-19.11.3-1.el8.x86_64

Results:
Testcase: vhostuser_reconnect_nonrt_iommu_ovs
Packets_loss Frame_Size Run_No Throughput Avg_Throughput
0 64 0 21.307395 21.307395
0 64 0 21.307384 21.307384
0 64 0 21.307395 21.307395


Best regards,

Pei

Comment 17 Adrián Moreno 2020-12-16 09:36:28 UTC
Hi Pei,


(In reply to Pei Zhang from comment #16)
> Hi Lulu, Adrian,
> 
> Could you help confirm from code level if this issue has been fixed? I may
> close this bz as CurrentRelease after your confirmation.
> 

I don't think this issue has been fixed from code level.
I don't have my old setup handy but I just reproduced it on qemu 5.1.0


> In recent 8.4-av testing, vhost-user keeps working well after ovs re-connect
> and no any error shows in qemu/kernel/ovs/guest/host.
> 

In this test-case, is testpmd transmitting packets in the guest?
The problem is triggered when testpmd, in txonly mode, is sending many packets.
Also, the longer the queue size, the higher the chances of hitting the issue.

Comment 18 Pei Zhang 2020-12-16 10:32:11 UTC
(In reply to Adrián Moreno from comment #17)
> Hi Pei,
> 
> 
> (In reply to Pei Zhang from comment #16)
> > Hi Lulu, Adrian,
> > 
> > Could you help confirm from code level if this issue has been fixed? I may
> > close this bz as CurrentRelease after your confirmation.
> > 
> 
> I don't think this issue has been fixed from code level.
> I don't have my old setup handy but I just reproduced it on qemu 5.1.0

Adrian, 

Thanks for your fast reply. Could you try qemu 5.2? As in my setup, this issue only is gone with qemu 5.2, not qemu 5.1.


> 
> 
> > In recent 8.4-av testing, vhost-user keeps working well after ovs re-connect
> > and no any error shows in qemu/kernel/ovs/guest/host.
> > 
> 
> In this test-case, is testpmd transmitting packets in the guest?

Yes, testpmd is running and transmitting packets in the guest.

Best regards,

Pei

> The problem is triggered when testpmd, in txonly mode, is sending many
> packets.
> Also, the longer the queue size, the higher the chances of hitting the issue.

Comment 19 Adrián Moreno 2020-12-16 10:43:11 UTC
Same with 5.2.0

Host testpmd:
sudo testpmd -l 0,20,21,22,23 --socket-mem=1024 -n 4  --vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1'  --vdev 'net_vhost1,iface=/tmp/vhost-user2,client=1' --no-pci -- --rxq=1 --txq=1 --portmask=f -a --forward-mode=rxonly --nb-cores=4  -i

Guest vhost config:
    <interface type='vhostuser'> 
      <mac address='56:48:4f:53:54:01'/> 
      <source type='unix' path='/tmp/vhost-user1' mode='server'/> 
      <model type='virtio'/> 
      <driver name='vhost' rx_queue_size='1024'/> 
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/> 
    </interface> 
    <interface type='vhostuser'> 
      <mac address='56:48:4f:53:54:02'/> 
      <source type='unix' path='/tmp/vhost-user2' mode='server'/> 
      <model type='virtio'/> 
      <driver name='vhost' rx_queue_size='1024'/> 
      <address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/> 
    </interface>

Command in guest:
testpmd -l 1,2 --socket-mem 1024 -n 2 -- --portmask=3 -i

Comment 20 lulu@redhat.com 2020-12-28 05:46:15 UTC
I think  Adrian have answer this

Comment 21 Pei Zhang 2021-02-01 10:18:45 UTC
This issue still can be reproduced with latest rhel8.4-av.

Versions:
4.18.0-278.rt7.43.el8.dt4.x86_64
qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.x86_64
tuned-2.15.0-1.el8.noarch
libvirt-7.0.0-3.module+el8.4.0+9709+a99efd61.x86_64
python3-libvirt-6.10.0-1.module+el8.4.0+8948+a39b3f3a.x86_64
openvswitch2.13-2.13.0-86.el8fdp.x86_64
dpdk-20.11-1.el8.x86_64

There are 2 ways to reproduce:

1. I can reproduce this issue following Adrian's step. Thanks Adrian for many times confirm about this issue. After many tries, I can confirm this is a different scenario with Comment 16. I will add a new test case after this bug is fixed.


2. I can also reproduce this issue with ovs on host

Steps:
(1) boot OVS

# ovs-vsctl show
4aa75943-b583-4eb1-9b7b-999c8409f68b
    Bridge "ovsbr1"
        Port "ovsbr1"
            Interface "ovsbr1"
                type: internal
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhostuser1.sock"}
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
                options: {dpdk-devargs="0000:5e:00.1", n_rxq="2"}
    Bridge "ovsbr0"
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "vhost-user0"
            Interface "vhost-user0"
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhostuser0.sock"}
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:5e:00.0", n_rxq="2"}

(2) Boot VM

    <interface type='vhostuser'>
      <mac address='18:66:da:5f:dd:22'/>
      <source type='unix' path='/tmp/vhostuser0.sock' mode='server'/>
      <target dev='vhost-user0'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:5f:dd:23'/>
      <source type='unix' path='/tmp/vhostuser1.sock' mode='server'/>
      <target dev='vhost-user1'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>

(3) In VM, start testpmd as txonly mode

dpdk-testpmd  \
        --socket-mem 1024 \
        -l 1,4,5 \
        -w 0000:07:00.0    \
        --proc-type auto \
        --file-prefix tx  \
        --     \
        --port-topology=chained     \
        --disable-rss \
        -i \
        --txq=2 \
        --nb-cores=2 \
        --auto-start  \
        --forward-mode=txonly

testpmd> show port stats all 

  ######################## NIC statistics for port 0  ########################
  RX-packets: 0          RX-missed: 0          RX-bytes:  0
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 35477152   TX-errors: 0          TX-bytes:  2270538240

  Throughput (since last show)
  Rx-pps:            0          Rx-bps:            0
  Tx-pps:      5651028          Tx-bps:   2893319344
  ############################################################################


(4) In host, restart ovs, VM become hang. This issue is reproduced.

Comment 22 Pei Zhang 2021-02-01 10:20:05 UTC
Hello Cindy, 

As Comment 21, this issue still exits and it can cause guest hang. So I would ask: Do we plan to fix it on rhel8.4-av? Thanks a lot.

If you need to debug on my setup, feel free to let me know, I can prepare it for you. 

Best regards,

Pei

Comment 24 lulu@redhat.com 2021-02-02 02:09:09 UTC
(In reply to Pei Zhang from comment #22)
> Hello Cindy, 
> 
> As Comment 21, this issue still exits and it can cause guest hang. So I
> would ask: Do we plan to fix it on rhel8.4-av? Thanks a lot.
> 
> If you need to debug on my setup, feel free to let me know, I can prepare it
> for you. 
> 
> Best regards,
> 
> Pei

sure,Thanks pei, I'm checking this bug now. will let you know if we can catch the rhel 8.4-av

Comment 29 RHEL Program Management 2021-07-13 07:30:05 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 30 Pei Zhang 2021-07-13 12:44:59 UTC
Re-open this bug as this issue still exists with rhel8.5-av:

Versions:
4.18.0-322.el8.x86_64
qemu-kvm-6.0.0-23.module+el8.5.0+11740+35571f13.x86_64
dpdk-20.11-3.el8.x86_64

Cindy,

Feel free to let me know if you have any other comments or concern about the re-open? Thanks a lot.

Best regards,

Pei

Comment 32 lulu@redhat.com 2021-07-20 06:42:59 UTC
sure, please reopen it, sorry for my late reply

Comment 34 John Ferlan 2021-09-09 12:34:21 UTC
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 36 RHEL Program Management 2022-01-13 07:27:23 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 37 Pei Zhang 2022-02-23 07:53:45 UTC
This issue still exits with rhel8.6.

4.18.0-367.el8.x86_64
qemu-kvm-6.2.0-5.module+el8.6.0+14025+ca131e0a.x86_64

Re-open the bug. 

Cindy, feel free to correct me if any concerns :)

Comment 40 RHEL Program Management 2022-08-23 07:27:48 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 41 Yanghang Liu 2022-08-31 01:41:06 UTC
Hi lulu,

May I ask if you plan to fix this bug on the rhel9 ? 

Is it ok that we re-open this bug ?

Comment 42 lulu@redhat.com 2022-08-31 06:57:46 UTC
(In reply to Yanghang Liu from comment #41)
> Hi lulu,
> 
> May I ask if you plan to fix this bug on the rhel9 ? 
> 
> Is it ok that we re-open this bug ?

sure, I will be working in this 
Thanks
cindy

Comment 45 RHEL Program Management 2023-03-03 07:27:51 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 48 RHEL Program Management 2023-09-30 07:28:38 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.