1644383 – Issues with LACP failover - frames dropped in OVS DPDK 2.9 with OSP 10

Bug 1644383 - Issues with LACP failover - frames dropped in OVS DPDK 2.9 with OSP 10

Summary: Issues with LACP failover - frames dropped in OVS DPDK 2.9 with OSP 10

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openvswitch
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	10.0 (Newton)
Assignee:	Aaron Conole
QA Contact:	Hekai Wang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1644982 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-30 16:56 UTC by Andreas Karis
Modified:	2022-07-09 10:19 UTC (History)
CC List:	22 users (show)
Fixed In Version:	openvswitch-2.9.0-56.el7fdp.3
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1649516 (view as bug list)
Environment:
Last Closed:	2019-01-16 17:11:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	NFV-785	None	None	None	2022-03-13 16:34:16 UTC
Red Hat Issue Tracker	OSP-13779	None	None	None	2022-03-13 16:34:24 UTC
Red Hat Knowledge Base (Solution)	3678391	None	None	None	2018-11-07 15:59:00 UTC
Red Hat Product Errata	RHSA-2019:0053	None	None	None	2019-01-16 17:11:43 UTC

Description Andreas Karis 2018-10-30 16:56:10 UTC

Description of problem:
when disable one port on dpdkbond0 on the compute host,   the connectivity becomes broken,
did mirror/snoop on the external switch port corresponding to remaining working port, 
and did ovs-tcpdump on the dpdkbond0 on compute host dpdk-2 interface while doing ping test from 10.145.131.72 to 10.145.131.29.
10.145.131.29 is on a VM on compute host dpdk-2,
we can see the trace from the external switch(pcap attached) only shows 1 way packet flow(icmp request from 10.145.131.72 to 10.145.131.29, there's no reply);
the trace from the dpdk-2(dpdk-2-test-2018-10-29.pcap) show both icmp request and icmp reply, one weird thing is there's 2 icmp reply of the same sequence captured in the trace on dpdk-2;
customer suspects the fault is on the ovs on the compute host.

This issue can be reproduced, it can reproduced not only on dpdk-2 compute, but on many other compute hosts( customer believes it happened on all compute hosts in that environment, but they did have time to test on all of them)


Version-Release number of selected component (if applicable):
[akaris@collab-shell sosreport-20181030-155034]$ grep openvswitch overcloud-compute-dpdk-2.localdomain/installed-rpms 
openstack-neutron-openvswitch-9.4.1-19.el7ost.noarch        Fri Aug  3 01:52:04 2018
openvswitch-2.9.0-19.el7fdp.1.x86_64                        Fri Aug  3 01:51:55 2018
openvswitch-test-2.9.0-56.el7fdp.noarch                     Mon Oct 29 19:22:15 2018
python-openvswitch-2.9.0-56.el7fdp.noarch                   Mon Oct 29 19:22:15 2018


Steps to Reproduce:

deploy the latest rhosp10, which uses ovs 2.9.0, enable dpdk for the compute host;
for compute host, use dpdkbond with lacp for the VNF traffic:
~~~
[akaris@collab-shell 02241525]$ grep BondInterfaceOvsDpdkOptions templates20181029.zip/ -R -C3
templates20181029.zip/compute-dpdk.yaml-    description: linux bond options
templates20181029.zip/compute-dpdk.yaml-    default: 'mode=802.3ad lacp_rate=1'
templates20181029.zip/compute-dpdk.yaml-    type: string
templates20181029.zip/compute-dpdk.yaml:  BondInterfaceOvsDpdkOptions:
templates20181029.zip/compute-dpdk.yaml-    description: linux bond options
templates20181029.zip/compute-dpdk.yaml-    default: 'bond_mode=balance-tcp lacp=active other_config={lacp-time=fast}'
templates20181029.zip/compute-dpdk.yaml-    type: string
--
templates20181029.zip/compute-dpdk.yaml-                -
templates20181029.zip/compute-dpdk.yaml-                  type: ovs_dpdk_bond
templates20181029.zip/compute-dpdk.yaml-                  name: dpdkbond0
templates20181029.zip/compute-dpdk.yaml:                  ovs_options: {get_param: BondInterfaceOvsDpdkOptions}
templates20181029.zip/compute-dpdk.yaml-                  members:
templates20181029.zip/compute-dpdk.yaml-                    -
templates20181029.zip/compute-dpdk.yaml-                      type: ovs_dpdk_port
~~~

deploy a VM on the compute host enabled with ovs-dpdk;
start a flow to that VM, e.g. start to continuously ping to the IP on that VM from an external server.
start the ovs-tcpdump on the dpdkbond interface on that compute host where the VM is deployed;
start tcpdump inside that destination VM;
start tcpdump on the source external server;
using some tools(e.g.mirror) on the external switch to learn which port on the external switch is receiving the ICMP reply flow (sent from the VM on the compute host);
then either disable that port on the external switch or disable the corresponding dpdk port on that compute host;

then you will see:
in the tcpdump on the source external server, there's only ICMP request, but no ICMP reply
in the ovs-tcpdump on dpdkbond interface on the compute host, there's both ICMP request and ICMP reply
in the tcpdunp inside the destination VM, there's both ICMP request and ICMP reply

Note: the ovs-appctl bond/show or ovs-appctl lacp/show correctly reflects the port status when either the port on the external switch or the corresponding dpdk port on the compute host is disabled, i.e. everything looks correct if you only use these commands to check  or use the command on the external switch to check.
 
But in reality, the packet belong to the old( i.e., the packet sent from the VM to the external world which used to be sent over the currently disabled port) is dropped somewhere.
My guess is that  these packets are dropped by ovs.

Comment 16 Andreas Karis 2018-10-31 23:07:57 UTC

This seems to have something to do with the FDB:
~~~
(overcloud) [stack@undercloud-r430 ~]$ ping 10.0.0.105
PING 10.0.0.105 (10.0.0.105) 56(84) bytes of data.
64 bytes from 10.0.0.105: icmp_seq=1 ttl=64 time=0.322 ms
64 bytes from 10.0.0.105: icmp_seq=2 ttl=64 time=0.194 ms
64 bytes from 10.0.0.105: icmp_seq=3 ttl=64 time=0.263 ms
64 bytes from 10.0.0.105: icmp_seq=4 ttl=64 time=0.183 ms
(...)
~~~

~~~
S4048-ON-sw(conf-if-te-1/14)#shut
S4048-ON-sw(conf-if-te-1/14)#
~~~
At this moment, the ping stops.

~~~
[root@overcloud-computeovsdpdk-0 ~]# ovs-appctl fdb/show br-link0
 port  VLAN  MAC                Age
    2   905  fa:16:3e:be:78:24    1
   10   905  52:54:00:32:28:9a    1
LOCAL   902  a0:36:9f:e3:de:c8    0
   10   902  c6:4a:3a:39:3f:8a    0
[root@overcloud-computeovsdpdk-0 ~]# ovs-appctl fdb/show br-link0
 port  VLAN  MAC                Age
LOCAL   902  a0:36:9f:e3:de:c8    1
    2   905  fa:16:3e:be:78:24    1
   10   905  52:54:00:32:28:9a    1
   10   902  c6:4a:3a:39:3f:8a    1
~~~

Stopping the ping:
~~~
^C
--- 10.0.0.105 ping statistics ---
15 packets transmitted, 4 received, 73% packet loss, time 14000ms
rtt min/avg/max/mdev = 0.183/0.240/0.322/0.058 ms
~~~

Flushing FDB:
~~~
[root@overcloud-computeovsdpdk-0 ~]# ovs-appctl fdb/flush
table successfully flushed
[root@overcloud-computeovsdpdk-0 ~]# ovs-appctl fdb/show br-link0
 port  VLAN  MAC                Age
LOCAL   902  a0:36:9f:e3:de:c8    1
   10   902  c6:4a:3a:39:3f:8a    1
[root@overcloud-computeovsdpdk-0 ~]# ovs-appctl fdb/show br-link0
 port  VLAN  MAC                Age
    2   905  fa:16:3e:be:78:24    3
   10   905  52:54:00:32:28:9a    3
   10   902  c6:4a:3a:39:3f:8a    1
LOCAL   902  a0:36:9f:e3:de:c8    1
~~~

Ping now works:
~~~
(overcloud) [stack@undercloud-r430 ~]$ ping 10.0.0.105
PING 10.0.0.105 (10.0.0.105) 56(84) bytes of data.
64 bytes from 10.0.0.105: icmp_seq=1 ttl=64 time=0.295 ms
64 bytes from 10.0.0.105: icmp_seq=2 ttl=64 time=0.208 ms
^C
--- 10.0.0.105 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.208/0.251/0.295/0.046 ms
(overcloud) [stack@undercloud-r430 ~]$
~~~

Comment 17 Andreas Karis 2018-10-31 23:14:28 UTC

Also, I cannot reproduce the issue in my lab with:

0) stop ping, wait 10 seconds

a) ovs-appctl fdb/flush

b) shut down switch port

c) start ping of instance


I think though that I have to wait for my switch's MAC address table to time out, which in my case is 10 seconds:

S4048-ON-sw(conf-if-te-1/19)#do show mac-address-table  | grep Po
 902	a0:36:9f:e3:de:c8	Dynamic    	Po 1      	Active
 905	fa:16:3e:be:78:24	Dynamic    	Po 1      	Active
S4048-ON-sw(conf-if-te-1/19)#do show mac-address-table  | grep Po
 902	a0:36:9f:e3:de:c8	Dynamic    	Po 1      	Active
 905	fa:16:3e:be:78:24	Dynamic    	Po 1      	Active
S4048-ON-sw(conf-if-te-1/19)#do show mac-address-table  | grep Po
 902	a0:36:9f:e3:de:c8	Dynamic    	Po 1      	Active
S4048-ON-sw(conf-if-te-1/19)#

Before I can ping again.

Comment 19 Andreas Karis 2018-11-01 20:51:42 UTC

We ping from outside to VM.

Echo request makes it to VM.
VM replies with echo reply.

The echo reply is not making it from OVS DPDK to the wire.

Comment 20 Andreas Karis 2018-11-01 22:05:00 UTC

Another observation:

* I can reproduce the issue with ovs-tcpdump on
* I was then reading through manuals an researching. I simply kept the environment in this "broken" state for the flow. 
* I CTRL-C'ed ovs-tcpdump after a while and the ping resumed

Looks like something forced a recalculation of the hash within OVS which made things work again

Comment 21 Andreas Karis 2018-11-01 22:32:50 UTC

FWIW, I could not reproduce comment #20 after that.

While reproducing the issue, I captured a reply and them send it through ofproto/trace:

[root@overcloud-computeovsdpdk-0 ~]# ovs-appctl ofproto/trace br-link0 in_port=2 52540032289afa163ebe782481000389080045000054b43000004001b20f0a0000690a00000100005a21613e0092d67cdb5b00000000ca62090000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
Flow: icmp,in_port=2,dl_vlan=905,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:be:78:24,dl_dst=52:54:00:32:28:9a,nw_src=10.0.0.105,nw_dst=10.0.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0

bridge("br-link0")
------------------
 0. priority 0
    NORMAL
     -> forwarding to learned port

Final flow: unchanged
Megaflow: recirc_id=0,eth,ip,in_port=2,dl_vlan=905,dl_src=fa:16:3e:be:78:24,dl_dst=52:54:00:32:28:9a,nw_src=0.0.0.0/1,nw_dst=0.0.0.0/1,nw_frag=no
Datapath actions: hash(hash_l4(0)),recirc(0x38f)
[root@overcloud-computeovsdpdk-0 ~]# ovs-appctl ofproto/trace br-link0 "icmp,in_port=2,dl_vlan=905,dl_vlan_pcp=0,dl_src=fa:16:3e:be:78:24,dl_dst=52:54:00:32:28:9a,nw_src=10.0.0.105,nw_dst=10.0.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0"                                                                                                                                   
Flow: icmp,in_port=2,dl_vlan=905,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:be:78:24,dl_dst=52:54:00:32:28:9a,nw_src=10.0.0.105,nw_dst=10.0.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0

bridge("br-link0")
------------------
 0. priority 0
    NORMAL
     -> forwarding to learned port

Final flow: unchanged
Megaflow: recirc_id=0,eth,ip,in_port=2,dl_vlan=905,dl_src=fa:16:3e:be:78:24,dl_dst=52:54:00:32:28:9a,nw_src=0.0.0.0/1,nw_dst=0.0.0.0/1,nw_frag=no
Datapath actions: hash(hash_l4(0)),recirc(0x38f)
[root@overcloud-computeovsdpdk-0 ~]#

Comment 23 Andreas Karis 2018-11-01 23:03:06 UTC

I could get rid of the issue by running:
ovs-appctl dpctl/del-flows

With that, it seems that the issue is gone for a longer while. I also see more hashes in the bond/show output. I'll have to look into this again tomorrow.

Comment 25 Aaron Conole 2018-11-02 19:24:38 UTC

Just for an FYI, in an active-active LAG, if you shutdown a port, there may be frame loss (if you want to prevent frame loss, you'd use active-backup).

I want to understand the issue being reported here more clearly, so I'll ask Andreas to join a call with me.

I did create a software-only test rig (see https://github.com/orgcandman/netunits/blob/master/openvswitch/openvswitch.sh#L332 for details) and don't see any traffic loss using Open vSwitch from upstream/master.  This is only with veth ports, not a dpdk port - but the logic should be the same, iirc.

I'm not sure why any flow deletion would make a difference here.  Further, I don't see any truly significant changes from 2.6 to 2.9 w.r.t. the bonding code, but there could be one or two commits that point to interesting behavior changes (all bug fixes, which is even stranger).  Specifically these are introduced between 2.6 and 2.9:

5fef88eaefee ("bond: send learning pkg when non active slave failed.")
deb67947e753 ("bond: Adjust bond hash masks")
42781e77035d ("bond: Unify hash functions in hash action and entry lookup.")
e5c4f8276b55 ("ofproto/bond: Drop traffic in balance-tcp mode without lacp.")
303539348848 ("ofproto/bond: Validate active-slave mac.")

-Aaron

Comment 26 Andreas Karis 2018-11-05 17:00:16 UTC

I can reproduce with the following setup:

Remove anything neutron:
~~~
systemctl stop neutron-openvswitch-agent
mv /etc/openvswitch/conf.db{,.back}
systemctl restart openvswitch
~~~

Set up test:
~~~
ip link add veth0 type veth peer name veth1
ip link set dev veth1 netns test
ip link set dev veth0 up
ip netns exec test ip link set dev lo up
ip netns exec test ip link set dev veth1 up
ip netns exec test ip a a dev veth1 10.0.0.200/24

socket_mem=4096,4096
pmd_cpu_mask=17c0017c
host_cpu_mask=100001
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=$socket_mem
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=$pmd_cpu_mask
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=$host_cpu_mask
ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
ovs-vsctl add-bond br0 dpdkbond0 dpdk10 dpdk11 -- set interface dpdk10 type=dpdk "options:dpdk-devargs=0000:05:00.0" -- set interface dpdk11 type=dpdk "options:dpdk-devargs=0000:05:00.1" -- set port dpdkbond0 lacp=active -- set port dpdkbond0 bond_mode=balance-tcp --  set port dpdkbond0 other-config:lacp-time=fast
ovs-vsctl add-port br0 veth0 tag=905
~~~

Ping:
~~~
ping 10.0.0.200
~~~

Shut down port:
~~~
S4048-ON-sw(conf-if-te-1/19)#int te1/14
S4048-ON-sw(conf-if-te-1/14)#shut
~~~

Comment 27 Andreas Karis 2018-11-05 18:01:35 UTC

I reproduced in software only. Test setup:

systemctl stop neutron-openvswitch-agent
mv /etc/openvswitch/conf.db{,.back}
systemctl restart openvswitch
socket_mem=4096,4096
pmd_cpu_mask=17c0017c
host_cpu_mask=100001
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=$socket_mem
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=$pmd_cpu_mask
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=$host_cpu_mask

ovs-vsctl add-br br1 -- set bridge br1 datapath_type=netdev
ip link add ovs-bond-if0 type veth peer name lx-bond-if0
ip link add ovs-bond-if1 type veth peer name lx-bond-if1
ip link add lx-bond0 type bond miimon 100 mode 802.3ad
ip link set dev lx-bond-if0 master lx-bond0
ip link set dev lx-bond-if1 master lx-bond0
ip link set dev lx-bond-if0 up
ip link set dev lx-bond-if1 up
ip link set dev ovs-bond-if0 up
ip link set dev ovs-bond-if1 up
ip link set dev lx-bond0 up
ovs-vsctl add-bond br1 dpdkbond1 ovs-bond-if0 ovs-bond-if1 -- set port dpdkbond1 lacp=active -- set port dpdkbond1 bond_mode=balance-tcp --  set port dpdkbond1 other-config:lacp-time=fast
ip link add name lx-bond0.905 link lx-bond0 type vlan id 905
ip link set dev lx-bond0.905 up
ip a a dev lx-bond0.905 192.168.123.10/24
ip link add veth2 type veth peer name veth3
ip netns add test2
ip link set dev veth2 netns test2
ip link set dev veth3 up
ip netns exec test2 ip link set dev lo up
ip netns exec test2 ip link set dev veth2 up
ip netns exec test2 ip a a dev veth2 192.168.123.11/24
ovs-vsctl add-port br1 veth3 tag=905

Comment 28 Andreas Karis 2018-11-05 18:03:32 UTC

Start a ping:

[root@overcloud-compute-0 ~]# ping 192.168.123.11
PING 192.168.123.11 (192.168.123.11) 56(84) bytes of data.
64 bytes from 192.168.123.11: icmp_seq=1 ttl=64 time=0.231 ms
64 bytes from 192.168.123.11: icmp_seq=2 ttl=64 time=0.128 ms
64 bytes from 192.168.123.11: icmp_seq=3 ttl=64 time=0.131 ms
64 bytes from 192.168.123.11: icmp_seq=4 ttl=64 time=0.136 ms
64 bytes from 192.168.123.11: icmp_seq=5 ttl=64 time=0.128 ms
64 bytes from 192.168.123.11: icmp_seq=6 ttl=64 time=0.134 ms
64 bytes from 192.168.123.11: icmp_seq=7 ttl=64 time=0.132 ms
64 bytes from 192.168.123.11: icmp_seq=8 ttl=64 time=0.126 ms
64 bytes from 192.168.123.11: icmp_seq=9 ttl=64 time=0.134 ms
64 bytes from 192.168.123.11: icmp_seq=10 ttl=64 time=0.134 ms
64 bytes from 192.168.123.11: icmp_seq=11 ttl=64 time=0.129 ms
64 bytes from 192.168.123.11: icmp_seq=12 ttl=64 time=0.134 ms
64 bytes from 192.168.123.11: icmp_seq=13 ttl=64 time=0.136 ms
64 bytes from 192.168.123.11: icmp_seq=14 ttl=64 time=0.128 ms
64 bytes from 192.168.123.11: icmp_seq=15 ttl=64 time=0.135 ms
64 bytes from 192.168.123.11: icmp_seq=16 ttl=64 time=0.136 ms
64 bytes from 192.168.123.11: icmp_seq=17 ttl=64 time=0.132 ms
64 bytes from 192.168.123.11: icmp_seq=18 ttl=64 time=0.134 ms
64 bytes from 192.168.123.11: icmp_seq=19 ttl=64 time=0.137 ms
64 bytes from 192.168.123.11: icmp_seq=20 ttl=64 time=0.130 ms
64 bytes from 192.168.123.11: icmp_seq=21 ttl=64 time=0.135 ms
64 bytes from 192.168.123.11: icmp_seq=22 ttl=64 time=0.133 ms
64 bytes from 192.168.123.11: icmp_seq=23 ttl=64 time=0.131 ms
64 bytes from 192.168.123.11: icmp_seq=24 ttl=64 time=0.138 ms
64 bytes from 192.168.123.11: icmp_seq=25 ttl=64 time=0.140 ms
64 bytes from 192.168.123.11: icmp_seq=26 ttl=64 time=0.130 ms

Shut down the "critical interface" and the ping "freezes" (packet loss):
[root@overcloud-compute-0 ~]# ip link set dev ovs-bond-if1 down

Switch the load balancing mode to balance-slb and this works again:
[root@overcloud-compute-0 ~]# ovs-vsctl set port dpdkbond1 bond_mode=balance-slb

64 bytes from 192.168.123.11: icmp_seq=63 ttl=64 time=0.145 ms
64 bytes from 192.168.123.11: icmp_seq=64 ttl=64 time=0.135 ms
64 bytes from 192.168.123.11: icmp_seq=65 ttl=64 time=0.129 ms
^C
--- 192.168.123.11 ping statistics ---
65 packets transmitted, 29 received, 55% packet loss, time 64000ms
rtt min/avg/max/mdev = 0.126/0.136/0.231/0.022 ms






[root@overcloud-compute-0 ~]# ovs-appctl bond/show
---- dpdkbond1 ----
bond_mode: balance-slb
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 9161 ms
lacp_status: negotiated
lacp_fallback_ab: false
active slave mac: 66:29:92:43:09:23(ovs-bond-if0)

slave ovs-bond-if0: enabled
	active slave
	may_enable: true

slave ovs-bond-if1: disabled
	may_enable: false

Comment 29 Andreas Karis 2018-11-05 18:07:54 UTC

And I can make the problem move to the other port by:

[root@overcloud-compute-0 ~]# ip link set dev ovs-bond-if1 down
[root@overcloud-compute-0 ~]# ovs-vsctl set port dpdkbond1 bond_mode=balance-slb

Start the ping now so that it uses if0, then switch the mode to balance-tcp and enable the port again:

[root@overcloud-compute-0 ~]# ovs-vsctl set port dpdkbond1 bond_mode=balance-tcp
[root@overcloud-compute-0 ~]# ip link set dev ovs-bond-if1 up

Comment 30 Andreas Karis 2018-11-05 18:12:00 UTC

And I can "fix" the problem in the same manner with a dpctl/del-flows:
~~~
flow-dump from non-dpdk interfaces:
recirc_id(0x1),dp_hash(0xfcc63ed7/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x8100),vlan(vid=905,pcp=0),encap(eth_type(0x0806)), packets:0, bytes:0, used:never, actions:3
recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=5e:5c:65:9a:c4:fa,dst=b6:f8:52:24:fd:a4),eth_type(0x8100),vlan(vid=905,pcp=0),encap(eth_type(0x0806)), packets:1, bytes:46, used:2.266s, actions:pop_vlan,4
recirc_id(0x1),dp_hash(0xa93f2709/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x8100),vlan(vid=905,pcp=0),encap(eth_type(0x0806)), packets:0, bytes:0, used:never, actions:3
recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth_type(0x8809), packets:207, bytes:25668, used:0.737s, actions:userspace(pid=0,slow_path(lacp))
recirc_id(0),in_port(4),packet_type(ns=0,id=0),eth(src=b6:f8:52:24:fd:a4,dst=5e:5c:65:9a:c4:fa),eth_type(0x0800),ipv4(frag=no), packets:252, bytes:24696, used:0.268s, actions:push_vlan(vid=905,pcp=0),hash(hash_l4(0)),recirc(0x1)
recirc_id(0x1),dp_hash(0xfcc63ed7/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x8100),vlan(vid=905,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:220, bytes:22440, used:0.268s, actions:2
recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=5e:5c:65:9a:c4:fa,dst=b6:f8:52:24:fd:a4),eth_type(0x8100),vlan(vid=905,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:11, bytes:1122, used:0.268s, actions:pop_vlan,4
recirc_id(0),in_port(4),packet_type(ns=0,id=0),eth(src=b6:f8:52:24:fd:a4,dst=5e:5c:65:9a:c4:fa),eth_type(0x0806), packets:1, bytes:42, used:2.266s, actions:push_vlan(vid=905,pcp=0),hash(hash_l4(0)),recirc(0x1)
~~~

~~~
ping (...)
(...)
64 bytes from 192.168.123.11: icmp_seq=103 ttl=64 time=0.146 ms
64 bytes from 192.168.123.11: icmp_seq=104 ttl=64 time=0.146 ms
64 bytes from 192.168.123.11: icmp_seq=105 ttl=64 time=0.136 ms
64 bytes from 192.168.123.11: icmp_seq=106 ttl=64 time=0.158 ms
~~~

Packet loss until ...

~~~
[root@overcloud-compute-0 ~]# ovs-appctl dpctl/del-flows
[root@overcloud-compute-0 ~]# 
~~~

~~~
64 bytes from 192.168.123.11: icmp_seq=127 ttl=64 time=0.199 ms
64 bytes from 192.168.123.11: icmp_seq=128 ttl=64 time=0.190 ms
64 bytes from 192.168.123.11: icmp_seq=129 ttl=64 time=0.195 ms
64 bytes from 192.168.123.11: icmp_seq=130 ttl=64 time=0.188 ms
^C
--- 192.168.123.11 ping statistics ---
130 packets transmitted, 103 received, 20% packet loss, time 128999ms
rtt min/avg/max/mdev = 0.128/0.146/0.199/0.014 ms
[root@overcloud-compute-0 ~]# 
~~~


The flows after that are completely deleted, but the ping works:

[root@overcloud-compute-0 ~]# ovs-appctl dpctl/dump-flows
[root@overcloud-compute-0 ~]# ovs-vsctl show
a1e1a7e9-2398-4b3c-8f73-53d6c5f90b00
    Bridge "br1"
        Port "dpdkbond1"
            Interface "ovs-bond-if0"
            Interface "ovs-bond-if1"
        Port "veth3"
            tag: 905
            Interface "veth3"
        Port "br1"
            Interface "br1"
                type: internal
    ovs_version: "2.9.0"
[root@overcloud-compute-0 ~]# ovs-ofctl dump-flows br1
 cookie=0x0, duration=890.168s, table=0, n_packets=2093, n_bytes=239076, priority=0 actions=NORMAL
[root@overcloud-compute-0 ~]#

Comment 31 Andreas Karis 2018-11-05 18:48:57 UTC

Couldn't reproduce with:

ip a a dev lx-bond0 192.168.124.10/24
ip link add veth4 type veth peer name veth5
ip link set dev veth4 netns test2
ip netns exec test2 ip link set dev veth4 up
ip link set dev veth5 up
ip netns exec test2 ip a a dev veth4 192.168.124.11/24
ovs-vsctl add-port br1 veth5


[root@overcloud-compute-0 ~]# ovs-vsctl show
a1e1a7e9-2398-4b3c-8f73-53d6c5f90b00
    Bridge "br1"
        Port "veth5"
            Interface "veth5"
        Port "dpdkbond1"
            Interface "ovs-bond-if0"
            Interface "ovs-bond-if1"
        Port "veth3"
            tag: 905
            Interface "veth3"
        Port "br1"
            Interface "br1"
                type: internal
    ovs_version: "2.9.0"
[root@overcloud-compute-0




[root@overcloud-compute-0 ~]# ip netns exec test2 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
27: veth2@if26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b6:f8:52:24:fd:a4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.123.11/24 scope global veth2
       valid_lft forever preferred_lft forever
    inet6 fe80::b4f8:52ff:fe24:fda4/64 scope link 
       valid_lft forever preferred_lft forever
29: veth4@if28: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fe:f2:83:bd:73:46 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.124.11/24 scope global veth4
       valid_lft forever preferred_lft forever
    inet6 fe80::fcf2:83ff:febd:7346/64 scope link 
       valid_lft forever preferred_lft forever




[root@overcloud-compute-0 ~]# ip a ls dev veth3
26: veth3@if27: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 9a:4b:f6:b3:e6:c8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::984b:f6ff:feb3:e6c8/64 scope link 
       valid_lft forever preferred_lft forever
[root@overcloud-compute-0 ~]# ip a ls dev veth5
28: veth5@if29: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 9e:a6:4a:a6:0a:e5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
[root@overcloud-compute-0 ~]# ip a ls dev lx-bond0
24: lx-bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5e:5c:65:9a:c4:fa brd ff:ff:ff:ff:ff:ff
    inet 192.168.124.10/24 scope global lx-bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::5c5c:65ff:fe9a:c4fa/64 scope link 
       valid_lft forever preferred_lft forever
[root@overcloud-compute-0 ~]# ip a ls dev lx-bond0.905
25: lx-bond0.905@lx-bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5e:5c:65:9a:c4:fa brd ff:ff:ff:ff:ff:ff
    inet 192.168.123.10/24 scope global lx-bond0.905
       valid_lft forever preferred_lft forever
    inet6 fe80::5c5c:65ff:fe9a:c4fa/64 scope link 
       valid_lft forever preferred_lft forever
[root@overcloud-compute-0 ~]#

Comment 32 Aaron Conole 2018-11-05 19:54:26 UTC

https://github.com/openvswitch/ovs/commit/35fe9efb2f02b89f3c5d7ac1ef4608464e4f1541

Resolves at least the ability to recover from failover (there may still be some packet loss during bond rebalancing I think).

Comment 40 Aaron Conole 2018-11-14 13:49:45 UTC

*** Bug 1644982 has been marked as a duplicate of this bug. ***

Comment 41 Aaron Conole 2018-11-15 14:51:35 UTC

Waiting on which version we should apply for the hotfix.  IE: which openvswitch version should we make this hotfix available in?

Comment 42 Miguel Angel Ajo 2018-11-15 14:52:40 UTC

(In reply to Aaron Conole from comment #32)
> https://github.com/openvswitch/ovs/commit/
> 35fe9efb2f02b89f3c5d7ac1ef4608464e4f1541
> 
> Resolves at least the ability to recover from failover (there may still be
> some packet loss during bond rebalancing I think).

Openvswitch team is working on providing this backport ^


The backport has to be for 2.9.0 FDP

Comment 59 Hekai Wang 2018-12-11 08:43:36 UTC

(In reply to Andreas Karis from comment #26)
> I can reproduce with the following setup:
> 
> Remove anything neutron:
> ~~~
> systemctl stop neutron-openvswitch-agent
> mv /etc/openvswitch/conf.db{,.back}
> systemctl restart openvswitch
> ~~~
> 
> Set up test:
> ~~~
> ip link add veth0 type veth peer name veth1
> ip link set dev veth1 netns test
> ip link set dev veth0 up
> ip netns exec test ip link set dev lo up
> ip netns exec test ip link set dev veth1 up
> ip netns exec test ip a a dev veth1 10.0.0.200/24
> 
> socket_mem=4096,4096
> pmd_cpu_mask=17c0017c
> host_cpu_mask=100001
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-socket-mem=$socket_mem
> ovs-vsctl --no-wait set Open_vSwitch .
> other_config:pmd-cpu-mask=$pmd_cpu_mask
> ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-lcore-mask=$host_cpu_mask
> ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
> ovs-vsctl add-bond br0 dpdkbond0 dpdk10 dpdk11 -- set interface dpdk10
> type=dpdk "options:dpdk-devargs=0000:05:00.0" -- set interface dpdk11
> type=dpdk "options:dpdk-devargs=0000:05:00.1" -- set port dpdkbond0
> lacp=active -- set port dpdkbond0 bond_mode=balance-tcp --  set port
> dpdkbond0 other-config:lacp-time=fast
> ovs-vsctl add-port br0 veth0 tag=905
> ~~~
> 
> Ping:
> ~~~
> ping 10.0.0.200
> ~~~
> 
> Shut down port:
> ~~~
> S4048-ON-sw(conf-if-te-1/19)#int te1/14
> S4048-ON-sw(conf-if-te-1/14)#shut
> ~~~

Hi with below script I can not reproduce it on both ovs 2.9.0-83.el7fdp.1 and 2.9.0-81.el7fdp. Both of them works fine .
I do not know which version have this issue . Just please tell me , Thanks 

ip netns del test
ip netns add test
#ip li del veth0
ip link add veth0 type veth peer name veth1
ip link set dev veth1 netns test
ip link set dev veth0 up
ip netns exec test ip link set dev lo up
ip netns exec test ip link set dev veth1 up
ip netns exec test ip a a dev veth1 10.0.0.200/24

swcfg port_up 5200-2 et-0/0/22
swcfg port_up 5200-2 et-0/0/23
swcfg cleanup_port_channel 5200-2 'et-0/0/22 et-0/0/23'
swcfg setup_port_channel 5200-2 'et-0/0/22 et-0/0/23' 'active'

systemctl restart openvswitch
sleep 3
ovs-vsctl --if-exists del-br br0

socket_mem=4096,4096
#pmd_cpu_mask=17c0017c
pmd_cpu_mask=0xa000
host_cpu_mask=0x1
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=$socket_mem
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=$pmd_cpu_mask
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=$host_cpu_mask
ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev

ovs-vsctl add-bond br0 dpdkbond0 dpdk10 dpdk11 -- set interface dpdk10 type=dpdk "options:dpdk-devargs=0000:86:00.0" -- set interface dpdk11 type=dpdk "options:dpdk-devargs=0000:86:00.1" -- set port dpdkbond0 lacp=active -- set port dpdkbond0 bond_mode=balance-tcp --  set port dpdkbond0 other-config:lacp-time=fast

ovs-vsctl add-port br0 veth0 tag=905

ip ad ad 10.0.0.100/24 dev veth0

########
ping 10.0.0.200

Comment 61 Hekai Wang 2018-12-12 07:29:25 UTC

(In reply to wanghekai from comment #59)
> (In reply to Andreas Karis from comment #26)
> > I can reproduce with the following setup:
> > 
> > Remove anything neutron:
> > ~~~
> > systemctl stop neutron-openvswitch-agent
> > mv /etc/openvswitch/conf.db{,.back}
> > systemctl restart openvswitch
> > ~~~
> > 
> > Set up test:
> > ~~~
> > ip link add veth0 type veth peer name veth1
> > ip link set dev veth1 netns test
> > ip link set dev veth0 up
> > ip netns exec test ip link set dev lo up
> > ip netns exec test ip link set dev veth1 up
> > ip netns exec test ip a a dev veth1 10.0.0.200/24
> > 
> > socket_mem=4096,4096
> > pmd_cpu_mask=17c0017c
> > host_cpu_mask=100001
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> > ovs-vsctl --no-wait set Open_vSwitch .
> > other_config:dpdk-socket-mem=$socket_mem
> > ovs-vsctl --no-wait set Open_vSwitch .
> > other_config:pmd-cpu-mask=$pmd_cpu_mask
> > ovs-vsctl --no-wait set Open_vSwitch .
> > other_config:dpdk-lcore-mask=$host_cpu_mask
> > ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
> > ovs-vsctl add-bond br0 dpdkbond0 dpdk10 dpdk11 -- set interface dpdk10
> > type=dpdk "options:dpdk-devargs=0000:05:00.0" -- set interface dpdk11
> > type=dpdk "options:dpdk-devargs=0000:05:00.1" -- set port dpdkbond0
> > lacp=active -- set port dpdkbond0 bond_mode=balance-tcp --  set port
> > dpdkbond0 other-config:lacp-time=fast
> > ovs-vsctl add-port br0 veth0 tag=905
> > ~~~
> > 
> > Ping:
> > ~~~
> > ping 10.0.0.200
> > ~~~
> > 
> > Shut down port:
> > ~~~
> > S4048-ON-sw(conf-if-te-1/19)#int te1/14
> > S4048-ON-sw(conf-if-te-1/14)#shut
> > ~~~
> 
> Hi with below script I can not reproduce it on both ovs 2.9.0-83.el7fdp.1
> and 2.9.0-81.el7fdp. Both of them works fine .
> I do not know which version have this issue . Just please tell me , Thanks 
> 
> ip netns del test
> ip netns add test
> #ip li del veth0
> ip link add veth0 type veth peer name veth1
> ip link set dev veth1 netns test
> ip link set dev veth0 up
> ip netns exec test ip link set dev lo up
> ip netns exec test ip link set dev veth1 up
> ip netns exec test ip a a dev veth1 10.0.0.200/24
> 
> swcfg port_up 5200-2 et-0/0/22
> swcfg port_up 5200-2 et-0/0/23
> swcfg cleanup_port_channel 5200-2 'et-0/0/22 et-0/0/23'
> swcfg setup_port_channel 5200-2 'et-0/0/22 et-0/0/23' 'active'
> 
> systemctl restart openvswitch
> sleep 3
> ovs-vsctl --if-exists del-br br0
> 
> socket_mem=4096,4096
> #pmd_cpu_mask=17c0017c
> pmd_cpu_mask=0xa000
> host_cpu_mask=0x1
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-socket-mem=$socket_mem
> ovs-vsctl --no-wait set Open_vSwitch .
> other_config:pmd-cpu-mask=$pmd_cpu_mask
> ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-lcore-mask=$host_cpu_mask
> ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
> 
> ovs-vsctl add-bond br0 dpdkbond0 dpdk10 dpdk11 -- set interface dpdk10
> type=dpdk "options:dpdk-devargs=0000:86:00.0" -- set interface dpdk11
> type=dpdk "options:dpdk-devargs=0000:86:00.1" -- set port dpdkbond0
> lacp=active -- set port dpdkbond0 bond_mode=balance-tcp --  set port
> dpdkbond0 other-config:lacp-time=fast
> 
> ovs-vsctl add-port br0 veth0 tag=905
> 
> ip ad ad 10.0.0.100/24 dev veth0
> 
> ########
> ping 10.0.0.200

Is there need add some openflows to openvswitch ?

Comment 64 Hekai Wang 2018-12-13 05:08:54 UTC

I am sorry I can not reproduce it with below command 

modprobe -r veth
modprobe -r bonding
modprobe -r 8021q
systemctl restart openvswitch
ovs-vsctl --if-exists del-br br0
ovs-vsctl --if-exists del-br br1
socket_mem=4096,4096
pmd_cpu_mask=0xa000
host_cpu_mask=0x1
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=$socket_mem
ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=$pmd_cpu_mask
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=$host_cpu_mask

ovs-vsctl add-br br1 -- set bridge br1 datapath_type=netdev
ip link add ovs-bond-if0 type veth peer name lx-bond-if0
ip link add ovs-bond-if1 type veth peer name lx-bond-if1
ip link add lx-bond0 type bond miimon 100 mode 802.3ad
ip link set dev lx-bond-if0 master lx-bond0
ip link set dev lx-bond-if1 master lx-bond0
ip link set dev lx-bond-if0 up
ip link set dev lx-bond-if1 up
ip link set dev ovs-bond-if0 up
ip link set dev ovs-bond-if1 up
ip link set dev lx-bond0 up
ovs-vsctl add-bond br1 dpdkbond1 ovs-bond-if0 ovs-bond-if1 -- set port dpdkbond1 lacp=active -- set port dpdkbond1 bond_mode=balance-tcp --  set port dpdkbond1 other-config:lacp-time=fast
ip link add name lx-bond0.905 link lx-bond0 type vlan id 905
ip link set dev lx-bond0.905 up
ip a a dev lx-bond0.905 192.168.123.10/24
ip link add veth2 type veth peer name veth3
ip netns add test2
ip link set dev veth2 netns test2
ip link set dev veth3 up
ip netns exec test2 ip link set dev lo up
ip netns exec test2 ip link set dev veth2 up
ip netns exec test2 ip a a dev veth2 192.168.123.11/24
ovs-vsctl add-port br1 veth3 tag=905

ping 192.168.123.11

*************************************************************
ip link set dev ovs-bond-if1 down
ovs-vsctl set port dpdkbond1 bond_mode=balance-slb

My host info 

[root@dell-per730-21 new-bonding]# rpm -qa | grep openv
python-openvswitch-2.9.0-56.el7fdp.noarch
openvswitch-selinux-extra-policy-1.0-9.el7fdp.noarch
openvswitch-2.9.0-56.el7fdp.x86_64
openvswitch-test-2.9.0-56.el7fdp.noarch

[root@dell-per730-21 new-bonding]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.6 (Maipo)


[root@dell-per730-21 new-bonding]# uname -r
3.10.0-957.el7.x86_64

Comment 65 Andreas Karis 2018-12-13 13:26:30 UTC

You may need to test:

Start a ping:
ping 192.168.123.11

Then, in a second CLI, while the ping is running (!) -  try the following sequence of commands:

ip link set dev ovs-bond-if1 down     # wait here and see if the ping times out
ip link set dev ovs-bond-if1 up       # if not, bring the interface back up and try with the other interface
ip link set dev ovs-bond-if0 down     # wait here and see if the ping times out
ip link set dev ovs-bond-if0 up       # bringing the interface back up should make packets flow again

Comment 66 Hekai Wang 2018-12-17 03:08:32 UTC

[root@dell-per730-21 new-bonding]# rpm -qa | grep openv
openvswitch-2.9.0-56.el7fdp.x86_64
python-openvswitch-2.9.0-56.el7fdp.noarch
openvswitch-test-2.9.0-56.el7fdp.noarch
openvswitch-selinux-extra-policy-1.0-9.el7fdp.noarch
[root@dell-per730-21 new-bonding]# 
[root@dell-per730-21 new-bonding]# 
[root@dell-per730-21 new-bonding]# 
[root@dell-per730-21 new-bonding]# 
[root@dell-per730-21 new-bonding]# modprobe -r veth
[root@dell-per730-21 new-bonding]# modprobe -r bonding
[root@dell-per730-21 new-bonding]# modprobe -r 8021q
[root@dell-per730-21 new-bonding]# systemctl restart openvswitch
[root@dell-per730-21 new-bonding]# ovs-vsctl --if-exists del-br br0
[root@dell-per730-21 new-bonding]# ovs-vsctl --if-exists del-br br1
[root@dell-per730-21 new-bonding]# socket_mem=4096,4096
[root@dell-per730-21 new-bonding]# pmd_cpu_mask=0xa000
[root@dell-per730-21 new-bonding]# host_cpu_mask=0x1
[root@dell-per730-21 new-bonding]# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
[root@dell-per730-21 new-bonding]# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=$socket_mem
[root@dell-per730-21 new-bonding]# ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=$pmd_cpu_mask
[root@dell-per730-21 new-bonding]# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=$host_cpu_mask
[root@dell-per730-21 new-bonding]# 
[root@dell-per730-21 new-bonding]# ovs-vsctl add-br br1 -- set bridge br1 datapath_type=netdev
[root@dell-per730-21 new-bonding]# ip link add ovs-bond-if0 type veth peer name lx-bond-if0
[root@dell-per730-21 new-bonding]# ip link add ovs-bond-if1 type veth peer name lx-bond-if1
[root@dell-per730-21 new-bonding]# ip link add lx-bond0 type bond miimon 100 mode 802.3ad
[root@dell-per730-21 new-bonding]# ip link set dev lx-bond-if0 master lx-bond0
[root@dell-per730-21 new-bonding]# ip link set dev lx-bond-if1 master lx-bond0
[root@dell-per730-21 new-bonding]# ip link set dev lx-bond-if0 up
[root@dell-per730-21 new-bonding]# ip link set dev lx-bond-if1 up
[root@dell-per730-21 new-bonding]# ip link set dev ovs-bond-if0 up
[root@dell-per730-21 new-bonding]# ip link set dev ovs-bond-if1 up
[root@dell-per730-21 new-bonding]# ip link set dev lx-bond0 up
[root@dell-per730-21 new-bonding]# ovs-vsctl add-bond br1 dpdkbond1 ovs-bond-if0 ovs-bond-if1 -- set port dpdkbond1 lacp=active -- set port dpdkbond1 bond_mode=balance-tcp --  set port dpdkbond1 other-config:lacp-time=fast
[root@dell-per730-21 new-bonding]# ip link add name lx-bond0.905 link lx-bond0 type vlan id 905
[root@dell-per730-21 new-bonding]# ip link set dev lx-bond0.905 up
[root@dell-per730-21 new-bonding]# ip a a dev lx-bond0.905 192.168.123.10/24
[root@dell-per730-21 new-bonding]# ip link add veth2 type veth peer name veth3
[root@dell-per730-21 new-bonding]# ip netns add test2
[root@dell-per730-21 new-bonding]# ip link set dev veth2 netns test2
[root@dell-per730-21 new-bonding]# ip link set dev veth3 up
[root@dell-per730-21 new-bonding]# ip netns exec test2 ip link set dev lo up
[root@dell-per730-21 new-bonding]# ip netns exec test2 ip link set dev veth2 up
[root@dell-per730-21 new-bonding]# ip netns exec test2 ip a a dev veth2 192.168.123.11/24
[root@dell-per730-21 new-bonding]# ovs-vsctl add-port br1 veth3 tag=905
[root@dell-per730-21 new-bonding]# ping 192.168.123.11
PING 192.168.123.11 (192.168.123.11) 56(84) bytes of data.
64 bytes from 192.168.123.11: icmp_seq=1 ttl=64 time=0.325 ms
64 bytes from 192.168.123.11: icmp_seq=2 ttl=64 time=0.112 ms
64 bytes from 192.168.123.11: icmp_seq=3 ttl=64 time=0.113 ms
64 bytes from 192.168.123.11: icmp_seq=4 ttl=64 time=0.104 ms
64 bytes from 192.168.123.11: icmp_seq=5 ttl=64 time=0.103 ms
64 bytes from 192.168.123.11: icmp_seq=6 ttl=64 time=0.101 ms
64 bytes from 192.168.123.11: icmp_seq=7 ttl=64 time=0.107 ms
64 bytes from 192.168.123.11: icmp_seq=8 ttl=64 time=0.118 ms
64 bytes from 192.168.123.11: icmp_seq=9 ttl=64 time=0.437 ms
64 bytes from 192.168.123.11: icmp_seq=10 ttl=64 time=0.108 ms
64 bytes from 192.168.123.11: icmp_seq=11 ttl=64 time=0.104 ms
64 bytes from 192.168.123.11: icmp_seq=12 ttl=64 time=0.109 ms
64 bytes from 192.168.123.11: icmp_seq=13 ttl=64 time=0.106 ms
64 bytes from 192.168.123.11: icmp_seq=14 ttl=64 time=0.105 ms
64 bytes from 192.168.123.11: icmp_seq=15 ttl=64 time=0.104 ms
64 bytes from 192.168.123.11: icmp_seq=16 ttl=64 time=0.104 ms
64 bytes from 192.168.123.11: icmp_seq=17 ttl=64 time=0.107 ms
64 bytes from 192.168.123.11: icmp_seq=18 ttl=64 time=0.106 ms
64 bytes from 192.168.123.11: icmp_seq=19 ttl=64 time=0.104 ms
64 bytes from 192.168.123.11: icmp_seq=20 ttl=64 time=0.103 ms
64 bytes from 192.168.123.11: icmp_seq=21 ttl=64 time=0.103 ms
64 bytes from 192.168.123.11: icmp_seq=22 ttl=64 time=0.107 ms
64 bytes from 192.168.123.11: icmp_seq=23 ttl=64 time=0.106 ms
64 bytes from 192.168.123.11: icmp_seq=24 ttl=64 time=0.104 ms
64 bytes from 192.168.123.11: icmp_seq=25 ttl=64 time=0.103 ms
64 bytes from 192.168.123.11: icmp_seq=26 ttl=64 time=0.101 ms
^C
--- 192.168.123.11 ping statistics ---
88 packets transmitted, 26 received, 70% packet loss, time 87000ms
rtt min/avg/max/mdev = 0.101/0.127/0.437/0.075 ms

Can reproduce it

Comment 73 errata-xmlrpc 2019-01-16 17:11:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0053

Note You need to log in before you can comment on or make changes to this bug.

aconole
akaris
amuller
apevec
astupnik
atelang
bhaley
cchen
cfields
chrisw
dalvarez
dhill
hewang
jraju
majopela
marjones
ovs-team
rhos-maint
tredaelli
tvvcox
wangxiaoning2
weiyongjun