Bug 2138133

Summary: [16.2][OVN][HWOFFLOAD][TRANSPARENT VLAN] Flows removed after 2 minutes
Product: Red Hat OpenStack Reporter: Miguel Angel Nieto <mnietoji>
Component: openvswitchAssignee: Eelco Chaudron <echaudro>
Status: CLOSED NOTABUG QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: apevec, cfontain, chrisw, echaudro, egarciar, fleitner, hakhande, mleitner
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-25 07:31:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miguel Angel Nieto 2022-10-27 10:51:07 UTC
Description of problem:
Configured  transparent vlan on a ovn hwoffload deployment. We can see that offload is working properly for vlan, geneve and vlan trunk, but for transparent vlan we can see that every 2 minutes flows are removed and reinstalled again. I think that if there is traffic in progress, flows should not be removed.

We can see in representor port one packet every 2 minutes as packet is sent through the kernel instead of being offloaded. This is only happening in the compute in which the iperf client is executed.

listening on ens6f1_5, link-type EN10MB (Ethernet), capture size 262144 bytes                                                                                                                               [17/37]
10:15:58.903654 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 9018: vlan 146, p 0, ethertype IPv4, 60.60.220.101.52614 > 60.60.220.100.8138: Flags [.], seq 3603368222:3603377170, ack 2
633797991, win 210, options [nop,nop,TS val 507648 ecr 482346], length 8948                                                                                                                                        
10:16:00.210737 fa:16:3e:9a:03:a8 > fa:16:3e:72:f8:35, ethertype 802.1Q (0x8100), length 78: vlan 146, p 0, ethertype IPv4, 60.60.220.100.8138 > 60.60.220.101.52616: Flags [S.], seq 2938696362, ack 3127832769, $
in 26844, options [mss 8960,sackOK,TS val 510640 ecr 508948,nop,wscale 7], length 0                                                                                                                                
10:16:26.103726 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 9018: vlan 146, p 0, ethertype IPv4, 60.60.220.101.52614 > 60.60.220.100.8138: Flags [.], seq 0:8948, ack 1, win 210, opt$
ons [nop,nop,TS val 534848 ecr 482346], length 8948                                                                                                                                                                
10:16:53.175590 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 9018: vlan 146, p 0, ethertype IPv4, 60.60.220.101.52616 > 60.60.220.100.8138: Flags [.], seq 1:8949, ack 1, win 210, opt$
ons [nop,nop,TS val 561920 ecr 510640], length 8948                                                                                                                                                                
10:16:58.183576 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.100 tell 60.60.220.101, length 34                             
10:16:58.189611 fa:16:3e:9a:03:a8 > fa:16:3e:72:f8:35, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Reply 60.60.220.100 is-at fa:16:3e:9a:03:a8, length 38                                  
10:17:46.295579 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 9018: vlan 146, p 0, ethertype IPv4, 60.60.220.101.52616 > 60.60.220.100.8138: Flags [.], seq 1:8949, ack 1, win 210, opt$
ons [nop,nop,TS val 615040 ecr 510640], length 8948                                                                                                                                                                
10:17:51.303565 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.100 tell 60.60.220.101, length 34                             
10:17:51.317553 fa:16:3e:9a:03:a8 > fa:16:3e:72:f8:35, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Reply 60.60.220.100 is-at fa:16:3e:9a:03:a8, length 38                                  
10:18:01.959679 ae:87:9c:a3:1c:3d > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::ac87:9cff:fea3:1c3d > ff02::2: ICMP6, router solicitation, length 16                                              
10:19:32.663455 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 9018: vlan 146, p 0, ethertype IPv4, 60.60.220.101.52616 > 60.60.220.100.8138: Flags [.], seq 1:8949, ack 1, win 210, opt$
ons [nop,nop,TS val 721408 ecr 510640], length 8948                                                                                                                                                                
10:19:37.671446 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.100 tell 60.60.220.101, length 34                            
10:19:37.678721 fa:16:3e:9a:03:a8 > fa:16:3e:72:f8:35, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Reply 60.60.220.100 is-at fa:16:3e:9a:03:a8, length 38                                 
10:21:32.983405 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 9018: vlan 146, p 0, ethertype IPv4, 60.60.220.101.52616 > 60.60.220.100.8138: Flags [.], seq 1:8949, ack 1, win 210, opti
ons [nop,nop,TS val 841728 ecr 510640], length 8948
10:21:37.991386 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.100 tell 60.60.220.101, length 34
10:21:37.997580 fa:16:3e:9a:03:a8 > fa:16:3e:72:f8:35, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Reply 60.60.220.100 is-at fa:16:3e:9a:03:a8, length 38
10:23:33.303393 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 9018: vlan 146, p 0, ethertype IPv4, 60.60.220.101.52616 > 60.60.220.100.8138: Flags [.], seq 1:8949, ack 1, win 210, opti
ons [nop,nop,TS val 962048 ecr 510640], length 8948
10:23:38.311380 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.100 tell 60.60.220.101, length 34
10:23:38.318870 fa:16:3e:9a:03:a8 > fa:16:3e:72:f8:35, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Reply 60.60.220.100 is-at fa:16:3e:9a:03:a8, length 38
10:25:33.623384 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 9018: vlan 146, p 0, ethertype IPv4, 60.60.220.101.52616 > 60.60.220.100.8138: Flags [.], seq 1:8949, ack 1, win 210, opti
ons [nop,nop,TS val 1082368 ecr 510640], length 8948
10:25:38.631376 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.100 tell 60.60.220.101, length 34
10:25:38.637627 fa:16:3e:9a:03:a8 > fa:16:3e:72:f8:35, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Reply 60.60.220.100 is-at fa:16:3e:9a:03:a8, length 38
10:27:33.943365 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 9018: vlan 146, p 0, ethertype IPv4, 60.60.220.101.52616 > 60.60.220.100.8138: Flags [.], seq 1:8949, ack 1, win 210, opti
ons [nop,nop,TS val 1202688 ecr 510640], length 8948
10:27:38.951354 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.100 tell 60.60.220.101, length 34
10:27:38.957637 fa:16:3e:9a:03:a8 > fa:16:3e:72:f8:35, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Reply 60.60.220.100 is-at fa:16:3e:9a:03:a8, length 38
10:29:34.263363 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 9018: vlan 146, p 0, ethertype IPv4, 60.60.220.101.52616 > 60.60.220.100.8138: Flags [.], seq 1:8949, ack 1, win 210, opti
ons [nop,nop,TS val 1323008 ecr 510640], length 8948
10:29:39.271346 fa:16:3e:72:f8:35 > fa:16:3e:9a:03:a8, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.100 tell 60.60.220.101, length 34
10:29:39.277661 fa:16:3e:9a:03:a8 > fa:16:3e:72:f8:35, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Reply 60.60.220.100 is-at fa:16:3e:9a:03:a8, length 38
10:33:19.463689 ae:87:9c:a3:1c:3d > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::ac87:9cff:fea3:1c3d > ff02::2: ICMP6, router solicitation, length 16

Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20220902.n.1
openvswitch2.15-2.15.0-109.el8fdp.x86_64
ovn-2021-21.12.0-82.el8fdp.x86_64


How reproducible:
Deploy 16.2 ovn hwoffload templates: ospd-16.2-geneve-ovn-hw-offload-ctlplane-dataplane-bonding-hybrid
Create 2 vms using transparent vlan
Execute in both vms:
iperf -s -B 60.60.220.100 -p 8138
iperf -c 60.60.220.100 -T s2 -p 8138 -t 10000


Actual results:
We can see packets in representor port each 2 minutes as flows are removed


Expected results:
We should only see packets in representor port in the beginning of the transmission


Additional info:

Comment 1 Flavio Leitner 2022-10-28 18:37:42 UTC
Are there any other tests or changes in the cluster happening in parallel?
Because if you adding/removing ports from OVS, for example, the data path flows might have to change as well and then you see the behavior reported.
fbl

Comment 2 Miguel Angel Nieto 2022-10-31 08:25:36 UTC
No, there is anything happening in parallel. 

This behaviour is only happening with transparent vlan and tcp traffic. It is working fine for icmp and udp and for vlan, geneve or vlan trunk

Comment 3 Eelco Chaudron 2022-11-04 13:08:43 UTC
Is there an easy way to replicate this without OCP (or even better OVN ;).

Comment 4 Eelco Chaudron 2022-11-16 10:56:55 UTC
Any update on #3?

Comment 5 Miguel Angel Nieto 2022-11-16 11:44:22 UTC
Hi, I do not know how to reproduce it with ovs only, maybe i can reproduce it in the way I opened the bug and you can connect to the setup to debug it.

Comment 7 Elvira 2022-11-21 15:03:22 UTC
Changing DFG to NFV. Please reach back if there is anything we need to check. Thanks!

Comment 10 Eelco Chaudron 2022-11-24 13:39:24 UTC
To confirm, after changing the MTU to 1500, I do not see the representor traffic for the active TCP session. Only the occasional ARPs for which the flows have timed out.

[heat-admin@computehwoffload-r740-0 ~]$ sudo tcpdump -i ens6f1_0 -nne
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens6f1_0, link-type EN10MB (Ethernet), capture size 262144 bytes


13:33:35.351301 fa:16:3e:a9:06:cf > fa:16:3e:80:38:21, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.101 tell 60.60.220.100, length 38                                                                                                                                                                        
13:33:35.351355 fa:16:3e:80:38:21 > fa:16:3e:a9:06:cf, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Reply 60.60.220.101 is-at fa:16:3e:80:38:21, length 34
13:34:10.679312 fa:16:3e:a9:06:cf > fa:16:3e:80:38:21, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.101 tell 60.60.220.100, length 38                                                                                                                                                                        
13:34:10.679381 fa:16:3e:80:38:21 > fa:16:3e:a9:06:cf, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Reply 60.60.220.101 is-at fa:16:3e:80:38:21, length 34
13:34:46.007310 fa:16:3e:a9:06:cf > fa:16:3e:80:38:21, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.101 tell 60.60.220.100, length 38                                                                                                                                                                        
13:34:46.007372 fa:16:3e:80:38:21 > fa:16:3e:a9:06:cf, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Reply 60.60.220.101 is-at fa:16:3e:80:38:21, length 34
13:35:21.335277 fa:16:3e:a9:06:cf > fa:16:3e:80:38:21, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.101 tell 60.60.220.100, length 38                                                                                                                                                                        
13:35:21.335343 fa:16:3e:80:38:21 > fa:16:3e:a9:06:cf, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Reply 60.60.220.101 is-at fa:16:3e:80:38:21, length 34
13:35:56.663360 fa:16:3e:a9:06:cf > fa:16:3e:80:38:21, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.101 tell 60.60.220.100, length 38                                                                                                                                                                        
13:35:56.663423 fa:16:3e:80:38:21 > fa:16:3e:a9:06:cf, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Reply 60.60.220.101 is-at fa:16:3e:80:38:21, length 34
13:36:31.991377 fa:16:3e:a9:06:cf > fa:16:3e:80:38:21, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.101 tell 60.60.220.100, length 38                                                                                                                                                                        
13:36:31.991437 fa:16:3e:80:38:21 > fa:16:3e:a9:06:cf, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Reply 60.60.220.101 is-at fa:16:3e:80:38:21, length 34
13:37:07.319411 fa:16:3e:a9:06:cf > fa:16:3e:80:38:21, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.101 tell 60.60.220.100, length 38                                                                                                                                                                        
13:37:07.319467 fa:16:3e:80:38:21 > fa:16:3e:a9:06:cf, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Reply 60.60.220.101 is-at fa:16:3e:80:38:21, length 34
13:37:42.647356 fa:16:3e:a9:06:cf > fa:16:3e:80:38:21, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.101 tell 60.60.220.100, length 38                                                                                                                                                                        
13:37:42.647419 fa:16:3e:80:38:21 > fa:16:3e:a9:06:cf, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Reply 60.60.220.101 is-at fa:16:3e:80:38:21, length 34
13:38:17.975361 fa:16:3e:a9:06:cf > fa:16:3e:80:38:21, ethertype 802.1Q (0x8100), length 56: vlan 146, p 0, ethertype ARP, Request who-has 60.60.220.101 tell 60.60.220.100, length 38                                                                                                                                                                        
13:38:17.975422 fa:16:3e:80:38:21 > fa:16:3e:a9:06:cf, ethertype 802.1Q (0x8100), length 52: vlan 146, p 0, ethertype ARP, Reply 60.60.220.101 is-at fa:16:3e:80:38:21, length 34

Comment 11 Miguel Angel Nieto 2022-11-25 07:31:47 UTC
The issue was caused by the MTU, when reducing the MTU in the vm, then it works fine. Closing the BZ