Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2127167

Summary: [OVS HW offload] [ML2-OVS] ping takes a long time to reply (vlan provider network)
Product: Red Hat OpenStack Reporter: Miguel Angel Nieto <mnietoji>
Component: openvswitchAssignee: Open vSwitch development team <ovs-team>
Status: CLOSED NOTABUG QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: high    
Version: 17.0 (Wallaby)CC: apevec, cfontain, chrisw, egarciar, eshulman, gurpsing, hakhande, mlavalle, mleitner, ralonsoh
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-12 14:04:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tcpdump files none

Description Miguel Angel Nieto 2022-09-15 14:50:11 UTC
Created attachment 1912081 [details]
tcpdump files

Description of problem:
I have configured ovs hw offload using ml2-ovs. I have configured 2 vms using a vlan provider network and I am having issues pinging from one vm to the other one using vlan provider network. No isssue with vxlan. 

vm1: 30.30.220.141  fa:16:3e:f2:fb:b4
vm2: 30.30.220.198  fa:16:3e:a8:2c:73
ping from vm1 to vm2

I attach some tcpdump files done in 4 places:
1. nic interface inside guess (sender)
2. nic interface insice guess (receiver)
3. representor port hypervisor (sender vm)
4. representor port hypervisor (receiver vm)

I have executed several times ping and first time after a while, it take a long time to get icmp reply. For example, icmp packet with ID 22172 needed 46 seconds to receive icmp reply:
08:07:33.288990 fa:16:3e:f2:fb:b4 > fa:16:3e:a8:2c:73, ethertype IPv4 (0x0800), length 98: 30.30.220.141 > 30.30.220.198: ICMP echo request, id 22172, seq 1, length 64
.....
08:08:19.306938 fa:16:3e:f2:fb:b4 > fa:16:3e:a8:2c:73, ethertype IPv4 (0x0800), length 98: 30.30.220.141 > 30.30.220.198: ICMP echo request, id 22172, seq 47, length 64
08:08:19.307554 fa:16:3e:a8:2c:73 > fa:16:3e:f2:fb:b4, ethertype IPv4 (0x0800), length 98: 30.30.220.198 > 30.30.220.141: ICMP echo reply, id 22172, seq 47, length 64



Version-Release number of selected component (if applicable):
RHOS-17.0-RHEL-9-20220909.n.0


How reproducible:
Deploy ml2-ovs hwoffload templates
Spawn 2  vms (it can be done with testcase python -m testtools.run  nfv_tempest_plugin.tests.scenario.test_nfv_offload.TestNfvOffload.test_offload_udp. Stop execution after creating vms)
ping from one vm to the other one using vlan provider network. Usually first time will work, but it is easy to reproduce executing 2 or 3 times ping



Actual results:
ping should work

Expected results:
failing ping


Additional info:
I will upload tcpdump captures and sosreport

Comment 2 Miguel Angel Nieto 2022-09-16 12:41:19 UTC
Somehow it looks like the same issue than the following bz
https://bugzilla.redhat.com/show_bug.cgi?id=2108213

This other bz was open with offload ovn templates using vlan provider network while this new one is with offload ml2-ovs templates. With ml2-ovs i see 2 issues:
1. ping issues in vlan provider network
2. offload not working properly (as it happen with ovn templates)

With ovn templates I didnt see the ping issue

Both issues are solved after i reboot computes.

Comment 3 Elvira 2022-09-19 13:52:53 UTC
Hi Miguel Angel. Should we close this BZ as a duplicate of the one you have mentioned?

Comment 4 Miguel Angel Nieto 2022-09-26 07:13:26 UTC
I would wait until we clarify the other BZ. It looks they have similar root  cause (ARP), but the behaviour is different. In the other bz, they is no issue with ping

Comment 5 Miguel Angel Nieto 2022-09-28 22:27:36 UTC
I think this is a different bz than [1]. [1] is solved with a newer kernel, but not this bz, see [2]


[1] https://bugzilla.redhat.com/show_bug.cgi?id=2108213
[2]https://bugzilla.redhat.com/show_bug.cgi?id=2108213#c18

Comment 11 Miguel Angel Nieto 2023-01-12 14:04:30 UTC
Hi

I am not able to reproduce this issue any more as it is described here. With 2 vms, i have been able to run regression successfully using puddle RHOS-17.0-RHEL-9-20221213.n.1

So, I am going to close this bz