Bug 2127167 - [OVS HW offload] [ML2-OVS] ping takes a long time to reply (vlan provider network)
Summary: [OVS HW offload] [ML2-OVS] ping takes a long time to reply (vlan provider ne...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Open vSwitch development team
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-15 14:50 UTC by Miguel Angel Nieto
Modified: 2023-01-15 08:36 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-12 14:04:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
tcpdump files (10.85 KB, application/gzip)
2022-09-15 14:50 UTC, Miguel Angel Nieto
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker NFV-2650 0 None None None 2022-09-26 14:17:01 UTC
Red Hat Issue Tracker OSP-18732 0 None None None 2022-09-15 15:26:30 UTC

Description Miguel Angel Nieto 2022-09-15 14:50:11 UTC
Created attachment 1912081 [details]
tcpdump files

Description of problem:
I have configured ovs hw offload using ml2-ovs. I have configured 2 vms using a vlan provider network and I am having issues pinging from one vm to the other one using vlan provider network. No isssue with vxlan. 

vm1: 30.30.220.141  fa:16:3e:f2:fb:b4
vm2: 30.30.220.198  fa:16:3e:a8:2c:73
ping from vm1 to vm2

I attach some tcpdump files done in 4 places:
1. nic interface inside guess (sender)
2. nic interface insice guess (receiver)
3. representor port hypervisor (sender vm)
4. representor port hypervisor (receiver vm)

I have executed several times ping and first time after a while, it take a long time to get icmp reply. For example, icmp packet with ID 22172 needed 46 seconds to receive icmp reply:
08:07:33.288990 fa:16:3e:f2:fb:b4 > fa:16:3e:a8:2c:73, ethertype IPv4 (0x0800), length 98: 30.30.220.141 > 30.30.220.198: ICMP echo request, id 22172, seq 1, length 64
.....
08:08:19.306938 fa:16:3e:f2:fb:b4 > fa:16:3e:a8:2c:73, ethertype IPv4 (0x0800), length 98: 30.30.220.141 > 30.30.220.198: ICMP echo request, id 22172, seq 47, length 64
08:08:19.307554 fa:16:3e:a8:2c:73 > fa:16:3e:f2:fb:b4, ethertype IPv4 (0x0800), length 98: 30.30.220.198 > 30.30.220.141: ICMP echo reply, id 22172, seq 47, length 64



Version-Release number of selected component (if applicable):
RHOS-17.0-RHEL-9-20220909.n.0


How reproducible:
Deploy ml2-ovs hwoffload templates
Spawn 2  vms (it can be done with testcase python -m testtools.run  nfv_tempest_plugin.tests.scenario.test_nfv_offload.TestNfvOffload.test_offload_udp. Stop execution after creating vms)
ping from one vm to the other one using vlan provider network. Usually first time will work, but it is easy to reproduce executing 2 or 3 times ping



Actual results:
ping should work

Expected results:
failing ping


Additional info:
I will upload tcpdump captures and sosreport

Comment 2 Miguel Angel Nieto 2022-09-16 12:41:19 UTC
Somehow it looks like the same issue than the following bz
https://bugzilla.redhat.com/show_bug.cgi?id=2108213

This other bz was open with offload ovn templates using vlan provider network while this new one is with offload ml2-ovs templates. With ml2-ovs i see 2 issues:
1. ping issues in vlan provider network
2. offload not working properly (as it happen with ovn templates)

With ovn templates I didnt see the ping issue

Both issues are solved after i reboot computes.

Comment 3 Elvira 2022-09-19 13:52:53 UTC
Hi Miguel Angel. Should we close this BZ as a duplicate of the one you have mentioned?

Comment 4 Miguel Angel Nieto 2022-09-26 07:13:26 UTC
I would wait until we clarify the other BZ. It looks they have similar root  cause (ARP), but the behaviour is different. In the other bz, they is no issue with ping

Comment 5 Miguel Angel Nieto 2022-09-28 22:27:36 UTC
I think this is a different bz than [1]. [1] is solved with a newer kernel, but not this bz, see [2]


[1] https://bugzilla.redhat.com/show_bug.cgi?id=2108213
[2]https://bugzilla.redhat.com/show_bug.cgi?id=2108213#c18

Comment 11 Miguel Angel Nieto 2023-01-12 14:04:30 UTC
Hi

I am not able to reproduce this issue any more as it is described here. With 2 vms, i have been able to run regression successfully using puddle RHOS-17.0-RHEL-9-20221213.n.1

So, I am going to close this bz


Note You need to log in before you can comment on or make changes to this bug.