Bug 1600115 - ping loss of first packet with OVN l3 logical router.
Summary: ping loss of first packet with OVN l3 logical router.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z4
: 13.0 (Queens)
Assignee: lorenzo bianconi
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks: 1701893 1637466 1728318 1728674
TreeView+ depends on / blocked
 
Reported: 2018-07-11 13:03 UTC by Eran Kuris
Modified: 2019-09-09 16:39 UTC (History)
13 users (show)

Fixed In Version: openvswitch-2.9.0-76.el7fdn
Doc Type: Enhancement
Doc Text:
Previously, the first packet of a new connection using an OVN logical router was used to discover the MAC address of the destination. This resulted in the loss of the first packet on the new connection. This enhancement adds the capability to correctly queue the first packet of a new connection, which prevents the loss of that packet.
Clone Of:
: 1637466 1701893 (view as bug list)
Environment:
Last Closed: 2019-04-22 07:15:16 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:0081 None None None 2019-01-16 17:53:08 UTC

Description Eran Kuris 2018-07-11 13:03:49 UTC
Description of problem:
I have configured OVN L3 logical router for external traffic.
When pinging from VM to the external network or from the external network to the VM, the first time after bringing up ovn l3 router, there is ping loss of the first packet. Afterward, everything works fine even if pinging after
some interval and no packet loss is observed.

basically, if ovn doesn't know the mac for the next hop, it internally sends an arp packet and when it gets ht reply it stores the mac of that IP in mac_binding table so the first packet is dropped.

The issue is just the lack of packet buffering.  To fix it, we need to
implement packet buffering somehow.


Version-Release number of selected component (if applicable):
OSP13 2018-07-06.1

[root@controller-0 ~]# rpm -qa |grep ovn 
openvswitch-ovn-common-2.9.0-19.el7fdp.1.x86_64
openvswitch-ovn-host-2.9.0-19.el7fdp.1.x86_64
python-networking-ovn-metadata-agent-4.0.1-0.20180420150812.c7c16d4.el7ost.noarch
openvswitch-ovn-central-2.9.0-19.el7fdp.1.x86_64
python-networking-ovn-4.0.1-0.20180420150812.c7c16d4.el7ost.noarch
puppet-ovn-12.4.0-0.20180329043503.36ff219.el7ost.noarch


How reproducible:
always

Steps to Reproduce:
1. boot new vm on ovn setup 
2. add to him FIP 
3. ping from the external network to the vm you see that the first packet is lost.

Actual results:
the first packet lost

Expected results:
no packet loss

Additional info:
upstream thread 

https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/thread.html

https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046403.html

Comment 1 Numan Siddique 2018-07-11 13:44:05 UTC
As mentioned in the u/s ML discussions, this is a known limitation and it is by design. It has to be fixed in openvswitch.


https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046409.html

Comment 2 Lucas Alvares Gomes 2018-07-24 13:22:09 UTC
Since the fix should to go to openvswitch first, I'm changing the component for this bz.

Comment 4 Shelley Dunne 2018-10-12 15:34:09 UTC
Updating Target Milestone to z3 for all Modified medium bugs

Comment 19 Eran Kuris 2018-12-11 08:47:08 UTC
fix verified:
overcloud) [stack@undercloud-0 ~]$ ssh root@10.0.0.222
The authenticity of host '10.0.0.222 (10.0.0.222)' can't be established.
ECDSA key fingerprint is SHA256:RDa93tjACDaE1cZmnNkiPkWGLHYErQDXqwa5ouH1tbk.
ECDSA key fingerprint is MD5:b2:48:bd:78:27:07:a6:ad:1f:b8:ff:91:52:49:6a:ba.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.0.222' (ECDSA) to the list of known hosts.
root@10.0.0.222's password: 
[root@net-64-1-vm-1 ~]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=59.4 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=58.7 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=116 time=57.9 ms

OpenStack/13.0-RHEL-7/2018-12-07.1/

openvswitch-2.9.0-81.el7fdp.x86_64

Comment 23 errata-xmlrpc 2019-01-16 17:52:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0081

Comment 24 Eran Kuris 2019-04-18 12:54:49 UTC
I am hitting the same issue when I am working with VLAN tenant network type.
OpenStack/13.0-RHEL-7/2019-04-10.1
openvswitch-2.9.0-103.el7fdp.x86_64

Comment 25 Eran Kuris 2019-04-18 12:57:47 UTC
Following the above, I see the problem with traffic that goes from External network to the instance

--- 10.46.21.216 ping statistics ---
388 packets transmitted, 386 received, 0% packet loss, time 387044ms
rtt min/avg/max/mdev = 0.413/0.603/1.430/0.103 ms


2 packets lost

Comment 26 Jon Schlueter 2019-04-22 07:15:16 UTC
Eran,

Since this bug was shipped on an advisory please clone this bug or file another for failure of issue please.

Comment 27 Eran Kuris 2019-04-22 12:50:09 UTC
(In reply to Jon Schlueter from comment #26)
> Eran,
> 
> Since this bug was shipped on an advisory please clone this bug or file
> another for failure of issue please.

np I cloned this issue. https://bugzilla.redhat.com/show_bug.cgi?id=1701893


Note You need to log in before you can comment on or make changes to this bug.