Bug 2175037

Summary: ACK packets are dropped inside of br-int when "direct routing" load balancer is running on a instance
Product: Red Hat OpenStack Reporter: yatanaka
Component: openstack-neutronAssignee: Jakub Libosvar <jlibosva>
Status: NEW --- QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16.1 (Train)CC: astupnik, averdagu, chrisw, gthiemon, jlibosva, mlavalle, scohen
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yatanaka 2023-03-03 00:24:34 UTC
Description of problem:

Summary : ACK packets are dropped inside of br-int on ml2/OVN environment

The customer is running "direct routing" loadbalancer[1] on a instances.

  - InstanceA : client 
  - InstanceB : direct routing loadbalancer
  - InstanceC : real server

  Note: InstanceX is not the actual instance name. I'm replacing the actual name with meaningless names.

All instances are connected to the same provider network.
InstanceA and InstanceB is on the same compute node and InstanceC is on a different compute node.
I took packet captures on tap-XXX interface and physical interface on compute nodes, and I'm summarizing the result below.
SYNACK packet bypass the loadbalancer, which is expected behavior of "direct routing" loadbalancer.
SYN packet and SYNACK packet arrives the destination nodes correctly, but ACK packet doesn't arrive the Loadbalancer.
As both InstanceA and InstanceB are running on the same compute node, ACK packet was dropped in br-int of the compute node.

~~~
|--------------------- computeA --------------------------------|                |---- computeB ----------|

 Client                       Loadbalancer                                              Server
   InstanceA                    InstanceB                                                 InstanceC
   aa:aa:aa:aa:aa:aa            bb:bb:bb:bb:bb:bb                                         cc:cc:cc:cc:cc:cc
   10.0.0.1                     10.0.0.100(VIP)                                           10.0.0.100(VIP)
     |                           |                                                         |
     |  TCP SYN                  |                                                         |
     |  src: aa:aa:aa:aa:aa:aa   |                                                         |
     |       10.0.0.1            |                                                         |
     |  dst: bb:bb:bb:bb:bb:bb   |                                                         |
     |       10.0.0.100          |                                                         |
     |-------------------------->|                                                         |
     |                           |  TCP SYN                                                |
     |                           |  src: bb:bb:bb:bb:bb:bb                                 |
     |                           |       10.0.0.1                                          |
     |                           |  dst: cc:cc:cc:cc:cc:cc                                 |
     |                           |       10.0.0.100                  VLAN 3                |
     |                           |---------------------------> eth1 - - - - > eth1 ------->|
     |                           |                           computeA       computeB       |
     |              TCP SYN+ACK                                                            |
     |              src: cc:cc:cc:cc:cc:cc                                                 |
     |                   10.0.0.100                                                        |
     |              dst: aa:aa:aa:aa:aa:aa                                                 |
     |                   10.0.0.1                                     VLAN 3               |
     |<------------------------------------------------------- eth1 < - - - - eth1 <-------|
     |                           |                           computeA       computeB       |
     |  TCP ACK                  |                                                         |
     |  src: aa:aa:aa:aa:aa:aa   |                                                         |
     |       10.0.0.1            |                                                         |
     |  dst: bb:bb:bb:bb:bb:bb   |                                                         |
     |       10.0.0.100          |                                                         |
     |---------------------->*   |                                                         |
     |                   dropped |                                                         |
     |                    here   |                                                         |
     |                           |                                                         |

  Note: I'm replacing the actual name/IP address/MAC address with meaningless one.
~~~

Security group of the port has been disabled by the following commands.

~~~
    # neutron port-update --no-allowed-address-pairs <port>
    # neutron port-update --no-security-groups <port>
    # openstack port set --disable-port-security <port>
~~~

This issue is reproduced always.
This issue doesn't occur on ml2/OVS environment.
This is only observed on ml2/OVN environment.

I checked "ovn-trace" and "ovs-appctl ofproto/trace", but these command don'T drop the ACK packet.
ACK packet is supposed to arrive the loadbalancer successfully as fa as I check these commands.

As I suspected conntrack of Open vSwitch, I asked CU to run ovs-appctl dpctl/dump-conntrack while the issue is being reproduced but it shows ESTABLISH.
So conntrack recognize the ACK packets and I don't think conntrack dropped the ACK packets.

fdb is not suspect because ACK packets are dropped in br-int on the compute node, and br-int doesn't have NORMAL rules.



Version-Release number of selected component (if applicable):
RHOSP 16.1.2 + ml2/OVN


How reproducible:
Steps to Reproduce:
1. deploy overcloud with ml2/OVN
2. create an provider network
3. Create three instances on the provider network
   First one is a client, second one is a direct routing loadbalancer, third one is the real server.
4. Send request from the client to the loadbalancer.


Actual results:
Only ACK packets are dropped before arriving the loadbalancer.


Expected results:
No packets are dropped


Additional info:
[1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/load_balancer_administration/s1-lvs-direct-vsa