Bug 1554233 - Flow based ping responder fails since packet state goes to -trk
Summary: Flow based ping responder fails since packet state goes to -trk
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Rashid Khan
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks: 1534886
TreeView+ depends on / blocked
 
Reported: 2018-03-12 07:34 UTC by Aswin Suryanarayanan
Modified: 2023-09-14 04:17 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-07 16:18:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Aswin Suryanarayanan 2018-03-12 07:34:58 UTC
Description of problem:

The flow based ping responder fails since the packet state goes to -trk and security groups drops it. In odl router ping responder is flow base(table 21 in [1]), the packet created by ping responder seems to have invalid conntrack packet state. 

The issue can be reproduced with [1] . [2] shows the stats. 

[1]https://gist.github.com/aswinsuryan/dd967b2bd404cd4e0671b2a3fb5b358e
[2]https://gist.github.com/aswinsuryan/4c76be32141bf986f6d0302828c87431


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
Try the flows in the reproducer [1] in ovs2.9 with kmod installed

Actual results:
Ping is not successful and ct_state is -trk

Expected results:
Ping should be successful and ct_state should be +trk+est


Additional info:

Comment 5 Eric Garver 2018-04-11 20:16:42 UTC
This is caused by ICMP actions not being supported by the kernel datapath causing them to occur in userspace. I'll try to explain what's happening, but it's tricky because some bits occur in the kernel and some in userspace.

  1) echo request ingresses OVS bridge
    - ct(commit) action causes a ct lookup in the kernel and a commit to conntrack

  2) echo request is modified into a reply (set_field:0->icmp_type)
    - flow slow pathed to userspace
    - action not supported by the kernel datapath so it must be sent to userspace
    - userspace happily does the ICMP packet modifications

  3) ct_clear action occurs
    - since we changed the tuple in #2 conntrack state is no longer valid for the frame so it's cleared

  4) ct(table=3) action occurs and triggers a recirculation
    - since the kernel datapath is in use, but flow is currently executing in userspace, conntrack actions are sent _independently_ to the kernel. The flow is forked and recirculated.
    - a copy of the packet is sent to the kernel to perform the ct() action
    - a copy of the packet is sent to the kernel to perform the recirc action

  5) kernel performs recirc action
    - this continues execution at table=3, which was specified by ct(table=3) action
    - no ct() lookup actually occurs in the kernel, because it was done independently in #4 above. As such, the packet is ct_state=-trk.


__WORKAROUND__

A workaround is to, in step 5, do another ct(table=xxx) action to force a ct lookup in the kernel. This works because the flow is currently executing in the kernel and not userspace.

I have verified this works using the OVS testsuite.

Comment 6 Aswin Suryanarayanan 2018-04-17 06:28:13 UTC
(In reply to Eric Garver from comment #5)
> This is caused by ICMP actions not being supported by the kernel datapath
> causing them to occur in userspace. I'll try to explain what's happening,
> but it's tricky because some bits occur in the kernel and some in userspace.
> 
>   1) echo request ingresses OVS bridge
>     - ct(commit) action causes a ct lookup in the kernel and a commit to
> conntrack
> 
>   2) echo request is modified into a reply (set_field:0->icmp_type)
>     - flow slow pathed to userspace
>     - action not supported by the kernel datapath so it must be sent to
> userspace
>     - userspace happily does the ICMP packet modifications
> 
>   3) ct_clear action occurs
>     - since we changed the tuple in #2 conntrack state is no longer valid
> for the frame so it's cleared
> 
>   4) ct(table=3) action occurs and triggers a recirculation
>     - since the kernel datapath is in use, but flow is currently executing
> in userspace, conntrack actions are sent _independently_ to the kernel. The
> flow is forked and recirculated.
>     - a copy of the packet is sent to the kernel to perform the ct() action
>     - a copy of the packet is sent to the kernel to perform the recirc action
> 
>   5) kernel performs recirc action
>     - this continues execution at table=3, which was specified by
> ct(table=3) action
>     - no ct() lookup actually occurs in the kernel, because it was done
> independently in #4 above. As such, the packet is ct_state=-trk.
> 
> 
> __WORKAROUND__
> 
> A workaround is to, in step 5, do another ct(table=xxx) action to force a ct
> lookup in the kernel. This works because the flow is currently executing in
> the kernel and not userspace.
> 
> I have verified this works using the OVS testsuite.

Eric,
This workaround need to be done in the pipeline? or in the ovs code base?

Comment 7 Eric Garver 2018-04-17 12:43:23 UTC
(In reply to Aswin Suryanarayanan from comment #6)
> 
> Eric,
> This workaround need to be done in the pipeline? or in the ovs code base?

Pipeline.

Comment 12 Bernard Cafarelli 2020-01-07 16:18:16 UTC
Closing this bug as it was opened 2 years ago and has not recent activity, PM or customer request and was originally aiming ODL. Feel free to reopen if interest appears again

Comment 13 Red Hat Bugzilla 2023-09-14 04:17:37 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.