Bug 1515815 - [Netvirt][NAT] SNAT flows are not removed after removing an external interface of a router
Summary: [Netvirt][NAT] SNAT flows are not removed after removing an external interfac...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z1
: 13.0 (Queens)
Assignee: Aswin Suryanarayanan
QA Contact: Noam Manos
URL:
Whiteboard: Netvirt
: 1558523 (view as bug list)
Depends On:
Blocks: 1414431 1528948
TreeView+ depends on / blocked
 
Reported: 2017-11-21 12:27 UTC by Itzik Brown
Modified: 2018-10-18 07:22 UTC (History)
6 users (show)

Fixed In Version: opendaylight-8.3.0-1.el7ost
Doc Type: Known Issue
Doc Text:
When the router gateway is cleared, the Layer 3 flows related to learned IP addresses is not removed. The learned IP addresses include the PNF and external gateway IP addresses. This leads stale flows, but not any functional issue. The external gateway and IP address does not change frequently. The stale flows will be removed when the external network is deleted.
Clone Of:
Environment:
N/A
Last Closed: 2018-07-19 13:53:05 UTC
Target Upstream Version:


Attachments (Terms of Use)
ODL Check SNAT flows scenario (115.54 KB, text/plain)
2018-07-12 12:12 UTC, Noam Manos
no flags Details
1) check snat with pre-created network objects (63.39 KB, text/plain)
2018-07-12 12:14 UTC, Noam Manos
no flags Details
2) check snat after removing network objects (21.47 KB, text/plain)
2018-07-12 12:15 UTC, Noam Manos
no flags Details
3) check snat after network objects were created again (37.31 KB, text/plain)
2018-07-12 12:17 UTC, Noam Manos
no flags Details


Links
System ID Priority Status Summary Last Updated
OpenDaylight Bug NETVIRT-1157 None None None 2018-03-22 06:58:23 UTC
OpenDaylight gerrit 70375 None None None 2018-06-29 06:28:17 UTC
OpenDaylight gerrit 71876 None None None 2018-06-29 06:27:12 UTC
Red Hat Product Errata RHBA-2018:2215 None None None 2018-07-19 13:53:47 UTC

Description Itzik Brown 2017-11-21 12:27:43 UTC
Description of problem:
After removing the external interface of a router no SNAT flows are removed.
After removing of a router there are still flows on a compute node with the IP of the router.

An example of flows:
cookie=0x8000003, duration=6586.136s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x324b0/0xfffffe,nw_dst=10.0.0.214 actions=goto_table:25

cookie=0x122201d9, duration=205.588s, table=81, n_packets=0, n_bytes=0, priority=100,arp,metadata=0x4157e000000/0xfffffffff000000,arp_tpa=10.0.0.214,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:9a:c9:20>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e9ac920->NXM_NX_ARP_SHA[],load:0xa0000d6->NXM_OF_ARP_SPA[],load:0->NXM_OF_IN_PORT[],load:0x400->NXM_NX_REG6[],write_metadata:0/0x1,goto_table:220

Version-Release number of selected component (if applicable):
Carbon opendaylight-6.2.0-4.el7ost.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
There should be no flows after removing an external interface from the router

Additional info:
u/s bug - https://jira.opendaylight.org/browse/NETVIRT-1020

Comment 1 Aswin Suryanarayanan 2017-12-12 11:52:35 UTC
(In reply to Itzik Brown from comment #0)
> Description of problem:
> After removing the external interface of a router no SNAT flows are removed.
> After removing of a router there are still flows on a compute node with the
> IP of the router.
> 
> An example of flows:
> cookie=0x8000003, duration=6586.136s, table=21, n_packets=0, n_bytes=0,
> priority=42,ip,metadata=0x324b0/0xfffffe,nw_dst=10.0.0.214
> actions=goto_table:25
> 
> cookie=0x122201d9, duration=205.588s, table=81, n_packets=0, n_bytes=0,
> priority=100,arp,metadata=0x4157e000000/0xfffffffff000000,arp_tpa=10.0.0.214,
> arp_op=1
> actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:9a:c9:
> 20>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],
> move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e9ac920->NXM_NX_ARP_SHA[],
> load:0xa0000d6->NXM_OF_ARP_SPA[],load:0->NXM_OF_IN_PORT[],load:0x400-
> >NXM_NX_REG6[],write_metadata:0/0x1,goto_table:220
> 
> Version-Release number of selected component (if applicable):
> Carbon opendaylight-6.2.0-4.el7ost.noarch
> 
> How reproducible:
> 
> 
> Steps to Reproduce:
> 1.
> 2.
> 3.
> 
> Actual results:
> 
> 
> Expected results:
> There should be no flows after removing an external interface from the router
> 
> Additional info:
> u/s bug - https://jira.opendaylight.org/browse/NETVIRT-1020

The flow to "nw_dst=10.0.0.214  actions=goto_table:25" is flow from FIP. So seems to be an issue related to FIP. Do you have any steps to reproduce this ? Was this a result of tempest test?

I tried creating a fip in vms across computes and tried multiple scenario and didn't observe this issue.

Comment 2 Itzik Brown 2017-12-12 13:17:30 UTC
Currently after removing the External network these are the flows:

# ovs-ofctl -O OpenFlow13 dump-flows br-int |grep 10.0.0
     cookie=0x8000000, duration=85485.560s, table=0, n_packets=756488, n_bytes=84687076, priority=4,in_port=1,vlan_tci=0x0000/0x1fff actions=write_metadata:0x110000000001/0xffffff0000000001,goto_table:17
     cookie=0x8000001, duration=776.462s, table=17, n_packets=633, n_bytes=70526, priority=10,metadata=0x110000000000/0xffffff0000000000 actions=load:0x1926a->NXM_NX_REG3[0..24],write_metadata:0x90001100000324d4/0xfffffffffffffffe,goto_table:19
     cookie=0x8040000, duration=776.462s, table=17, n_packets=629, n_bytes=70078, priority=10,metadata=0x9000110000000000/0xffffff0000000000 actions=load:0x11->NXM_NX_REG1[0..19],load:0x1388->NXM_NX_REG7[0..15],write_metadata:0xa000111388000000/0xfffffffffffffffe,goto_table:43
     cookie=0x1080000, duration=85470.515s, table=19, n_packets=759100, n_bytes=85337506, priority=0 actions=resubmit(,17)
     cookie=0x1030000, duration=85470.515s, table=20, n_packets=0, n_bytes=0, priority=0 actions=goto_table:80
     cookie=0x8000003, duration=746.867s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x324da/0xfffffe,nw_dst=10.0.0.1 actions=set_field:52:54:00:67:51:ed->eth_dst,load:0x1100->NXM_NX_REG6[],resubmit(,220)
     cookie=0x8000003, duration=85487.369s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x33c22/0xfffffe,nw_dst=10.0.0.0/24 actions=write_metadata:0x1770033c22/0xfffffffffe,goto_table:22
     cookie=0x8000003, duration=768.846s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x324da/0xfffffe,nw_dst=10.0.0.0/24 actions=write_metadata:0x13880324da/0xfffffffffe,goto_table:22
     cookie=0x8000004, duration=85487.369s, table=22, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33c22/0xfffffe,nw_dst=10.0.0.255 actions=drop
     cookie=0x8000004, duration=768.846s, table=22, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x324da/0xfffffe,nw_dst=10.0.0.255 actions=drop
     cookie=0x1080000, duration=85470.515s, table=23, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17)
     cookie=0x8800011, duration=776.451s, table=55, n_packets=0, n_bytes=0, priority=10,tun_id=0x11,metadata=0x110000000000/0xfffff0000000000 actions=drop
     cookie=0x1030000, duration=85470.513s, table=80, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17)

After talking with Aswin it should be fixed in 6.2.0-5

Comment 3 Aswin Suryanarayanan 2017-12-14 06:57:14 UTC
Fixed in version 6.2.0-5

Comment 8 Itzik Brown 2018-03-22 06:59:32 UTC
There are stale flows as reported in the u/s bug.
Checked with opendaylight-8.0.0-2.el7ost.noarch

Comment 9 Itzik Brown 2018-03-22 07:01:47 UTC
*** Bug 1558523 has been marked as a duplicate of this bug. ***

Comment 11 Mike Kolesnik 2018-04-16 06:42:17 UTC
Aswin, any progress on this?

Comment 12 Aswin Suryanarayanan 2018-04-16 06:52:59 UTC
[1] solves some of them. But still there are occasional flows in some tables. Which I am working on.

[1]https://git.opendaylight.org/gerrit/#/c/70375/

Comment 17 Mike Kolesnik 2018-05-21 12:38:44 UTC
Aswin,

Any update on this?

I see the partial fix was merged a while ago, is there anything else still needed to fix this?

Comment 18 Aswin Suryanarayanan 2018-05-21 13:59:01 UTC
There is one more patch, which should solve this bug.

https://git.opendaylight.org/gerrit/#/c/71495/

Comment 19 Mike Kolesnik 2018-05-31 10:12:59 UTC
Based on discussion with Aswin, seems these stale flows aren't causing functional failures, so lowering the priority of the bug.

Comment 24 Noam Manos 2018-07-12 12:10:04 UTC
Verified with the following scenario (see scenario output attachments).

# Create a cirros Image.
# Open security group rules for ICMP and SSH.
# Create an external network and a subnet.
# Create a router and attach an interface.
# Create a tenant network.
# Attach the tenant network to the router.
# Create a floating IP.
# Launch an instance and associate a Floating IP to the instance.
# Check connectivity.

# Check OVS SNAT flows - with Network object already created (see snat output 1).

# Delete Network, Subnet, Router, Ports and Floating IP.

# Check OVS SNAT flows on each controller - After Network objects were removed (see snat output 2)

# Re-create Network, Subnet, Router and Floating IP, and check connectivity.

# Check OVS SNAT flows on each controller - After Network objects were re-created (see snat output 3).

Comment 25 Noam Manos 2018-07-12 12:12:45 UTC
Created attachment 1458382 [details]
ODL Check SNAT flows scenario

(output is record since the first snat check, after objects were already created).

Comment 26 Noam Manos 2018-07-12 12:14:33 UTC
Created attachment 1458383 [details]
1) check snat with pre-created network objects

Comment 27 Noam Manos 2018-07-12 12:15:23 UTC
Created attachment 1458384 [details]
2) check snat after removing network objects

Comment 28 Noam Manos 2018-07-12 12:17:09 UTC
Created attachment 1458385 [details]
3) check snat after network objects were created again

Comment 30 errata-xmlrpc 2018-07-19 13:53:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2215


Note You need to log in before you can comment on or make changes to this bug.