Bug 1515815

Summary: [Netvirt][NAT] SNAT flows are not removed after removing an external interface of a router
Product: Red Hat OpenStack Reporter: Itzik Brown <itbrown>
Component: opendaylightAssignee: Aswin Suryanarayanan <asuryana>
Status: CLOSED ERRATA QA Contact: Noam Manos <nmanos>
Severity: high Docs Contact:
Priority: medium    
Version: 12.0 (Pike)CC: aadam, asuryana, itbrown, jschluet, mkolesni, nyechiel
Target Milestone: z1Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Netvirt
Fixed In Version: opendaylight-8.3.0-1.el7ost Doc Type: Known Issue
Doc Text:
When the router gateway is cleared, the Layer 3 flows related to learned IP addresses is not removed. The learned IP addresses include the PNF and external gateway IP addresses. This leads stale flows, but not any functional issue. The external gateway and IP address does not change frequently. The stale flows will be removed when the external network is deleted.
Story Points: ---
Clone Of: Environment:
N/A
Last Closed: 2018-07-19 13:53:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1414431, 1528948    
Attachments:
Description Flags
ODL Check SNAT flows scenario
none
1) check snat with pre-created network objects
none
2) check snat after removing network objects
none
3) check snat after network objects were created again none

Description Itzik Brown 2017-11-21 12:27:43 UTC
Description of problem:
After removing the external interface of a router no SNAT flows are removed.
After removing of a router there are still flows on a compute node with the IP of the router.

An example of flows:
cookie=0x8000003, duration=6586.136s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x324b0/0xfffffe,nw_dst=10.0.0.214 actions=goto_table:25

cookie=0x122201d9, duration=205.588s, table=81, n_packets=0, n_bytes=0, priority=100,arp,metadata=0x4157e000000/0xfffffffff000000,arp_tpa=10.0.0.214,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:9a:c9:20>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e9ac920->NXM_NX_ARP_SHA[],load:0xa0000d6->NXM_OF_ARP_SPA[],load:0->NXM_OF_IN_PORT[],load:0x400->NXM_NX_REG6[],write_metadata:0/0x1,goto_table:220

Version-Release number of selected component (if applicable):
Carbon opendaylight-6.2.0-4.el7ost.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
There should be no flows after removing an external interface from the router

Additional info:
u/s bug - https://jira.opendaylight.org/browse/NETVIRT-1020

Comment 1 Aswin Suryanarayanan 2017-12-12 11:52:35 UTC
(In reply to Itzik Brown from comment #0)
> Description of problem:
> After removing the external interface of a router no SNAT flows are removed.
> After removing of a router there are still flows on a compute node with the
> IP of the router.
> 
> An example of flows:
> cookie=0x8000003, duration=6586.136s, table=21, n_packets=0, n_bytes=0,
> priority=42,ip,metadata=0x324b0/0xfffffe,nw_dst=10.0.0.214
> actions=goto_table:25
> 
> cookie=0x122201d9, duration=205.588s, table=81, n_packets=0, n_bytes=0,
> priority=100,arp,metadata=0x4157e000000/0xfffffffff000000,arp_tpa=10.0.0.214,
> arp_op=1
> actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],set_field:fa:16:3e:9a:c9:
> 20>eth_src,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],
> move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xfa163e9ac920->NXM_NX_ARP_SHA[],
> load:0xa0000d6->NXM_OF_ARP_SPA[],load:0->NXM_OF_IN_PORT[],load:0x400-
> >NXM_NX_REG6[],write_metadata:0/0x1,goto_table:220
> 
> Version-Release number of selected component (if applicable):
> Carbon opendaylight-6.2.0-4.el7ost.noarch
> 
> How reproducible:
> 
> 
> Steps to Reproduce:
> 1.
> 2.
> 3.
> 
> Actual results:
> 
> 
> Expected results:
> There should be no flows after removing an external interface from the router
> 
> Additional info:
> u/s bug - https://jira.opendaylight.org/browse/NETVIRT-1020

The flow to "nw_dst=10.0.0.214  actions=goto_table:25" is flow from FIP. So seems to be an issue related to FIP. Do you have any steps to reproduce this ? Was this a result of tempest test?

I tried creating a fip in vms across computes and tried multiple scenario and didn't observe this issue.

Comment 2 Itzik Brown 2017-12-12 13:17:30 UTC
Currently after removing the External network these are the flows:

# ovs-ofctl -O OpenFlow13 dump-flows br-int |grep 10.0.0
     cookie=0x8000000, duration=85485.560s, table=0, n_packets=756488, n_bytes=84687076, priority=4,in_port=1,vlan_tci=0x0000/0x1fff actions=write_metadata:0x110000000001/0xffffff0000000001,goto_table:17
     cookie=0x8000001, duration=776.462s, table=17, n_packets=633, n_bytes=70526, priority=10,metadata=0x110000000000/0xffffff0000000000 actions=load:0x1926a->NXM_NX_REG3[0..24],write_metadata:0x90001100000324d4/0xfffffffffffffffe,goto_table:19
     cookie=0x8040000, duration=776.462s, table=17, n_packets=629, n_bytes=70078, priority=10,metadata=0x9000110000000000/0xffffff0000000000 actions=load:0x11->NXM_NX_REG1[0..19],load:0x1388->NXM_NX_REG7[0..15],write_metadata:0xa000111388000000/0xfffffffffffffffe,goto_table:43
     cookie=0x1080000, duration=85470.515s, table=19, n_packets=759100, n_bytes=85337506, priority=0 actions=resubmit(,17)
     cookie=0x1030000, duration=85470.515s, table=20, n_packets=0, n_bytes=0, priority=0 actions=goto_table:80
     cookie=0x8000003, duration=746.867s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x324da/0xfffffe,nw_dst=10.0.0.1 actions=set_field:52:54:00:67:51:ed->eth_dst,load:0x1100->NXM_NX_REG6[],resubmit(,220)
     cookie=0x8000003, duration=85487.369s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x33c22/0xfffffe,nw_dst=10.0.0.0/24 actions=write_metadata:0x1770033c22/0xfffffffffe,goto_table:22
     cookie=0x8000003, duration=768.846s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x324da/0xfffffe,nw_dst=10.0.0.0/24 actions=write_metadata:0x13880324da/0xfffffffffe,goto_table:22
     cookie=0x8000004, duration=85487.369s, table=22, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33c22/0xfffffe,nw_dst=10.0.0.255 actions=drop
     cookie=0x8000004, duration=768.846s, table=22, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x324da/0xfffffe,nw_dst=10.0.0.255 actions=drop
     cookie=0x1080000, duration=85470.515s, table=23, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17)
     cookie=0x8800011, duration=776.451s, table=55, n_packets=0, n_bytes=0, priority=10,tun_id=0x11,metadata=0x110000000000/0xfffff0000000000 actions=drop
     cookie=0x1030000, duration=85470.513s, table=80, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17)

After talking with Aswin it should be fixed in 6.2.0-5

Comment 3 Aswin Suryanarayanan 2017-12-14 06:57:14 UTC
Fixed in version 6.2.0-5

Comment 8 Itzik Brown 2018-03-22 06:59:32 UTC
There are stale flows as reported in the u/s bug.
Checked with opendaylight-8.0.0-2.el7ost.noarch

Comment 9 Itzik Brown 2018-03-22 07:01:47 UTC
*** Bug 1558523 has been marked as a duplicate of this bug. ***

Comment 11 Mike Kolesnik 2018-04-16 06:42:17 UTC
Aswin, any progress on this?

Comment 12 Aswin Suryanarayanan 2018-04-16 06:52:59 UTC
[1] solves some of them. But still there are occasional flows in some tables. Which I am working on.

[1]https://git.opendaylight.org/gerrit/#/c/70375/

Comment 17 Mike Kolesnik 2018-05-21 12:38:44 UTC
Aswin,

Any update on this?

I see the partial fix was merged a while ago, is there anything else still needed to fix this?

Comment 18 Aswin Suryanarayanan 2018-05-21 13:59:01 UTC
There is one more patch, which should solve this bug.

https://git.opendaylight.org/gerrit/#/c/71495/

Comment 19 Mike Kolesnik 2018-05-31 10:12:59 UTC
Based on discussion with Aswin, seems these stale flows aren't causing functional failures, so lowering the priority of the bug.

Comment 24 Noam Manos 2018-07-12 12:10:04 UTC
Verified with the following scenario (see scenario output attachments).

# Create a cirros Image.
# Open security group rules for ICMP and SSH.
# Create an external network and a subnet.
# Create a router and attach an interface.
# Create a tenant network.
# Attach the tenant network to the router.
# Create a floating IP.
# Launch an instance and associate a Floating IP to the instance.
# Check connectivity.

# Check OVS SNAT flows - with Network object already created (see snat output 1).

# Delete Network, Subnet, Router, Ports and Floating IP.

# Check OVS SNAT flows on each controller - After Network objects were removed (see snat output 2)

# Re-create Network, Subnet, Router and Floating IP, and check connectivity.

# Check OVS SNAT flows on each controller - After Network objects were re-created (see snat output 3).

Comment 25 Noam Manos 2018-07-12 12:12:45 UTC
Created attachment 1458382 [details]
ODL Check SNAT flows scenario

(output is record since the first snat check, after objects were already created).

Comment 26 Noam Manos 2018-07-12 12:14:33 UTC
Created attachment 1458383 [details]
1) check snat with pre-created network objects

Comment 27 Noam Manos 2018-07-12 12:15:23 UTC
Created attachment 1458384 [details]
2) check snat after removing network objects

Comment 28 Noam Manos 2018-07-12 12:17:09 UTC
Created attachment 1458385 [details]
3) check snat after network objects were created again

Comment 30 errata-xmlrpc 2018-07-19 13:53:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2215