Bug 1568012 - [NetvirtIssues] In a VLAN tenant network, after removal of a Floating IP from an instance there is no connectivity to an external Ip
Summary: [NetvirtIssues] In a VLAN tenant network, after removal of a Floating IP from...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: 13.0 (Queens)
Assignee: Sridhar Gaddam
QA Contact: Itzik Brown
URL:
Whiteboard: NetvirtIssues
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-16 15:12 UTC by Itzik Brown
Modified: 2018-10-18 07:25 UTC (History)
9 users (show)

Fixed In Version: opendaylight-8.0.0-11.el7ost
Doc Type: Known Issue
Doc Text:
Connecting to an external IP fails when associating a floating IP to an instance then disassociating the floating IP. This situation happens in a tenant VLAN network when: * a VM spawned on a non-NAPT switch is associated with a floating IP and * the floating IP is removed. This results in a missing flow (sporadically) in the FIB table of NAPT switch. Due to the missing FIB table entry, the VM loses connectivity to the public network. Associating the floating IP to the VM restores connectivity to the public network. As long as the floating IP is associated with the VM, it will be able to connect to the internet. However, you will lose a public IP/floating IP from the external network.
Clone Of:
Environment:
N/A
Last Closed: 2018-06-27 13:51:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenDaylight Bug NETVIRT-1080 0 None None None 2018-05-17 18:13:07 UTC
OpenDaylight gerrit 69777 0 None None None 2018-05-17 18:14:34 UTC
Red Hat Product Errata RHEA-2018:2086 0 None None None 2018-06-27 13:51:45 UTC

Description Itzik Brown 2018-04-16 15:12:38 UTC
Description of problem:
Associating a Floating IP to and instance then disassociating the Floating IP - connectivity to an external IP fails.

Version-Release number of selected component (if applicable):
opendaylight-8.0.0-5.el7ost.noarch

How reproducible:
It doesn't happen all the time. It may be related to the locations of the NAPT switch and the instance.

Steps to Reproduce:
1. Launch an instance and associate it with a FIP
2. Connect to the instance via console or from another instance	
3. Check connectivity to an external IP (that is not part of Neutron)
4. Disassociate the FIP from the instance
5. Check connectivity to an external IP (that is not part of Neutron)
6. Verify that there is no connectivity 



Actual results:


Expected results:


Additional info:

Comment 3 Sridhar Gaddam 2018-04-25 10:55:06 UTC
I tried the same use-case in a multi-node setup where NAPT switch is scheduled on a different Compute node and not on the Compute node where the VM is spawned, but could not reproduce the issue. Tried multiple times but the use-case seems to work fine. 

@Itzik, please attach logs to this BZ and if possible try to provide me access to the setup when the issue is seen - I can take a close look at it.

Comment 7 Aswin Suryanarayanan 2018-05-17 18:15:59 UTC
The removal of FIP was causing the external learned ip's to be removed. This solved by 
https://git.opendaylight.org/gerrit/#/c/69777/

Comment 20 Itzik Brown 2018-05-30 06:37:50 UTC
On bare metal it doesn't work with 8.0.0-11

Comment 21 Itzik Brown 2018-05-30 07:22:25 UTC
To be clear it's bare metal with VLAN setup.

Comment 22 Sridhar Gaddam 2018-06-01 13:29:15 UTC
Some updates:
-------------

This issue is seen only with VLAN tenant networks and NOT with VxLAN tenant networks.

           +---------------------------+
           | 8.8.8.8 (External Server) |
           +-----------+---------------+
                       |
  +------+-------------+----------------------+
         |                External Network (FLAT/VLAN)
         |
         |   ---+-------------------------------+-------+
         |      |          Tenant VLAN Network  |
         |      |                               |
         |      |                               |
       +-+------+---------+         +-----------+---------+
       |                  |         |     ComputeNode     |
       |    NAPT Switch   |         |     hosting VM      |
       |                  |         |     (10.0.0.8)      |
       +------------------+         +---------------------+

So, in a VLAN tenant network when the issue was reproduced, the problem was a missing Table-21 entry after DNAT (sample flow *) for the return traffic from 8.8.8.8 to VM. I had a look at the config datastore and there was no flow in the config store as well.

[*] table=21, priority=42,ip,metadata=0x30d46/0xfffffe,nw_dst=10.0.0.8 actions=set_field:fa:16:3e:62:20:80->eth_dst,load:0x700->NXM_NX_REG6[],resubmit(,220)

Comment 30 errata-xmlrpc 2018-06-27 13:51:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.