Bug 1753216

Summary:	Duplicated IPs on InfraNodes using automatic egress IP
Product:	OpenShift Container Platform	Reporter:	Bruno Lima <blima>
Component:	Networking	Assignee:	Dan Winship <danw>
Networking sub component:	openshift-sdn	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	aghadge, aivaraslaimikis, bbennett, danw, huirwang, jnordell, natanfranghieru, palonsor, piqin, swasthan
Version:	3.11.0
Target Milestone:	---
Target Release:	4.3.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: When the SDN pod was restarted on a node, it did not clean up any old Egress IPs. Consequence: If the set of Egress IPs assigned to a node changed while the SDN pod was not running (eg, because multiple services on the node were restarted at the same time) then the node might continue to claim that it owned the Egress IP even after the IP had been assigned to another node, causing traffic to that IP to be delivered to the wrong node and be lost. Fix: The SDN pod now cleans up stale Egress IPs at startup. Result: Nodes should not fight over ownership of Egress IPs.	Story Points:	---
Clone Of:
Clones:	1762235 1772904 1772905 (view as bug list)		Environment:
Last Closed:	2020-01-23 11:06:16 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1762235, 1772904, 1772905

Description Bruno Lima 2019-09-18 12:30:38 UTC

Description of problem:

Egress IPs are duplicated in infranodes for unknown reason, causing malfunction in the communication between pods and destinations outside projects.

When we apply egressIP unexpectedly the same IP is allocated on other nodes.

It is as if the IP was not reallocated to another Node correctly.

Version-Release number of selected component (if applicable):

OpenShift Master: v3.11.117
Kubernetes Master: v1.11.0+d4cacc0
OpenShift Web Console: v3.11.117


How reproducible:

Apply automatic egressIP on project.

Steps to Reproduce:
1. Create a project
$ oc new-project <projectname> --description="<description>" --display-name="<display_name>"

2. Apply automatic egressIP
$ oc patch netnamespace <project_name> -p '{"egressIPs": ["<IP_address>"]}' 
$ oc patch hostsubnet node1 -p '{"egressCIDRs": ["192.168.1.0/24"]}'

3. Check hostsubnet
$ oc get hostsubnet

Actual results:

The egressIP will be applied, but few minutes after, the IP applied will be duplicated in another host.


Expected results:

EgressIP applied only in one host.


Additional info:

When we restart the docker and atomic-openshift-node everything work as expected.

$ systemctl restart docker atomic-openshift-node

Comment 1 natanfranghieru 2019-09-18 13:48:45 UTC

I have the same problem, here with more details:

We have 3 infra nodes, they have the egress CIDRs:

NAME	HOST	HOST IP		SUBNET		EGRESS CIDRS		EGRESS IPS
infra01	infra01	192.168.1.247	10.144.16.0/23	[192.168.1.0/24]	[192.168.1.19, 192.168.1.25, 192.168.1.22, 192.168.1.12, 192.168.1.14, 192.168.1.24, 192.168.1.21]
infra02	infra02	192.168.1.246	10.144.14.0/23	[192.168.1.0/24]	[192.168.1.15, 192.168.1.29, 192.168.1.11, 192.168.1.30, 192.168.1.27, 192.168.1.17]
infra03	infra03	192.168.1.245	10.144.10.0/23	[192.168.1.0/24]	[192.168.1.28, 192.168.1.13, 192.168.1.31, 192.168.1.16, 192.168.1.20, 192.168.1.18]

We are assigning one egress IP per project:

NAME		NETID		EGRESS IPS
project01	omitted		[192.168.1.17]
project02	omitted		[192.168.1.28]
project03	omitted		[192.168.1.20]
project04	omitted		[192.168.1.29]
project05	omitted		[192.168.1.27]
project06	omitted		[192.168.1.16]
project07	omitted		[192.168.1.25]
project08	omitted		[192.168.1.21]
project09	omitted		[192.168.1.14]
project10	omitted		[192.168.1.13]
project11	omitted		[192.168.1.31]
project12	omitted		[192.168.1.19]
project13	omitted		[192.168.1.15]
project14	omitted		[192.168.1.18]
project15	omitted		[192.168.1.30]
project16	omitted		[192.168.1.12]
project17	omitted		[192.168.1.11]
project18	omitted		[192.168.1.24]
project19	omitted		[192.168.1.22]

On the infra nodes, the egress IP are showing in the interface as secondary:

infra01:
    inet 192.168.1.14/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.12/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.25/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.19/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.24/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.22/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.21/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever

infra02:
    inet 192.168.1.15/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.27/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.11/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.30/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.29/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.17/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever

infra03:
    inet 192.168.1.13/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.28/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.20/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.31/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.16/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.18/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever

Some events (we don't know exactly which ones, but one of them is docker restart) triggers the Egress IP engine to change the IP from node, so, for example, for some reason the docker service had a problem in infra03, the egress IPs that are on infra03 are migrated to infra01 and infra 02:

NAME	HOST	HOST IP		SUBNET		EGRESS CIDRS		EGRESS IPS
infra01	infra01	192.168.1.247	10.144.16.0/23	[192.168.1.0/24]	[192.168.1.19, 192.168.1.25, 192.168.1.22, 192.168.1.12, 192.168.1.14, 192.168.1.24, 192.168.1.21, 192.168.1.28, 192.168.1.31, 192.168.1.20]
infra02	infra02	192.168.1.246	10.144.14.0/23	[192.168.1.0/24]	[192.168.1.15, 192.168.1.29, 192.168.1.11, 192.168.1.30, 192.168.1.27, 192.168.1.17, 192.168.1.13, 192.168.1.16, 192.168.1.18]
infra03	infra03	192.168.1.245	10.144.10.0/23	[192.168.1.0/24]	[]

The IPs are allocated on interfaces too:
infra01:
    inet 192.168.1.14/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.12/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.25/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.19/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.24/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.22/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.21/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.28/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.20/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.31/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever

infra02:
    inet 192.168.1.15/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.27/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.11/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.30/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.29/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.17/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.13/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.16/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.18/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever

And the bug is here, the infra03 IPs aren't removed from the interface, causing duplicated IP on network, this causes outages, because sometimes the traffic goes to the right node, and sometimes to the wrong node. 

infra03:
    inet 192.168.1.13/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.28/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.20/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.31/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.16/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever
    inet 192.168.1.18/24 brd 192.168.1.255 scope global secondary mgmt
       valid_lft forever preferred_lft forever

The expected behavior is that all the secondary IPs that were migrated should be deleted from the interface.

OpenShift Master: v3.11.117
Kubernetes Master: v1.11.0+d4cacc0
OpenShift Web Console: v3.11.117

Comment 2 natanfranghieru 2019-09-18 14:00:18 UTC

One more missing detail, this bug happens in 3.11.141 too, as we tested on our laboratory.

Comment 3 Dan Winship 2019-09-18 15:26:52 UTC

Hm... so I guess the problem is that the docker restart on infra03 also causes openshift-sdn to restart, and when it comes back up, the egress IPs have already been removed from the HostSubnet object, so it doesn't explicitly get told to remove them, and it doesn't actually even know that they are former egress IPs.

It shouldn't be hard to fix, but there's no easy workaround until we do. They'll need to just ensure that the old egress IPs get cleaned up manually when this happens.

Comment 4 Dan Winship 2019-09-18 16:01:36 UTC

QE: to reproduce/test:

  - Create a namespace+node with an egress IP, as in other tests. (Use a manually-assigned
    egress IP, it will be simpler.) Confirm that it works.

  - On the node with the egress IP, kill the sdn pod and prevent it from being restarted.
    Not sure what the best way to do that is... 

  - Remove the egress IP from the HostSubnet it's on, add it to a different HostSubnet.
    Confirm that "ip addr" on both nodes shows the egress IP.

  - Allow a new sdn pod to be started on the old egress node.

  - Expected behavior: "ip addr" on the old egress node shows that the egress IP has now
    been removed. (Current/buggy behavior: the egress IP still exists after restart.)

Comment 5 natanfranghieru 2019-09-18 16:55:19 UTC

(In reply to Dan Winship from comment #3)
> Hm... so I guess the problem is that the docker restart on infra03 also
> causes openshift-sdn to restart, and when it comes back up, the egress IPs
> have already been removed from the HostSubnet object, so it doesn't
> explicitly get told to remove them, and it doesn't actually even know that
> they are former egress IPs.
> 
> It shouldn't be hard to fix, but there's no easy workaround until we do.
> They'll need to just ensure that the old egress IPs get cleaned up manually
> when this happens.

Thanks! We've already made a Python script that runs every 5 min and compares the hostsubnet versus the ip addr of every infra node, if there are more IP addresses than egress IPs, the script deletes the leftovers IPs from the corresponding infra. It doesn't prevent the error, but at least we suffer for a maximum of 5 minutes. It not happens a lot, but when it happens it's a big impact because the pods can't access nothing outside the project.

Comment 8 Dan Winship 2019-10-09 14:50:07 UTC

(In reply to Dan Winship from comment #4)
> QE: to reproduce/test:

additional test:

  - If you restart openshift-sdn *without* reassigning the egress IPs, it
    doesn't remove them on startup. (Check the logs to make sure of this;
    if it removes the IP but then adds it back, that counts as failure.
    It shouldn't remove it in the first place.)

Comment 15 errata-xmlrpc 2020-01-23 11:06:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062