Bug 2097243 - NodeIP is used instead of EgressIP
Summary: NodeIP is used instead of EgressIP
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.12.0
Assignee: Surya Seetharaman
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks: 2105101
TreeView+ depends on / blocked
 
Reported: 2022-06-15 09:38 UTC by Anand T N
Modified: 2023-01-17 19:50 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When an egressIP pod is recreated with the namespace/name quickly within a timespan of 60 seconds, stale entry of the older pod in the cache was used to create the new pod thus causing wrong SNAT's and re-route policies to be confiured for the egressIP pod. Consequence: As a result of wrong SNAT being configured packets could go out with nodeIP instead of egressIP SNAT. Fix: This bug fixed the staleness in cache and ensured the new information is used to construct SNATs and policies. Result: Traffic leaves the cluster through egressIP instead of nodeIP.
Clone Of:
Environment:
Last Closed: 2023-01-17 19:50:02 UTC
Target Upstream Version:
Embargoed:
surya: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 1169 0 None Merged Bug 2097243: Fix egressips for pods recreated with same name 2022-08-18 16:55:56 UTC
Github ovn-org ovn-kubernetes pull 3050 0 None Merged EIP: Fix stateful set pods recreated with same name 2022-08-18 16:56:28 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:50:34 UTC

Description Anand T N 2022-06-15 09:38:54 UTC
Description of problem:

NodeIP where used instead of EgressIP while we try to connect to external service from the pod

OCP 4.10

Replicated the issue in multiple other versions of OCP also

Comment 11 Surya Seetharaman 2022-06-24 07:58:00 UTC
We know why this is happening (thanks to Patryk who made this a lot more fun #peer-debugging). At first we thought it was an egressIP bug but its not. When we delete a pod in OVNK, we do oc.logicalPortCache.remove(logicalPort) from deleteLogicalPort which in turn marks the pod as ready for deletion:

func (c *portCache) remove(logicalPort string) {
	c.Lock()
	defer c.Unlock()
	info, ok := c.cache[logicalPort]
	if !ok || !info.expires.IsZero() {
		klog.Infof("SURYA: port-cache(%s): port not found in cache or already marked for removal", logicalPort)
		return
	}
	info.expires = time.Now().Add(time.Minute)
	klog.Infof("SURYA: port-cache(%s): scheduling port for removal at %v", logicalPort, info.expires)

	// Removal must be deferred since, for example, NetworkPolicy pod handlers
	// may run after the main pod handler and look for items in the cache
	// on delete events.
	go func() {
		select {
		case <-time.After(time.Minute):
			c.Lock()
			defer c.Unlock()
			// remove the port info if its expiration time has elapsed.
			// This makes sure that we don't prematurely remove a port
			// that was deleted and re-added before the timer expires.
			if info, ok := c.cache[logicalPort]; ok && !info.expires.IsZero() {
				if time.Now().After(info.expires) {
					klog.Infof("SURYA: port-cache(%s): removing port", logicalPort)
					delete(c.cache, logicalPort)
				}
			}
		case <-c.stopChan:
			break
		}
	}()
}

The actual entry get's removed 60 seconds after the pod deletion to allow network policy handlers to do their work. So workaround for the users is to wait for at least 60 seconds before you recreate a new pod with the same name in the same namespace.


Reproducer: Create a pod in egress namespace, delete the pod and recreate it within 60 seconds.

Add the pod:

I0624 07:35:33.576902       1 egressip.go:860] SURYA: logicaPort doesn't exist: <nil>:logical port foo_pod6 not found in cache                                               
I0624 07:35:33.663414       1 pods.go:619] SURYA portInfo for foo_pod6: &{foo_pod6 3ca2e837-2670-4961-a7a7-481365cacbc8 ci-ln-4s53gzb-72292-khx2x-worker-a-tzl99 [0xc00335ab1
0] [10 88 10 131 0 24] {0 0 <nil>}}:[10.131.0.24/23]                                                                                                                         
I0624 07:35:33.663594       1 egressip.go:868] SURYA: logicaPort exists: [10.131.0.24/23]:[10.131.0.24/23]                                                                   
I0624 07:35:33.668321       1 egressip.go:881] SURYA: Adding pod egress IP status: {ci-ln-4s53gzb-72292-khx2x-worker-c-rqq88 10.0.128.100} for EgressIP: egressips-prod and p
od: pod6/foo                                                                                                                                                                 
I0624 07:36:19.179266       1 egressip.go:961] SURYA: Deleting pod egress IP status: {ci-ln-4s53gzb-72292-khx2x-worker-c-rqq88 10.0.128.100} for EgressIP: egressips-prod and
 pod: pod6/foo                                                                                                                                                               

Now we delete the pod:

I0624 07:36:19.190860       1 pods.go:127] SURYA: Deleting pod: foo/pod6                                                                                                     
I0624 07:36:19.196854       1 pods.go:190] SURYA: removing foo_pod6 from cache                                                                                               
I0624 07:36:19.196886       1 port_cache.go:70] SURYA: port-cache(foo_pod6): scheduling port for removal at 2022-06-24 07:37:19.196880848 +0000 UTC m=+4638.978930633        --------> WE SCHEDULE THIS FOR PORT DELETION
I0624 07:36:19.196912       1 pods.go:196] SURYA: I still have an entry at this port! &{foo_pod6 3ca2e837-2670-4961-a7a7-481365cacbc8 ci-ln-4s53gzb-72292-khx2x-worker-a-tzl9
9 [0xc00335ab10] [10 88 10 131 0 24] {13881633693375408592 4638978930633 0x2cc2400}}                                                                                         

Now we re-add the pod back:

I0624 07:36:59.691517       1 egressip.go:868] SURYA: logicaPort exists logicalPortCache: [10.131.0.24/23]: podIP: [10.131.0.25/23] -> WE PICK THE OLD INFO FOR THE PORT SINCE THE PORT WAS SCHEDULED FOR DELETION AT 07:37:19
I0624 07:36:59.691564       1 egressip.go:870] SURYA: deletePerPodGRSNAT failed: failed to get node : node "" not found
I0624 07:36:59.745100       1 pods.go:619] SURYA portInfo for foo_pod6: &{foo_pod6 04b9d7f0-f90d-4f02-812f-f49483e93ca6 ci-ln-4s53gzb-72292-khx2x-worker-a-tzl99 [0xc00354f9e
0] [10 88 10 131 0 25] {0 0 <nil>}}:[10.131.0.25/23]                                  
I0624 07:36:59.748583       1 egressip.go:881] SURYA: Adding pod egress IP status: {ci-ln-4s53gzb-72292-khx2x-worker-c-rqq88 10.0.128.100} for EgressIP: egressips-prod and p
od: pod6/foo                               

Will work on a fix for this, meanwhile please inform users to leave a 75 seconds gap between creation and deletion as a workaround, if this is blocking them. Also note there was a SNAT bug we fixed in 4.10.15: https://bugzilla.redhat.com/show_bug.cgi?id=2083116 that user might run into, might be good to upgrade.

Comment 12 Surya Seetharaman 2022-06-24 08:06:17 UTC
A working case:

Delete a pod:

I0624 07:59:22.570022       1 pods.go:127] SURYA: Deleting pod: foo/pod6
I0624 07:59:22.580127       1 pods.go:190] SURYA: removing foo_pod6 from cache
I0624 07:59:22.580158       1 port_cache.go:70] SURYA: port-cache(foo_pod6): scheduling port for removal at 2022-06-24 08:00:22.580151465 +0000 UTC m=+6022.362201272
I0624 07:59:22.580201       1 pods.go:196] SURYA: I still have an entry at this port! &{foo_pod6 04b9d7f0-f90d-4f02-812f-f49483e93ca6 ci-ln-4s53gzb-72292-khx2x-worker-a-tzl99 [0xc00354f9e0] [10 88 10 131 0 25] {13881635178743621801 6022362201272 0x2cc2400}}
I0624 07:59:22.580256       1 egressip.go:961] SURYA: Deleting pod egress IP status: {ci-ln-4s53gzb-72292-khx2x-worker-c-rqq88 10.0.128.100} for EgressIP: egressips-prod and pod: pod6/foo
I0624 07:59:22.596156       1 egressip.go:978] SURYA: DELETED

Wait till logicalPortCache is actually removed > 60 seconds [2022-06-24 08:00:22.580151465]:

I0624 08:00:22.580793       1 port_cache.go:85] SURYA: port-cache(foo_pod6): removing port


Recreate the pod:

I0624 08:01:03.954024       1 egressip.go:860] SURYA: logicaPort doesn't exist: <nil>:logical port foo_pod6 not found in cache
I0624 08:01:04.022411       1 pods.go:619] SURYA portInfo for foo_pod6: &{foo_pod6 7c7fbec9-f91a-4fd6-a49c-c44d5d7eba18 ci-ln-4s53gzb-72292-khx2x-worker-a-tzl99 [0xc0038e74d0] [10 88 10 131 0 28] {0 0 <nil>}}:[10.131.0.28/23]
I0624 08:01:04.022503       1 egressip.go:868] SURYA: logicaPort exists: [10.131.0.28/23]:[10.131.0.28/23]
I0624 08:01:04.026573       1 egressip.go:881] SURYA: Adding pod egress IP status: {ci-ln-4s53gzb-72292-khx2x-worker-c-rqq88 10.0.128.100} for EgressIP: egressips-prod and pod: pod6/foo


This will cause the correct SNAT to get configured:

sh-4.4# ovn-nbctl --no-leader-only lr-nat-list GR_ci-ln-4s53gzb-72292-khx2x-worker-c-rqq88
TYPE             EXTERNAL_IP        EXTERNAL_PORT    LOGICAL_IP            EXTERNAL_MAC         LOGICAL_PORT
snat             10.0.128.100                        10.131.0.28


Versus in the wrong case a wrong SNAT for old Pod was getting configured that wouldn't match on EIP and depending on where the pod is - egress node or non-egress node, we'd hit https://bugzilla.redhat.com/show_bug.cgi?id=2083116 as well. That's why there are inconsistencies in the user's curls.

So moving forward, would be good to upgrade to https://bugzilla.redhat.com/show_bug.cgi?id=2083116 - because in some cases wrong SNATs were being deleted; and also wait for this bug to be fixed if user wants to recreate the pods with same name in less than 60 seconds.

Comment 13 Surya Seetharaman 2022-06-27 10:47:04 UTC
Upstream Fix has been posted: https://github.com/ovn-org/ovn-kubernetes/pull/3050 moving bug into POST state.

Comment 31 milti leonard 2022-08-26 17:39:43 UTC
@pdiak, was the support case 03257527 information reviewed in the context of this ticket at all?

Comment 37 errata-xmlrpc 2023-01-17 19:50:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.