Bug 1857743 - NodePort stuck open within SDN after unidling
Summary: NodePort stuck open within SDN after unidling
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 4.6.0
Assignee: Surya Seetharaman
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1870064 1870303
TreeView+ depends on / blocked
 
Reported: 2020-07-16 13:19 UTC by Matthew Robson
Modified: 2020-10-27 16:15 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: In unidling proxy mode of openshift-sdn, it was expected for the endpoint to get deleted always after the service gets deleted. However this might not be the case always and if the endpoint got deleted before the service, it would induce this bug. Consequence: Though the service and endpoint would get deleted, the nodeport wasn't getting deleted and would remain stuck open in the unidling proxy mode. Fix: Removed dependency on the order in which service and endpoint need to be deleted and ensure all the necessary resources get deleted completely irrespective of which one (service or endpoint) gets deleted first in the unidling proxy mode. This was fixed in openshift-sdn 4.6 and backported to 3.11. Result: nodeport gets deleted correctly.
Clone Of:
: 1870064 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:15:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sdn pull 172 0 None closed Bug 1857743: Port stuck open when ep deleted before svc in unidling mode 2020-11-19 08:33:48 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:15:45 UTC

Description Matthew Robson 2020-07-16 13:19:26 UTC
Description of problem:

Customer is leveraging NodePort for service access through their DMZ for firewall purposes.

One of their application was idled and never came back when traffic resumed.

The observation is that the pod returned to service but was not accessible via the NodePort. The pod was accessible directly via its endpoints, non NodePort service / Route and via a new NodePort service.

The NodePort was correctly bound and listening on the host, but was not receiving any traffic.

Deleting the NodePort service did not remove the in use port from the host, but it did remove the service from etcd.


Version-Release number of selected component (if applicable):
3.10


How reproducible:
First time this has been observed. Customer has 60 NodePort services.


Steps to Reproduce:
1. TBD
2.
3.

Actual results:
Service is inaccessible.

Expected results:
Should recover after unidling.

Additional info:

Comment 21 zhaozhanqi 2020-08-26 09:03:34 UTC
Verified this bug on 4.6.0-0.nightly-2020-08-20-234448

Following step in comment 9

Comment 23 errata-xmlrpc 2020-10-27 16:15:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.