Bug 1956535 - Multiple REDIRECT/DNAT IPTable rules on the node causing "connection refused" error while accessing the idled POD [NEEDINFO]
Summary: Multiple REDIRECT/DNAT IPTable rules on the node causing "connection refused"...
Keywords:
Status: ASSIGNED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Dan Winship
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-03 20:48 UTC by Swadeep Asthana
Modified: 2021-05-14 13:25 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-05 11:20:59 UTC
Target Upstream Version:
swasthan: needinfo? (danw)


Attachments (Terms of Use)

Comment 1 Andrew McDermott 2021-05-04 14:14:17 UTC
If the root cause is duplicate iptables rules then moving this to SDN for analysis/triage as the network edge components { routing, DNS } don't manipulate iptables

Comment 2 Alexander Constantinescu 2021-05-05 11:20:59 UTC

*** This bug has been marked as a duplicate of bug 1953705 ***

Comment 3 emmanuel.quiroga 2021-05-05 14:00:45 UTC
This is the correct bugzilla. Please keep open for follow up.

Comment 4 Dan Winship 2021-05-05 15:19:27 UTC
This is not the same as 1953705

Comment 6 Dan Winship 2021-05-11 12:25:32 UTC
It's not clear if this is the same bug as 1953705 or not, so let's get the same info:

It appears that if you idle a service with an "old" oc binary (4.6.16 or earlier, or most 4.7 alpha/beta builds) in a "new" cluster (4.6.17 or later, 4.7.0-rc.1 and later, or any 4.8 nightly) then it will not unidle correctly when it receives traffic. (openshift-sdn will emit the NeedPods event but the controller will not scale it up.)

@swasthan, can you confirm the versions of OCP you are using and the version of the "oc" binary that you are using to idle to pods? ("oc version" will tell you both.)

If you are using an "old" oc binary, then getting an updated binary should fix the bug.

If not, then please create a new deployment and service (ie, one that has not been previously idled) and:

1. idle the service
2. get the output of "oc get service NAME -o yaml" and "oc get ep NAME -o yaml"
3. try to connect to the service
4. get the output of "oc get service NAME -o yaml" and "oc get ep NAME -o yaml" again
5. get the output of "oc get events -o yaml"
6. get the output of "oc get pods -n NAMESPACE" (to confirm whether pods have been recreated for the deployment)
7. tar/zip up all the files and attach them to this bug


Note You need to log in before you can comment on or make changes to this bug.