2056735 – Pods on pod network lose ability to connect to internal Ingress VIP until ovnkube-node is restarted

Bug 2056735 - Pods on pod network lose ability to connect to internal Ingress VIP until ovnkube-node is restarted

Summary: Pods on pod network lose ability to connect to internal Ingress VIP until ovn...

Keywords:
Status:	CLOSED DUPLICATE of bug 2022042
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Mohamed Mahmoud
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-22 00:57 UTC by milti leonard
Modified:	2023-09-15 01:19 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-04-05 12:16:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description milti leonard 2022-02-22 00:57:46 UTC

Description of problem:
Have noticed that pods that are connected to pod network will sometimes not be unable to reach the internal Ingress VIP hosted by keepalived.
For example, the openshift-console pods will get “connection refused” when trying to reach the OAuth endpoint to validate logins.
We run a test from fluentd pod on each node to try and hit the console endpoint and some show it cannot connect.
Test apps VIP for cld-paas-d-eusw1b-2-d9mlc-worker-storage-q5hfg
* Rebuilt URL to: https://console-openshift-console.apps.cld-paas-d-eusw1b-2.phx.aexp.com/
* Uses proxy env variable NO_PROXY == ‘.aexp.com,.cluster.local,.svc,10.10.60.0/23,127.0.0.1,172.28.128.0/17,192.168.0.0/16,api-int.cld-paas-d-eusw1b-2.phx.aexp.com,localhost’
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 10.10.60.19...
* TCP_NODELAY set
* connect to 10.10.60.19 port 443 failed: Connection refused
* Failed to connect to console-openshift-console.apps.cld-paas-d-eusw1b-2.phx.aexp.com port 443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to console-openshift-console.apps.cld-paas-d-eusw1b-2.phx.aexp.com port 443: Connection refused
command terminated with exit code 7
Have found that bouncing the ovnkube-node for the node the pod is running on will clear things up, but will some times the issue will appear again.

Version-Release number of selected component (if applicable):
OCPv4.8.2

How reproducible:
N/A

Steps to Reproduce:
1.
2.
3.

Actual results:
intermittently, pods are unable to connect to the internal VIP; at times restarting the ovnkube-node container will work, but the issue reasserts itself and increasingly the workaround is losing effectiveness.

Expected results:
pods will be able to connect over podnetwork

Additional info:
there are sosreports and gathers attached to the ticket. AMEX has been removed from BZ2055251 and this BZ opened for the cu for further investigation on the issue.

Comment 7 Mohamed Mahmoud 2022-04-05 12:16:57 UTC


*** This bug has been marked as a duplicate of bug 2022042 ***

Comment 8 Red Hat Bugzilla 2023-09-15 01:19:42 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.