Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2056735

Summary: Pods on pod network lose ability to connect to internal Ingress VIP until ovnkube-node is restarted
Product: OpenShift Container Platform Reporter: milti leonard <mleonard>
Component: NetworkingAssignee: Mohamed Mahmoud <mmahmoud>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: anbhat, ffernand, skanakal
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-05 12:16:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description milti leonard 2022-02-22 00:57:46 UTC
Description of problem:
Have noticed that pods that are connected to pod network will sometimes not be unable to reach the internal Ingress VIP hosted by keepalived.
For example, the openshift-console pods will get “connection refused” when trying to reach the OAuth endpoint to validate logins.
We run a test from fluentd pod on each node to try and hit the console endpoint and some show it cannot connect.
Test apps VIP for cld-paas-d-eusw1b-2-d9mlc-worker-storage-q5hfg
* Rebuilt URL to: https://console-openshift-console.apps.cld-paas-d-eusw1b-2.phx.aexp.com/
* Uses proxy env variable NO_PROXY == ‘.aexp.com,.cluster.local,.svc,10.10.60.0/23,127.0.0.1,172.28.128.0/17,192.168.0.0/16,api-int.cld-paas-d-eusw1b-2.phx.aexp.com,localhost’
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.10.60.19...
* TCP_NODELAY set
* connect to 10.10.60.19 port 443 failed: Connection refused
* Failed to connect to console-openshift-console.apps.cld-paas-d-eusw1b-2.phx.aexp.com port 443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to console-openshift-console.apps.cld-paas-d-eusw1b-2.phx.aexp.com port 443: Connection refused
command terminated with exit code 7
Have found that bouncing the ovnkube-node for the node the pod is running on will clear things up, but will some times the issue will appear again.

Version-Release number of selected component (if applicable):
OCPv4.8.2

How reproducible:
N/A

Steps to Reproduce:
1.
2.
3.

Actual results:
intermittently, pods are unable to connect to the internal VIP; at times restarting the ovnkube-node container will work, but the issue reasserts itself and increasingly the workaround is losing effectiveness.

Expected results:
pods will be able to connect over podnetwork

Additional info:
there are sosreports and gathers attached to the ticket. AMEX has been removed from BZ2055251 and this BZ opened for the cu for further investigation on the issue.

Comment 7 Mohamed Mahmoud 2022-04-05 12:16:57 UTC

*** This bug has been marked as a duplicate of bug 2022042 ***

Comment 8 Red Hat Bugzilla 2023-09-15 01:19:42 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days