Description of problem: Have noticed that pods that are connected to pod network will sometimes not be unable to reach the internal Ingress VIP hosted by keepalived. For example, the openshift-console pods will get “connection refused” when trying to reach the OAuth endpoint to validate logins. We run a test from fluentd pod on each node to try and hit the console endpoint and some show it cannot connect. Test apps VIP for cld-paas-d-eusw1b-2-d9mlc-worker-storage-q5hfg * Rebuilt URL to: https://console-openshift-console.apps.cld-paas-d-eusw1b-2.phx.aexp.com/ * Uses proxy env variable NO_PROXY == ‘.aexp.com,.cluster.local,.svc,10.10.60.0/23,127.0.0.1,172.28.128.0/17,192.168.0.0/16,api-int.cld-paas-d-eusw1b-2.phx.aexp.com,localhost’ % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 10.10.60.19... * TCP_NODELAY set * connect to 10.10.60.19 port 443 failed: Connection refused * Failed to connect to console-openshift-console.apps.cld-paas-d-eusw1b-2.phx.aexp.com port 443: Connection refused * Closing connection 0 curl: (7) Failed to connect to console-openshift-console.apps.cld-paas-d-eusw1b-2.phx.aexp.com port 443: Connection refused command terminated with exit code 7 Have found that bouncing the ovnkube-node for the node the pod is running on will clear things up, but will some times the issue will appear again. Version-Release number of selected component (if applicable): OCPv4.8.2 How reproducible: N/A Steps to Reproduce: 1. 2. 3. Actual results: intermittently, pods are unable to connect to the internal VIP; at times restarting the ovnkube-node container will work, but the issue reasserts itself and increasingly the workaround is losing effectiveness. Expected results: pods will be able to connect over podnetwork Additional info: there are sosreports and gathers attached to the ticket. AMEX has been removed from BZ2055251 and this BZ opened for the cu for further investigation on the issue.
*** This bug has been marked as a duplicate of bug 2022042 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days