Bug 2111733
Summary: | pod cannot access kubernetes service | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | zhaozhanqi <zzhao> |
Component: | Networking | Assignee: | Surya Seetharaman <surya> |
Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | anbhat, dcbw, lwan, mifiedle, surya, wking |
Version: | 4.11 | ||
Target Milestone: | --- | ||
Target Release: | 4.12.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-01-17 19:53:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2112111 | ||
Bug Blocks: | 2111619 |
Description
zhaozhanqi
2022-07-28 04:09:07 UTC
must-gather logs: http://file.apac.redhat.com/~zzhao/must-gather-124715-307932630.tar.gz The issue happen on 4.11.0-rc.5 version. So it should not be related to upgrade. and this issue not always can be reproduced. This bug might be same as : https://bugzilla.redhat.com/show_bug.cgi?id=2111619#c4 I need a kubeconfig or sos-report from ip-10-0-61-174.us-east-2.compute.internal so that I can check for ovs dump-groups on the node where the router pod lives to make sure the necessary flows were installed properly for the k8s api clusterIP service. If I had access I could do an ovs trace, So far from the ovn-controller logs alone provided in the must-gather I didn't spot the group mod issue for 10.0.48.125:6443 or 10.0.65.181:6443 or 10.0.70.203:6443. controller is also seeing long polls: 2022-07-27T05:48:57.775586061Z 2022-07-27T05:48:47.067Z|00005|timeval(ovn_pinctrl0)|WARN|Unreasonably long 163228ms poll interval (0ms user, 3126ms system) 2022-07-27T06:07:04.726817283Z 2022-07-27T06:06:55.617Z|00548|timeval|WARN|Unreasonably long 1318281ms poll interval (0ms user, 19886ms system) Let's use this bug to track the actual fix from OVN, so will track the bump to an OVN version where this can be fixed properly. Still not hit this issue today by kind of testing including: 1. Create more than > 200 pods in 3 workers 2. restart openvswitch on worker 3. Delete openshift-ovn-kubernetes pods 4. Reboot all workers. 5. Delete all 200 pods and recreated. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |