Bug 1894667
Summary: | [release 4.5] cluster-openshift-controller-manager-operator: Fix bug in reflector not recovering from "Too large resource version" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Lukasz Szaszkiewicz <lszaszki> |
Component: | openshift-controller-manager | Assignee: | Gabe Montero <gmontero> |
Status: | CLOSED ERRATA | QA Contact: | wewang <wewang> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.5 | CC: | adam.kaplan, aos-bugs, gmontero, mfojtik, palonsor |
Target Milestone: | --- | ||
Target Release: | 4.5.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | devex | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: An upstream kubernetes bug resulted in the API client not recovering reasonably quickly after recovery from a tcp-reset. Since controllers/operators inherently maintain client connections to the api server, they could be impacted by this.
Consequence: Client logs could be flooded with "Timeout: Too large resource version errors" when connectivity was lost and and then regained.
Fix: The upstream kubernetes 1.18 fix was pulled into samples operator 4.5.z
Result: openshift-controller-manager operator is no longer susceptible to this hot loop of errror messages.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-11-17 16:06:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1879901 |
Description
Lukasz Szaszkiewicz
2020-11-04 18:06:33 UTC
Verified in
4.5.0-0.nightly-2020-11-05-223728
Steps
1. disconnect one node for a few minutes
sh-4.2# cat > test.sh << EOF
> ifconfig ens3 down
> sleep 300
> ifconfig ens3 up
> EOF
sh-4.2$ ./test.sh
2. After recovered the node connect, check logs of pods in openshift-controller-manager-operator, no issue about "Timeout: Too large resource version"
[wewang@wangwen ~]$ oc logs -f pod/openshift-controller-manager-operator-55f49cc48f-ffksz -n openshift-controller-manager-operator |grep "Timeout: Too large resource version"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.19 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5051 |