Bug 1877367

Summary: KubeAPIErrorsHigh firing due to "too large resource version"
Product: OpenShift Container Platform Reporter: Lukasz Szaszkiewicz <lszaszki>
Component: kube-apiserverAssignee: Lukasz Szaszkiewicz <lszaszki>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: aos-bugs, fhirtz, kewang, mfojtik, palonsor, pkanthal, rsandu, sferguso, xxia
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1877346 Environment:
Last Closed: 2020-10-27 16:38:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1877346    

Comment 1 Lukasz Szaszkiewicz 2020-09-09 13:20:23 UTC
Once the rebase (1.19) PR lands https://github.com/openshift/kubernetes/pull/325 the fixes (https://github.com/openshift/origin/pull/25489 and https://github.com/openshift/origin/pull/25490) will be present in 4.6.

Comment 4 Ke Wang 2020-09-22 11:10:02 UTC
(In reply to Lukasz Szaszkiewicz from comment #1)
> Once the rebase (1.19) PR lands
> https://github.com/openshift/kubernetes/pull/325 the fixes
> (https://github.com/openshift/origin/pull/25489 and
> https://github.com/openshift/origin/pull/25490) will be present in 4.6.

Hi Lukasz, PR 25489 and 25490 have not been merged, could you have a look? without them merging, versification is unable to go on.

Comment 5 Ke Wang 2020-09-23 03:20:44 UTC
CC: lszaszki, PR 25489 and 25490 are 4.5 fixes, we need corresponding fixes for 4.6 here.

Comment 6 Ke Wang 2020-09-23 15:07:06 UTC
OCP 4.6 already has been re-based bump to kube 1.19 and have a master node connected to the cluster. Then, disconnect it from the network for 5 minutes, after network recovery kubelet reconnects to the Apiserver as before. Then observe kubelet's logs, such timeouts do not occur anymore.
# cat ~/test.sh 
ifconfig ens5 down
sleep 300
ifconfig ens5 up

#./test.sh &

# pwd
/var/log/pods
# grep -nr 'Timeout: Too large resource version' openshift-*
# journalctl -b -u kubelet | grep 'Timeout: Too large resource version'

Comment 8 errata-xmlrpc 2020-10-27 16:38:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196