Bug 1877367

Summary:	KubeAPIErrorsHigh firing due to "too large resource version"
Product:	OpenShift Container Platform	Reporter:	Lukasz Szaszkiewicz <lszaszki>
Component:	kube-apiserver	Assignee:	Lukasz Szaszkiewicz <lszaszki>
Status:	CLOSED ERRATA	QA Contact:	Ke Wang <kewang>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.6	CC:	aos-bugs, fhirtz, kewang, mfojtik, palonsor, pkanthal, rsandu, sferguso, xxia
Target Milestone:	---
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1877346	Environment:
Last Closed:	2020-10-27 16:38:53 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1877346

Comment 1 Lukasz Szaszkiewicz 2020-09-09 13:20:23 UTC

Once the rebase (1.19) PR lands https://github.com/openshift/kubernetes/pull/325 the fixes (https://github.com/openshift/origin/pull/25489 and https://github.com/openshift/origin/pull/25490) will be present in 4.6.

Comment 4 Ke Wang 2020-09-22 11:10:02 UTC

(In reply to Lukasz Szaszkiewicz from comment #1)
> Once the rebase (1.19) PR lands
> https://github.com/openshift/kubernetes/pull/325 the fixes
> (https://github.com/openshift/origin/pull/25489 and
> https://github.com/openshift/origin/pull/25490) will be present in 4.6.

Hi Lukasz, PR 25489 and 25490 have not been merged, could you have a look? without them merging, versification is unable to go on.

Comment 5 Ke Wang 2020-09-23 03:20:44 UTC

CC: lszaszki, PR 25489 and 25490 are 4.5 fixes, we need corresponding fixes for 4.6 here.

Comment 6 Ke Wang 2020-09-23 15:07:06 UTC

OCP 4.6 already has been re-based bump to kube 1.19 and have a master node connected to the cluster. Then, disconnect it from the network for 5 minutes, after network recovery kubelet reconnects to the Apiserver as before. Then observe kubelet's logs, such timeouts do not occur anymore.
# cat ~/test.sh 
ifconfig ens5 down
sleep 300
ifconfig ens5 up

#./test.sh &

# pwd
/var/log/pods
# grep -nr 'Timeout: Too large resource version' openshift-*
# journalctl -b -u kubelet | grep 'Timeout: Too large resource version'

Comment 8 errata-xmlrpc 2020-10-27 16:38:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196