Bug 1893637

Summary: [release 4.5] kubernetes-kube-storage-version-migrator: Fix bug in reflector not recovering from "Too large resource version"
Product: OpenShift Container Platform Reporter: Lukasz Szaszkiewicz <lszaszki>
Component: kube-storage-version-migratorAssignee: Lukasz Szaszkiewicz <lszaszki>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: aos-bugs, kewang, palonsor, sanchezl
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1880327 Environment:
Last Closed: 2020-11-17 16:06:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1880369, 1881819    
Bug Blocks: 1879901    

Comment 1 Ke Wang 2020-11-03 07:25:20 UTC
This bug's PR is dev-approved and not yet merged, so I'm following issue DPTP-660 to do the pre-merge verifying for QE pre-merge verification goal of issue OCPQE-815 by using the bot to launch a cluster with the open PR.  Here is the verification steps:

Did one encryption for etcd,

$ oc get clusterversion
NAME      VERSION                                           AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.ci.test-2020-11-03-004049-ci-ln-33i2nht   True        False         153m    Cluster version is 4.5.0-0.ci.test-2020-11-03-004049-ci-ln-33i2nht

$ oc patch apiserver/cluster -p '{"spec":{"encryption":{"type":"aescbc"}}}' --type merge
apiserver.config.openshift.io/cluster patched

$ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
All resources encrypted: secrets, configmaps

Did one decryption for etcd,
$ oc patch apiserver/cluster -p '{"spec":{"encryption": {"type":"identity"}}}' --type merge
apiserver.config.openshift.io/cluster patched

$ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
Encryption mode set to identity and everything is decrypted

encryption and decryption for etcd work well.

According to the PR https://github.com/kubernetes/kubernetes/issues/91073, disconnected a node from network for a few minutes and after the network is restored, check if the message 'Timeout: Too large resource version' still can be found in log files.

$ oc debug node/<master>

# cat ~/test.sh 
ifconfig ens4 down
sleep 300
ifconfig ens4 up

#./test.sh &

# pwd
# grep -nr 'Timeout: Too large resource version' openshift-*
No results found

# journalctl -b -u kubelet | grep 'Timeout: Too large resource version'
No results found

No longer see the such error messages in kubelet and pods log files, as expected. So the bug is pre-merge verified. After the PR gets merged, the bug will be moved to VERIFIED by the bot automatically or, if not working, by me manually.

Comment 5 errata-xmlrpc 2020-11-17 16:06:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.19 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.