Bug 1893637 - [release 4.5] kubernetes-kube-storage-version-migrator: Fix bug in reflector not recovering from "Too large resource version"
Summary: [release 4.5] kubernetes-kube-storage-version-migrator: Fix bug in reflector ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-storage-version-migrator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.z
Assignee: Lukasz Szaszkiewicz
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On: 1880369 1881819
Blocks: 1879901
TreeView+ depends on / blocked
 
Reported: 2020-11-02 09:10 UTC by Lukasz Szaszkiewicz
Modified: 2020-11-17 16:06 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1880327
Environment:
Last Closed: 2020-11-17 16:06:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes-kube-storage-version-migrator pull 161 0 None closed Bug 1893637: Fix bug in reflector not recovering from "Too large resource version" 2021-02-02 06:42:20 UTC
Red Hat Product Errata RHBA-2020:5051 0 None None None 2020-11-17 16:06:29 UTC

Comment 1 Ke Wang 2020-11-03 07:25:20 UTC
This bug's PR is dev-approved and not yet merged, so I'm following issue DPTP-660 to do the pre-merge verifying for QE pre-merge verification goal of issue OCPQE-815 by using the bot to launch a cluster with the open PR.  Here is the verification steps:

Did one encryption for etcd,

$ oc get clusterversion
NAME      VERSION                                           AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.ci.test-2020-11-03-004049-ci-ln-33i2nht   True        False         153m    Cluster version is 4.5.0-0.ci.test-2020-11-03-004049-ci-ln-33i2nht

$ oc patch apiserver/cluster -p '{"spec":{"encryption":{"type":"aescbc"}}}' --type merge
apiserver.config.openshift.io/cluster patched

$ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
EncryptionCompleted
All resources encrypted: secrets, configmaps

Did one decryption for etcd,
$ oc patch apiserver/cluster -p '{"spec":{"encryption": {"type":"identity"}}}' --type merge
apiserver.config.openshift.io/cluster patched

$ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="Encrypted")]}{.reason}{"\n"}{.message}{"\n"}'
DecryptionCompleted
Encryption mode set to identity and everything is decrypted

encryption and decryption for etcd work well.


According to the PR https://github.com/kubernetes/kubernetes/issues/91073, disconnected a node from network for a few minutes and after the network is restored, check if the message 'Timeout: Too large resource version' still can be found in log files.

$ oc debug node/<master>

# cat ~/test.sh 
ifconfig ens4 down
sleep 300
ifconfig ens4 up

#./test.sh &

# pwd
/var/log/pods
# grep -nr 'Timeout: Too large resource version' openshift-*
No results found

# journalctl -b -u kubelet | grep 'Timeout: Too large resource version'
No results found

No longer see the such error messages in kubelet and pods log files, as expected. So the bug is pre-merge verified. After the PR gets merged, the bug will be moved to VERIFIED by the bot automatically or, if not working, by me manually.

Comment 5 errata-xmlrpc 2020-11-17 16:06:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.19 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5051


Note You need to log in before you can comment on or make changes to this bug.