Bug 1880369
Summary: | [release 4.6] Top-level cross-platform: Fix bug in reflector not recovering from "Too large resource version" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Lukasz Szaszkiewicz <lszaszki> |
Component: | kube-apiserver | Assignee: | Lukasz Szaszkiewicz <lszaszki> |
Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.6 | CC: | aos-bugs, mfojtik, obulatov, sgreene, xxia |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:42:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1880302, 1880320, 1880329, 1880331, 1880343, 1880344 | ||
Bug Blocks: | 1879901, 1879991, 1880301, 1880304, 1880307, 1880309, 1880311, 1880313, 1880315, 1880318, 1880322, 1880324, 1880326, 1880327, 1880333, 1880337, 1880341, 1880353, 1880354, 1880357, 1880359, 1880360, 1880366, 1880368, 1881072, 1881079, 1881107, 1881109, 1881134, 1881819, 1881963, 1882055, 1882071, 1882073, 1882077, 1882210, 1882379, 1893637, 1894666 |
Description
Lukasz Szaszkiewicz
2020-09-18 11:18:30 UTC
Just discussed with Lukasz about quick verification way, the operators are using the client-go in version is at least 1.18.6. - cluster-authentication-operator checking, $ git clone https://github.com/openshift/cluster-authentication-operator.git $ cd cluster-authentication-operator $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-20-184226 | grep authentication cluster-authentication-operator https://github.com/openshift/cluster-authentication-operator 9b643c807d83bda5422fec7d5185cb1dcc0a9019 $ git checkout -b 4.6.0-0.nightly-2020-09-20-184226 9b643c80 Switched to a new branch '4.6.0-0.nightly-2020-09-20-184226' $ grep 'k8s.io/client-go' go.mod k8s.io/client-go v0.19.0 k8s.io/client-go => k8s.io/client-go v0.19.0-rc.2 - cluster-kube-apiserver-operator checking, $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-20-184226 | grep kube-apiserver cluster-kube-apiserver-operator https://github.com/openshift/cluster-kube-apiserver-operator a8772111e17be9f4e1889d7f234d73d723f0c916 $ git checkout -b 4.6.0-0.nightly-2020-09-20-184226 a8772111 Switched to a new branch '4.6.0-0.nightly-2020-09-20-184226' $ grep 'k8s.io/client-go' go.mod k8s.io/client-go v0.19.0 - cluster-openshift-apiserver-operator checking, $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-20-184226 | grep cluster-openshift-apiserver-operator cluster-openshift-apiserver-operator https://github.com/openshift/cluster-openshift-apiserver-operator c8aa8510137ce04277ba222f6b0b245291719245 $ git checkout -b 4.6.0-0.nightly-2020-09-20-184226 c8aa8510 Switched to a new branch '4.6.0-0.nightly-2020-09-20-184226' $ grep 'k8s.io/client-go' go.mod k8s.io/client-go v0.19.0 All are as expected version.so move the bug verified. (In reply to Ke Wang from comment #3) > Just discussed with Lukasz about quick verification way, the operators are > using the client-go in version is at least 1.18.6. > > - cluster-authentication-operator checking, > $ git clone https://github.com/openshift/cluster-authentication-operator.git > > $ cd cluster-authentication-operator > > $ oc adm release info --commits > registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-20-184226 > | grep authentication > cluster-authentication-operator > https://github.com/openshift/cluster-authentication-operator > 9b643c807d83bda5422fec7d5185cb1dcc0a9019 > > > $ git checkout -b 4.6.0-0.nightly-2020-09-20-184226 9b643c80 > Switched to a new branch '4.6.0-0.nightly-2020-09-20-184226' > > $ grep 'k8s.io/client-go' go.mod > k8s.io/client-go v0.19.0 > k8s.io/client-go => k8s.io/client-go v0.19.0-rc.2 > > - cluster-kube-apiserver-operator checking, > $ oc adm release info --commits > registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-20-184226 > | grep kube-apiserver > cluster-kube-apiserver-operator > https://github.com/openshift/cluster-kube-apiserver-operator > a8772111e17be9f4e1889d7f234d73d723f0c916 > > $ git checkout -b 4.6.0-0.nightly-2020-09-20-184226 a8772111 > Switched to a new branch '4.6.0-0.nightly-2020-09-20-184226' > > $ grep 'k8s.io/client-go' go.mod > k8s.io/client-go v0.19.0 > > - cluster-openshift-apiserver-operator checking, > $ oc adm release info --commits > registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-20-184226 > | grep cluster-openshift-apiserver-operator > cluster-openshift-apiserver-operator > https://github.com/openshift/cluster-openshift-apiserver-operator > c8aa8510137ce04277ba222f6b0b245291719245 > > $ git checkout -b 4.6.0-0.nightly-2020-09-20-184226 c8aa8510 > Switched to a new branch '4.6.0-0.nightly-2020-09-20-184226' > > $ grep 'k8s.io/client-go' go.mod > k8s.io/client-go v0.19.0 > > All are as expected version.so move the bug verified. I think that we need to check all operators. You will find the complete list in the "Depends On" field. Add more which PRs already were merged in, - machine-api-operator checking, $ git clone https://github.com/openshift/machine-api-operator.git $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-20-184226 | grep machine-api-operator machine-api-operator https://github.com/openshift/machine-api-operator f304d771cd3bc40305430b6a3f1ff8ca9a56e91d $ git checkout -b 4.6.0-0.nightly-2020-09-20-184226 f304d77 $ grep 'k8s.io/client-go' go.mod k8s.io/client-go v0.19.0 - service-ca-operator checking, $ git clone https://github.com/openshift/service-ca-operator.git $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-20-184226 | grep service-ca-operator service-ca-operator https://github.com/openshift/service-ca-operator 3e554169d9068c047b377223d4092e5f7279f8be $ git checkout -b 4.6.0-0.nightly-2020-09-20-184226 3e55416 $ grep 'k8s.io/client-go' go.mod k8s.io/client-go v0.19.0-rc.2 Ran a quick test, disconnected a master node from network for a few minutes. After reconnecting, I keep receiving such error messages from csi-snapshotter, # grep -nr 'Too large resource version' openshift-* ... openshift-cluster-csi-drivers_aws-ebs-csi-driver-controller-6666d4bb4b-pwtnj_f776c34f-84d5-4080-aa1f-26f2dff1cb7f/csi-snapshotter/0.log:292:2020-09-21T10:22:17.156081404+00:00 stderr F E0921 10:22:17.156058 1 reflector.go:178] github.com/kubernetes-csi/external-snapshotter/pkg/client/informers/externalversions/factory.go:117: Failed to list *v1beta1.VolumeSnapshotContent: Timeout: Too large resource version: 425501, current: 380908 openshift-cluster-csi-drivers_aws-ebs-csi-driver-controller-6666d4bb4b-pwtnj_f776c34f-84d5-4080-aa1f-26f2dff1cb7f/csi-snapshotter/0.log:295:2020-09-21T10:23:28.974718812+00:00 stderr F E0921 10:23:28.974702 1 reflector.go:178] github.com/kubernetes-csi/external-snapshotter/pkg/client/informers/externalversions/factory.go:117: Failed to list *v1beta1.VolumeSnapshotClass: Timeout: Too large resource version: 425500, current: 423949 ... # grep -nr 'Too large resource version' openshift-* | wc -l 348 Also I checked https://github.com/openshift/cluster-csi-snapshot-controller-operator/blob/master/go.mod, still old version as below, k8s.io/client-go v0.18.3 Checked the cluster status, it works well. Lukasz, Do you think if we need to submit a patch for this? (In reply to Ke Wang from comment #7) > Ran a quick test, disconnected a master node from network for a few minutes. > After reconnecting, I keep receiving such error messages from > csi-snapshotter, > > # grep -nr 'Too large resource version' openshift-* > > ... > openshift-cluster-csi-drivers_aws-ebs-csi-driver-controller-6666d4bb4b- > pwtnj_f776c34f-84d5-4080-aa1f-26f2dff1cb7f/csi-snapshotter/0.log:292:2020-09- > 21T10:22:17.156081404+00:00 stderr F E0921 10:22:17.156058 1 > reflector.go:178] > github.com/kubernetes-csi/external-snapshotter/pkg/client/informers/ > externalversions/factory.go:117: Failed to list > *v1beta1.VolumeSnapshotContent: Timeout: Too large resource version: 425501, > current: 380908 > openshift-cluster-csi-drivers_aws-ebs-csi-driver-controller-6666d4bb4b- > pwtnj_f776c34f-84d5-4080-aa1f-26f2dff1cb7f/csi-snapshotter/0.log:295:2020-09- > 21T10:23:28.974718812+00:00 stderr F E0921 10:23:28.974702 1 > reflector.go:178] > github.com/kubernetes-csi/external-snapshotter/pkg/client/informers/ > externalversions/factory.go:117: Failed to list > *v1beta1.VolumeSnapshotClass: Timeout: Too large resource version: 425500, > current: 423949 > ... > > # grep -nr 'Too large resource version' openshift-* | wc -l > 348 > > Also I checked > https://github.com/openshift/cluster-csi-snapshot-controller-operator/blob/ > master/go.mod, still old version as below, > k8s.io/client-go v0.18.3 > > Checked the cluster status, it works well. > > Lukasz, Do you think if we need to submit a patch for this? Yes, created https://bugzilla.redhat.com/show_bug.cgi?id=1881134 I'm not sure if the cluster would immediately go into a degraded state. It might since the csi controller was unable to make progress. Thanks for catching this. Hi Lukasz, I finished checking client-go for all operators, see below, $ ./check-client-go.sh #### operator:aws-ebs-csi-driver-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:baremetal-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v12.0.0+incompatible k8s.io/client-go => k8s.io/client-go v0.17.4 // Required by prometheus-operator ----------------------------- #### operator:cloud-credential-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-authentication-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 k8s.io/client-go => k8s.io/client-go v0.19.0-rc.2 ----------------------------- #### operator:cluster-autoscaler-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v12.0.0+incompatible replace k8s.io/client-go => k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-config-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-csi-snapshot-controller-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-dns-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.2 ----------------------------- #### operator:cluster-etcd-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-image-registry-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-ingress-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-kube-apiserver-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-kube-controller-manager-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-kube-scheduler-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-kube-storage-version-migrator-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.18.2 k8s.io/client-go => k8s.io/client-go v0.18.2 ----------------------------- #### operator:cluster-monitoring-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v12.0.0+incompatible k8s.io/client-go => k8s.io/client-go v0.18.3 ----------------------------- #### operator:cluster-network-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v12.0.0+incompatible replace k8s.io/client-go => k8s.io/client-go v0.18.3 ----------------------------- #### operator:cluster-node-tuning-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 k8s.io/client-go => k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-openshift-apiserver-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-openshift-controller-manager-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-samples-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0-rc.3 ----------------------------- #### operator:cluster-storage-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:cluster-version-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0-rc.2 ----------------------------- #### operator:console-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0-rc.2 ----------------------------- #### operator:csi-driver-manila-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:insights-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v11.0.0+incompatible k8s.io/client-go => k8s.io/client-go v0.17.1 ----------------------------- #### operator:machine-api-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:machine-config-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 k8s.io/client-go => k8s.io/client-go v0.19.0 ----------------------------- #### operator:operator-lifecycle-manager Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.18.2 ----------------------------- #### operator:operator-marketplace Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: grep: go.mod: No such file or directory ----------------------------- #### operator:operator-registry Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.18.2 ----------------------------- #### operator:ovirt-csi-driver-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0 ----------------------------- #### operator:prometheus-config-reloader ./check-client-go.sh: line 7: cd: prometheus-config-reloader: No such file or directory fatal: not a git repository (or any of the parent directories): .git client-go version: grep: go.mod: No such file or directory ----------------------------- #### operator:prometheus-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.18.3 ----------------------------- #### operator:service-ca-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.19.0-rc.2 ----------------------------- Thanks Ke, this is awesome ! #### operator:baremetal-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v12.0.0+incompatible k8s.io/client-go => k8s.io/client-go v0.17.4 // Required by prometheus-operator ----------------------------- Is this this operator https://github.com/openshift/cluster-baremetal-operator/blob/master/go.mod#L16 ? It seems to be using client-go in the correct version BTW: It doesn't show up when you do "oc get co" #### operator:cluster-kube-storage-version-migrator-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.18.2 k8s.io/client-go => k8s.io/client-go v0.18.2 ----------------------------- Updated today - https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/160/commits #### operator:cluster-monitoring-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v12.0.0+incompatible k8s.io/client-go => k8s.io/client-go v0.18.3 ----------------------------- Seems to be using the correct version https://github.com/openshift/cluster-monitoring-operator/blob/master/go.mod#L31 #### operator:cluster-network-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v12.0.0+incompatible replace k8s.io/client-go => k8s.io/client-go v0.18.3 ----------------------------- Created https://bugzilla.redhat.com/show_bug.cgi?id=1882071 #### operator:insights-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v11.0.0+incompatible k8s.io/client-go => k8s.io/client-go v0.17.1 ----------------------------- Created https://bugzilla.redhat.com/show_bug.cgi?id=1882073 #### operator:operator-lifecycle-manager Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.18.2 ----------------------------- Created https://bugzilla.redhat.com/show_bug.cgi?id=1882077 #### operator:operator-marketplace Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: grep: go.mod: No such file or directory ----------------------------- I think this belongs to OLM, asked them to create follow-up BZs for theirs components https://bugzilla.redhat.com/show_bug.cgi?id=1882077#c0 #### operator:operator-registry Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.18.2 ----------------------------- Is this image-registry, https://github.com/openshift/image-registry/blob/master/go.mod#L42 ? #### operator:prometheus-config-reloader ./check-client-go.sh: line 7: cd: prometheus-config-reloader: No such file or directory fatal: not a git repository (or any of the parent directories): .git client-go version: grep: go.mod: No such file or directory ----------------------------- I will double check to which team this belongs. BTW: It doesn't show up when you do "oc get co" #### operator:prometheus-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.18.3 ----------------------------- Merged yesterday https://github.com/openshift/prometheus-operator/pull/93 https://bugzilla.redhat.com/show_bug.cgi?id=1881072 BTW: It doesn't show up when you do "oc get co" #### operator:baremetal-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v12.0.0+incompatible k8s.io/client-go => k8s.io/client-go v0.17.4 // Required by prometheus-operator ----------------------------- Is this this operator https://github.com/openshift/cluster-baremetal-operator/blob/master/go.mod#L16 ? It seems to be using client-go in the correct version BTW: It doesn't show up when you do "oc get co" ^^^^ Change was already in master, not picked in the latest 4.6 payload. #### operator:cluster-monitoring-operator Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v12.0.0+incompatible k8s.io/client-go => k8s.io/client-go v0.18.3 ----------------------------- Seems to be using the correct version https://github.com/openshift/cluster-monitoring-operator/blob/master/go.mod#L31 ^^^^ Change was already in master, not picked in the latest 4.6 payload. #### operator:operator-registry Switched to a new branch '4.6.0-0.nightly-2020-09-22-213802' client-go version: k8s.io/client-go v0.18.2 ----------------------------- Is this image-registry, https://github.com/openshift/image-registry/blob/master/go.mod#L42 ? ^^^^ Change was already in master, not picked in the latest 4.6 payload. #### operator:prometheus-config-reloader ./check-client-go.sh: line 7: cd: prometheus-config-reloader: No such file or directory fatal: not a git repository (or any of the parent directories): .git client-go version: grep: go.mod: No such file or directory ----------------------------- I will double check to which team this belongs. BTW: It doesn't show up when you do "oc get co" ^^^^ It belongs to prometheus-operator Others all changed to expected versions, will do the next round in a couple of days. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |