Bug 2005581
Summary: | 4.8.12 to 4.9 upgrade hung due to cluster-version-operator pod CrashLoopBackOff: error creating clients: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Wei Duan <wduan> | |
Component: | Cluster Version Operator | Assignee: | W. Trevor King <wking> | |
Status: | CLOSED ERRATA | QA Contact: | Yang Yang <yanyang> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 4.9 | CC: | abhbaner, aos-bugs, wking, yanyang | |
Target Milestone: | --- | |||
Target Release: | 4.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2006145 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-10 16:11:55 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2006145 |
Description
Wei Duan
2021-09-18 10:26:13 UTC
Same thing going on in CI, e.g. [1]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1439895817182777344/artifacts/e2e-aws-upgrade/pods.json | jq -r '.items[] | select(.metadata.name | startswith("cluster-version-operator-")).status.containerStatuses[] | .state.waiting.reason + " " + (.restartCount | tostri ng) + "\n\n" + .lastState.terminated.message' CrashLoopBackOff 34 4.9.0-202109161743.p0.git.43d63b8.assembly.stream-43d63b8 F0920 13:23:23.565439 1 start.go:24] error: error creating clients: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable goroutine 1 [running]: ... [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1439895817182777344 Comparing with a healthy 4.8.11 -> 4.9.0-rc.1 job [1], the issue is the recent volume change from bug 2002834 (backported to 4.9 as bug 2004568): $ diff -u \ > <(curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1437413099530358784/artifacts/launch/deployments.json | jq '.items[] | select(.metadata.name == "cluster-version-operator").spec.template.spec.containers[].volumeMounts[]') \ > <(curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1439895817182777344/artifacts/e2e-aws-upgrade/deployments.json | jq '.items[] | select(.metadata.name == "cluster-version-operator").spec.template.spec.containers[].volumeMounts[]') --- /dev/fd/63 2021-09-20 15:39:31.090945777 -0700 +++ /dev/fd/62 2021-09-20 15:39:31.092945777 -0700 @@ -13,8 +13,3 @@ "name": "serving-cert", "readOnly": true } -{ - "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", - "name": "kube-api-access", - "readOnly": true -} [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1437413099530358784 It's not reproduced on an upgrade from 4.9.0-rc.1 to 4.10.0-0.nightly-2021-09-21-181111 Reproduced on an upgrade from 4.9.0-0.nightly-2021-09-21-215600 to 4.10.0-0.nightly-2021-09-21-102830 because 4.9.0-0.nightly-2021-09-21-215600 contains the fix in which CVO explicitly sets the kube-api-access, while 4.10.0-0.nightly-2021-09-21-102830 does not. Procedure to reproduce it: # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-21-215600 True True 5m32s Working towards 4.10.0-0.nightly-2021-09-21-102830: 18 of 739 done (2% complete) # oc get po -n openshift-cluster-version NAME READY STATUS RESTARTS AGE cluster-version-operator-55cfc7966d-vgzt4 0/1 CrashLoopBackOff 4 (43s ago) 2m11s version--gvs6t--1-hxl8w 0/1 Completed 0 5m42s # oc logs pod/cluster-version-operator-55cfc7966d-vgzt4 -n openshift-cluster-version I0922 08:59:29.116670 1 start.go:21] ClusterVersionOperator 4.10.0-202109162043.p0.git.e4eefca.assembly.stream-e4eefca F0922 08:59:29.116990 1 start.go:24] error: error creating clients: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable goroutine 1 [running]: k8s.io/klog/v2.stacks(0xc000012001, 0xc0001ba1c0, 0xb8, 0xd1) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/klog/v2/klog.go:1021 +0xb9 k8s.io/klog/v2.(*loggingT).output(0x2ad7120, 0xc000000003, 0x0, 0x0, 0xc00013d9d0, 0x22e7978, 0x8, 0x18, 0x0) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/klog/v2/klog.go:970 +0x191 k8s.io/klog/v2.(*loggingT).printf(0x2ad7120, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1c6ecc9, 0x9, 0xc000128fe0, 0x1, ...) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/klog/v2/klog.go:751 +0x191 k8s.io/klog/v2.Fatalf(...) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/klog/v2/klog.go:1509 main.init.3.func1(0xc0004d6000, 0xc00013d960, 0x0, 0x7) /go/src/github.com/openshift/cluster-version-operator/cmd/start.go:24 +0x1ed github.com/spf13/cobra.(*Command).execute(0xc0004d6000, 0xc00013d8f0, 0x7, 0x7, 0xc0004d6000, 0xc00013d8f0) /go/src/github.com/openshift/cluster-version-operator/vendor/github.com/spf13/cobra/command.go:854 +0x2c2 github.com/spf13/cobra.(*Command).ExecuteC(0x2ac3380, 0xc000000180, 0xc00005c740, 0x46ef85) /go/src/github.com/openshift/cluster-version-operator/vendor/github.com/spf13/cobra/command.go:958 +0x375 github.com/spf13/cobra.(*Command).Execute(...) /go/src/github.com/openshift/cluster-version-operator/vendor/github.com/spf13/cobra/command.go:895 main.main() /go/src/github.com/openshift/cluster-version-operator/cmd/main.go:26 +0x53 goroutine 6 [chan receive]: k8s.io/klog/v2.(*loggingT).flushDaemon(0x2ad7120) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/klog/v2/klog.go:1164 +0x8b created by k8s.io/klog/v2.init.0 /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/klog/v2/klog.go:418 +0xdf Verified with 4.10.0-0.nightly-2021-09-22-061245 and passed # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-21-215600 True True 11m Working towards 4.10.0-0.nightly-2021-09-22-061245: 95 of 739 done (12% complete) # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-09-22-061245 True False 20m Cluster version is 4.10.0-0.nightly-2021-09-22-061245 # oc get pod/cluster-version-operator-c966c8955-nlc6v -ojson -n openshift-cluster-version | jq -r .spec.volumes[] { "hostPath": { "path": "/etc/ssl/certs", "type": "" }, "name": "etc-ssl-certs" } { "hostPath": { "path": "/etc/cvo/updatepayloads", "type": "" }, "name": "etc-cvo-updatepayloads" } { "name": "serving-cert", "secret": { "defaultMode": 420, "secretName": "cluster-version-operator-serving-cert" } } { "name": "kube-api-access", "projected": { "defaultMode": 420, "sources": [ { "serviceAccountToken": { "expirationSeconds": 3600, "path": "token" } }, { "configMap": { "items": [ { "key": "ca.crt", "path": "ca.crt" } ], "name": "kube-root-ca.crt" } }, { "downwardAPI": { "items": [ { "fieldRef": { "apiVersion": "v1", "fieldPath": "metadata.namespace" }, "path": "namespace" } ] } } ] } } # oc get pod/cluster-version-operator-c966c8955-nlc6v -ojson -n openshift-cluster-version | jq -r .spec.containers[].volumeMounts [ { "mountPath": "/etc/ssl/certs", "name": "etc-ssl-certs", "readOnly": true }, { "mountPath": "/etc/cvo/updatepayloads", "name": "etc-cvo-updatepayloads", "readOnly": true }, { "mountPath": "/etc/tls/serving-cert", "name": "serving-cert", "readOnly": true }, { "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "kube-api-access", "readOnly": true } ] Moving it to verified state. Bug 2006145 fixed this in 4.9 before GA, and it didn't apply to 4.8, so we won't need to block any edges in graph-data on this bug, and I'm dropping UpgradeBlocker. *** Bug 2007228 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |