Description of problem: at https://api.ci-op-9259g0x6-067ff.origin-ci-int-aws.dev.rhcloud.com:6443..." level=info msg="API v1.19.0-rc.2.1075+6a59bc4c1d0117-dirty up" level=info msg="Waiting up to 30m0s for bootstrapping to complete..." level=info msg="Destroying the bootstrap resources..." level=info msg="Waiting up to 40m0s for the cluster at https://api.ci-op-9259g0x6-067ff.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..." E1007 14:44:45.487611 37 reflector.go:307] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: Get "https://api.ci-op-9259g0x6-067ff.origin-ci-int-aws.dev.rhcloud.com:6443/apis/config.openshift.io/v1/clusterversions?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dversion&resourceVersion=22717&timeoutSeconds=359&watch=true": dial tcp 54.237.212.88:6443: connect: connection refused E1007 14:48:32.230350 37 reflector.go:307] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: Get "https://api.ci-op-9259g0x6-067ff.origin-ci-int-aws.dev.rhcloud.com:6443/apis/config.openshift.io/v1/clusterversions?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dversion&resourceVersion=23451&timeoutSeconds=469&watch=true": dial tcp 52.86.173.82:6443: connect: connection refused level=info msg="Cluster operator insights Disabled is False with AsExpected: " level=info msg="Cluster operator monitoring Available is False with : " level=info msg="Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack." level=error msg="Cluster operator monitoring Degraded is True with UpdatingopenshiftStateMetricsFailed: Failed to rollout the stack. Error: running task Updating openshift-state-metrics failed: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: got 1 unavailable replicas" level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: marketplace, monitoring" Version-Release number of selected component (if applicable): 4.7 https://search.ci.openshift.org/?search=UpdatingopenshiftStateMetricsFailed&maxAge=48h&context=1&type=bug%2Bjunit&name=4.7&maxMatches=5&maxBytes=20971520&groupBy=job https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/25585/pull-ci-openshift-origin-master-e2e-aws-disruptive/1313842655171448832 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1313852661115654144
Digging into the promotion failure [1] from comment 0: level=error msg="Cluster operator monitoring Degraded is True with UpdatingopenshiftStateMetricsFailed: Failed to rollout the stack. Error: running task Updating openshift-state-metrics failed: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: got 1 unavailable replicas" Looking at the Deployment: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1313852661115654144/artifacts/e2e-gcp/deployments.json | gunzip | jq -r '.items[] | select(.metadata.name == "openshift-state-metrics").status' { "conditions": [ { "lastTransitionTime": "2020-10-07T15:02:45Z", "lastUpdateTime": "2020-10-07T15:02:45Z", "message": "Deployment does not have minimum availability.", "reason": "MinimumReplicasUnavailable", "status": "False", "type": "Available" }, { "lastTransitionTime": "2020-10-07T15:12:46Z", "lastUpdateTime": "2020-10-07T15:12:46Z", "message": "ReplicaSet \"openshift-state-metrics-7d5967f58\" has timed out progressing.", "reason": "ProgressDeadlineExceeded", "status": "False", "type": "Progressing" } ], "observedGeneration": 9, "replicas": 1, "unavailableReplicas": 1, "updatedReplicas": 1 } Looking at the pod: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1313852661115654144/artifacts/e2e-gcp/pods.json | jq -r '.items[] | select(.metadata.name | startswith("openshift-state-metrics-")).status.containerStatuses[] | select(.name == "openshift-state-metrics")' { "containerID": "cri-o://ee31975808c3296ed0ebf072ae4d5377114ad9d284d01f22a6cd52813acbedf8", "image": "registry.svc.ci.openshift.org/ocp/4.7-2020-10-07-143852@sha256:c260f540eccbba3fb29f262061839e0f528c9d689a88ddd52c3e0c3c1a0dfba0", "imageID": "registry.svc.ci.openshift.org/ocp/4.7-2020-10-07-143852@sha256:c260f540eccbba3fb29f262061839e0f528c9d689a88ddd52c3e0c3c1a0dfba0", "lastState": { "terminated": { "containerID": "cri-o://ee31975808c3296ed0ebf072ae4d5377114ad9d284d01f22a6cd52813acbedf8", "exitCode": 2, "finishedAt": "2020-10-07T15:41:38Z", "reason": "Error", "startedAt": "2020-10-07T15:41:38Z" } }, "name": "openshift-state-metrics", "ready": false, "restartCount": 12, "started": false, "state": { "waiting": { "message": "back-off 5m0s restarting failed container=openshift-state-metrics pod=openshift-state-metrics-7d5967f58-kp4wb_openshift-monitoring(e2f9178d-0e26-46bb-812e-afd9feb8e431)", "reason": "CrashLoopBackOff" } } } Pod logs: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1313852661115654144/artifacts/e2e-gcp/pods/openshift-monitoring_openshift-state-metrics-7d5967f58-kp4wb_openshift-state-metrics_previous.log panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x13e3c44] goroutine 1 [running]: github.com/openshift/openshift-state-metrics/pkg/options.(*Options).AddFlags(0xc00020bf40) /go/src/github.com/openshift/openshift-state-metrics/pkg/options/options.go:44 +0x104 main.main() /go/src/github.com/openshift/openshift-state-metrics/main.go:46 +0xba Ah. So that's pretty clear. Presumably the 'logtostderr' access needs adjusting after [2]. In the meantime, let's revert... [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1313852661115654144 [2]: https://github.com/openshift/openshift-state-metrics/pull/59
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633