Version: 4.8.0-0.nightly-2021-05-21-233425: Steps to reproduce: Try to deploy SNO with ipv4. Result: The monitoring operator appears as failed. Querier panic: runtime error: invalid memory address or nil pointer dereference oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 76m Unable to apply 4.8.0-0.nightly-2021-05-21-233425: the cluster operator monitoring has not yet successfully rolled out oc get pod -A|grep -v Run|grep -v Comple NAMESPACE NAME READY STATUS RESTARTS AGE openshift-monitoring cluster-monitoring-operator-fdb9d949c-44w8r 1/2 CrashLoopBackOff 12 41m oc describe pod -n openshift-monitoring cluster-monitoring-operator-fdb9d949c-44w8r Name: cluster-monitoring-operator-fdb9d949c-44w8r Namespace: openshift-monitoring Priority: 2000000000 Priority Class Name: system-cluster-critical Node: openshift-master-0.qe1.kni.lab.eng.bos.redhat.com/10.19.134.13 Start Time: Sun, 23 May 2021 20:59:17 -0400 Labels: app=cluster-monitoring-operator pod-template-hash=fdb9d949c Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.128.0.96" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.128.0.96" ], "default": true, "dns": {} }] openshift.io/scc: restricted workload.openshift.io/warning: the node "openshift-master-0.qe1.kni.lab.eng.bos.redhat.com" does not have resource "management.workload.openshift.io/cores" Status: Running IP: 10.128.0.96 IPs: IP: 10.128.0.96 Controlled By: ReplicaSet/cluster-monitoring-operator-fdb9d949c Containers: kube-rbac-proxy: Container ID: cri-o://d40eaff5abbbbb282377648874bc863f40701d00426b7e2a016edb9f3b8f27b4 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:73daea39b02fbf384a6c0fdc5db7b6034d45112004633d72f508b31c6c5f1c3f Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:73daea39b02fbf384a6c0fdc5db7b6034d45112004633d72f508b31c6c5f1c3f Port: 8443/TCP Host Port: 0/TCP Args: --logtostderr --secure-listen-address=:8443 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 --upstream=http://127.0.0.1:8080/ --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key State: Running Started: Sun, 23 May 2021 20:59:21 -0400 Ready: True Restart Count: 0 Requests: cpu: 1m memory: 20Mi Environment: <none> Mounts: /etc/tls/private from cluster-monitoring-operator-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from cluster-monitoring-operator-token-zqgsw (ro) cluster-monitoring-operator: Container ID: cri-o://d4ba04b17e939f768d02b40daaba636b44aeebb0caf132c4931e76ae73717b0b Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:478333df826fbd4534d5bfc8f27ea5b01bb531d62cb18832a9da4d6a8bcc538f Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:478333df826fbd4534d5bfc8f27ea5b01bb531d62cb18832a9da4d6a8bcc538f Port: <none> Host Port: <none> Args: -namespace=openshift-monitoring -namespace-user-workload=openshift-user-workload-monitoring -configmap=cluster-monitoring-config -release-version=$(RELEASE_VERSION) -logtostderr=true -v=2 -images=prometheus-operator=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:01bcc9143ee529339cba78255b0eef80014022d43b78df6c00f3b90949a3e54b -images=prometheus-config-reloader=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7e27249e9080ed72bd2a720e62391dcd81f589a565978e7830aaa34f15daee4f -images=configmap-reloader=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5897a3a33b597a6d97912f3715642de762b64f0c39c982975d0417672351a1b5 -images=prometheus=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1bfa5629bfde2d2e045a600e0f83d3a47ad3740b2051e0f6f87ee02467af9330 -images=alertmanager=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4cb695db9dd455904b6f23b9b1201040286fa37d98aeb8fe1302f5c0f1794e83 -images=grafana=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:493ef314d2c1977ff66d65d3d905377f0a297d030d895078512bfbc878a8781f -images=oauth-proxy=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:013f14899294e4d6e18aab7ea9d0b6d98db99e477f49607d9287dc5caba3ec5d -images=node-exporter=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b5f57488e90465919487e47abffda690f346c108180b159b51cd693dbba197b1 -images=kube-state-metrics=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d427b8a5548b85b8fe8a2a673dfec821bf898197d8530c8372fd4053accb3179 -images=openshift-state-metrics=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:31ec557de595a7caf68a638a0363ac91dda60a85649b6b34d91d849b435384b0 -images=kube-rbac-proxy=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:73daea39b02fbf384a6c0fdc5db7b6034d45112004633d72f508b31c6c5f1c3f -images=telemeter-client=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:354c44628376adcc8e68592cdf4a67ba83f453d8a2deca013fa720be43beb1f4 -images=prom-label-proxy=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:833933f54d0d72bf2a6195b05800155a955c77ab961c468a1d724b777acf7cbb -images=k8s-prometheus-adapter=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:59cba985b3ba921ce66139743b463503ce7a83284f424e63b5ede6e278c6b623 -images=thanos=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55aaf9dc4d1e8495b543d886688db12b72b69a2826d89d3b0f44f0dbed30f86d State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: tasks.go:46] ran task 8 of 16: Updating node-exporter I0524 01:37:50.173099 1 tasks.go:46] ran task 11 of 16: Updating prometheus-adapter I0524 01:37:50.240853 1 tasks.go:46] ran task 12 of 16: Updating Telemeter client I0524 01:37:51.211841 1 tasks.go:46] ran task 1 of 16: Updating Prometheus Operator I0524 01:37:51.663992 1 tasks.go:46] ran task 4 of 16: Updating Grafana I0524 01:37:53.381295 1 tasks.go:46] ran task 14 of 16: Updating Thanos Querier panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x17f5c58] goroutine 296 [running]: github.com/openshift/cluster-monitoring-operator/pkg/client.(*Client).DeletePodDisruptionBudget(0xc000155970, 0x0, 0x0, 0x0) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/client/client.go:451 +0x98 github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*PrometheusUserWorkloadTask).destroy(0xc000758738, 0x226e214, 0xc000208070) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/prometheus_user_workload.go:311 +0x4ba github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*PrometheusUserWorkloadTask).Run(0xc000758738, 0xc000000000, 0x0) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/prometheus_user_workload.go:44 +0x6c github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TaskRunner).ExecuteTask(...) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/tasks.go:66 github.com/openshift/cluster-monitoring-operator/pkg/tasks.(*TaskRunner).RunAll.func1(0x0, 0x0) /go/src/github.com/openshift/cluster-monitoring-operator/pkg/tasks/tasks.go:45 +0x1d1 golang.org/x/sync/errgroup.(*Group).Go.func1(0xc000c81140, 0xc00044bb80) /go/src/github.com/openshift/cluster-monitoring-operator/vendor/golang.org/x/sync/errgroup/errgroup.go:57 +0x59 created by golang.org/x/sync/errgroup.(*Group).Go /go/src/github.com/openshift/cluster-monitoring-operator/vendor/golang.org/x/sync/errgroup/errgroup.go:54 +0x66 Exit Code: 2 Started: Sun, 23 May 2021 21:37:48 -0400 Finished: Sun, 23 May 2021 21:38:00 -0400 Ready: False Restart Count: 12 Requests: cpu: 10m memory: 75Mi Environment: RELEASE_VERSION: 4.8.0-0.nightly-2021-05-21-233425 Mounts: /etc/cluster-monitoring-operator/telemetry from telemetry-config (rw) /var/run/secrets/kubernetes.io/serviceaccount from cluster-monitoring-operator-token-zqgsw (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: telemetry-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: telemetry-config Optional: false cluster-monitoring-operator-tls: Type: Secret (a volume populated by a Secret) SecretName: cluster-monitoring-operator-tls Optional: true cluster-monitoring-operator-token-zqgsw: Type: Secret (a volume populated by a Secret) SecretName: cluster-monitoring-operator-token-zqgsw Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 120s node.kubernetes.io/unreachable:NoExecute op=Exists for 120s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 41m default-scheduler Successfully assigned openshift-monitoring/cluster-monitoring-operator-fdb9d949c-44w8r to openshift-master-0.qe1.kni.lab.eng.bos.redhat.com Normal AddedInterface 41m multus Add eth0 [10.128.0.96/23] Normal Pulled 41m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:73daea39b02fbf384a6c0fdc5db7b6034d45112004633d72f508b31c6c5f1c3f" already present on machine Normal Created 41m kubelet Created container kube-rbac-proxy Normal Started 41m kubelet Started container kube-rbac-proxy Normal Pulled 39m (x5 over 41m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:478333df826fbd4534d5bfc8f27ea5b01bb531d62cb18832a9da4d6a8bcc538f" already present on machine Normal Created 39m (x5 over 41m) kubelet Created container cluster-monitoring-operator Normal Started 39m (x5 over 41m) kubelet Started container cluster-monitoring-operator Warning BackOff 98s (x174 over 41m) kubelet Back-off restarting failed container
checked with the same 4.8.0-0.nightly-2021-05-21-233425 payload, no issue in HighlyAvailable cluster, this issue is only related to SNO # oc -n openshift-monitoring get pod | grep cluster-monitoring-operator cluster-monitoring-operator-fdb9d949c-vkl5q 2/2 Running 2 7h14m
bug 1963833 is a duplicate of this, but a PR for bug 1963833 [1] has already been created and linked to it, so I prefer if this one was closed. [1] https://github.com/openshift/cluster-monitoring-operator/pull/1176
*** This bug has been marked as a duplicate of bug 1963833 ***