Description of problem (please be detailed as possible and provide log snippests): rook-ceph-operator in CLBO state after upgrading 4.8 ---> 4.9 ---> 4.10 Version of all relevant components (if applicable): upgraded from 4.8 ---> 4.9 ( ocs-operator.v4.9.4 ) ---> 4.10 ocs-registry:4.10.0-189 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? 1/1 Can this issue reproduce from the UI? Not tried If this is a regression, please provide more details to justify this: NA Steps to Reproduce: 1. In external mode, upgrade cluster from 4.8 ---> 4.9 ---> 4.10 2. check all pods are running Actual results: $ oc get csv ocs-operator.v4.10.0 NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.10.0 OpenShift Container Storage 4.10.0 ocs-operator.v4.9.4 Installing > pods status $ oc get pods NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-c6f4bcfdb-q2php 2/2 Running 0 23h csi-cephfsplugin-fsj7w 3/3 Running 0 25h csi-cephfsplugin-kljl4 3/3 Running 0 25h csi-cephfsplugin-provisioner-58c7b655f-d85dh 6/6 Running 0 25h csi-cephfsplugin-provisioner-58c7b655f-xllb6 6/6 Running 0 25h csi-cephfsplugin-r9f68 3/3 Running 0 25h csi-rbdplugin-22x49 3/3 Running 0 12h csi-rbdplugin-provisioner-5bc5c7fcd9-ld9td 6/6 Running 0 12h csi-rbdplugin-provisioner-5bc5c7fcd9-m5rxx 6/6 Running 0 12h csi-rbdplugin-x45gw 3/3 Running 0 12h csi-rbdplugin-xmmtn 3/3 Running 0 12h noobaa-core-0 1/1 Running 0 25h noobaa-db-pg-0 1/1 Running 0 25h noobaa-endpoint-564b5c9b76-mk8qd 1/1 Running 0 25h noobaa-operator-764b8f8569-tzf7b 1/1 Running 0 23h ocs-metrics-exporter-7c5d8b7bd9-q4q6n 1/1 Running 0 23h ocs-operator-d7fd9f5fb-925zg 1/1 Running 0 23h odf-console-f987957d9-79bld 1/1 Running 0 23h odf-operator-controller-manager-7f97874489-bjkrl 2/2 Running 0 23h rook-ceph-operator-7cb464db7d-mhq6n 0/1 CrashLoopBackOff 5 (2m37s ago) 5m58s rook-ceph-tools-external-5f456fb6cb-rd7j9 1/1 Running 0 25h Expected results: upgrade should success full and all pods should be in running state Additional info: > csv events $ oc describe csv ocs-operator.v4.10.0 Name: ocs-operator.v4.10.0 Namespace: openshift-storage Labels: full_version=4.10.0-189 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NeedsReinstall 157m (x600 over 23h) operator-lifecycle-manager installing: waiting for deployment rook-ceph-operator to become ready: deployment "rook-ceph-operator" not available: Deployment does not have minimum availability. Warning InstallCheckFailed 7m48s (x310 over 22h) operator-lifecycle-manager install timeout Normal InstallSucceeded 102s (x396 over 23h) operator-lifecycle-manager install strategy completed with no errors > rook-ceph-operator log has runtime error 2022-03-16 13:40:21.917865 I | op-bucket-prov: successfully reconciled bucket provisioner I0316 13:40:21.917924 1 manager.go:135] objectbucket.io/provisioner-manager "msg"="starting provisioner" "name"="openshift-storage.ceph.rook.io/bucket" panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1462005] goroutine 1045 [running]: github.com/rook/rook/pkg/apis/ceph.rook.io/v1.(*CephObjectStore).GetObjectKind(0x0, 0x0, 0x0) <autogenerated>:1 +0x5 github.com/rook/rook/pkg/operator/ceph/reporting.ReportReconcileResult(0xc00000c150, 0x23a9980, 0xc0009b47c0, 0x23df8f0, 0x0, 0xc000cb6c00, 0x0, 0x2370c80, 0xc001408150, 0xc001408150, ...) /remote-source/rook/app/pkg/operator/ceph/reporting/reporting.go:46 +0x4f github.com/rook/rook/pkg/operator/ceph/object.(*ReconcileCephObjectStore).Reconcile(0xc0000c3080, 0x23afb78, 0xc000b14270, 0xc000f60f48, 0x11, 0xc0001649c0, 0x2b, 0xc000b14270, 0xc000b14210, 0xc000b42db0, ...) /remote-source/rook/app/pkg/operator/ceph/object/controller.go:159 +0xc9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc000ae3220, 0x23afb78, 0xc000b14210, 0xc000f60f48, 0x11, 0xc0001649c0, 0x2b, 0xc000b14200, 0x0, 0x0, ...) /remote-source/rook/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114 +0x247 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000ae3220, 0x23afad0, 0xc0009b4fc0, 0x1e2bc00, 0xc000f9a4c0) /remote-source/rook/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311 +0x305 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000ae3220, 0x23afad0, 0xc0009b4fc0, 0x0) /remote-source/rook/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x205 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2(0xc000d3c930, 0xc000ae3220, 0x23afad0, 0xc0009b4fc0) /remote-source/rook/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x6b created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /remote-source/rook/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x425 Job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10829/consoleFull must gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/vavuthu-bz2064107/vavuthu-bz2064107_20220315T110140/logs/must-gather/
This behavior looks the same as that from https://bugzilla.redhat.com/show_bug.cgi?id=2061675. Supposedly, the fix for 2061675 is present in the version being tested here, but I wonder if maybe it isn't present until the next release. @vavuthu Is there a newer version of ODF 4.10 that can be used to re-test this behavior to see if it persists?
(In reply to Blaine Gardner from comment #4) > This behavior looks the same as that from > https://bugzilla.redhat.com/show_bug.cgi?id=2061675. Supposedly, the fix for > 2061675 is present in the version being tested here, but I wonder if maybe > it isn't present until the next release. > > @vavuthu Is there a newer version of ODF 4.10 that can be used to > re-test this behavior to see if it persists? tested with latest version of 4.10 ( 4.10.0-198 )and didn't see the issue job ( 4.9 to 4.10 external upgrade ) : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10914/ 4.8 to 4.9 external upgrade : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/3542/console > $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.0 NooBaa Operator 4.10.0 mcg-operator.v4.9.4 Succeeded ocs-operator.v4.10.0 OpenShift Container Storage 4.10.0 ocs-operator.v4.9.4 Succeeded odf-csi-addons-operator.v4.10.0 CSI Addons 4.10.0 Succeeded odf-operator.v4.10.0 OpenShift Data Foundation 4.10.0 odf-operator.v4.9.4 Succeeded $ oc get pods NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-57bbfd7479-gqv9w 2/2 Running 0 88m csi-cephfsplugin-9rp65 3/3 Running 0 87m csi-cephfsplugin-khkdm 3/3 Running 0 88m csi-cephfsplugin-provisioner-579ddb8f44-96d4f 6/6 Running 0 88m csi-cephfsplugin-provisioner-579ddb8f44-ckz7g 6/6 Running 0 88m csi-cephfsplugin-t7wsp 3/3 Running 0 87m csi-rbdplugin-68vzt 3/3 Running 0 88m csi-rbdplugin-g4msp 3/3 Running 0 87m csi-rbdplugin-provisioner-58887668cb-4nqwf 6/6 Running 0 88m csi-rbdplugin-provisioner-58887668cb-t745h 6/6 Running 0 88m csi-rbdplugin-w657c 3/3 Running 0 88m noobaa-core-0 1/1 Running 0 87m noobaa-db-pg-0 1/1 Running 0 87m noobaa-endpoint-8469489b8f-gb98t 1/1 Running 0 88m noobaa-endpoint-8469489b8f-r6fk5 1/1 Running 0 87m noobaa-operator-56948bd958-jqz7g 1/1 Running 0 89m ocs-metrics-exporter-7fd6498c-9gkx2 1/1 Running 0 88m ocs-operator-8b49d4986-zmckr 1/1 Running 0 88m odf-console-58b4b85cb-b2d6w 1/1 Running 0 90m odf-operator-controller-manager-7488dc497c-dwfpf 2/2 Running 0 90m rook-ceph-operator-5c54b594f-96rtd 1/1 Running 0 88m rook-ceph-tools-external-594b6f7978-bjzv4 1/1 Running 0 3h3m $ As this issue is not seen in latest versions, we can close this bug
Great. Thanks. Closing this since it seems to have been a duplicate of 2061675 given that the issue can no longer be reproduced with the latest version. *** This bug has been marked as a duplicate of bug 2061675 ***