Description of problem (please be detailed as possible and provide log snippests): The crashing of exporter daemons during upgrade from 4.13->4.14 is because in 4.13 we don't have fix required (Ceph upstream fix) which was recelty delivered to 4.14. Thus disabling exporter in 4.13 will resolve this. Version of all relevant components (if applicable): Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
(In reply to Travis Nielsen from comment #2) > The exporter was enabled in 4.13 for the rbd-mirroring metrics in > https://bugzilla.redhat.com/show_bug.cgi?id=2192875. > > If we disable the exporter in 4.13, those metrics will no longer be > collected. Why are those not needed anymore? > If this is the only fix to the upgrade issue, we need to understand all the > impact of this change. RDR metrics are targeted for 4.14 AFAIK. So we don't need to expose them in 4.13, thus disabling exporter
Verified with below version 1. fresh deployment with 4.13.4 version > csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.13.4-rhodf NooBaa Operator 4.13.4-rhodf mcg-operator.v4.13.3-rhodf Succeeded ocs-operator.v4.13.4-rhodf OpenShift Container Storage 4.13.4-rhodf ocs-operator.v4.13.3-rhodf Succeeded odf-csi-addons-operator.v4.13.4-rhodf CSI Addons 4.13.4-rhodf odf-csi-addons-operator.v4.13.3-rhodf Succeeded odf-operator.v4.13.4-rhodf OpenShift Data Foundation 4.13.4-rhodf odf-operator.v4.13.3-rhodf Succeeded > pods pod/compute-0-debug 1/1 Running 0 73s 10.1.113.2 compute-0 <none> <none> pod/compute-1-debug 1/1 Running 0 74s 10.1.112.244 compute-1 <none> <none> pod/compute-2-debug 1/1 Running 0 73s 10.1.113.1 compute-2 <none> <none> pod/csi-addons-controller-manager-688cc884bb-npv5m 2/2 Running 0 9m35s 10.131.0.16 compute-1 <none> <none> pod/csi-cephfsplugin-bmrth 2/2 Running 0 6m46s 10.1.113.2 compute-0 <none> <none> pod/csi-cephfsplugin-bx8dx 2/2 Running 0 6m46s 10.1.113.1 compute-2 <none> <none> pod/csi-cephfsplugin-j2ffr 2/2 Running 0 6m46s 10.1.112.244 compute-1 <none> <none> pod/csi-cephfsplugin-provisioner-6497549b7d-5qrhl 5/5 Running 0 6m46s 10.129.2.19 compute-2 <none> <none> pod/csi-cephfsplugin-provisioner-6497549b7d-jkjf6 5/5 Running 0 6m46s 10.128.2.23 compute-0 <none> <none> pod/csi-rbdplugin-2frll 3/3 Running 0 6m46s 10.1.112.244 compute-1 <none> <none> pod/csi-rbdplugin-h2qwb 3/3 Running 0 6m46s 10.1.113.1 compute-2 <none> <none> pod/csi-rbdplugin-provisioner-8bcbb667f-hfvgv 6/6 Running 0 6m46s 10.129.2.18 compute-2 <none> <none> pod/csi-rbdplugin-provisioner-8bcbb667f-pk8n9 6/6 Running 0 6m46s 10.131.0.19 compute-1 <none> <none> pod/csi-rbdplugin-r57rq 3/3 Running 0 6m46s 10.1.113.2 compute-0 <none> <none> pod/must-gather-gb7qk-helper 1/1 Running 0 74s 10.129.2.31 compute-2 <none> <none> pod/noobaa-core-0 1/1 Running 0 3m23s 10.131.0.27 compute-1 <none> <none> pod/noobaa-db-pg-0 1/1 Running 0 3m24s 10.128.2.35 compute-0 <none> <none> pod/noobaa-endpoint-586ccf6f76-clljw 1/1 Running 0 2m33s 10.131.0.30 compute-1 <none> <none> pod/noobaa-operator-55d8db996f-g75bl 1/1 Running 0 9m22s 10.131.0.17 compute-1 <none> <none> pod/ocs-metrics-exporter-76cfc5dbcc-x5jlt 1/1 Running 0 9m25s 10.128.2.22 compute-0 <none> <none> pod/ocs-operator-8968678dd-dvxbw 1/1 Running 0 9m26s 10.128.2.21 compute-0 <none> <none> pod/odf-console-b68b6665-4kbkv 1/1 Running 0 9m51s 10.129.2.15 compute-2 <none> <none> pod/odf-operator-controller-manager-7bdc8b5845-kxqtv 2/2 Running 0 9m51s 10.128.2.18 compute-0 <none> <none> pod/rook-ceph-crashcollector-compute-0-5c874c676b-5c5vs 1/1 Running 0 4m24s 10.128.2.27 compute-0 <none> <none> pod/rook-ceph-crashcollector-compute-1-856786dcc5-jprg5 1/1 Running 0 4m49s 10.131.0.23 compute-1 <none> <none> pod/rook-ceph-crashcollector-compute-2-cbd55f756-55zhc 1/1 Running 0 4m48s 10.129.2.22 compute-2 <none> <none> pod/rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6b87c888hvxl7 2/2 Running 0 3m40s 10.128.2.33 compute-0 <none> <none> pod/rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6fd7597788l9w 2/2 Running 0 3m38s 10.129.2.25 compute-2 <none> <none> pod/rook-ceph-mgr-a-597cc7b664-cb5mq 2/2 Running 0 4m49s 10.131.0.22 compute-1 <none> <none> pod/rook-ceph-mon-a-564fdf78bc-8s2hc 2/2 Running 0 5m58s 10.129.2.21 compute-2 <none> <none> pod/rook-ceph-mon-b-5868dfbdc-lpgsj 2/2 Running 0 5m24s 10.128.2.26 compute-0 <none> <none> pod/rook-ceph-mon-c-5d8d957c95-c29b8 2/2 Running 0 5m5s 10.131.0.21 compute-1 <none> <none> pod/rook-ceph-operator-844579548f-jdlkw 1/1 Running 0 6m51s 10.129.2.17 compute-2 <none> <none> pod/rook-ceph-osd-0-6fcb9b77fc-p485b 2/2 Running 0 4m1s 10.131.0.25 compute-1 <none> <none> pod/rook-ceph-osd-1-596dff5d78-md7t6 2/2 Running 0 3m57s 10.128.2.29 compute-0 <none> <none> pod/rook-ceph-osd-2-679d56dc64-jkrbv 2/2 Running 0 3m56s 10.129.2.24 compute-2 <none> <none> pod/rook-ceph-osd-prepare-8665b50972b04512c9c395e41ce5e174-qfhjr 0/1 Completed 0 4m27s 10.128.2.28 compute-0 <none> <none> pod/rook-ceph-osd-prepare-d4e430e34c62be49db03f4c8e16bbbe3-lw6wz 0/1 Completed 0 4m27s 10.131.0.24 compute-1 <none> <none> pod/rook-ceph-osd-prepare-e9f0014782b5e190d23c6f28f9c2bb20-7557j 0/1 Completed 0 4m27s 10.129.2.23 compute-2 <none> <none> pod/rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-68df6cflqtn4 2/2 Running 0 3m25s 10.131.0.26 compute-1 <none> <none> pod/rook-ceph-tools-84cd6ffb6-hnsgj 1/1 Running 0 3m43s 10.128.2.32 compute-0 <none> <none> > As expected, there is no "rook-ceph-exporter*" pods > must gather: https://url.corp.redhat.com/cc5fcc2 > job link: https://url.corp.redhat.com/5461caf 2. upgrade from 4.13.3-6 to 4.13.4-2 before upgrade, rook-ceph-exporter exists $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.13.3-rhodf NooBaa Operator 4.13.3-rhodf mcg-operator.v4.13.2-rhodf Succeeded ocs-operator.v4.13.3-rhodf OpenShift Container Storage 4.13.3-rhodf ocs-operator.v4.13.2-rhodf Succeeded odf-csi-addons-operator.v4.13.3-rhodf CSI Addons 4.13.3-rhodf odf-csi-addons-operator.v4.13.2-rhodf Succeeded odf-operator.v4.13.3-rhodf OpenShift Data Foundation 4.13.3-rhodf odf-operator.v4.13.2-rhodf Succeeded $ oc get csv odf-operator.v4.13.3-rhodf -o yaml | grep -i full_version full_version: 4.13.3-6 $ oc get pods | grep -i ceph-exporter rook-ceph-exporter-compute-0-7dc7797956-knqnz 1/1 Running 0 165m rook-ceph-exporter-compute-1-9896f587c-7m4hh 1/1 Running 0 165m rook-ceph-exporter-compute-2-f9774b458-vcmfx 1/1 Running 0 165m > after upgrade $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.13.4-rhodf NooBaa Operator 4.13.4-rhodf mcg-operator.v4.13.3-rhodf Succeeded ocs-operator.v4.13.4-rhodf OpenShift Container Storage 4.13.4-rhodf ocs-operator.v4.13.3-rhodf Succeeded odf-csi-addons-operator.v4.13.4-rhodf CSI Addons 4.13.4-rhodf odf-csi-addons-operator.v4.13.3-rhodf Succeeded odf-operator.v4.13.4-rhodf OpenShift Data Foundation 4.13.4-rhodf odf-operator.v4.13.3-rhodf Succeeded $ oc get csv odf-operator.v4.13.4-rhodf -o yaml | grep -i full_version full_version: 4.13.4-2 $ oc get pods | grep -i ceph-exporter $ job link: https://url.corp.redhat.com/203e604 logs: https://url.corp.redhat.com/c960bf7
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.4 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:6146