Description of problem (please be detailed as possible and provide log snippests): I see that there are not completed jobs: $ oc get jobs -n openshift-storage NAME COMPLETIONS DURATION AGE rook-ceph-osd-prepare-6a0978641422c498e5e0b41e7c87e228 0/1 24h 24h rook-ceph-osd-prepare-8ef2e36cd02587aa3419a9f80dbd0029 0/1 24h 24h rook-ceph-osd-prepare-c7ff8662a257798d2324a04f91e5b0bb 0/1 24h 24h $ oc get pod -n openshift-storage NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-675f5fd4d8-z6nrc 2/2 Running 0 25h csi-cephfsplugin-5k45d 2/2 Running 0 25h csi-cephfsplugin-bfh4q 2/2 Running 0 25h csi-cephfsplugin-f72q2 2/2 Running 0 25h csi-cephfsplugin-provisioner-5cbc66774f-bclcz 5/5 Running 0 25h csi-cephfsplugin-provisioner-5cbc66774f-rhsqs 5/5 Running 0 25h csi-rbdplugin-5k96q 3/3 Running 0 25h csi-rbdplugin-65flp 3/3 Running 0 25h csi-rbdplugin-h9c82 3/3 Running 0 25h csi-rbdplugin-provisioner-584f74c4b5-hm2zt 6/6 Running 0 25h csi-rbdplugin-provisioner-584f74c4b5-rdx9t 6/6 Running 0 25h noobaa-operator-754bd488d-7lbgj 1/1 Running 0 25h ocs-metrics-exporter-7467bf64f8-sqhw6 1/1 Running 0 25h ocs-operator-55d4999fc5-rlwxb 1/1 Running 0 25h odf-console-779b55b44d-j6z2t 1/1 Running 0 25h odf-operator-controller-manager-5b58ff8bf8-vz4n9 2/2 Running 0 25h rook-ceph-crashcollector-compute-0-6579df6bff-cbvxn 1/1 Running 0 25h rook-ceph-crashcollector-compute-1-cc7885564-tslxc 1/1 Running 0 25h rook-ceph-crashcollector-compute-2-94bb898c9-w749v 1/1 Running 0 25h rook-ceph-exporter-compute-0-b6c56fbb6-g5mds 0/1 CreateContainerError 0 25h rook-ceph-exporter-compute-1-575549cfdc-nwc2b 0/1 CreateContainerError 0 25h rook-ceph-exporter-compute-1-b869f47fc-zr5fv 0/1 CreateContainerError 0 25h rook-ceph-exporter-compute-2-dd557f498-wx9rw 0/1 CreateContainerError 0 25h rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8dd6c9855bzrh 2/2 Running 0 25h rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-54d98b886fd2p 2/2 Running 0 25h rook-ceph-mgr-a-857fb49f5-zzgzf 2/2 Running 0 25h rook-ceph-mon-a-5c4dcb87fb-xbstm 2/2 Running 0 25h rook-ceph-mon-b-849677cf8c-4lv2w 2/2 Running 0 25h rook-ceph-mon-c-6dfb874f99-bdgnc 2/2 Running 0 25h rook-ceph-operator-7db78f9fb6-2wxbh 1/1 Running 0 25h rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-566dd7cz6csv 1/2 Running 339 (73s ago) 25h And we see rook-ceph-exporter in CreateContainerError. Version of all relevant components (if applicable): ocs-operator.v4.13.0-86 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Cannot deploy Is there any workaround available to the best of your knowledge? NO Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Probably yes Can this issue reproduce from the UI? Haven't tried If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Install ODF 4.13 with LSO 2. 3. Actual results: No OSD pods created Expected results: Have OSD pods created Additional info: Logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-107vue1cslv33-a/j-107vue1cslv33-a_20230221T105029/logs/failed_testcase_ocs_logs_1676978595/test_deployment_ocs_logs/ Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-trigger-vsphere-upi-encryption-1az-rhcos-vsan-lso-vmdk-3m-3w-acceptance/107/
Also observing this issue on IBM Z and also with the version v4.13.0-87.stable # oc get po -n openshift-storage NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-868bd8dd99-m74gb 2/2 Running 0 73m csi-cephfsplugin-jv7cc 2/2 Running 0 72m csi-cephfsplugin-pflc2 2/2 Running 0 72m csi-cephfsplugin-provisioner-77c4b58f4d-ls6cl 5/5 Running 0 72m csi-cephfsplugin-provisioner-77c4b58f4d-lx9dk 5/5 Running 0 72m csi-cephfsplugin-zzqhg 2/2 Running 0 72m csi-rbdplugin-bs7hh 3/3 Running 0 72m csi-rbdplugin-provisioner-668f9f7cc4-7dc57 6/6 Running 0 72m csi-rbdplugin-provisioner-668f9f7cc4-lt6mb 6/6 Running 0 72m csi-rbdplugin-wbqlq 3/3 Running 0 72m csi-rbdplugin-xbbtm 3/3 Running 0 72m noobaa-operator-7d484677fc-wn4k2 1/1 Running 0 73m ocs-metrics-exporter-7c5985796-ks9bm 1/1 Running 0 73m ocs-operator-b4765698c-7r6xg 1/1 Running 0 73m odf-console-5b4cb5c44b-256d2 1/1 Running 0 73m odf-operator-controller-manager-794ddf57b4-jplhp 2/2 Running 0 73m rook-ceph-crashcollector-worker-0.ocsa3e25001.lnxero1.boe-6vmpz 1/1 Running 0 70m rook-ceph-crashcollector-worker-1.ocsa3e25001.lnxero1.boe-5ptpn 1/1 Running 0 70m rook-ceph-crashcollector-worker-2.ocsa3e25001.lnxero1.boe-brlqp 1/1 Running 0 70m rook-ceph-exporter-worker-0.ocsa3e25001.lnxero1.boe-58c484gsvng 0/1 CreateContainerError 0 70m rook-ceph-exporter-worker-0.ocsa3e25001.lnxero1.boe-87fc4dbgx7x 0/1 CreateContainerError 0 70m rook-ceph-exporter-worker-1.ocsa3e25001.lnxero1.boe-859549bl94h 0/1 CreateContainerError 0 70m rook-ceph-exporter-worker-2.ocsa3e25001.lnxero1.boe-dc477d7d9jb 0/1 CreateContainerError 0 70m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-789776769x7ql 2/2 Running 0 70m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-57bcf5d8zmhmx 2/2 Running 0 70m rook-ceph-mgr-a-6c7fb66b77-vxxmd 2/2 Running 0 70m rook-ceph-mon-a-84c5b9678-tpjmx 2/2 Running 0 72m rook-ceph-mon-b-669669f847-r98m4 2/2 Running 0 71m rook-ceph-mon-c-7f9469b45-czdcv 2/2 Running 0 71m rook-ceph-operator-5dcf9494cd-9xs2d 1/1 Running 0 73m rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-7d8b8f9fsj75 1/2 Running 16 (4m50s ago) 70m
The operator log [1] shows the below failure from ceph-volume when attempting to create the OSD. This seems similar to a ceph-volume issue which has been fixed upstream [2]. The Ceph version in this repro is: ceph version 17.2.5-67.el9cp (0462778d88af57caea127c35d7b78e21ff0aef24) quincy (stable) This is coming from the downstream image: quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74 Guillaume, does this look like the same or related issue? If so, sounds like we just need to pick that up downstream. 2023-02-21T11:32:32.759232665Z 2023-02-21 11:32:32.759187 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-localblock-0-data-0bhwgg. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to initialize devices on PVC: failed to run ceph-volume. stderr: Bad argument "/mnt/ocs-deviceset-localblock-0-data-0bhwgg", expected an absolute path in /dev/ or /sys or a unit name: Invalid argument 2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph-authtool --gen-print-key 2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph-authtool --gen-print-key 2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new c5714959-d016-4467-aa24-c84135f1448f 2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph-authtool --gen-print-key 2023-02-21T11:32:32.759232665Z --> Was unable to complete a new OSD, will rollback changes 2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it 2023-02-21T11:32:32.759232665Z stderr: purged osd.0 2023-02-21T11:32:32.759232665Z Traceback (most recent call last): 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 91, in safe_prepare 2023-02-21T11:32:32.759232665Z self.prepare() 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root 2023-02-21T11:32:32.759232665Z return func(*a, **kw) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 127, in prepare 2023-02-21T11:32:32.759232665Z prepare_bluestore( 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 51, in prepare_bluestore 2023-02-21T11:32:32.759232665Z block = prepare_dmcrypt(key, block, 'block', fsid) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 23, in prepare_dmcrypt 2023-02-21T11:32:32.759232665Z kname = disk.lsblk(device)['KNAME'] 2023-02-21T11:32:32.759232665Z KeyError: 'KNAME' 2023-02-21T11:32:32.759232665Z 2023-02-21T11:32:32.759232665Z During handling of the above exception, another exception occurred: 2023-02-21T11:32:32.759232665Z 2023-02-21T11:32:32.759232665Z Traceback (most recent call last): 2023-02-21T11:32:32.759232665Z File "/usr/sbin/ceph-volume", line 33, in <module> 2023-02-21T11:32:32.759232665Z sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 41, in __init__ 2023-02-21T11:32:32.759232665Z self.main(self.argv) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc 2023-02-21T11:32:32.759232665Z return f(*a, **kw) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 153, in main 2023-02-21T11:32:32.759232665Z terminal.dispatch(self.mapper, subcommand_args) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch 2023-02-21T11:32:32.759232665Z instance.main() 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/main.py", line 32, in main 2023-02-21T11:32:32.759232665Z terminal.dispatch(self.mapper, self.argv) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch 2023-02-21T11:32:32.759232665Z instance.main() 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 169, in main 2023-02-21T11:32:32.759232665Z self.safe_prepare(self.args) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 95, in safe_prepare 2023-02-21T11:32:32.759232665Z rollback_osd(self.args, self.osd_id) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd 2023-02-21T11:32:32.759232665Z Zap(['--destroy', '--osd-id', osd_id]).main() 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 404, in main 2023-02-21T11:32:32.759232665Z self.zap_osd() 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root 2023-02-21T11:32:32.759232665Z return func(*a, **kw) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 301, in zap_osd 2023-02-21T11:32:32.759232665Z devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid) 2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 87, in find_associated_devices 2023-02-21T11:32:32.759232665Z raise RuntimeError('Unable to find any LV for zapping OSD: ' [1] http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-107vue1cslv33-a/j-107vue1cslv33-a_20230221T105029/logs/failed_testcase_ocs_logs_1676978595/test_deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-0bfc087e607cb167604734bc029edbccaf247d749e80e7db39901916cb85226a/namespaces/openshift-storage/pods/rook-ceph-operator-7db78f9fb6-2wxbh/rook-ceph-operator/rook-ceph-operator/logs/current.log [2] https://tracker.ceph.com/issues/58137
Same issue on Vmware IPI [ODF4.13] $ oc describe csv odf-operator.v4.13.0 -n openshift-storage | grep full Labels: full_version=4.13.0-88 Server Version: 4.13.0-0.nightly-2023-02-21-014524 $ oc get pods NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-7bfd5fb7cf-lk5f2 2/2 Running 0 168m csi-cephfsplugin-2hhbb 2/2 Running 0 48m csi-cephfsplugin-djqvv 2/2 Running 0 48m csi-cephfsplugin-g4h67 2/2 Running 0 48m csi-cephfsplugin-provisioner-57b59c7588-pr2c8 5/5 Running 0 48m csi-cephfsplugin-provisioner-57b59c7588-rgpth 5/5 Running 0 48m csi-rbdplugin-7zlv4 3/3 Running 0 48m csi-rbdplugin-d8bsm 3/3 Running 0 48m csi-rbdplugin-p7tnh 3/3 Running 0 48m csi-rbdplugin-provisioner-79744c94b9-fpzgz 6/6 Running 0 48m csi-rbdplugin-provisioner-79744c94b9-s8wgt 6/6 Running 0 48m noobaa-operator-7f4f4756c-9rqsv 1/1 Running 0 168m ocs-metrics-exporter-64f44dbc4b-wlndn 1/1 Running 0 167m ocs-operator-6bc4c886bc-jhfwx 1/1 Running 0 167m odf-console-55f557999f-dlzql 1/1 Running 0 168m odf-operator-controller-manager-746575b65-hwnjm 2/2 Running 0 168m rook-ceph-crashcollector-oviner59-vmware-i-25lqc-worker-hmk99rt 1/1 Running 0 36m rook-ceph-crashcollector-oviner59-vmware-i-25lqc-worker-nrh6xcz 1/1 Running 0 36m rook-ceph-crashcollector-oviner59-vmware-i-25lqc-worker-vjwjxz5 1/1 Running 0 36m rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-hmpj6-c4wf4lq 0/1 CreateContainerError 0 36m rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-nrqx9-68rrvz5 0/1 CreateContainerError 0 36m rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-vjlkz-64z655z 0/1 CreateContainerError 0 36m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-75b6cfb8sf6qg 2/2 Running 0 34m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-74575cd777mp2 2/2 Running 0 34m rook-ceph-mgr-a-6cf948bc97-x4bnd 2/2 Running 0 36m rook-ceph-mon-a-f8648c4f9-6rjqb 2/2 Running 0 38m rook-ceph-mon-b-56bb9f5957-wq22d 2/2 Running 0 38m rook-ceph-mon-c-6888bfbd99-g92dw 2/2 Running 0 37m rook-ceph-operator-7b48fdc47-qr8m5 1/1 Running 0 167m rook-ceph-osd-0-d99c68b4f-dg4l7 2/2 Running 0 34m rook-ceph-osd-prepare-6eb5add5463b09dc9ee447eb1a6ab358-7sfl6 0/1 Completed 0 36m rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-8964675g44cd 1/2 Running 8 (2m1s ago) 34m $ oc get storageclusters.ocs.openshift.io NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 48m Progressing 2023-02-23T11:40:32Z 4.13.0 Status: Conditions: Last Heartbeat Time: 2023-02-23T12:30:25Z Last Transition Time: 2023-02-23T11:40:34Z Message: Error while reconciling: some StorageClasses were skipped while waiting for pre-requisites to be met: [ocs-storagecluster-ceph-rbd] Reason: ReconcileFailed Status: False Type: ReconcileComplete $ oc describe pod rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-hmpj6-c4wf4lq Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 37m default-scheduler Successfully assigned openshift-storage/rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-hmpj6-c4wf4lq to oviner59-vmware-i-25lqc-worker-hmpj6 Normal AddedInterface 37m multus Add eth0 [10.129.2.30/23] from ovn-kubernetes Normal Pulled 37m kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already present on machine Normal Created 37m kubelet Created container chown-container-data-dir Normal Started 37m kubelet Started container chown-container-data-dir Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:52:43Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:52:44Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:52:56Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:53:10Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:53:23Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:53:37Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Warning Failed 35m kubelet Error: container create failed: time="2023-02-23T11:53:48Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Warning Failed 35m kubelet Error: container create failed: time="2023-02-23T11:54:02Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Warning Failed 35m kubelet Error: container create failed: time="2023-02-23T11:54:17Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Warning Failed 35m (x2 over 35m) kubelet (combined from similar events): Error: container create failed: time="2023-02-23T11:54:42Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Normal Pulled 116s (x161 over 36m) kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already present on machine
know ceph issue. Latest plan is to have this fix in ceph 6.0. Related BZ https://bugzilla.redhat.com/show_bug.cgi?id=2170925
We are also observing this issue on Power. after creating storagesystem with multus enabled. [root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-ppc64le-2023-02-17-084453 True False 2d Cluster version is 4.13.0-0.nightly-ppc64le-2023-02-17-084453 [root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc describe csv odf-operator.v4.13.0 -n openshift-storage | grep full Labels: full_version=4.13.0-92 [root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc get pods NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-65d8d5494c-6xqm2 2/2 Running 0 74m csi-cephfsplugin-9bscw 2/2 Running 0 17h csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-5k2vh 1/1 Running 0 17h csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-rcnhw 1/1 Running 0 17h csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-whk56 1/1 Running 0 17h csi-cephfsplugin-provisioner-796b5c797b-bjxnf 5/5 Running 0 17h csi-cephfsplugin-provisioner-796b5c797b-srb7m 5/5 Running 0 17h csi-cephfsplugin-r4kgl 2/2 Running 0 17h csi-cephfsplugin-z58wm 2/2 Running 0 17h csi-rbdplugin-2vn7s 3/3 Running 0 74m csi-rbdplugin-7glk6 3/3 Running 0 74m csi-rbdplugin-holder-ocs-storagecluster-cephcluster-gw8g4 1/1 Running 0 17h csi-rbdplugin-holder-ocs-storagecluster-cephcluster-jvcf7 1/1 Running 0 17h csi-rbdplugin-holder-ocs-storagecluster-cephcluster-qlnr6 1/1 Running 0 17h csi-rbdplugin-provisioner-76868b57b-cd2kq 6/6 Running 0 74m csi-rbdplugin-provisioner-76868b57b-qtqpv 6/6 Running 0 74m csi-rbdplugin-rjpzw 3/3 Running 0 74m noobaa-operator-65fd7fd66b-csbbn 1/1 Running 0 75m ocs-metrics-exporter-5d5b75d775-qd6z2 1/1 Running 0 75m ocs-operator-fb99f4b-mrlm5 1/1 Running 0 75m odf-console-df4db7d66-m2r9f 1/1 Running 0 76m odf-operator-controller-manager-559d5c8958-hqdrl 2/2 Running 0 76m rook-ceph-crashcollector-390279bcc8f75bdec1ffce3b8152fb1b-6fdt9 1/1 Running 0 75m rook-ceph-crashcollector-3d442b29c4d43fa6c6654a521ab8e866-rmwnw 1/1 Running 0 74m rook-ceph-crashcollector-d67f4488231c2d93d9117a394e78de57-k6rrm 1/1 Running 0 75m rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cnkh7z 0/1 CreateContainerError 0 17h rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-7547d5vn469 0/1 CreateContainerError 0 7h31m rook-ceph-exporter-3d442b29c4d43fa6c6654a521ab8e866-8667779ttbt 0/1 CreateContainerError 0 7h31m rook-ceph-exporter-3d442b29c4d43fa6c6654a521ab8e866-b9ff76klx4n 0/1 CreateContainerError 0 7h28m rook-ceph-exporter-d67f4488231c2d93d9117a394e78de57-567c46zczpq 0/1 CreateContainerError 0 7h31m rook-ceph-exporter-d67f4488231c2d93d9117a394e78de57-887f9b6bkqj 0/1 CreateContainerError 0 17h rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5495c847lj6s5 2/2 Running 0 17h rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-794bbf96dgk85 2/2 Running 0 17h rook-ceph-mgr-a-5bccc8cff8-s6w9q 3/3 Running 0 17h rook-ceph-mon-a-744bbfb85d-95np9 2/2 Running 0 17h rook-ceph-mon-b-5ddfd8fbb7-55pb9 2/2 Running 0 17h rook-ceph-mon-c-fc6c5789-lkqnh 2/2 Running 0 17h rook-ceph-operator-7f5bd8884c-nwxgw 1/1 Running 0 75m rook-ceph-osd-0-69b97cb99c-jqmjn 2/2 Running 0 7h30m rook-ceph-osd-1-5cf854cbf8-ffzlm 2/2 Running 0 7h29m rook-ceph-osd-2-7cb59dd54-f25fq 2/2 Running 0 7h28m rook-ceph-osd-prepare-10eb7a6b0fd146a33ba8e36ba2f9e992-t62hg 0/1 Completed 0 17h rook-ceph-osd-prepare-607f980ddb507c9429f3970fb79f9e79-2gwth 0/1 Completed 0 17h rook-ceph-osd-prepare-d7a1a3dce4f33f9d6dfd00a4e026bc19-6rk2g 0/1 Completed 0 17h rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-69f6db5lr8xr 1/2 Running 228 (2m56s ago) 17h [root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# [root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc get storageclusters.ocs.openshift.io NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 17h Progressing 2023-02-28T12:30:14Z 4.13.0 Status: Conditions: Last Heartbeat Time: 2023-03-01T05:36:02Z Last Transition Time: 2023-02-28T12:30:15Z Message: Error while reconciling: some StorageClasses were skipped while waiting for pre-requisites to be met: [ocs-storagecluster-ceph-rbd] Reason: ReconcileFailed Status: False Type: ReconcileComplete Last Heartbeat Time: 2023-02-28T12:30:15Z Last Transition Time: 2023-02-28T12:30:15Z Message: Initializing StorageCluster Reason: Init [root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc describe pod rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cnkh7z Name: rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cnkh7z Namespace: openshift-storage Priority: 0 Service Account: default Node: syd05-worker-2.nara1-cicd-odf-1c53.redhat.com/192.168.0.164 Start Time: Tue, 28 Feb 2023 07:32:48 -0500 Labels: app=rook-ceph-exporter ceph-version=17.2.5-67 kubernetes.io/hostname=syd05-worker-2.nara1-cicd-odf-1c53.redhat.com node_name=syd05-worker-2.nara1-cicd-odf-1c53.redhat.com pod-template-hash=6fc75cf4ff rook-version=v4.13.0-0.4abaa33873c8984c8df04d06debc120eb61919c9 Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.128.2.238" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.128.2.238" ], "default": true, "dns": {} }] openshift.io/scc: rook-ceph prometheus.io/port: 9926 prometheus.io/scrape: true Status: Pending IP: 10.128.2.238 IPs: IP: 10.128.2.238 Controlled By: ReplicaSet/rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cf4ff Init Containers: chown-container-data-dir: Container ID: cri-o://8215f6e89ce41c13490c744b9dce3893c8d53583db57ce5d1341a1b19b0067fd Image: quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74 Image ID: quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74 Port: <none> Host Port: <none> Command: chown Args: --verbose --recursive ceph:ceph /var/log/ceph /var/lib/ceph/crash /run/ceph State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 28 Feb 2023 07:32:52 -0500 Finished: Tue, 28 Feb 2023 07:32:52 -0500 Ready: True Restart Count: 0 Environment: <none> Mounts: /etc/ceph from rook-config-override (ro) /run/ceph from ceph-daemons-sock-dir (rw) /var/lib/ceph/crash from rook-ceph-crash (rw) /var/lib/rook/openshift-storage from ceph-conf-dir (rw) /var/log/ceph from rook-ceph-log (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfr5m (ro) Containers: ceph-exporter: Container ID: Image: quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74 Image ID: Port: <none> Host Port: <none> Command: ceph-exporter Args: --conf /var/lib/rook/openshift-storage/openshift-storage.config --sock-dir /run/ceph --port 9926 --prio-limit 5 --stats-period 5 State: Waiting Reason: CreateContainerError Ready: False Restart Count: 0 Environment: <none> Mounts: /etc/ceph from rook-config-override (ro) /run/ceph from ceph-daemons-sock-dir (rw) /var/lib/ceph/crash from rook-ceph-crash (rw) /var/lib/rook/openshift-storage from ceph-conf-dir (rw) /var/log/ceph from rook-ceph-log (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfr5m (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: rook-config-override: Type: Projected (a volume that contains injected data from multiple sources) ConfigMapName: rook-config-override ConfigMapOptional: <nil> ceph-daemons-sock-dir: Type: HostPath (bare host directory volume) Path: /var/lib/rook/exporter HostPathType: DirectoryOrCreate rook-ceph-log: Type: HostPath (bare host directory volume) Path: /var/lib/rook/openshift-storage/log HostPathType: rook-ceph-crash: Type: HostPath (bare host directory volume) Path: /var/lib/rook/openshift-storage/crash HostPathType: ceph-conf-dir: Type: HostPath (bare host directory volume) Path: /var/lib/rook/openshift-storage HostPathType: Directory kube-api-access-gfr5m: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: kubernetes.io/hostname=syd05-worker-2.nara1-cicd-odf-1c53.redhat.com Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 5s node.ocs.openshift.io/storage=true:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Failed 139m (x3807 over 17h) kubelet (combined from similar events): Error: container create failed: time="2023-03-01T03:17:51Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH" Normal Pulled 4m39s (x4398 over 17h) kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already present on machine [root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]#
(In reply to narayanspg from comment #8) > We are also observing this issue on Power. after creating storagesystem with > multus enabled. > > > Events: > Type Reason Age From Message > ---- ------ ---- ---- ------- > Warning Failed 139m (x3807 over 17h) kubelet (combined from similar > events): Error: container create failed: time="2023-03-01T03:17:51Z" > level=error msg="runc create failed: unable to start container process: > exec: \"ceph-exporter\": executable file not found in $PATH" > Normal Pulled 4m39s (x4398 over 17h) kubelet Container image > "quay.io/rhceph-dev/rhceph@sha256: > c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already > present on machine > [root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# We are planning to disable ceph exporter in rook. This is being tracked here -https://bugzilla.redhat.com/show_bug.cgi?id=2173934
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days