Bug 2172521 - No OSD pods are created for 4.13 LSO deployment
Summary: No OSD pods are created for 4.13 LSO deployment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.13.0
Assignee: Santosh Pillai
QA Contact: Petr Balogh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-22 12:37 UTC by Petr Balogh
Modified: 2023-12-08 04:32 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Rook has a specific use case where devices are copied in /mnt If the basename (in /mnt) is different from the original device name, then the current logic can't match it. Consequence: OSD prepare pod fails to find the device. Fix: The idea is to append the device to the lsblk command and return the result. Result: OSD prepare pod is able to prepare osd on the device.
Clone Of:
Environment:
Last Closed: 2023-06-21 15:24:01 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2023:3742 0 None None None 2023-06-21 15:24:27 UTC

Description Petr Balogh 2023-02-22 12:37:50 UTC
Description of problem (please be detailed as possible and provide log
snippests):
I see that there are not completed jobs:
$ oc get jobs -n openshift-storage
NAME                                                     COMPLETIONS   DURATION   AGE
rook-ceph-osd-prepare-6a0978641422c498e5e0b41e7c87e228   0/1           24h        24h
rook-ceph-osd-prepare-8ef2e36cd02587aa3419a9f80dbd0029   0/1           24h        24h
rook-ceph-osd-prepare-c7ff8662a257798d2324a04f91e5b0bb   0/1           24h        24h

$ oc get pod -n openshift-storage
NAME                                                              READY   STATUS                 RESTARTS        AGE
csi-addons-controller-manager-675f5fd4d8-z6nrc                    2/2     Running                0               25h
csi-cephfsplugin-5k45d                                            2/2     Running                0               25h
csi-cephfsplugin-bfh4q                                            2/2     Running                0               25h
csi-cephfsplugin-f72q2                                            2/2     Running                0               25h
csi-cephfsplugin-provisioner-5cbc66774f-bclcz                     5/5     Running                0               25h
csi-cephfsplugin-provisioner-5cbc66774f-rhsqs                     5/5     Running                0               25h
csi-rbdplugin-5k96q                                               3/3     Running                0               25h
csi-rbdplugin-65flp                                               3/3     Running                0               25h
csi-rbdplugin-h9c82                                               3/3     Running                0               25h
csi-rbdplugin-provisioner-584f74c4b5-hm2zt                        6/6     Running                0               25h
csi-rbdplugin-provisioner-584f74c4b5-rdx9t                        6/6     Running                0               25h
noobaa-operator-754bd488d-7lbgj                                   1/1     Running                0               25h
ocs-metrics-exporter-7467bf64f8-sqhw6                             1/1     Running                0               25h
ocs-operator-55d4999fc5-rlwxb                                     1/1     Running                0               25h
odf-console-779b55b44d-j6z2t                                      1/1     Running                0               25h
odf-operator-controller-manager-5b58ff8bf8-vz4n9                  2/2     Running                0               25h
rook-ceph-crashcollector-compute-0-6579df6bff-cbvxn               1/1     Running                0               25h
rook-ceph-crashcollector-compute-1-cc7885564-tslxc                1/1     Running                0               25h
rook-ceph-crashcollector-compute-2-94bb898c9-w749v                1/1     Running                0               25h
rook-ceph-exporter-compute-0-b6c56fbb6-g5mds                      0/1     CreateContainerError   0               25h
rook-ceph-exporter-compute-1-575549cfdc-nwc2b                     0/1     CreateContainerError   0               25h
rook-ceph-exporter-compute-1-b869f47fc-zr5fv                      0/1     CreateContainerError   0               25h
rook-ceph-exporter-compute-2-dd557f498-wx9rw                      0/1     CreateContainerError   0               25h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8dd6c9855bzrh   2/2     Running                0               25h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-54d98b886fd2p   2/2     Running                0               25h
rook-ceph-mgr-a-857fb49f5-zzgzf                                   2/2     Running                0               25h
rook-ceph-mon-a-5c4dcb87fb-xbstm                                  2/2     Running                0               25h
rook-ceph-mon-b-849677cf8c-4lv2w                                  2/2     Running                0               25h
rook-ceph-mon-c-6dfb874f99-bdgnc                                  2/2     Running                0               25h
rook-ceph-operator-7db78f9fb6-2wxbh                               1/1     Running                0               25h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-566dd7cz6csv   1/2     Running                339 (73s ago)   25h


And we see rook-ceph-exporter in CreateContainerError.

Version of all relevant components (if applicable):
ocs-operator.v4.13.0-86


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Cannot deploy

Is there any workaround available to the best of your knowledge?
NO

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Probably yes

Can this issue reproduce from the UI?
Haven't tried

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install ODF 4.13 with LSO
2.
3.


Actual results:
No OSD pods created

Expected results:
Have OSD pods created

Additional info:

Logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-107vue1cslv33-a/j-107vue1cslv33-a_20230221T105029/logs/failed_testcase_ocs_logs_1676978595/test_deployment_ocs_logs/

Jenkins job:
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-trigger-vsphere-upi-encryption-1az-rhcos-vsan-lso-vmdk-3m-3w-acceptance/107/

Comment 4 Sravika 2023-02-22 16:42:35 UTC
Also observing this issue on IBM Z and also with the version v4.13.0-87.stable

# oc get po -n openshift-storage
NAME                                                              READY   STATUS                 RESTARTS         AGE
csi-addons-controller-manager-868bd8dd99-m74gb                    2/2     Running                0                73m
csi-cephfsplugin-jv7cc                                            2/2     Running                0                72m
csi-cephfsplugin-pflc2                                            2/2     Running                0                72m
csi-cephfsplugin-provisioner-77c4b58f4d-ls6cl                     5/5     Running                0                72m
csi-cephfsplugin-provisioner-77c4b58f4d-lx9dk                     5/5     Running                0                72m
csi-cephfsplugin-zzqhg                                            2/2     Running                0                72m
csi-rbdplugin-bs7hh                                               3/3     Running                0                72m
csi-rbdplugin-provisioner-668f9f7cc4-7dc57                        6/6     Running                0                72m
csi-rbdplugin-provisioner-668f9f7cc4-lt6mb                        6/6     Running                0                72m
csi-rbdplugin-wbqlq                                               3/3     Running                0                72m
csi-rbdplugin-xbbtm                                               3/3     Running                0                72m
noobaa-operator-7d484677fc-wn4k2                                  1/1     Running                0                73m
ocs-metrics-exporter-7c5985796-ks9bm                              1/1     Running                0                73m
ocs-operator-b4765698c-7r6xg                                      1/1     Running                0                73m
odf-console-5b4cb5c44b-256d2                                      1/1     Running                0                73m
odf-operator-controller-manager-794ddf57b4-jplhp                  2/2     Running                0                73m
rook-ceph-crashcollector-worker-0.ocsa3e25001.lnxero1.boe-6vmpz   1/1     Running                0                70m
rook-ceph-crashcollector-worker-1.ocsa3e25001.lnxero1.boe-5ptpn   1/1     Running                0                70m
rook-ceph-crashcollector-worker-2.ocsa3e25001.lnxero1.boe-brlqp   1/1     Running                0                70m
rook-ceph-exporter-worker-0.ocsa3e25001.lnxero1.boe-58c484gsvng   0/1     CreateContainerError   0                70m
rook-ceph-exporter-worker-0.ocsa3e25001.lnxero1.boe-87fc4dbgx7x   0/1     CreateContainerError   0                70m
rook-ceph-exporter-worker-1.ocsa3e25001.lnxero1.boe-859549bl94h   0/1     CreateContainerError   0                70m
rook-ceph-exporter-worker-2.ocsa3e25001.lnxero1.boe-dc477d7d9jb   0/1     CreateContainerError   0                70m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-789776769x7ql   2/2     Running                0                70m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-57bcf5d8zmhmx   2/2     Running                0                70m
rook-ceph-mgr-a-6c7fb66b77-vxxmd                                  2/2     Running                0                70m
rook-ceph-mon-a-84c5b9678-tpjmx                                   2/2     Running                0                72m
rook-ceph-mon-b-669669f847-r98m4                                  2/2     Running                0                71m
rook-ceph-mon-c-7f9469b45-czdcv                                   2/2     Running                0                71m
rook-ceph-operator-5dcf9494cd-9xs2d                               1/1     Running                0                73m
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-7d8b8f9fsj75   1/2     Running                16 (4m50s ago)   70m

Comment 5 Travis Nielsen 2023-02-22 18:36:43 UTC
The operator log [1] shows the below failure from ceph-volume when attempting to create the OSD. 

This seems similar to a ceph-volume issue which has been fixed upstream [2].

The Ceph version in this repro is: 
ceph version 17.2.5-67.el9cp (0462778d88af57caea127c35d7b78e21ff0aef24) quincy (stable)

This is coming from the downstream image: 
quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74

Guillaume, does this look like the same or related issue? If so, sounds like we just need to pick that up downstream.


2023-02-21T11:32:32.759232665Z 2023-02-21 11:32:32.759187 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-localblock-0-data-0bhwgg. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to initialize devices on PVC: failed to run ceph-volume. stderr: Bad argument "/mnt/ocs-deviceset-localblock-0-data-0bhwgg", expected an absolute path in /dev/ or /sys or a unit name: Invalid argument
2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph-authtool --gen-print-key
2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph-authtool --gen-print-key
2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new c5714959-d016-4467-aa24-c84135f1448f
2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph-authtool --gen-print-key
2023-02-21T11:32:32.759232665Z --> Was unable to complete a new OSD, will rollback changes
2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
2023-02-21T11:32:32.759232665Z  stderr: purged osd.0
2023-02-21T11:32:32.759232665Z Traceback (most recent call last):
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 91, in safe_prepare
2023-02-21T11:32:32.759232665Z     self.prepare()
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
2023-02-21T11:32:32.759232665Z     return func(*a, **kw)
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 127, in prepare
2023-02-21T11:32:32.759232665Z     prepare_bluestore(
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 51, in prepare_bluestore
2023-02-21T11:32:32.759232665Z     block = prepare_dmcrypt(key, block, 'block', fsid)
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 23, in prepare_dmcrypt
2023-02-21T11:32:32.759232665Z     kname = disk.lsblk(device)['KNAME']
2023-02-21T11:32:32.759232665Z KeyError: 'KNAME'
2023-02-21T11:32:32.759232665Z 
2023-02-21T11:32:32.759232665Z During handling of the above exception, another exception occurred:
2023-02-21T11:32:32.759232665Z 
2023-02-21T11:32:32.759232665Z Traceback (most recent call last):
2023-02-21T11:32:32.759232665Z   File "/usr/sbin/ceph-volume", line 33, in <module>
2023-02-21T11:32:32.759232665Z     sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')())
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 41, in __init__
2023-02-21T11:32:32.759232665Z     self.main(self.argv)
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2023-02-21T11:32:32.759232665Z     return f(*a, **kw)
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 153, in main
2023-02-21T11:32:32.759232665Z     terminal.dispatch(self.mapper, subcommand_args)
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2023-02-21T11:32:32.759232665Z     instance.main()
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/main.py", line 32, in main
2023-02-21T11:32:32.759232665Z     terminal.dispatch(self.mapper, self.argv)
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2023-02-21T11:32:32.759232665Z     instance.main()
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 169, in main
2023-02-21T11:32:32.759232665Z     self.safe_prepare(self.args)
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 95, in safe_prepare
2023-02-21T11:32:32.759232665Z     rollback_osd(self.args, self.osd_id)
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
2023-02-21T11:32:32.759232665Z     Zap(['--destroy', '--osd-id', osd_id]).main()
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 404, in main
2023-02-21T11:32:32.759232665Z     self.zap_osd()
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
2023-02-21T11:32:32.759232665Z     return func(*a, **kw)
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 301, in zap_osd
2023-02-21T11:32:32.759232665Z     devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
2023-02-21T11:32:32.759232665Z   File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 87, in find_associated_devices
2023-02-21T11:32:32.759232665Z     raise RuntimeError('Unable to find any LV for zapping OSD: '


[1] http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-107vue1cslv33-a/j-107vue1cslv33-a_20230221T105029/logs/failed_testcase_ocs_logs_1676978595/test_deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-0bfc087e607cb167604734bc029edbccaf247d749e80e7db39901916cb85226a/namespaces/openshift-storage/pods/rook-ceph-operator-7db78f9fb6-2wxbh/rook-ceph-operator/rook-ceph-operator/logs/current.log

[2] https://tracker.ceph.com/issues/58137

Comment 6 Oded 2023-02-23 12:31:41 UTC
Same issue on Vmware IPI [ODF4.13]

$  oc describe csv odf-operator.v4.13.0 -n openshift-storage | grep full
Labels:       full_version=4.13.0-88

Server Version: 4.13.0-0.nightly-2023-02-21-014524

$ oc get pods 
NAME                                                              READY   STATUS                 RESTARTS       AGE
csi-addons-controller-manager-7bfd5fb7cf-lk5f2                    2/2     Running                0              168m
csi-cephfsplugin-2hhbb                                            2/2     Running                0              48m
csi-cephfsplugin-djqvv                                            2/2     Running                0              48m
csi-cephfsplugin-g4h67                                            2/2     Running                0              48m
csi-cephfsplugin-provisioner-57b59c7588-pr2c8                     5/5     Running                0              48m
csi-cephfsplugin-provisioner-57b59c7588-rgpth                     5/5     Running                0              48m
csi-rbdplugin-7zlv4                                               3/3     Running                0              48m
csi-rbdplugin-d8bsm                                               3/3     Running                0              48m
csi-rbdplugin-p7tnh                                               3/3     Running                0              48m
csi-rbdplugin-provisioner-79744c94b9-fpzgz                        6/6     Running                0              48m
csi-rbdplugin-provisioner-79744c94b9-s8wgt                        6/6     Running                0              48m
noobaa-operator-7f4f4756c-9rqsv                                   1/1     Running                0              168m
ocs-metrics-exporter-64f44dbc4b-wlndn                             1/1     Running                0              167m
ocs-operator-6bc4c886bc-jhfwx                                     1/1     Running                0              167m
odf-console-55f557999f-dlzql                                      1/1     Running                0              168m
odf-operator-controller-manager-746575b65-hwnjm                   2/2     Running                0              168m
rook-ceph-crashcollector-oviner59-vmware-i-25lqc-worker-hmk99rt   1/1     Running                0              36m
rook-ceph-crashcollector-oviner59-vmware-i-25lqc-worker-nrh6xcz   1/1     Running                0              36m
rook-ceph-crashcollector-oviner59-vmware-i-25lqc-worker-vjwjxz5   1/1     Running                0              36m
rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-hmpj6-c4wf4lq   0/1     CreateContainerError   0              36m
rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-nrqx9-68rrvz5   0/1     CreateContainerError   0              36m
rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-vjlkz-64z655z   0/1     CreateContainerError   0              36m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-75b6cfb8sf6qg   2/2     Running                0              34m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-74575cd777mp2   2/2     Running                0              34m
rook-ceph-mgr-a-6cf948bc97-x4bnd                                  2/2     Running                0              36m
rook-ceph-mon-a-f8648c4f9-6rjqb                                   2/2     Running                0              38m
rook-ceph-mon-b-56bb9f5957-wq22d                                  2/2     Running                0              38m
rook-ceph-mon-c-6888bfbd99-g92dw                                  2/2     Running                0              37m
rook-ceph-operator-7b48fdc47-qr8m5                                1/1     Running                0              167m
rook-ceph-osd-0-d99c68b4f-dg4l7                                   2/2     Running                0              34m
rook-ceph-osd-prepare-6eb5add5463b09dc9ee447eb1a6ab358-7sfl6      0/1     Completed              0              36m
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-8964675g44cd   1/2     Running                8 (2m1s ago)   34m



$ oc get storageclusters.ocs.openshift.io 
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   48m   Progressing              2023-02-23T11:40:32Z   4.13.0

Status:
  Conditions:
    Last Heartbeat Time:   2023-02-23T12:30:25Z
    Last Transition Time:  2023-02-23T11:40:34Z
    Message:               Error while reconciling: some StorageClasses were skipped while waiting for pre-requisites to be met: [ocs-storagecluster-ceph-rbd]
    Reason:                ReconcileFailed
    Status:                False
    Type:                  ReconcileComplete

$ oc describe pod rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-hmpj6-c4wf4lq
Events:
  Type     Reason          Age                   From               Message
  ----     ------          ----                  ----               -------
  Normal   Scheduled       37m                   default-scheduler  Successfully assigned openshift-storage/rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-hmpj6-c4wf4lq to oviner59-vmware-i-25lqc-worker-hmpj6
  Normal   AddedInterface  37m                   multus             Add eth0 [10.129.2.30/23] from ovn-kubernetes
  Normal   Pulled          37m                   kubelet            Container image "quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already present on machine
  Normal   Created         37m                   kubelet            Created container chown-container-data-dir
  Normal   Started         37m                   kubelet            Started container chown-container-data-dir
  Warning  Failed          36m                   kubelet            Error: container create failed: time="2023-02-23T11:52:43Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Warning  Failed          36m                   kubelet            Error: container create failed: time="2023-02-23T11:52:44Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Warning  Failed          36m                   kubelet            Error: container create failed: time="2023-02-23T11:52:56Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Warning  Failed          36m                   kubelet            Error: container create failed: time="2023-02-23T11:53:10Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Warning  Failed          36m                   kubelet            Error: container create failed: time="2023-02-23T11:53:23Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Warning  Failed          36m                   kubelet            Error: container create failed: time="2023-02-23T11:53:37Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Warning  Failed          35m                   kubelet            Error: container create failed: time="2023-02-23T11:53:48Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Warning  Failed          35m                   kubelet            Error: container create failed: time="2023-02-23T11:54:02Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Warning  Failed          35m                   kubelet            Error: container create failed: time="2023-02-23T11:54:17Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Warning  Failed          35m (x2 over 35m)     kubelet            (combined from similar events): Error: container create failed: time="2023-02-23T11:54:42Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Normal   Pulled          116s (x161 over 36m)  kubelet            Container image "quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already present on machine

Comment 7 Santosh Pillai 2023-02-23 13:04:18 UTC
know ceph issue. Latest plan is to have this fix in ceph 6.0. Related BZ https://bugzilla.redhat.com/show_bug.cgi?id=2170925

Comment 8 narayanspg 2023-03-01 05:40:18 UTC
We are also observing this issue on Power. after creating storagesystem with multus enabled.

[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc get clusterversion
NAME      VERSION                                      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-ppc64le-2023-02-17-084453   True        False         2d      Cluster version is 4.13.0-0.nightly-ppc64le-2023-02-17-084453


[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc describe csv odf-operator.v4.13.0 -n openshift-storage | grep full
Labels:       full_version=4.13.0-92
      

[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc get pods
NAME                                                              READY   STATUS                 RESTARTS          AGE
csi-addons-controller-manager-65d8d5494c-6xqm2                    2/2     Running                0                 74m
csi-cephfsplugin-9bscw                                            2/2     Running                0                 17h
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-5k2vh      1/1     Running                0                 17h
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-rcnhw      1/1     Running                0                 17h
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-whk56      1/1     Running                0                 17h
csi-cephfsplugin-provisioner-796b5c797b-bjxnf                     5/5     Running                0                 17h
csi-cephfsplugin-provisioner-796b5c797b-srb7m                     5/5     Running                0                 17h
csi-cephfsplugin-r4kgl                                            2/2     Running                0                 17h
csi-cephfsplugin-z58wm                                            2/2     Running                0                 17h
csi-rbdplugin-2vn7s                                               3/3     Running                0                 74m
csi-rbdplugin-7glk6                                               3/3     Running                0                 74m
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-gw8g4         1/1     Running                0                 17h
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-jvcf7         1/1     Running                0                 17h
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-qlnr6         1/1     Running                0                 17h
csi-rbdplugin-provisioner-76868b57b-cd2kq                         6/6     Running                0                 74m
csi-rbdplugin-provisioner-76868b57b-qtqpv                         6/6     Running                0                 74m
csi-rbdplugin-rjpzw                                               3/3     Running                0                 74m
noobaa-operator-65fd7fd66b-csbbn                                  1/1     Running                0                 75m
ocs-metrics-exporter-5d5b75d775-qd6z2                             1/1     Running                0                 75m
ocs-operator-fb99f4b-mrlm5                                        1/1     Running                0                 75m
odf-console-df4db7d66-m2r9f                                       1/1     Running                0                 76m
odf-operator-controller-manager-559d5c8958-hqdrl                  2/2     Running                0                 76m
rook-ceph-crashcollector-390279bcc8f75bdec1ffce3b8152fb1b-6fdt9   1/1     Running                0                 75m
rook-ceph-crashcollector-3d442b29c4d43fa6c6654a521ab8e866-rmwnw   1/1     Running                0                 74m
rook-ceph-crashcollector-d67f4488231c2d93d9117a394e78de57-k6rrm   1/1     Running                0                 75m
rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cnkh7z   0/1     CreateContainerError   0                 17h
rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-7547d5vn469   0/1     CreateContainerError   0                 7h31m
rook-ceph-exporter-3d442b29c4d43fa6c6654a521ab8e866-8667779ttbt   0/1     CreateContainerError   0                 7h31m
rook-ceph-exporter-3d442b29c4d43fa6c6654a521ab8e866-b9ff76klx4n   0/1     CreateContainerError   0                 7h28m
rook-ceph-exporter-d67f4488231c2d93d9117a394e78de57-567c46zczpq   0/1     CreateContainerError   0                 7h31m
rook-ceph-exporter-d67f4488231c2d93d9117a394e78de57-887f9b6bkqj   0/1     CreateContainerError   0                 17h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5495c847lj6s5   2/2     Running                0                 17h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-794bbf96dgk85   2/2     Running                0                 17h
rook-ceph-mgr-a-5bccc8cff8-s6w9q                                  3/3     Running                0                 17h
rook-ceph-mon-a-744bbfb85d-95np9                                  2/2     Running                0                 17h
rook-ceph-mon-b-5ddfd8fbb7-55pb9                                  2/2     Running                0                 17h
rook-ceph-mon-c-fc6c5789-lkqnh                                    2/2     Running                0                 17h
rook-ceph-operator-7f5bd8884c-nwxgw                               1/1     Running                0                 75m
rook-ceph-osd-0-69b97cb99c-jqmjn                                  2/2     Running                0                 7h30m
rook-ceph-osd-1-5cf854cbf8-ffzlm                                  2/2     Running                0                 7h29m
rook-ceph-osd-2-7cb59dd54-f25fq                                   2/2     Running                0                 7h28m
rook-ceph-osd-prepare-10eb7a6b0fd146a33ba8e36ba2f9e992-t62hg      0/1     Completed              0                 17h
rook-ceph-osd-prepare-607f980ddb507c9429f3970fb79f9e79-2gwth      0/1     Completed              0                 17h
rook-ceph-osd-prepare-d7a1a3dce4f33f9d6dfd00a4e026bc19-6rk2g      0/1     Completed              0                 17h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-69f6db5lr8xr   1/2     Running                228 (2m56s ago)   17h
[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]#

[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc get storageclusters.ocs.openshift.io
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   17h   Progressing              2023-02-28T12:30:14Z   4.13.0


Status:
  Conditions:
    Last Heartbeat Time:   2023-03-01T05:36:02Z
    Last Transition Time:  2023-02-28T12:30:15Z
    Message:               Error while reconciling: some StorageClasses were skipped while waiting for pre-requisites to be met: [ocs-storagecluster-ceph-rbd]
    Reason:                ReconcileFailed
    Status:                False
    Type:                  ReconcileComplete
    Last Heartbeat Time:   2023-02-28T12:30:15Z
    Last Transition Time:  2023-02-28T12:30:15Z
    Message:               Initializing StorageCluster
    Reason:                Init


[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc describe pod rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cnkh7z
Name:             rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cnkh7z
Namespace:        openshift-storage
Priority:         0
Service Account:  default
Node:             syd05-worker-2.nara1-cicd-odf-1c53.redhat.com/192.168.0.164
Start Time:       Tue, 28 Feb 2023 07:32:48 -0500
Labels:           app=rook-ceph-exporter
                  ceph-version=17.2.5-67
                  kubernetes.io/hostname=syd05-worker-2.nara1-cicd-odf-1c53.redhat.com
                  node_name=syd05-worker-2.nara1-cicd-odf-1c53.redhat.com
                  pod-template-hash=6fc75cf4ff
                  rook-version=v4.13.0-0.4abaa33873c8984c8df04d06debc120eb61919c9
Annotations:      k8s.v1.cni.cncf.io/network-status:
                    [{
                        "name": "openshift-sdn",
                        "interface": "eth0",
                        "ips": [
                            "10.128.2.238"
                        ],
                        "default": true,
                        "dns": {}
                    }]
                  k8s.v1.cni.cncf.io/networks-status:
                    [{
                        "name": "openshift-sdn",
                        "interface": "eth0",
                        "ips": [
                            "10.128.2.238"
                        ],
                        "default": true,
                        "dns": {}
                    }]
                  openshift.io/scc: rook-ceph
                  prometheus.io/port: 9926
                  prometheus.io/scrape: true
Status:           Pending
IP:               10.128.2.238
IPs:
  IP:           10.128.2.238
Controlled By:  ReplicaSet/rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cf4ff
Init Containers:
  chown-container-data-dir:
    Container ID:  cri-o://8215f6e89ce41c13490c744b9dce3893c8d53583db57ce5d1341a1b19b0067fd
    Image:         quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74
    Image ID:      quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74
    Port:          <none>
    Host Port:     <none>
    Command:
      chown
    Args:
      --verbose
      --recursive
      ceph:ceph
      /var/log/ceph
      /var/lib/ceph/crash
      /run/ceph
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 28 Feb 2023 07:32:52 -0500
      Finished:     Tue, 28 Feb 2023 07:32:52 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/ceph from rook-config-override (ro)
      /run/ceph from ceph-daemons-sock-dir (rw)
      /var/lib/ceph/crash from rook-ceph-crash (rw)
      /var/lib/rook/openshift-storage from ceph-conf-dir (rw)
      /var/log/ceph from rook-ceph-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfr5m (ro)
Containers:
  ceph-exporter:
    Container ID:
    Image:         quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      ceph-exporter
    Args:
      --conf
      /var/lib/rook/openshift-storage/openshift-storage.config
      --sock-dir
      /run/ceph
      --port
      9926
      --prio-limit
      5
      --stats-period
      5
    State:          Waiting
      Reason:       CreateContainerError
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/ceph from rook-config-override (ro)
      /run/ceph from ceph-daemons-sock-dir (rw)
      /var/lib/ceph/crash from rook-ceph-crash (rw)
      /var/lib/rook/openshift-storage from ceph-conf-dir (rw)
      /var/log/ceph from rook-ceph-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfr5m (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  rook-config-override:
    Type:               Projected (a volume that contains injected data from multiple sources)
    ConfigMapName:      rook-config-override
    ConfigMapOptional:  <nil>
  ceph-daemons-sock-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/exporter
    HostPathType:  DirectoryOrCreate
  rook-ceph-log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/openshift-storage/log
    HostPathType:
  rook-ceph-crash:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/openshift-storage/crash
    HostPathType:
  ceph-conf-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/openshift-storage
    HostPathType:  Directory
  kube-api-access-gfr5m:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/hostname=syd05-worker-2.nara1-cicd-odf-1c53.redhat.com
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 5s
                             node.ocs.openshift.io/storage=true:NoSchedule
Events:
  Type     Reason  Age                     From     Message
  ----     ------  ----                    ----     -------
  Warning  Failed  139m (x3807 over 17h)   kubelet  (combined from similar events): Error: container create failed: time="2023-03-01T03:17:51Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
  Normal   Pulled  4m39s (x4398 over 17h)  kubelet  Container image "quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already present on machine
[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]#

Comment 9 Santosh Pillai 2023-03-01 06:04:21 UTC
(In reply to narayanspg from comment #8)
> We are also observing this issue on Power. after creating storagesystem with
> multus enabled.
> 
>
> Events:
>   Type     Reason  Age                     From     Message
>   ----     ------  ----                    ----     -------
>   Warning  Failed  139m (x3807 over 17h)   kubelet  (combined from similar
> events): Error: container create failed: time="2023-03-01T03:17:51Z"
> level=error msg="runc create failed: unable to start container process:
> exec: \"ceph-exporter\": executable file not found in $PATH"
>   Normal   Pulled  4m39s (x4398 over 17h)  kubelet  Container image
> "quay.io/rhceph-dev/rhceph@sha256:
> c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already
> present on machine
> [root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]#


We are planning to disable ceph exporter in rook. This is being tracked here -https://bugzilla.redhat.com/show_bug.cgi?id=2173934

Comment 17 errata-xmlrpc 2023-06-21 15:24:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742

Comment 18 Red Hat Bugzilla 2023-12-08 04:32:27 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.