Bug 2172521
| Summary: | No OSD pods are created for 4.13 LSO deployment | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Petr Balogh <pbalogh> |
| Component: | rook | Assignee: | Santosh Pillai <sapillai> |
| Status: | CLOSED ERRATA | QA Contact: | Petr Balogh <pbalogh> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | unspecified | CC: | muagarwa, nberry, ngowda, ocs-bugs, odf-bz-bot, oviner, sapillai, sbalusu, tnielsen |
| Target Milestone: | --- | Keywords: | Automation, Regression |
| Target Release: | ODF 4.13.0 | Flags: | sheggodu:
needinfo?
(sapillai) sheggodu: needinfo? (sapillai) |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Rook has a specific use case where devices are copied in /mnt
If the basename (in /mnt) is different from the original device name, then the current logic can't match it.
Consequence: OSD prepare pod fails to find the device.
Fix: The idea is to append the device to the lsblk command and return the result.
Result: OSD prepare pod is able to prepare osd on the device.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-06-21 15:24:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Petr Balogh
2023-02-22 12:37:50 UTC
Also observing this issue on IBM Z and also with the version v4.13.0-87.stable # oc get po -n openshift-storage NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-868bd8dd99-m74gb 2/2 Running 0 73m csi-cephfsplugin-jv7cc 2/2 Running 0 72m csi-cephfsplugin-pflc2 2/2 Running 0 72m csi-cephfsplugin-provisioner-77c4b58f4d-ls6cl 5/5 Running 0 72m csi-cephfsplugin-provisioner-77c4b58f4d-lx9dk 5/5 Running 0 72m csi-cephfsplugin-zzqhg 2/2 Running 0 72m csi-rbdplugin-bs7hh 3/3 Running 0 72m csi-rbdplugin-provisioner-668f9f7cc4-7dc57 6/6 Running 0 72m csi-rbdplugin-provisioner-668f9f7cc4-lt6mb 6/6 Running 0 72m csi-rbdplugin-wbqlq 3/3 Running 0 72m csi-rbdplugin-xbbtm 3/3 Running 0 72m noobaa-operator-7d484677fc-wn4k2 1/1 Running 0 73m ocs-metrics-exporter-7c5985796-ks9bm 1/1 Running 0 73m ocs-operator-b4765698c-7r6xg 1/1 Running 0 73m odf-console-5b4cb5c44b-256d2 1/1 Running 0 73m odf-operator-controller-manager-794ddf57b4-jplhp 2/2 Running 0 73m rook-ceph-crashcollector-worker-0.ocsa3e25001.lnxero1.boe-6vmpz 1/1 Running 0 70m rook-ceph-crashcollector-worker-1.ocsa3e25001.lnxero1.boe-5ptpn 1/1 Running 0 70m rook-ceph-crashcollector-worker-2.ocsa3e25001.lnxero1.boe-brlqp 1/1 Running 0 70m rook-ceph-exporter-worker-0.ocsa3e25001.lnxero1.boe-58c484gsvng 0/1 CreateContainerError 0 70m rook-ceph-exporter-worker-0.ocsa3e25001.lnxero1.boe-87fc4dbgx7x 0/1 CreateContainerError 0 70m rook-ceph-exporter-worker-1.ocsa3e25001.lnxero1.boe-859549bl94h 0/1 CreateContainerError 0 70m rook-ceph-exporter-worker-2.ocsa3e25001.lnxero1.boe-dc477d7d9jb 0/1 CreateContainerError 0 70m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-789776769x7ql 2/2 Running 0 70m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-57bcf5d8zmhmx 2/2 Running 0 70m rook-ceph-mgr-a-6c7fb66b77-vxxmd 2/2 Running 0 70m rook-ceph-mon-a-84c5b9678-tpjmx 2/2 Running 0 72m rook-ceph-mon-b-669669f847-r98m4 2/2 Running 0 71m rook-ceph-mon-c-7f9469b45-czdcv 2/2 Running 0 71m rook-ceph-operator-5dcf9494cd-9xs2d 1/1 Running 0 73m rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-7d8b8f9fsj75 1/2 Running 16 (4m50s ago) 70m The operator log [1] shows the below failure from ceph-volume when attempting to create the OSD.
This seems similar to a ceph-volume issue which has been fixed upstream [2].
The Ceph version in this repro is:
ceph version 17.2.5-67.el9cp (0462778d88af57caea127c35d7b78e21ff0aef24) quincy (stable)
This is coming from the downstream image:
quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74
Guillaume, does this look like the same or related issue? If so, sounds like we just need to pick that up downstream.
2023-02-21T11:32:32.759232665Z 2023-02-21 11:32:32.759187 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-localblock-0-data-0bhwgg. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to initialize devices on PVC: failed to run ceph-volume. stderr: Bad argument "/mnt/ocs-deviceset-localblock-0-data-0bhwgg", expected an absolute path in /dev/ or /sys or a unit name: Invalid argument
2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph-authtool --gen-print-key
2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph-authtool --gen-print-key
2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new c5714959-d016-4467-aa24-c84135f1448f
2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph-authtool --gen-print-key
2023-02-21T11:32:32.759232665Z --> Was unable to complete a new OSD, will rollback changes
2023-02-21T11:32:32.759232665Z Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
2023-02-21T11:32:32.759232665Z stderr: purged osd.0
2023-02-21T11:32:32.759232665Z Traceback (most recent call last):
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 91, in safe_prepare
2023-02-21T11:32:32.759232665Z self.prepare()
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
2023-02-21T11:32:32.759232665Z return func(*a, **kw)
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 127, in prepare
2023-02-21T11:32:32.759232665Z prepare_bluestore(
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 51, in prepare_bluestore
2023-02-21T11:32:32.759232665Z block = prepare_dmcrypt(key, block, 'block', fsid)
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 23, in prepare_dmcrypt
2023-02-21T11:32:32.759232665Z kname = disk.lsblk(device)['KNAME']
2023-02-21T11:32:32.759232665Z KeyError: 'KNAME'
2023-02-21T11:32:32.759232665Z
2023-02-21T11:32:32.759232665Z During handling of the above exception, another exception occurred:
2023-02-21T11:32:32.759232665Z
2023-02-21T11:32:32.759232665Z Traceback (most recent call last):
2023-02-21T11:32:32.759232665Z File "/usr/sbin/ceph-volume", line 33, in <module>
2023-02-21T11:32:32.759232665Z sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')())
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 41, in __init__
2023-02-21T11:32:32.759232665Z self.main(self.argv)
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2023-02-21T11:32:32.759232665Z return f(*a, **kw)
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 153, in main
2023-02-21T11:32:32.759232665Z terminal.dispatch(self.mapper, subcommand_args)
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2023-02-21T11:32:32.759232665Z instance.main()
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/main.py", line 32, in main
2023-02-21T11:32:32.759232665Z terminal.dispatch(self.mapper, self.argv)
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2023-02-21T11:32:32.759232665Z instance.main()
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 169, in main
2023-02-21T11:32:32.759232665Z self.safe_prepare(self.args)
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 95, in safe_prepare
2023-02-21T11:32:32.759232665Z rollback_osd(self.args, self.osd_id)
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
2023-02-21T11:32:32.759232665Z Zap(['--destroy', '--osd-id', osd_id]).main()
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 404, in main
2023-02-21T11:32:32.759232665Z self.zap_osd()
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
2023-02-21T11:32:32.759232665Z return func(*a, **kw)
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 301, in zap_osd
2023-02-21T11:32:32.759232665Z devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
2023-02-21T11:32:32.759232665Z File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 87, in find_associated_devices
2023-02-21T11:32:32.759232665Z raise RuntimeError('Unable to find any LV for zapping OSD: '
[1] http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-107vue1cslv33-a/j-107vue1cslv33-a_20230221T105029/logs/failed_testcase_ocs_logs_1676978595/test_deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-0bfc087e607cb167604734bc029edbccaf247d749e80e7db39901916cb85226a/namespaces/openshift-storage/pods/rook-ceph-operator-7db78f9fb6-2wxbh/rook-ceph-operator/rook-ceph-operator/logs/current.log
[2] https://tracker.ceph.com/issues/58137
Same issue on Vmware IPI [ODF4.13]
$ oc describe csv odf-operator.v4.13.0 -n openshift-storage | grep full
Labels: full_version=4.13.0-88
Server Version: 4.13.0-0.nightly-2023-02-21-014524
$ oc get pods
NAME READY STATUS RESTARTS AGE
csi-addons-controller-manager-7bfd5fb7cf-lk5f2 2/2 Running 0 168m
csi-cephfsplugin-2hhbb 2/2 Running 0 48m
csi-cephfsplugin-djqvv 2/2 Running 0 48m
csi-cephfsplugin-g4h67 2/2 Running 0 48m
csi-cephfsplugin-provisioner-57b59c7588-pr2c8 5/5 Running 0 48m
csi-cephfsplugin-provisioner-57b59c7588-rgpth 5/5 Running 0 48m
csi-rbdplugin-7zlv4 3/3 Running 0 48m
csi-rbdplugin-d8bsm 3/3 Running 0 48m
csi-rbdplugin-p7tnh 3/3 Running 0 48m
csi-rbdplugin-provisioner-79744c94b9-fpzgz 6/6 Running 0 48m
csi-rbdplugin-provisioner-79744c94b9-s8wgt 6/6 Running 0 48m
noobaa-operator-7f4f4756c-9rqsv 1/1 Running 0 168m
ocs-metrics-exporter-64f44dbc4b-wlndn 1/1 Running 0 167m
ocs-operator-6bc4c886bc-jhfwx 1/1 Running 0 167m
odf-console-55f557999f-dlzql 1/1 Running 0 168m
odf-operator-controller-manager-746575b65-hwnjm 2/2 Running 0 168m
rook-ceph-crashcollector-oviner59-vmware-i-25lqc-worker-hmk99rt 1/1 Running 0 36m
rook-ceph-crashcollector-oviner59-vmware-i-25lqc-worker-nrh6xcz 1/1 Running 0 36m
rook-ceph-crashcollector-oviner59-vmware-i-25lqc-worker-vjwjxz5 1/1 Running 0 36m
rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-hmpj6-c4wf4lq 0/1 CreateContainerError 0 36m
rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-nrqx9-68rrvz5 0/1 CreateContainerError 0 36m
rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-vjlkz-64z655z 0/1 CreateContainerError 0 36m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-75b6cfb8sf6qg 2/2 Running 0 34m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-74575cd777mp2 2/2 Running 0 34m
rook-ceph-mgr-a-6cf948bc97-x4bnd 2/2 Running 0 36m
rook-ceph-mon-a-f8648c4f9-6rjqb 2/2 Running 0 38m
rook-ceph-mon-b-56bb9f5957-wq22d 2/2 Running 0 38m
rook-ceph-mon-c-6888bfbd99-g92dw 2/2 Running 0 37m
rook-ceph-operator-7b48fdc47-qr8m5 1/1 Running 0 167m
rook-ceph-osd-0-d99c68b4f-dg4l7 2/2 Running 0 34m
rook-ceph-osd-prepare-6eb5add5463b09dc9ee447eb1a6ab358-7sfl6 0/1 Completed 0 36m
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-8964675g44cd 1/2 Running 8 (2m1s ago) 34m
$ oc get storageclusters.ocs.openshift.io
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 48m Progressing 2023-02-23T11:40:32Z 4.13.0
Status:
Conditions:
Last Heartbeat Time: 2023-02-23T12:30:25Z
Last Transition Time: 2023-02-23T11:40:34Z
Message: Error while reconciling: some StorageClasses were skipped while waiting for pre-requisites to be met: [ocs-storagecluster-ceph-rbd]
Reason: ReconcileFailed
Status: False
Type: ReconcileComplete
$ oc describe pod rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-hmpj6-c4wf4lq
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37m default-scheduler Successfully assigned openshift-storage/rook-ceph-exporter-oviner59-vmware-i-25lqc-worker-hmpj6-c4wf4lq to oviner59-vmware-i-25lqc-worker-hmpj6
Normal AddedInterface 37m multus Add eth0 [10.129.2.30/23] from ovn-kubernetes
Normal Pulled 37m kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already present on machine
Normal Created 37m kubelet Created container chown-container-data-dir
Normal Started 37m kubelet Started container chown-container-data-dir
Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:52:43Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:52:44Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:52:56Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:53:10Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:53:23Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Warning Failed 36m kubelet Error: container create failed: time="2023-02-23T11:53:37Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Warning Failed 35m kubelet Error: container create failed: time="2023-02-23T11:53:48Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Warning Failed 35m kubelet Error: container create failed: time="2023-02-23T11:54:02Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Warning Failed 35m kubelet Error: container create failed: time="2023-02-23T11:54:17Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Warning Failed 35m (x2 over 35m) kubelet (combined from similar events): Error: container create failed: time="2023-02-23T11:54:42Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Normal Pulled 116s (x161 over 36m) kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already present on machine
know ceph issue. Latest plan is to have this fix in ceph 6.0. Related BZ https://bugzilla.redhat.com/show_bug.cgi?id=2170925 We are also observing this issue on Power. after creating storagesystem with multus enabled.
[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.0-0.nightly-ppc64le-2023-02-17-084453 True False 2d Cluster version is 4.13.0-0.nightly-ppc64le-2023-02-17-084453
[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc describe csv odf-operator.v4.13.0 -n openshift-storage | grep full
Labels: full_version=4.13.0-92
[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc get pods
NAME READY STATUS RESTARTS AGE
csi-addons-controller-manager-65d8d5494c-6xqm2 2/2 Running 0 74m
csi-cephfsplugin-9bscw 2/2 Running 0 17h
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-5k2vh 1/1 Running 0 17h
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-rcnhw 1/1 Running 0 17h
csi-cephfsplugin-holder-ocs-storagecluster-cephcluster-whk56 1/1 Running 0 17h
csi-cephfsplugin-provisioner-796b5c797b-bjxnf 5/5 Running 0 17h
csi-cephfsplugin-provisioner-796b5c797b-srb7m 5/5 Running 0 17h
csi-cephfsplugin-r4kgl 2/2 Running 0 17h
csi-cephfsplugin-z58wm 2/2 Running 0 17h
csi-rbdplugin-2vn7s 3/3 Running 0 74m
csi-rbdplugin-7glk6 3/3 Running 0 74m
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-gw8g4 1/1 Running 0 17h
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-jvcf7 1/1 Running 0 17h
csi-rbdplugin-holder-ocs-storagecluster-cephcluster-qlnr6 1/1 Running 0 17h
csi-rbdplugin-provisioner-76868b57b-cd2kq 6/6 Running 0 74m
csi-rbdplugin-provisioner-76868b57b-qtqpv 6/6 Running 0 74m
csi-rbdplugin-rjpzw 3/3 Running 0 74m
noobaa-operator-65fd7fd66b-csbbn 1/1 Running 0 75m
ocs-metrics-exporter-5d5b75d775-qd6z2 1/1 Running 0 75m
ocs-operator-fb99f4b-mrlm5 1/1 Running 0 75m
odf-console-df4db7d66-m2r9f 1/1 Running 0 76m
odf-operator-controller-manager-559d5c8958-hqdrl 2/2 Running 0 76m
rook-ceph-crashcollector-390279bcc8f75bdec1ffce3b8152fb1b-6fdt9 1/1 Running 0 75m
rook-ceph-crashcollector-3d442b29c4d43fa6c6654a521ab8e866-rmwnw 1/1 Running 0 74m
rook-ceph-crashcollector-d67f4488231c2d93d9117a394e78de57-k6rrm 1/1 Running 0 75m
rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cnkh7z 0/1 CreateContainerError 0 17h
rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-7547d5vn469 0/1 CreateContainerError 0 7h31m
rook-ceph-exporter-3d442b29c4d43fa6c6654a521ab8e866-8667779ttbt 0/1 CreateContainerError 0 7h31m
rook-ceph-exporter-3d442b29c4d43fa6c6654a521ab8e866-b9ff76klx4n 0/1 CreateContainerError 0 7h28m
rook-ceph-exporter-d67f4488231c2d93d9117a394e78de57-567c46zczpq 0/1 CreateContainerError 0 7h31m
rook-ceph-exporter-d67f4488231c2d93d9117a394e78de57-887f9b6bkqj 0/1 CreateContainerError 0 17h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5495c847lj6s5 2/2 Running 0 17h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-794bbf96dgk85 2/2 Running 0 17h
rook-ceph-mgr-a-5bccc8cff8-s6w9q 3/3 Running 0 17h
rook-ceph-mon-a-744bbfb85d-95np9 2/2 Running 0 17h
rook-ceph-mon-b-5ddfd8fbb7-55pb9 2/2 Running 0 17h
rook-ceph-mon-c-fc6c5789-lkqnh 2/2 Running 0 17h
rook-ceph-operator-7f5bd8884c-nwxgw 1/1 Running 0 75m
rook-ceph-osd-0-69b97cb99c-jqmjn 2/2 Running 0 7h30m
rook-ceph-osd-1-5cf854cbf8-ffzlm 2/2 Running 0 7h29m
rook-ceph-osd-2-7cb59dd54-f25fq 2/2 Running 0 7h28m
rook-ceph-osd-prepare-10eb7a6b0fd146a33ba8e36ba2f9e992-t62hg 0/1 Completed 0 17h
rook-ceph-osd-prepare-607f980ddb507c9429f3970fb79f9e79-2gwth 0/1 Completed 0 17h
rook-ceph-osd-prepare-d7a1a3dce4f33f9d6dfd00a4e026bc19-6rk2g 0/1 Completed 0 17h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-69f6db5lr8xr 1/2 Running 228 (2m56s ago) 17h
[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]#
[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc get storageclusters.ocs.openshift.io
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 17h Progressing 2023-02-28T12:30:14Z 4.13.0
Status:
Conditions:
Last Heartbeat Time: 2023-03-01T05:36:02Z
Last Transition Time: 2023-02-28T12:30:15Z
Message: Error while reconciling: some StorageClasses were skipped while waiting for pre-requisites to be met: [ocs-storagecluster-ceph-rbd]
Reason: ReconcileFailed
Status: False
Type: ReconcileComplete
Last Heartbeat Time: 2023-02-28T12:30:15Z
Last Transition Time: 2023-02-28T12:30:15Z
Message: Initializing StorageCluster
Reason: Init
[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# oc describe pod rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cnkh7z
Name: rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cnkh7z
Namespace: openshift-storage
Priority: 0
Service Account: default
Node: syd05-worker-2.nara1-cicd-odf-1c53.redhat.com/192.168.0.164
Start Time: Tue, 28 Feb 2023 07:32:48 -0500
Labels: app=rook-ceph-exporter
ceph-version=17.2.5-67
kubernetes.io/hostname=syd05-worker-2.nara1-cicd-odf-1c53.redhat.com
node_name=syd05-worker-2.nara1-cicd-odf-1c53.redhat.com
pod-template-hash=6fc75cf4ff
rook-version=v4.13.0-0.4abaa33873c8984c8df04d06debc120eb61919c9
Annotations: k8s.v1.cni.cncf.io/network-status:
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.128.2.238"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.128.2.238"
],
"default": true,
"dns": {}
}]
openshift.io/scc: rook-ceph
prometheus.io/port: 9926
prometheus.io/scrape: true
Status: Pending
IP: 10.128.2.238
IPs:
IP: 10.128.2.238
Controlled By: ReplicaSet/rook-ceph-exporter-390279bcc8f75bdec1ffce3b8152fb1b-6fc75cf4ff
Init Containers:
chown-container-data-dir:
Container ID: cri-o://8215f6e89ce41c13490c744b9dce3893c8d53583db57ce5d1341a1b19b0067fd
Image: quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74
Image ID: quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74
Port: <none>
Host Port: <none>
Command:
chown
Args:
--verbose
--recursive
ceph:ceph
/var/log/ceph
/var/lib/ceph/crash
/run/ceph
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 28 Feb 2023 07:32:52 -0500
Finished: Tue, 28 Feb 2023 07:32:52 -0500
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/etc/ceph from rook-config-override (ro)
/run/ceph from ceph-daemons-sock-dir (rw)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/rook/openshift-storage from ceph-conf-dir (rw)
/var/log/ceph from rook-ceph-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfr5m (ro)
Containers:
ceph-exporter:
Container ID:
Image: quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74
Image ID:
Port: <none>
Host Port: <none>
Command:
ceph-exporter
Args:
--conf
/var/lib/rook/openshift-storage/openshift-storage.config
--sock-dir
/run/ceph
--port
9926
--prio-limit
5
--stats-period
5
State: Waiting
Reason: CreateContainerError
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/ceph from rook-config-override (ro)
/run/ceph from ceph-daemons-sock-dir (rw)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/rook/openshift-storage from ceph-conf-dir (rw)
/var/log/ceph from rook-ceph-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfr5m (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
rook-config-override:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: rook-config-override
ConfigMapOptional: <nil>
ceph-daemons-sock-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/exporter
HostPathType: DirectoryOrCreate
rook-ceph-log:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/openshift-storage/log
HostPathType:
rook-ceph-crash:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/openshift-storage/crash
HostPathType:
ceph-conf-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/openshift-storage
HostPathType: Directory
kube-api-access-gfr5m:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: BestEffort
Node-Selectors: kubernetes.io/hostname=syd05-worker-2.nara1-cicd-odf-1c53.redhat.com
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 5s
node.ocs.openshift.io/storage=true:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Failed 139m (x3807 over 17h) kubelet (combined from similar events): Error: container create failed: time="2023-03-01T03:17:51Z" level=error msg="runc create failed: unable to start container process: exec: \"ceph-exporter\": executable file not found in $PATH"
Normal Pulled 4m39s (x4398 over 17h) kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already present on machine
[root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]#
(In reply to narayanspg from comment #8) > We are also observing this issue on Power. after creating storagesystem with > multus enabled. > > > Events: > Type Reason Age From Message > ---- ------ ---- ---- ------- > Warning Failed 139m (x3807 over 17h) kubelet (combined from similar > events): Error: container create failed: time="2023-03-01T03:17:51Z" > level=error msg="runc create failed: unable to start container process: > exec: \"ceph-exporter\": executable file not found in $PATH" > Normal Pulled 4m39s (x4398 over 17h) kubelet Container image > "quay.io/rhceph-dev/rhceph@sha256: > c4cceafa24f984bfa8aaa8937df0c545c21f37c35cc4661db8ee4f010bddfb74" already > present on machine > [root@nara1-cicd-odf-1c53-syd05-bastion-0 ~]# We are planning to disable ceph exporter in rook. This is being tracked here -https://bugzilla.redhat.com/show_bug.cgi?id=2173934 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742 |