I think this is the issue, ``` op-osd: OSD orchestration status for PVC ocs-deviceset-sc-odf-2-data-27lr657 is "failed" 2023-01-24T15:44:23.415980210Z 2023-01-24 15:44:23.415968 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-sc-odf-2-data-27lr657. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to generate osd keyring: failed to get or create auth key for client.bootstrap-osd: failed get-or-create-key client.bootstrap-osd: exit status 1} 2023-01-24T15:44:23.433564120Z 2023-01-24 15:44:23.433532 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: 15 failures encountered while running osds on nodes in namespace "openshift-storage". ``` did you tried this solution https://access.redhat.com/solutions/3524771 seems same scenario
@mduasope I don't see any osd-prepare pod or logs in must gather I'm looking, could you attach them here. And, in rook operator logs only errors I see is ```2023-01-24T15:44:23.415980210Z 2023-01-24 15:44:23.415968 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-sc-odf-2-data-27lr657. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to generate osd keyring: failed to get or create auth key for client.bootstrap-osd: failed get-or-create-key client.bootstrap-osd: exit status 1}``` which says that osd is trying to come up but the keys are missing.
Were you able to clean the device and get this working?
Hi. A few questions to better understand what's happening. - A node got failed so it was replaced by a new node. Is that the only event that led to the current situation with the cluster? - The logs suggest that osd is using the same pv `local-pv-2a0b2a3` even on the newly added node. Is that correct?. IIUC, when a new node is added, then LSO operator would create a new PV using that disks on the new node. So the osd on this new node should use a new pv. - When the old node got failed, were the failing OSDs removed from that node using the OSD removal job?
Re-assigning to Santosh as he looking now. Thanks
Removing needsinfo since Santosh is looking
This is a known ODF issue and currently being looked at in https://bugzilla.redhat.com/show_bug.cgi?id=2102304
Hi Alicia, A few questions: 1. Is comment 17, the only issue the customer is facing right now? 2. Does issue with comment 17 not allow user to continue further?
Hi. So customer added a new node and still don't see any new osds coming up. Does that summarize the current situation correctly? Looked at the must gather logs: All the osd prepare pods are in pending state due to following error: ``` '0/23 nodes are available: 10 node(s) didn''t match Pod''s node affinity/selector, 2 node(s) had taint {node-role.kubernetes.io/spectrum: }, that the pod didn''t tolerate, 3 node(s) had taint {node-role.kubernetes.io/infra: reserved}, that the pod didn''t tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate, 5 node(s) didn''t find available persistent volumes to bind.' ``` ``` pod/rook-ceph-osd-prepare-297231d095a8b2f3cce9df0c6c95f036-gzs4s 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-4750a47e6acc2ed202fcd5e904355c6f-s2c2w 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-902a8387150a5d7492e83406b88b7312-kp9gv 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-97ca5e82f872abfea03437c8783882e3-2f5w6 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-abb427bd904e400157f3f6ce14328332-lc8zx 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-b0e69ad309df460a3f979a39928800e7-9qhrb 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-c680a538497b3a87011db28b856e0b4e-h98r9 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-cd84b42e320034e2b5447e34441728ef-bn5jm 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-d8bd70f44817b51607869df9e6cc491c-qvtfv 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-daafab739afa1600c6d1d4e311e98c15-56hrc 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-e9c94c244fc6ef4b02a2d75fc74d47f0-7ffxh 0/1 Pending 0 18d <none> <none> <none> <none> pod/rook-ceph-osd-prepare-fe4e80997e70ed70447be3c38d9314c3-zlfbp 0/1 Pending 0 18d <none> <none> <none> <none> ```
Hello, I do not agree with the bug closure. This still needs to be fixed or documented.