Description of problem (please be detailed as possible and provide log snippests): ObjectNotFound('RADOS object not found (error calling conf_read_file) message is displayed in rook-ceph-operator log upon trying cephFS volume recovery Version of all relevant components (if applicable): OCP 4.17.0-0.nightly-2024-08-19-165854 ODF 4.17.0-84.stable provided by Red Hat Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? NA Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: NA Steps to Reproduce: Steps to Reproduce: 1. Install ODF 4.17.0-84 onn IBM cloud 2. Create a deployment pod, on my test setup i created logwriter-ceph pod 3.Add taint to the node ```oc adm taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute``` Wait for some time(if the application pod and rook operator are on the same node wait for bit logger) then check the networkFence cr status and make sure its state is fenced 4. As network fence is not created, check for rook-ceph-operator log, you can see following message ceph-cluster-controller: failed to handle node failure. failed to create network fence for node "jopinto-clu19-c4pcf-worker-0-jbcww".: failed to fence cephfs subvolumes: failed to get ceph status for check active mds: failed to get status. . Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)'): exit status 1 Actual results: Network fence is not created Expected results: Networkfence should be created Additional info: rook ceph operator log: 2024-08-28 05:35:52.772621 I | clusterdisruption-controller: osd "rook-ceph-osd-0" is down and a possible node drain is detected 2024-08-28 05:35:52.929902 I | ceph-cluster-controller: Found taint: Key=node.kubernetes.io/out-of-service, Value=nodeshutdown on node jopinto-clu19-c4pcf-worker-0-jbcww 2024-08-28 05:35:52.929931 I | ceph-cluster-controller: volumeInUse after split based on '^' [csi.vsphere.vmware.com 64c02db6-3edd-458d-bc79-36461858cb42] 2024-08-28 05:35:52.929938 I | ceph-cluster-controller: volumeInUse after split based on '^' [csi.vsphere.vmware.com b1954e96-d763-4344-993a-9f8022d0520c] 2024-08-28 05:35:52.929943 I | ceph-cluster-controller: volumeInUse after split based on '^' [openshift-storage.cephfs.csi.ceph.com 0001-0011-openshift-storage-0000000000000001-48286e7e-84e5-443f-a064-39a3f0609885] 2024-08-28 05:35:53.545803 I | ceph-cluster-controller: node "jopinto-clu19-c4pcf-worker-0-jbcww" require fencing, found cephfs subvolumes in use 2024-08-28 05:35:54.107446 I | ceph-spec: parsing mon endpoints: a=172.30.195.199:3300,b=172.30.15.18:3300,c=172.30.195.182:3300 2024-08-28 05:35:54.199578 I | ceph-block-pool-controller: skipping reconcile since operator is still initializing 2024-08-28 05:35:54.509763 I | ceph-spec: parsing mon endpoints: a=172.30.195.199:3300,b=172.30.15.18:3300,c=172.30.195.182:3300 2024-08-28 05:35:54.509856 I | ceph-object-store-user-controller: CephObjectStore "ocs-storagecluster-cephobjectstore" found 2024-08-28 05:35:54.509985 I | ceph-object-store-user-controller: CephObjectStore "ocs-storagecluster-cephobjectstore" found 2024-08-28 05:35:54.543839 E | ceph-object-store-user-controller: failed to reconcile CephObjectStoreUser "openshift-storage/noobaa-ceph-objectstore-user". failed to initialized rgw admin ops client api: failed to create or retrieve rgw admin ops user: failed to create object user "rgw-admin-ops-user". error code 1 for object store "ocs-storagecluster-cephobjectstore": skipping reconcile since operator is still initializing 2024-08-28 05:35:54.906487 I | ceph-spec: parsing mon endpoints: a=172.30.195.199:3300,b=172.30.15.18:3300,c=172.30.195.182:3300 2024-08-28 05:35:54.906522 I | ceph-cluster-controller: fencing cephfs subvolume "pvc-0ca18ad2-1123-4ff8-8e94-500b565c0dc4" on node "jopinto-clu19-c4pcf-worker-0-jbcww" 2024-08-28 05:35:55.006105 E | ceph-cluster-controller: failed to handle node failure. failed to create network fence for node "jopinto-clu19-c4pcf-worker-0-jbcww".: failed to fence cephfs subvolumes: failed to get ceph status for check active mds: failed to get status. . Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)'): exit status 1 2024-08-28 05:35:55.306810 I | ceph-spec: parsing mon endpoints: a=172.30.195.199:3300,b=172.30.15.18:3300,c=172.30.195.182:3300 2024-08-28 05:35:55.508108 E | ceph-csi: failed to delete CSI-operator Ceph Connection "". cephconnections.csi.ceph.io is forbidden: User "system:serviceaccount:openshift-storage:rook-ceph-system" cannot deletecollection resource "cephconnections" in API group "csi.ceph.io" in the namespace "openshift-storage" 2024-08-28 05:35:55.509410 E | ceph-csi: failed to delete CSI-operator client profile "". clientprofiles.csi.ceph.io is forbidden: User "system:serviceaccount:openshift-storage:rook-ceph-system" cannot deletecollection resource "clientprofiles" in API group "csi.ceph.io" in the namespace "openshift-storage" 2024-08-28 05:35:55.510457 E | ceph-csi: failed to delete CSI-operator Ceph Connection "". cephconnections.csi.ceph.io is forbidden: User "system:serviceaccount:openshift-storage:rook-ceph-system" cannot deletecollection resource "cephconnections" in API group "csi.ceph.io" in the namespace "openshift-storage" 2024-08-28 05:35:55.511470 E | ceph-csi: failed to delete CSI-operator client profile "". clientprofiles.csi.ceph.io is forbidden: User "system:serviceaccount:openshift-storage:rook-ceph-system" cannot deletecollection resource "clientprofiles" in API group "csi.ceph.io" in the namespace "openshift-storage" 2024-08-28 05:35:55.512458 E | ceph-csi: failed to delete CSI-operator driver config "". drivers.csi.ceph.io is forbidden: User "system:serviceaccount:openshift-storage:rook-ceph-system" cannot deletecollection resource "drivers" in API group "csi.ceph.io" in the namespace "openshift-storage" 2024-08-28 05:35:55.513542 E | ceph-csi: failed to delete CSI-operator operator config "". operatorconfigs.csi.ceph.io is forbidden: User "system:serviceaccount:openshift-storage:rook-ceph-system" cannot deletecollection resource "operatorconfigs" in API group "csi.ceph.io" in the namespace "openshift-storage" 2024-08-28 05:35:55.517559 I | ceph-csi: Kubernetes version is 1.30 2024-08-28 05:35:55.920387 I | ceph-csi: skipping csi version check, since unsupported versions are allowed or csi is disabled 2024-08-28 05:35:56.043670 I | ceph-csi: successfully started CSI Ceph RBD driver 2024-08-28 05:35:56.118578 I | ceph-spec: parsing mon endpoints: a=172.30.195.199:3300,b=172.30.15.18:3300,c=172.30.195.182:3300 2024-08-28 05:35:56.118667 I | ceph-fs-subvolumegroup-controller: creating ceph filesystem subvolume group ocs-storagecluster-cephfilesystem-csi in namespace openshift-storage 2024-08-28 05:35:56.118674 I | cephclient: creating cephfs "ocs-storagecluster-cephfilesystem" subvolume group "csi" 2024-08-28 05:35:56.165126 I | ceph-csi: successfully started CSI CephFS driver 2024-08-28 05:35:56.186958 I | ceph-csi: CSIDriver object updated for driver "openshift-storage.rbd.csi.ceph.com" 2024-08-28 05:35:56.200826 I | ceph-csi: CSIDriver object updated for driver "openshift-storage.cephfs.csi.ceph.com" 2024-08-28 05:35:56.200850 I | op-k8sutil: removing daemonset csi-nfsplugin if it exists 2024-08-28 05:35:56.204169 I | op-k8sutil: removing deployment csi-nfsplugin-provisioner if it exists 2024-08-28 05:35:56.302405 I | ceph-fs-subvolumegroup-controller: skipping reconcile since operator is still initializing 2024-08-28 05:35:56.310362 I | ceph-csi: successfully removed CSI NFS driver 2024-08-28 05:35:56.864420 I | ceph-spec: parsing mon endpoints: a=172.30.195.199:3300,b=172.30.15.18:3300,c=172.30.195.182:3300 2024-08-28 05:35:56.965705 I | ceph-block-pool-controller: skipping reconcile since operator is still initializing 2024-08-28 05:35:57.116948 I | ceph-spec: parsing mon endpoints: a=172.30.195.199:3300,b=172.30.15.18:3300,c=172.30.195.182:3300 2024-08-28 05:35:57.117085 I | ceph-object-store-user-controller: CephObjectStore "ocs-storagecluster-cephobjectstore" found 2024-08-28 05:35:57.117357 I | ceph-object-store-user-controller: CephObjectStore "ocs-storagecluster-cephobjectstore" found 2024-08-28 05:35:57.163901 E | ceph-object-store-user-controller: failed to reconcile CephObjectStoreUser "openshift-storage/ocs-storagecluster-cephobjectstoreuser". failed to initialized rgw admin ops client api: failed to create or retrieve rgw admin ops user: failed to create object user "rgw-admin-ops-user". error code 1 for object store "ocs-storagecluster-cephobjectstore": skipping reconcile since operator is still initializing
Please update the RDT flag/text appropriately.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:8676