Bug 2259668
| Summary: | Network fence with rbd_csi driver gets created upon cephfs volume recovery | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Joy John Pinto <jopinto> |
| Component: | rook | Assignee: | Subham Rai <srai> |
| Status: | CLOSED ERRATA | QA Contact: | Joy John Pinto <jopinto> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.15 | CC: | aaaggarw, ebenahar, mrajanna, muagarwa, odf-bz-bot, sapillai, sheggodu, srai, tnielsen |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.17.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.15.0-142 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-10-30 14:26:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2262070, 2265124 | ||
Hi @Joy Is this the same result that you shared over chat? As there you mentioned that the fencing was succeeded and pod got rescheduled on another node too. Can you also share the oc describe for network fence, it will have a result field. (venv) [jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io
NAME DRIVER CIDRS FENCESTATE AGE RESULT
compute-2 openshift-storage.rbd.csi.ceph.com ["100.64.0.7/32"] Fenced 77s Succeeded
(venv) [jopinto@jopinto ceph-csi]$ oc describe networkfences
Name: compute-2
Namespace:
Labels: <none>
Annotations: <none>
API Version: csiaddons.openshift.io/v1alpha1
Kind: NetworkFence
Metadata:
Creation Timestamp: 2024-01-23T13:05:51Z
Finalizers:
csiaddons.openshift.io/network-fence
Generation: 1
Managed Fields:
API Version: csiaddons.openshift.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:ownerReferences:
.:
k:{"uid":"db2eb584-6ebb-4865-8902-a17241e49ff2"}:
f:spec:
.:
f:cidrs:
f:driver:
f:fenceState:
f:parameters:
.:
f:clusterID:
f:secret:
.:
f:name:
f:namespace:
Manager: rook
Operation: Update
Time: 2024-01-23T13:05:51Z
API Version: csiaddons.openshift.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"csiaddons.openshift.io/network-fence":
Manager: csi-addons-manager
Operation: Update
Time: 2024-01-23T13:05:52Z
API Version: csiaddons.openshift.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:message:
f:result:
Manager: csi-addons-manager
Operation: Update
Subresource: status
Time: 2024-01-23T13:06:13Z
Owner References:
API Version: ceph.rook.io/v1
Block Owner Deletion: true
Controller: true
Kind: CephCluster
Name: ocs-storagecluster-cephcluster
UID: db2eb584-6ebb-4865-8902-a17241e49ff2
Resource Version: 123463
UID: 0e088dd2-64b8-460c-b057-21a9803abb52
Spec:
Cidrs:
100.64.0.7/32
Driver: openshift-storage.rbd.csi.ceph.com
Fence State: Fenced
Parameters:
Cluster ID: openshift-storage
Secret:
Name: rook-csi-rbd-provisioner
Namespace: openshift-storage
Status:
Message: fencing operation successful
Result: Succeeded
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning OwnerRefInvalidNamespace 89s garbage-collector-controller ownerRef [ceph.rook.io/v1/CephCluster, namespace: , name: ocs-storagecluster-cephcluster, uid: db2eb584-6ebb-4865-8902-a17241e49ff2] does not exist in namespace ""
(venv) [jopinto@jopinto ceph-csi]$
Hi @Riya, After using a different deployment pod yaml(logwriter app pod https://url.corp.redhat.com/791fe61) the pod got rescheduled on new node after applying 'out-of-service=nodeshutdown:NoExecute' label and fecing state became successful, But its still using openshift-storage.rbd.csi.ceph.com driver during networkfence creation. (In reply to Joy John Pinto from comment #6) > Hi @Riya, > > After using a different deployment pod yaml(logwriter app pod > https://url.corp.redhat.com/791fe61) the pod got rescheduled on new node > after applying 'out-of-service=nodeshutdown:NoExecute' label and fecing > state became successful, But its still using > openshift-storage.rbd.csi.ceph.com driver during networkfence creation. Hi @jopinto Can you also share the PVs that you had for this cluster, were you having both rbd and cephfs PVs? Removing the needinfo flag, as reproduction enviornemnt was provided for debug With OCP 4.15 and ODF 4.15.0-139 upon applying 'noschedule' taint on failed node with cephfs volume, networkfence is created but it is in failed state. also CIDR list is empty
[jopinto@jopinto cephfeb13]$ oc get nodes
NAME STATUS ROLES AGE VERSION
compute-0 Ready worker 18h v1.28.6+f1618d5
compute-1 NotReady worker 18h v1.28.6+f1618d5
compute-2 Ready worker 18h v1.28.6+f1618d5
control-plane-0 Ready control-plane,master 19h v1.28.6+f1618d5
control-plane-1 Ready control-plane,master 19h v1.28.6+f1618d5
control-plane-2 Ready control-plane,master 19h v1.28.6+f1618d5
[jopinto@jopinto cephfeb13]$ oc get networkfences.csiaddons.openshift.io
NAME DRIVER CIDRS FENCESTATE AGE RESULT
compute-1-cephfs openshift-storage.cephfs.csi.ceph.com [] Fenced 12m Failed
[jopinto@jopinto cephfeb13]$ oc describe networkfence
Name: compute-1-cephfs
Namespace:
Labels: <none>
Annotations: <none>
API Version: csiaddons.openshift.io/v1alpha1
Kind: NetworkFence
Metadata:
Creation Timestamp: 2024-02-14T05:16:36Z
Finalizers:
csiaddons.openshift.io/network-fence
Generation: 1
Managed Fields:
API Version: csiaddons.openshift.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"csiaddons.openshift.io/network-fence":
Manager: csi-addons-manager
Operation: Update
Time: 2024-02-14T05:16:36Z
API Version: csiaddons.openshift.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:spec:
.:
f:cidrs:
f:driver:
f:fenceState:
f:parameters:
.:
f:clusterID:
f:secret:
.:
f:name:
f:namespace:
Manager: rook
Operation: Update
Time: 2024-02-14T05:16:36Z
API Version: csiaddons.openshift.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:message:
f:result:
Manager: csi-addons-manager
Operation: Update
Subresource: status
Time: 2024-02-14T05:19:23Z
Resource Version: 712414
UID: cdef49b2-a9d2-403b-aa89-7e474c99aaed
Spec:
Cidrs:
Driver: openshift-storage.cephfs.csi.ceph.com
Fence State: Fenced
Parameters:
Cluster ID: openshift-storage
Secret:
Name: rook-csi-cephfs-provisioner
Namespace: openshift-storage
Status:
Message: rpc error: code = InvalidArgument desc = CIDR block cannot be empty
Result: Failed
Events: <none>
snippet of ocs-operator log:
2024-02-14 05:16:34.697118 I | ceph-cluster-controller: Found taint: Key=node.kubernetes.io/out-of-service, Value=nodeshutdown on node compute-1
2024-02-14 05:16:34.697153 I | ceph-cluster-controller: volumeInUse after split based on '^' [csi.vsphere.vmware.com 868f32f9-11fb-40f9-bd07-b383de2817f6]
2024-02-14 05:16:34.697158 I | ceph-cluster-controller: volumeInUse after split based on '^' [csi.vsphere.vmware.com e88d2a6d-7871-4940-9aa8-911e4f21d179]
2024-02-14 05:16:34.697163 I | ceph-cluster-controller: volumeInUse after split based on '^' [openshift-storage.cephfs.csi.ceph.com 0001-0011-openshift-storage-0000000000000001-7c9309f4-f5b1-4e7a-a272-756a78a14884]
2024-02-14 05:16:34.697167 I | ceph-cluster-controller: volumeInUse after split based on '^' [openshift-storage.cephfs.csi.ceph.com 0001-0011-openshift-storage-0000000000000001-8524a835-3bbc-4baa-8dbd-4b9b7de26773]
2024-02-14 05:16:35.153415 I | ceph-cluster-controller: node "compute-1" require fencing, found cephFS volumes in use
2024-02-14 05:16:35.866848 I | ceph-spec: parsing mon endpoints: d=172.30.246.217:3300,a=172.30.62.137:3300,c=172.30.4.34:3300
2024-02-14 05:16:35.866944 I | ceph-cluster-controller: fencing cephfs volume "pvc-01775025-8010-4497-9178-b96912f17d50" on node "compute-1"
2024-02-14 05:16:36.618528 W | ceph-cluster-controller: Blocking node IP []
2024-02-14 05:16:36.868706 I | ceph-cluster-controller: successfully created network fence CR for node "compute-1"
Moving it back to 'assigned' state as per https://bugzilla.redhat.com/show_bug.cgi?id=2259668#c20 Upon verifying with OCP 4.15 and ODF 4.15.0-144 Upon tainting the node networkfence gets created and it goes to succeeded state [jopinto@jopinto ceph-csi]$ oc adm taint nodes compute-0 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute node/compute-0 tainted [jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io NAME DRIVER CIDRS FENCESTATE AGE RESULT compute-0-cephfs-openshift-storage openshift-storage.cephfs.csi.ceph.com ["100.64.0.7/32"] Fenced 4m1s Succeeded After untainting the node the networkfence gets deleted but subsequent entry in 'ceph osd blocklist ls' still remains [jopinto@jopinto ceph-csi]$ oc adm taint nodes compute-0 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute- node/compute-0 untainted [jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io No resources found [jopinto@jopinto ceph-csi]$ sh-5.1$ ceph osd blocklist ls 10.131.0.46:6801/4289200221 2024-02-20T16:30:06.541675+0000 10.131.0.46:6800/4289200221 2024-02-20T16:30:06.541675+0000 100.64.0.7:0/2258816229 2024-02-19T17:27:55.434936+0000 10.128.2.83:6800/2653172858 2024-02-20T16:27:47.800520+0000 10.131.0.46:6801/152410105 2024-02-20T16:27:06.540832+0000 10.128.2.83:6801/2653172858 2024-02-20T16:27:47.800520+0000 10.131.0.46:6800/152410105 2024-02-20T16:27:06.540832+0000 Hence moving it back to assigned state I also tested this feature on IBM Power (ppc64le)
Build used: v4.15.0-143.stable
Upon tainting the node on which application pod was scheduled, networkfence got created and went to succeeded state.
[root@rdr-rhcs-bastion-0 ~]# oc get pods -o wide |grep logwriter
logwriter-cephfs-5b99f4dcc8-j5nfn 1/1 Running 0 44m 10.131.0.210 worker-2 <none> <none>
[root@rdr-rhcs-bastion-0 ~]# oc adm taint node worker-2 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
node/worker-2 tainted
[root@rdr-rhcs-bastion-0 ~]# oc get networkfence
NAME DRIVER CIDRS FENCESTATE AGE RESULT
worker-2-cephfs-openshift-storage openshift-storage.cephfs.csi.ceph.com ["100.64.0.5/32"] Fenced 19s Succeeded
[root@rdr-rhcs-bastion-0 ~]# oc describe networkfence worker-2-cephfs-openshift-storage
Name: worker-2-cephfs-openshift-storage
Namespace:
Labels: cephClusterUID=33f2e385-a0b8-4b6a-ae62-ddf76962e1bc
Annotations: <none>
API Version: csiaddons.openshift.io/v1alpha1
Kind: NetworkFence
Metadata:
Creation Timestamp: 2024-02-20T11:35:10Z
Finalizers:
csiaddons.openshift.io/network-fence
Generation: 1
Resource Version: 3892979
UID: 1ade9c9b-ce1d-4487-98a4-e1bb754df6e1
Spec:
Cidrs:
100.64.0.5/32
Driver: openshift-storage.cephfs.csi.ceph.com
Fence State: Fenced
Parameters:
Cluster ID: openshift-storage
Secret:
Name: rook-csi-cephfs-provisioner
Namespace: openshift-storage
Status:
Message: fencing operation successful
Result: Succeeded
Events: <none>
Ceph osd blocklist ls also shows blocklisted CIDR:
sh-5.1$ ceph osd blocklist ls
10.129.2.198:6801/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6801/3519091161 2024-02-20T14:45:28.638578+0000
10.128.2.61:6800/3519091161 2024-02-20T14:45:28.638578+0000
10.129.2.198:6800/929584275 2024-02-20T14:42:59.458096+0000
10.128.2.61:6801/304212021 2024-02-20T14:42:28.637783+0000
10.129.2.198:6801/1005064452 2024-02-20T14:39:59.455103+0000
10.129.2.173:6800/2334388931 2024-02-20T12:22:09.194626+0000
100.64.0.5:0/3657037826 2024-02-20T12:35:10.539542+0000 ----->> this one
10.128.2.61:6800/4007831461 2024-02-20T14:24:28.635649+0000
10.129.2.173:0/532969118 2024-02-20T12:22:09.194626+0000
10.128.2.61:6800/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/2962405343 2024-02-20T14:24:59.456604+0000
10.128.2.61:6801/4007831461 2024-02-20T14:24:28.635649+0000
10.128.2.61:6800/304212021 2024-02-20T14:42:28.637783+0000
10.128.2.61:6801/3326064042 2024-02-20T14:30:28.636763+0000
10.128.2.61:6800/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6801/2417995690 2024-02-20T14:36:59.459907+0000
10.128.2.61:6800/3326064042 2024-02-20T14:30:28.636763+0000
10.129.2.198:6800/2962405343 2024-02-20T14:24:59.456604+0000
10.129.2.173:0/70648446 2024-02-20T12:22:09.194626+0000
10.128.2.61:6801/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/1111294452 2024-02-20T14:27:59.456099+0000
10.129.2.198:6800/3415358265 2024-02-20T14:30:59.455236+0000
10.129.2.198:6801/3415358265 2024-02-20T14:30:59.455236+0000
10.128.2.61:6801/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6800/1798926241 2024-02-20T14:33:59.456596+0000
10.128.2.61:6800/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/1005064452 2024-02-20T14:39:59.455103+0000
10.129.2.198:6801/929584275 2024-02-20T14:42:59.458096+0000
10.129.2.173:0/2547077582 2024-02-20T12:22:09.194626+0000
10.129.2.198:6801/1798926241 2024-02-20T14:33:59.456596+0000
10.129.2.198:6800/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6800/484956839 2024-02-20T14:36:28.635064+0000
10.129.2.198:6800/1111294452 2024-02-20T14:27:59.456099+0000
10.128.2.61:6801/484956839 2024-02-20T14:36:28.635064+0000
10.129.2.173:6801/2334388931 2024-02-20T12:22:09.194626+0000
10.128.2.61:6801/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/2417995690 2024-02-20T14:36:59.459907+0000
listed 38 entries
After untainting the node the networkfence gets deleted immediately but subsequent entry in 'ceph osd blocklist ls' takes time.
It took around 10mins for the blocklisted CIDR to delete.
[root@rdr-rhcs-bastion-0 ~]# oc adm taint node worker-2 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute-
node/worker-2 untainted
sh-5.1$ ceph osd blocklist ls
10.129.2.198:6801/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6801/3519091161 2024-02-20T14:45:28.638578+0000
10.128.2.61:6800/3519091161 2024-02-20T14:45:28.638578+0000
10.129.2.198:6800/929584275 2024-02-20T14:42:59.458096+0000
10.128.2.61:6801/304212021 2024-02-20T14:42:28.637783+0000
10.129.2.198:6801/1005064452 2024-02-20T14:39:59.455103+0000
100.64.0.5:0/3657037826 2024-02-20T12:35:10.539542+0000
10.128.2.61:6800/4007831461 2024-02-20T14:24:28.635649+0000
10.128.2.61:6800/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/2962405343 2024-02-20T14:24:59.456604+0000
10.128.2.61:6801/4007831461 2024-02-20T14:24:28.635649+0000
10.128.2.61:6800/304212021 2024-02-20T14:42:28.637783+0000
10.128.2.61:6801/3326064042 2024-02-20T14:30:28.636763+0000
10.128.2.61:6800/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6801/2417995690 2024-02-20T14:36:59.459907+0000
10.128.2.61:6800/3326064042 2024-02-20T14:30:28.636763+0000
10.129.2.198:6800/2962405343 2024-02-20T14:24:59.456604+0000
10.128.2.61:6801/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/1111294452 2024-02-20T14:27:59.456099+0000
10.129.2.198:6800/3415358265 2024-02-20T14:30:59.455236+0000
10.129.2.198:6801/3415358265 2024-02-20T14:30:59.455236+0000
10.128.2.61:6801/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6800/1798926241 2024-02-20T14:33:59.456596+0000
10.128.2.61:6800/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/1005064452 2024-02-20T14:39:59.455103+0000
10.129.2.198:6801/929584275 2024-02-20T14:42:59.458096+0000
10.129.2.198:6801/1798926241 2024-02-20T14:33:59.456596+0000
10.129.2.198:6800/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6800/484956839 2024-02-20T14:36:28.635064+0000
10.129.2.198:6800/1111294452 2024-02-20T14:27:59.456099+0000
10.128.2.61:6801/484956839 2024-02-20T14:36:28.635064+0000
10.128.2.61:6801/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/2417995690 2024-02-20T14:36:59.459907+0000
listed 33 entries
when I checked after 10mins, it got removed.
sh-5.1$ ceph osd blocklist ls
10.129.2.198:6801/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6801/3519091161 2024-02-20T14:45:28.638578+0000
10.128.2.61:6800/3519091161 2024-02-20T14:45:28.638578+0000
10.129.2.198:6800/929584275 2024-02-20T14:42:59.458096+0000
10.128.2.61:6801/304212021 2024-02-20T14:42:28.637783+0000
10.129.2.198:6801/1005064452 2024-02-20T14:39:59.455103+0000
10.128.2.61:6800/4007831461 2024-02-20T14:24:28.635649+0000
10.128.2.61:6800/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/2962405343 2024-02-20T14:24:59.456604+0000
10.128.2.61:6801/4007831461 2024-02-20T14:24:28.635649+0000
10.128.2.61:6800/304212021 2024-02-20T14:42:28.637783+0000
10.128.2.61:6801/3326064042 2024-02-20T14:30:28.636763+0000
10.128.2.61:6800/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6801/2417995690 2024-02-20T14:36:59.459907+0000
10.128.2.61:6800/3326064042 2024-02-20T14:30:28.636763+0000
10.129.2.198:6800/2962405343 2024-02-20T14:24:59.456604+0000
10.128.2.61:6801/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/1111294452 2024-02-20T14:27:59.456099+0000
10.129.2.198:6800/3415358265 2024-02-20T14:30:59.455236+0000
10.129.2.198:6801/3415358265 2024-02-20T14:30:59.455236+0000
10.128.2.61:6801/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6800/1798926241 2024-02-20T14:33:59.456596+0000
10.128.2.61:6800/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/1005064452 2024-02-20T14:39:59.455103+0000
10.129.2.198:6801/929584275 2024-02-20T14:42:59.458096+0000
10.129.2.198:6801/1798926241 2024-02-20T14:33:59.456596+0000
10.129.2.198:6800/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6800/484956839 2024-02-20T14:36:28.635064+0000
10.129.2.198:6800/1111294452 2024-02-20T14:27:59.456099+0000
10.128.2.61:6801/484956839 2024-02-20T14:36:28.635064+0000
10.128.2.61:6801/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/2417995690 2024-02-20T14:36:59.459907+0000
listed 32 entries
Are there any blockers to provide devel ack for this bz? If not, please provide the devel ack. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:8676 |
Description of problem (please be detailed as possible and provide log snippests): Network fence with rbd_csi driver gets created upon cephfs volume recovery During cephfs volume recovery, upon tainting the node having cephfs volume with nodeshutdown:NoExecute label subsequent network fence that gets created uses openshift-storage.rbd.csi.ceph.com as the driver and the fencing rseult shows as Failed (venv) [jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io NAME DRIVER CIDRS FENCESTATE AGE RESULT compute-1 openshift-storage.rbd.csi.ceph.com ["100.64.0.5/32"] Fenced 4m36s Failed Version of all relevant components (if applicable): OCP 4.15 ODF 4.15.0-120 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? NA Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: NA Steps to Reproduce: 1. Install Openshift data foundation and deploy a app pod in same node as that of rook ceph operator pod 2. Shutdown the node on which CephFS RWO pod is deployed 3.Once the node is down, add taint ```oc taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute ``` Wait for some time(if the application pod and rook operator are on the same node wait for bit logger) then check the networkFence cr status and make sure its state is fenced (venv) [jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io NAME DRIVER CIDRS FENCESTATE AGE RESULT compute-1 openshift-storage.rbd.csi.ceph.com ["100.64.0.5/32"] Fenced 4m36s Failed 4. Wait for pod to come up on new node. Actual results: Network fence uses rbd csi driver and fence state shows as failed Expected results: Network fence should use cephfs csi driver Additional info: (venv) [jopinto@jopinto ceph-csi]$ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES wordpress-mysql-76c9c75d64-jfmsq 0/1 ContainerCreating 0 4m26s <none> compute-2 <none> <none> wordpress-mysql-76c9c75d64-wlwc8 0/1 Terminating 0 9m52s 10.131.0.40 compute-1 <none> <none> (venv) [jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io NAME DRIVER CIDRS FENCESTATE AGE RESULT compute-1 openshift-storage.rbd.csi.ceph.com ["100.64.0.5/32"] Fenced 4m36s Failed (venv) [jopinto@jopinto ceph-csi]$ oc describe pvc mysql-pv-claim1 Name: mysql-pv-claim1 Namespace: ca StorageClass: ocs-storagecluster-cephfs Status: Bound Volume: pvc-a5ef834f-012f-4ffe-93d1-1681f7c54338 Labels: app=wordpress Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: 20Gi Access Modes: RWO VolumeMode: Filesystem Used By: wordpress-mysql-76c9c75d64-jfmsq wordpress-mysql-76c9c75d64-wlwc8 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ExternalProvisioning 11m (x2 over 11m) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'openshift-storage.cephfs.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered. Normal Provisioning 11m openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-99dddf64d-xn8bz_67f4c779-e5ba-48b7-b51c-9339a0352eaa External provisioner is provisioning volume for claim "ca/mysql-pv-claim1" Normal ProvisioningSucceeded 11m openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-99dddf64d-xn8bz_67f4c779-e5ba-48b7-b51c-9339a0352eaa Successfully provisioned volume pvc-a5ef834f-012f-4ffe-93d1-1681f7c54338 deployment pod and pvc yaml: (venv) [jopinto@jopinto ceph-csi]$ cat mysql_ceph.yaml apiVersion: v1 kind: Service metadata: name: wordpress-mysql labels: app: wordpress spec: ports: - port: 3306 selector: app: wordpress tier: mysql clusterIP: None --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pv-claim1 labels: app: wordpress spec: storageClassName: ocs-storagecluster-cephfs accessModes: - ReadWriteOnce resources: requests: storage: 20Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: wordpress-mysql labels: app: wordpress tier: mysql spec: selector: matchLabels: app: wordpress tier: mysql strategy: type: Recreate template: metadata: labels: app: wordpress tier: mysql spec: containers: - image: mysql:5.6 name: mysql env: - name: MYSQL_ROOT_PASSWORD value: changeme ports: - containerPort: 3306 name: mysql volumeMounts: - name: mysql-persistent-storage mountPath: /var/lib/mysql volumes: - name: mysql-persistent-storage persistentVolumeClaim: claimName: mysql-pv-claim1