2259668 – Network fence with rbd_csi driver gets created upon cephfs volume recovery

Bug 2259668 - Network fence with rbd_csi driver gets created upon cephfs volume recovery

Summary: Network fence with rbd_csi driver gets created upon cephfs volume recovery

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.17.0
Assignee:	Subham Rai
QA Contact:	Joy John Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2262070 2265124
TreeView+	depends on / blocked

Reported:	2024-01-22 15:56 UTC by Joy John Pinto
Modified:	2024-10-30 14:26 UTC (History)
CC List:	9 users (show)
Fixed In Version:	4.15.0-142
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-10-30 14:26:14 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage rook pull 568	None	open	Bug 2259668: csi: update network fence CR name	2024-02-06 13:32:43 UTC
Github	red-hat-storage rook pull 574	None	open	BUG 2259668: core: Continue processing PVs for network fencing when no node IPs found	2024-02-14 19:46:46 UTC
Github	rook rook pull 13768	None	Merged	core: Continue processing PVs for network fencing when no node IPs found	2024-02-20 16:59:59 UTC
Red Hat Product Errata	RHSA-2024:8676	None	None	None	2024-10-30 14:26:27 UTC

Description Joy John Pinto 2024-01-22 15:56:19 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Network fence with rbd_csi driver gets created upon cephfs volume recovery

During cephfs volume recovery, upon tainting the node having cephfs volume with nodeshutdown:NoExecute label subsequent network fence that gets created uses openshift-storage.rbd.csi.ceph.com as the driver and the fencing rseult shows as Failed

(venv) [jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io
NAME        DRIVER                               CIDRS               FENCESTATE   AGE     RESULT
compute-1   openshift-storage.rbd.csi.ceph.com   ["100.64.0.5/32"]   Fenced       4m36s   Failed

Version of all relevant components (if applicable):

OCP 4.15
ODF 4.15.0-120

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
NA

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
1. Install Openshift data foundation and deploy a app pod in same node as that of rook ceph operator pod
2. Shutdown the node on which CephFS RWO pod is deployed
3.Once the node is down, add taint
```oc  taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute ```
Wait for some time(if the application pod and rook operator are on the same node wait for bit logger) then check the networkFence cr status and make sure its state is fenced 

(venv) [jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io
NAME        DRIVER                               CIDRS               FENCESTATE   AGE     RESULT
compute-1   openshift-storage.rbd.csi.ceph.com   ["100.64.0.5/32"]   Fenced       4m36s   Failed

4. Wait for pod to come up on new node. 


Actual results:
Network fence uses rbd csi driver and fence state shows as failed

Expected results:
Network fence should use cephfs csi driver 

Additional info:
(venv) [jopinto@jopinto ceph-csi]$ oc get pods -o wide
NAME                               READY   STATUS              RESTARTS   AGE     IP            NODE        NOMINATED NODE   READINESS GATES
wordpress-mysql-76c9c75d64-jfmsq   0/1     ContainerCreating   0          4m26s   <none>        compute-2   <none>           <none>
wordpress-mysql-76c9c75d64-wlwc8   0/1     Terminating         0          9m52s   10.131.0.40   compute-1   <none>           <none>

(venv) [jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io
NAME        DRIVER                               CIDRS               FENCESTATE   AGE     RESULT
compute-1   openshift-storage.rbd.csi.ceph.com   ["100.64.0.5/32"]   Fenced       4m36s   Failed

(venv) [jopinto@jopinto ceph-csi]$ oc describe pvc mysql-pv-claim1
Name:          mysql-pv-claim1
Namespace:     ca
StorageClass:  ocs-storagecluster-cephfs
Status:        Bound
Volume:        pvc-a5ef834f-012f-4ffe-93d1-1681f7c54338
Labels:        app=wordpress
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com
               volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      20Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       wordpress-mysql-76c9c75d64-jfmsq
               wordpress-mysql-76c9c75d64-wlwc8
Events:
  Type    Reason                 Age                From                                                                                                                     Message
  ----    ------                 ----               ----                                                                                                                     -------
  Normal  ExternalProvisioning   11m (x2 over 11m)  persistentvolume-controller                                                                                              Waiting for a volume to be created either by the external provisioner 'openshift-storage.cephfs.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
  Normal  Provisioning           11m                openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-99dddf64d-xn8bz_67f4c779-e5ba-48b7-b51c-9339a0352eaa  External provisioner is provisioning volume for claim "ca/mysql-pv-claim1"
  Normal  ProvisioningSucceeded  11m                openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-99dddf64d-xn8bz_67f4c779-e5ba-48b7-b51c-9339a0352eaa  Successfully provisioned volume pvc-a5ef834f-012f-4ffe-93d1-1681f7c54338


deployment pod and pvc yaml:
(venv) [jopinto@jopinto ceph-csi]$ cat mysql_ceph.yaml 
apiVersion: v1
kind: Service
metadata:
  name: wordpress-mysql
  labels:
    app: wordpress
spec:
  ports:
    - port: 3306
  selector:
    app: wordpress
    tier: mysql
  clusterIP: None
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pv-claim1
  labels:
    app: wordpress
spec:
  storageClassName: ocs-storagecluster-cephfs
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress-mysql
  labels:
    app: wordpress
    tier: mysql
spec:
  selector:
    matchLabels:
      app: wordpress
      tier: mysql
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: wordpress
        tier: mysql
    spec:
      containers:
        - image: mysql:5.6
          name: mysql
          env:
            - name: MYSQL_ROOT_PASSWORD
              value: changeme
          ports:
            - containerPort: 3306
              name: mysql
          volumeMounts:
            - name: mysql-persistent-storage
              mountPath: /var/lib/mysql
      volumes:
        - name: mysql-persistent-storage
          persistentVolumeClaim:
            claimName: mysql-pv-claim1

Comment 2 Riya SInghal 2024-01-22 16:45:33 UTC

Hi @Joy  
Is this the same result that you shared over chat? 
As there you mentioned that the fencing was succeeded and pod got rescheduled on another node too.

Comment 3 Riya SInghal 2024-01-22 16:51:09 UTC

Can you also share the oc describe for network fence, it will have a result field.

Comment 5 Joy John Pinto 2024-01-23 13:09:01 UTC

(venv) [jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io
NAME        DRIVER                               CIDRS               FENCESTATE   AGE   RESULT
compute-2   openshift-storage.rbd.csi.ceph.com   ["100.64.0.7/32"]   Fenced       77s   Succeeded
(venv) [jopinto@jopinto ceph-csi]$ oc describe networkfences
Name:         compute-2
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  csiaddons.openshift.io/v1alpha1
Kind:         NetworkFence
Metadata:
  Creation Timestamp:  2024-01-23T13:05:51Z
  Finalizers:
    csiaddons.openshift.io/network-fence
  Generation:  1
  Managed Fields:
    API Version:  csiaddons.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .:
          k:{"uid":"db2eb584-6ebb-4865-8902-a17241e49ff2"}:
      f:spec:
        .:
        f:cidrs:
        f:driver:
        f:fenceState:
        f:parameters:
          .:
          f:clusterID:
        f:secret:
          .:
          f:name:
          f:namespace:
    Manager:      rook
    Operation:    Update
    Time:         2024-01-23T13:05:51Z
    API Version:  csiaddons.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"csiaddons.openshift.io/network-fence":
    Manager:      csi-addons-manager
    Operation:    Update
    Time:         2024-01-23T13:05:52Z
    API Version:  csiaddons.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:message:
        f:result:
    Manager:      csi-addons-manager
    Operation:    Update
    Subresource:  status
    Time:         2024-01-23T13:06:13Z
  Owner References:
    API Version:           ceph.rook.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  CephCluster
    Name:                  ocs-storagecluster-cephcluster
    UID:                   db2eb584-6ebb-4865-8902-a17241e49ff2
  Resource Version:        123463
  UID:                     0e088dd2-64b8-460c-b057-21a9803abb52
Spec:
  Cidrs:
    100.64.0.7/32
  Driver:       openshift-storage.rbd.csi.ceph.com
  Fence State:  Fenced
  Parameters:
    Cluster ID:  openshift-storage
  Secret:
    Name:       rook-csi-rbd-provisioner
    Namespace:  openshift-storage
Status:
  Message:  fencing operation successful
  Result:   Succeeded
Events:
  Type     Reason                    Age   From                          Message
  ----     ------                    ----  ----                          -------
  Warning  OwnerRefInvalidNamespace  89s   garbage-collector-controller  ownerRef [ceph.rook.io/v1/CephCluster, namespace: , name: ocs-storagecluster-cephcluster, uid: db2eb584-6ebb-4865-8902-a17241e49ff2] does not exist in namespace ""
(venv) [jopinto@jopinto ceph-csi]$

Comment 6 Joy John Pinto 2024-01-23 13:18:43 UTC

Hi @Riya, 

After using a different deployment pod yaml(logwriter app pod https://url.corp.redhat.com/791fe61) the pod got rescheduled on new node after applying 'out-of-service=nodeshutdown:NoExecute' label and fecing state became successful, But its still using openshift-storage.rbd.csi.ceph.com driver during networkfence creation.

Comment 9 Riya SInghal 2024-01-24 09:38:57 UTC

(In reply to Joy John Pinto from comment #6)
> Hi @Riya, 
> 
> After using a different deployment pod yaml(logwriter app pod
> https://url.corp.redhat.com/791fe61) the pod got rescheduled on new node
> after applying 'out-of-service=nodeshutdown:NoExecute' label and fecing
> state became successful, But its still using
> openshift-storage.rbd.csi.ceph.com driver during networkfence creation.

Hi @jopinto 
Can you also share the PVs that you had for this cluster, 
were you having both rbd and cephfs PVs?

Comment 15 Joy John Pinto 2024-02-09 05:03:18 UTC

Removing the needinfo flag, as reproduction enviornemnt was provided for debug

Comment 19 Joy John Pinto 2024-02-14 07:06:38 UTC

With OCP 4.15 and ODF 4.15.0-139 upon applying 'noschedule' taint on failed node with cephfs volume, networkfence is created but it is in failed state. also CIDR list is empty

[jopinto@jopinto cephfeb13]$ oc get nodes
NAME              STATUS     ROLES                  AGE   VERSION
compute-0         Ready      worker                 18h   v1.28.6+f1618d5
compute-1         NotReady   worker                 18h   v1.28.6+f1618d5
compute-2         Ready      worker                 18h   v1.28.6+f1618d5
control-plane-0   Ready      control-plane,master   19h   v1.28.6+f1618d5
control-plane-1   Ready      control-plane,master   19h   v1.28.6+f1618d5
control-plane-2   Ready      control-plane,master   19h   v1.28.6+f1618d5


[jopinto@jopinto cephfeb13]$ oc get networkfences.csiaddons.openshift.io  
NAME               DRIVER                                  CIDRS   FENCESTATE   AGE   RESULT
compute-1-cephfs   openshift-storage.cephfs.csi.ceph.com   []      Fenced       12m   Failed

[jopinto@jopinto cephfeb13]$ oc describe networkfence
Name:         compute-1-cephfs
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  csiaddons.openshift.io/v1alpha1
Kind:         NetworkFence
Metadata:
  Creation Timestamp:  2024-02-14T05:16:36Z
  Finalizers:
    csiaddons.openshift.io/network-fence
  Generation:  1
  Managed Fields:
    API Version:  csiaddons.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"csiaddons.openshift.io/network-fence":
    Manager:      csi-addons-manager
    Operation:    Update
    Time:         2024-02-14T05:16:36Z
    API Version:  csiaddons.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:cidrs:
        f:driver:
        f:fenceState:
        f:parameters:
          .:
          f:clusterID:
        f:secret:
          .:
          f:name:
          f:namespace:
    Manager:      rook
    Operation:    Update
    Time:         2024-02-14T05:16:36Z
    API Version:  csiaddons.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:message:
        f:result:
    Manager:         csi-addons-manager
    Operation:       Update
    Subresource:     status
    Time:            2024-02-14T05:19:23Z
  Resource Version:  712414
  UID:               cdef49b2-a9d2-403b-aa89-7e474c99aaed
Spec:
  Cidrs:
  Driver:       openshift-storage.cephfs.csi.ceph.com
  Fence State:  Fenced
  Parameters:
    Cluster ID:  openshift-storage
  Secret:
    Name:       rook-csi-cephfs-provisioner
    Namespace:  openshift-storage
Status:
  Message:  rpc error: code = InvalidArgument desc = CIDR block cannot be empty
  Result:   Failed
Events:     <none>



snippet of ocs-operator log:
2024-02-14 05:16:34.697118 I | ceph-cluster-controller: Found taint: Key=node.kubernetes.io/out-of-service, Value=nodeshutdown on node compute-1
2024-02-14 05:16:34.697153 I | ceph-cluster-controller: volumeInUse after split based on '^' [csi.vsphere.vmware.com 868f32f9-11fb-40f9-bd07-b383de2817f6]
2024-02-14 05:16:34.697158 I | ceph-cluster-controller: volumeInUse after split based on '^' [csi.vsphere.vmware.com e88d2a6d-7871-4940-9aa8-911e4f21d179]
2024-02-14 05:16:34.697163 I | ceph-cluster-controller: volumeInUse after split based on '^' [openshift-storage.cephfs.csi.ceph.com 0001-0011-openshift-storage-0000000000000001-7c9309f4-f5b1-4e7a-a272-756a78a14884]
2024-02-14 05:16:34.697167 I | ceph-cluster-controller: volumeInUse after split based on '^' [openshift-storage.cephfs.csi.ceph.com 0001-0011-openshift-storage-0000000000000001-8524a835-3bbc-4baa-8dbd-4b9b7de26773]
2024-02-14 05:16:35.153415 I | ceph-cluster-controller: node "compute-1" require fencing, found cephFS volumes in use
2024-02-14 05:16:35.866848 I | ceph-spec: parsing mon endpoints: d=172.30.246.217:3300,a=172.30.62.137:3300,c=172.30.4.34:3300
2024-02-14 05:16:35.866944 I | ceph-cluster-controller: fencing cephfs volume "pvc-01775025-8010-4497-9178-b96912f17d50" on node "compute-1"
2024-02-14 05:16:36.618528 W | ceph-cluster-controller: Blocking node IP []
2024-02-14 05:16:36.868706 I | ceph-cluster-controller: successfully created network fence CR for node "compute-1"

Comment 21 Joy John Pinto 2024-02-14 10:02:25 UTC

Moving it back to 'assigned' state as per https://bugzilla.redhat.com/show_bug.cgi?id=2259668#c20

Comment 25 Joy John Pinto 2024-02-19 16:53:19 UTC

Upon verifying with OCP 4.15 and ODF 4.15.0-144

Upon tainting the node networkfence gets created and it goes to succeeded state

[jopinto@jopinto ceph-csi]$ oc adm taint nodes compute-0 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
node/compute-0 tainted

[jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io  
NAME                                 DRIVER                                  CIDRS               FENCESTATE   AGE    RESULT
compute-0-cephfs-openshift-storage   openshift-storage.cephfs.csi.ceph.com   ["100.64.0.7/32"]   Fenced       4m1s   Succeeded

After untainting the node the networkfence gets deleted but subsequent entry in 'ceph osd blocklist ls' still remains

[jopinto@jopinto ceph-csi]$ oc adm taint nodes compute-0 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute-
node/compute-0 untainted

[jopinto@jopinto ceph-csi]$ oc get networkfences.csiaddons.openshift.io  
No resources found
[jopinto@jopinto ceph-csi]$ 

sh-5.1$ ceph osd blocklist ls
10.131.0.46:6801/4289200221 2024-02-20T16:30:06.541675+0000
10.131.0.46:6800/4289200221 2024-02-20T16:30:06.541675+0000
100.64.0.7:0/2258816229 2024-02-19T17:27:55.434936+0000
10.128.2.83:6800/2653172858 2024-02-20T16:27:47.800520+0000
10.131.0.46:6801/152410105 2024-02-20T16:27:06.540832+0000
10.128.2.83:6801/2653172858 2024-02-20T16:27:47.800520+0000
10.131.0.46:6800/152410105 2024-02-20T16:27:06.540832+0000

Hence moving it back to assigned state

Comment 26 Aaruni Aggarwal 2024-02-20 12:50:58 UTC

I also tested this feature on IBM Power (ppc64le)

Build used: v4.15.0-143.stable

Upon tainting the node on which application pod was scheduled, networkfence got created and went to succeeded state. 

[root@rdr-rhcs-bastion-0 ~]# oc get pods -o wide |grep logwriter
logwriter-cephfs-5b99f4dcc8-j5nfn                                 1/1     Running     0             44m   10.131.0.210    worker-2   <none>           <none>

[root@rdr-rhcs-bastion-0 ~]# oc adm taint node worker-2 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
node/worker-2 tainted

[root@rdr-rhcs-bastion-0 ~]# oc get networkfence
NAME                                DRIVER                                  CIDRS               FENCESTATE   AGE   RESULT
worker-2-cephfs-openshift-storage   openshift-storage.cephfs.csi.ceph.com   ["100.64.0.5/32"]   Fenced       19s   Succeeded

[root@rdr-rhcs-bastion-0 ~]# oc describe networkfence worker-2-cephfs-openshift-storage
Name:         worker-2-cephfs-openshift-storage
Namespace:
Labels:       cephClusterUID=33f2e385-a0b8-4b6a-ae62-ddf76962e1bc
Annotations:  <none>
API Version:  csiaddons.openshift.io/v1alpha1
Kind:         NetworkFence
Metadata:
  Creation Timestamp:  2024-02-20T11:35:10Z
  Finalizers:
    csiaddons.openshift.io/network-fence
  Generation:        1
  Resource Version:  3892979
  UID:               1ade9c9b-ce1d-4487-98a4-e1bb754df6e1
Spec:
  Cidrs:
    100.64.0.5/32
  Driver:       openshift-storage.cephfs.csi.ceph.com
  Fence State:  Fenced
  Parameters:
    Cluster ID:  openshift-storage
  Secret:
    Name:       rook-csi-cephfs-provisioner
    Namespace:  openshift-storage
Status:
  Message:  fencing operation successful
  Result:   Succeeded
Events:     <none>

Ceph osd blocklist ls also shows blocklisted CIDR: 

sh-5.1$ ceph osd blocklist ls
10.129.2.198:6801/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6801/3519091161 2024-02-20T14:45:28.638578+0000
10.128.2.61:6800/3519091161 2024-02-20T14:45:28.638578+0000
10.129.2.198:6800/929584275 2024-02-20T14:42:59.458096+0000
10.128.2.61:6801/304212021 2024-02-20T14:42:28.637783+0000
10.129.2.198:6801/1005064452 2024-02-20T14:39:59.455103+0000
10.129.2.173:6800/2334388931 2024-02-20T12:22:09.194626+0000
100.64.0.5:0/3657037826 2024-02-20T12:35:10.539542+0000         ----->> this one
10.128.2.61:6800/4007831461 2024-02-20T14:24:28.635649+0000
10.129.2.173:0/532969118 2024-02-20T12:22:09.194626+0000
10.128.2.61:6800/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/2962405343 2024-02-20T14:24:59.456604+0000
10.128.2.61:6801/4007831461 2024-02-20T14:24:28.635649+0000
10.128.2.61:6800/304212021 2024-02-20T14:42:28.637783+0000
10.128.2.61:6801/3326064042 2024-02-20T14:30:28.636763+0000
10.128.2.61:6800/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6801/2417995690 2024-02-20T14:36:59.459907+0000
10.128.2.61:6800/3326064042 2024-02-20T14:30:28.636763+0000
10.129.2.198:6800/2962405343 2024-02-20T14:24:59.456604+0000
10.129.2.173:0/70648446 2024-02-20T12:22:09.194626+0000
10.128.2.61:6801/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/1111294452 2024-02-20T14:27:59.456099+0000
10.129.2.198:6800/3415358265 2024-02-20T14:30:59.455236+0000
10.129.2.198:6801/3415358265 2024-02-20T14:30:59.455236+0000
10.128.2.61:6801/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6800/1798926241 2024-02-20T14:33:59.456596+0000
10.128.2.61:6800/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/1005064452 2024-02-20T14:39:59.455103+0000
10.129.2.198:6801/929584275 2024-02-20T14:42:59.458096+0000
10.129.2.173:0/2547077582 2024-02-20T12:22:09.194626+0000
10.129.2.198:6801/1798926241 2024-02-20T14:33:59.456596+0000
10.129.2.198:6800/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6800/484956839 2024-02-20T14:36:28.635064+0000
10.129.2.198:6800/1111294452 2024-02-20T14:27:59.456099+0000
10.128.2.61:6801/484956839 2024-02-20T14:36:28.635064+0000
10.129.2.173:6801/2334388931 2024-02-20T12:22:09.194626+0000
10.128.2.61:6801/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/2417995690 2024-02-20T14:36:59.459907+0000
listed 38 entries


After untainting the node the networkfence gets deleted immediately but subsequent entry in 'ceph osd blocklist ls' takes time. 
It took around 10mins for the blocklisted CIDR to delete. 

[root@rdr-rhcs-bastion-0 ~]# oc adm taint node worker-2 node.kubernetes.io/out-of-service=nodeshutdown:NoExecute-
node/worker-2 untainted

sh-5.1$ ceph osd blocklist ls
10.129.2.198:6801/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6801/3519091161 2024-02-20T14:45:28.638578+0000
10.128.2.61:6800/3519091161 2024-02-20T14:45:28.638578+0000
10.129.2.198:6800/929584275 2024-02-20T14:42:59.458096+0000
10.128.2.61:6801/304212021 2024-02-20T14:42:28.637783+0000
10.129.2.198:6801/1005064452 2024-02-20T14:39:59.455103+0000
100.64.0.5:0/3657037826 2024-02-20T12:35:10.539542+0000
10.128.2.61:6800/4007831461 2024-02-20T14:24:28.635649+0000
10.128.2.61:6800/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/2962405343 2024-02-20T14:24:59.456604+0000
10.128.2.61:6801/4007831461 2024-02-20T14:24:28.635649+0000
10.128.2.61:6800/304212021 2024-02-20T14:42:28.637783+0000
10.128.2.61:6801/3326064042 2024-02-20T14:30:28.636763+0000
10.128.2.61:6800/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6801/2417995690 2024-02-20T14:36:59.459907+0000
10.128.2.61:6800/3326064042 2024-02-20T14:30:28.636763+0000
10.129.2.198:6800/2962405343 2024-02-20T14:24:59.456604+0000
10.128.2.61:6801/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/1111294452 2024-02-20T14:27:59.456099+0000
10.129.2.198:6800/3415358265 2024-02-20T14:30:59.455236+0000
10.129.2.198:6801/3415358265 2024-02-20T14:30:59.455236+0000
10.128.2.61:6801/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6800/1798926241 2024-02-20T14:33:59.456596+0000
10.128.2.61:6800/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/1005064452 2024-02-20T14:39:59.455103+0000
10.129.2.198:6801/929584275 2024-02-20T14:42:59.458096+0000
10.129.2.198:6801/1798926241 2024-02-20T14:33:59.456596+0000
10.129.2.198:6800/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6800/484956839 2024-02-20T14:36:28.635064+0000
10.129.2.198:6800/1111294452 2024-02-20T14:27:59.456099+0000
10.128.2.61:6801/484956839 2024-02-20T14:36:28.635064+0000
10.128.2.61:6801/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/2417995690 2024-02-20T14:36:59.459907+0000
listed 33 entries

when I checked after 10mins, it got removed. 

sh-5.1$ ceph osd blocklist ls
10.129.2.198:6801/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6801/3519091161 2024-02-20T14:45:28.638578+0000
10.128.2.61:6800/3519091161 2024-02-20T14:45:28.638578+0000
10.129.2.198:6800/929584275 2024-02-20T14:42:59.458096+0000
10.128.2.61:6801/304212021 2024-02-20T14:42:28.637783+0000
10.129.2.198:6801/1005064452 2024-02-20T14:39:59.455103+0000
10.128.2.61:6800/4007831461 2024-02-20T14:24:28.635649+0000
10.128.2.61:6800/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/2962405343 2024-02-20T14:24:59.456604+0000
10.128.2.61:6801/4007831461 2024-02-20T14:24:28.635649+0000
10.128.2.61:6800/304212021 2024-02-20T14:42:28.637783+0000
10.128.2.61:6801/3326064042 2024-02-20T14:30:28.636763+0000
10.128.2.61:6800/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6801/2417995690 2024-02-20T14:36:59.459907+0000
10.128.2.61:6800/3326064042 2024-02-20T14:30:28.636763+0000
10.129.2.198:6800/2962405343 2024-02-20T14:24:59.456604+0000
10.128.2.61:6801/3652878968 2024-02-20T14:27:28.634523+0000
10.129.2.198:6801/1111294452 2024-02-20T14:27:59.456099+0000
10.129.2.198:6800/3415358265 2024-02-20T14:30:59.455236+0000
10.129.2.198:6801/3415358265 2024-02-20T14:30:59.455236+0000
10.128.2.61:6801/3655404529 2024-02-20T14:33:28.642924+0000
10.129.2.198:6800/1798926241 2024-02-20T14:33:59.456596+0000
10.128.2.61:6800/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/1005064452 2024-02-20T14:39:59.455103+0000
10.129.2.198:6801/929584275 2024-02-20T14:42:59.458096+0000
10.129.2.198:6801/1798926241 2024-02-20T14:33:59.456596+0000
10.129.2.198:6800/408667598 2024-02-20T14:45:59.457093+0000
10.128.2.61:6800/484956839 2024-02-20T14:36:28.635064+0000
10.129.2.198:6800/1111294452 2024-02-20T14:27:59.456099+0000
10.128.2.61:6801/484956839 2024-02-20T14:36:28.635064+0000
10.128.2.61:6801/3446143648 2024-02-20T14:39:28.643747+0000
10.129.2.198:6800/2417995690 2024-02-20T14:36:59.459907+0000
listed 32 entries

Comment 37 Sunil Kumar Acharya 2024-08-26 11:22:42 UTC

Are there any blockers to provide devel ack for this bz? If not, please provide the devel ack.

Comment 43 errata-xmlrpc 2024-10-30 14:26:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676

Note You need to log in before you can comment on or make changes to this bug.