2116358 – Number of OSD in PodBudgetDisruption does not increase when more than 1 OSD runs on the same node/zone

Bug 2116358 - Number of OSD in PodBudgetDisruption does not increase when more than 1 OSD runs on the same node/zone

Summary: Number of OSD in PodBudgetDisruption does not increase when more than 1 OSD r...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Santosh Pillai
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-08-08 11:24 UTC by Javier Coscia
Modified:	2023-08-09 17:03 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-03-02 14:36:46 UTC
Embargoed:

Attachments	(Terms of Use)

Description Javier Coscia 2022-08-08 11:24:27 UTC

Description of problem (please be detailed as possible and provide log
snippests):

- The logic to increase the PDB "max unavailable" as OSDs are added to a node is taken care of by the ocs/odf operator.

- We should not expect manual changes to it to allow nodes to be drained when upgrading OCP nodes for example.

Version of all relevant components (if applicable):

- OCP 4.10.23
- ODF 4.10.5

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

- customer needs to manually increase the number of unavailable OSDs in PDB object definition or manually delete the OSD POD running on the node which is being drained.

Is there any workaround available to the best of your knowledge?

- manually increase the amount of unavailable OSDs if they run on the same node which is being drained, or delete the POD manually

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

Can this issue reproducible?

- Unknown

Can this issue reproduce from the UI?

- Unknown

If this is a regression, please provide more details to justify this:

Steps to Reproduce in customer environment:
1. Have OCP with ODF installed
2. Have more than 1 OSDs per storage node.
3. Upgrade/drain 1 storage node.

Actual results:

- node won't be drained since OSD cannot be evicted from node due to the amount of unavailable OSDs allowed in PDB object which defaults to 1
~~~
error when evicting pods/"rook-ceph-osd-5-86455f59c4-t4lbb" -n "openshift-storage" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
~~~

Expected results:

- If customer adds OSDs to ODF such that there are two OSDs per node, PDB should increase the number of `max unavailable` accordingly to allow node to be drained in an OCP node upgrade scenario for example.

Comment 3 Travis Nielsen 2022-08-29 20:33:57 UTC

This is by design. Once a node starts being drained, one of the OSDs is expected to go down, then the rook operator will adjust the PDBs dynamically to allow all OSDs on that node (or in that failure domain) to go down as well.

If the rook operator was on the same drained node, the adjustment may take a bit longer while the operator starts on another node.

See the design here: https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md

Do you not see the PDBs adjusted automatically when an OSD pod goes down?

Comment 6 Travis Nielsen 2022-09-12 15:19:07 UTC

Santosh PTAL, thanks

Comment 20 Santosh Pillai 2022-11-08 04:44:28 UTC

I'll check and get back to you as soon as I can.

Comment 23 Santosh Pillai 2022-11-29 12:07:16 UTC

Got occupied with some other task. I'll check and get back to you this week. 

Is https://bugzilla.redhat.com/show_bug.cgi?id=2116358#c19 latest update from the customer? or have there been new instances where customer is not seeing correct behavior with PDBs?

Comment 24 Javier Coscia 2022-11-29 12:25:33 UTC

Correct, latest update from customer is https://bugzilla.redhat.com/show_bug.cgi?id=2116358#c19 

At least customer did not report any new scenario about PDB

Comment 33 Santosh Pillai 2023-01-02 13:02:01 UTC

Hi Javier

Sorry for the delay.

I'm looking at the must gather logs `inspect.local.359676569811226791` logs for comment 19. Can you please confirm if these logs are related to comment 19. 

Looking at the the rook-operator logs at `inspect.local.359676569811226791/namespaces/openshift-storage/pods/rook-ceph-operator-6985f85bcb-ncrms/rook-ceph-operator/rook-ceph-operator/logs/current.log`. Few things I observed:

1. One of the mon pod is down and it never came back:
-------

2022-11-07T09:15:03.483731749Z 2022-11-07 09:15:03.483630 E | op-mon: failed to schedule mon "g". failed to schedule canary pod(s)
2022-11-07T09:15:03.489879925Z 2022-11-07 09:15:03.489782 I | op-mon: cleaning up canary monitor deployment "rook-ceph-mon-g-canary"
2022-11-07T09:15:03.511533755Z 2022-11-07 09:15:03.511507 I | op-mon: scaling the mon "c" deployment to replica 1
2022-11-07T09:15:03.532785605Z 2022-11-07 09:15:03.532714 E | op-mon: failed to failover mon "c". failed to place new mon on a node: failed to schedule mons
2022-11-07T09:15:03.532785605Z 2022-11-07 09:15:03.532726 I | op-mon: allow voluntary mon drain after failover

.....
...
..

op-mon: mon "c" not found in quorum, waiting for timeout (554 seconds left) before failover
-------

2. The events suggest that 

```
07:59:40	openshift-storage		rook-ceph-mon-g-canary-5c7c748cfb-c2tn8	  FailedScheduling  0/5 nodes are available: 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) didn't match pod anti-affinity rules.
```


So my assumption is that the last drained node never came back up due to which we are seeing the unexpected behavior.

Comment 34 Javier Coscia 2023-01-02 13:51:58 UTC

Hi Santosh, Happy new year!

Thanks for looking at this.

Latest time frame when this scenario was seen in customer environment was on Nov 4th around 11 AM when worker-12 node was brought down 

~~~

I1104 11:15:54.830513   20381 drain.go:44] Initiating cordon on node (currently schedulable: true)
I1104 11:15:54.867419   20381 drain.go:66] cordon succeeded on node (currently schedulable: false)
I1104 11:15:54.867442   20381 update.go:1956] Node has been successfully cordoned
I1104 11:15:54.869633   20381 update.go:1956] Update prepared; beginning drain

E1104 11:15:58.555501   20381 daemon.go:335] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-7s5ck, openshift-cnv/bridge-marker-wpvj7, openshift-cnv/kube-cni-linux-bridge-plugin-j4xj5, openshift-cnv/nmstate-handler-72lpw, openshift-cnv/virt-handler-d5hcf, openshift-controller-manager/controller-manager-xnsz2, openshift-dns/dns-default-fxvnk, openshift-dns/node-resolver-d27mx, openshift-image-registry/node-ca-75xs8, openshift-ingress-canary/ingress-canary-b6sw6, openshift-local-storage/diskmaker-discovery-zw5hp, openshift-local-storage/diskmaker-manager-v44f8, openshift-machine-api/metal3-image-cache-xh5vz, openshift-machine-config-operator/machine-config-daemon-m99n2, openshift-machine-config-operator/machine-config-server-h5sn8, openshift-monitoring/node-exporter-tqhsk, openshift-multus/multus-additional-cni-plugins-fbcqf, openshift-multus/multus-admission-controller-nbxwz, openshift-multus/multus-z9lt7, openshift-multus/network-metrics-daemon-6pbmh, openshift-network-diagnostics/network-check-target-82slr, openshift-sdn/sdn-48f4f, openshift-sdn/sdn-controller-z7drv, openshift-sriov-network-operator/network-resources-injector-rnkg9, openshift-sriov-network-operator/sriov-device-plugin-pxgw6, openshift-sriov-network-operator/sriov-network-config-daemon-8qbxj, openshift-storage/csi-cephfsplugin-f88ln, openshift-storage/csi-rbdplugin-b2hsk; deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: openshift-kube-apiserver/kube-apiserver-guard-worker-12, openshift-kube-controller-manager/kube-controller-manager-guard-worker-12, openshift-kube-scheduler/openshift-kube-scheduler-guard-worker-12, openshift-marketplace/certified-operators-m5cz9, openshift-marketplace/redhat-operators-pmcgc

I1104 11:15:58.556703   20381 daemon.go:335] evicting pod openshift-storage/rook-ceph-osd-5-86455f59c4-d4tpj
...
I1104 11:15:58.558112   20381 daemon.go:335] evicting pod openshift-storage/rook-ceph-osd-1-785c747d65-hw9lx

E1104 11:15:58.575057   20381 daemon.go:335] error when evicting pods/"rook-ceph-osd-5-86455f59c4-d4tpj" -n "openshift-storage" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
...
E1104 11:16:04.794348   20381 daemon.go:335] error when evicting pods/"rook-ceph-osd-1-785c747d65-hw9lx" -n "openshift-storage" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
...
I1104 11:17:16.672382   20381 daemon.go:320] Evicted pod openshift-storage/rook-ceph-osd-1-785c747d65-hw9lx
I1104 11:17:20.970957   20381 daemon.go:335] evicting pod openshift-storage/rook-ceph-osd-5-86455f59c4-d4tpj
E1104 11:17:21.735658   20381 daemon.go:335] error when evicting pods/"rook-ceph-osd-5-86455f59c4-d4tpj" -n "openshift-storage" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
~~~

This happened on a Friday, so customer leave it during the weekend to give enough time for the drain process to complete. This process didn't complete.

The mon Pod that didn't come back was the mon-c Pod which was drained from worker-12 node on Friday 11/4

~~~
2022-11-07T09:15:03.483731749Z 2022-11-07 09:15:03.483630 E | op-mon: failed to schedule mon "g". failed to schedule canary pod(s)
2022-11-07T09:15:03.489879925Z 2022-11-07 09:15:03.489782 I | op-mon: cleaning up canary monitor deployment "rook-ceph-mon-g-canary"
2022-11-07T09:15:03.511533755Z 2022-11-07 09:15:03.511507 I | op-mon: scaling the mon "c" deployment to replica 1
2022-11-07T09:15:03.532785605Z 2022-11-07 09:15:03.532714 E | op-mon: failed to failover mon "c". failed to place new mon on a node: failed to schedule mons
2022-11-07T09:15:03.532785605Z 2022-11-07 09:15:03.532726 I | op-mon: allow voluntary mon drain after failover
~~~

The one event you shared is the one showing worker-12 node still in unschedulable status because was hung trying to drain the second OSD from the node, thus, the node didn't reboot and the unschedulable condition was not removed until the second OSD pod was forcefully killed to allow the node to be drained and rebooted.

~~~
07:59:40	openshift-storage		rook-ceph-mon-g-canary-5c7c748cfb-c2tn8	  FailedScheduling  0/5 nodes are available: 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) didn't match pod anti-affinity rules.
~~~



Hope it clarifies the scenario, else, please let me know

Comment 35 Santosh Pillai 2023-01-02 14:29:14 UTC

(In reply to Javier Coscia from comment #34)
> Hi Santosh, Happy new year!
> 
> Thanks for looking at this.
> 
> Latest time frame when this scenario was seen in customer environment was on
> Nov 4th around 11 AM when worker-12 node was brought down 
> 
> 
> Hope it clarifies the scenario, else, please let me know

Thanks for the clarification. I'll try to setup a similar cluster (multiple OSDs on a node) and try to reproduce this behavior (comment 19) again. I tried to reproduce this last time but couldn't. I'll give it another shot. I'll get back to you about the results by tomorrow.

Comment 36 Santosh Pillai 2023-01-04 05:38:02 UTC

Tried testing this out locally on minikube. The failure domain was Node and each node had 2 osds. 

oc get pods -o wide -n rook-ceph 
NAME                                                     READY   STATUS      RESTARTS   AGE     IP               NODE           NOMINATED NODE   READINESS GATES
csi-cephfsplugin-b5b8g                                   2/2     Running     0          23m     192.168.50.115   minikube-m03   <none>           <none>
csi-cephfsplugin-provisioner-569f96898b-bcc55            5/5     Running     0          23m     10.244.3.4       minikube-m04   <none>           <none>
csi-cephfsplugin-provisioner-569f96898b-lq2rs            5/5     Running     0          23m     10.244.2.4       minikube-m03   <none>           <none>
csi-cephfsplugin-qh6r9                                   2/2     Running     0          23m     192.168.50.36    minikube-m02   <none>           <none>
csi-cephfsplugin-w4qgt                                   2/2     Running     0          23m     192.168.50.70    minikube-m04   <none>           <none>
csi-rbdplugin-25g7w                                      2/2     Running     0          23m     192.168.50.70    minikube-m04   <none>           <none>
csi-rbdplugin-d2jtn                                      2/2     Running     0          23m     192.168.50.115   minikube-m03   <none>           <none>
csi-rbdplugin-provisioner-5d4578b479-2wx4v               5/5     Running     0          23m     10.244.2.3       minikube-m03   <none>           <none>
csi-rbdplugin-provisioner-5d4578b479-9dlph               5/5     Running     0          18m     10.244.3.14      minikube-m04   <none>           <none>
csi-rbdplugin-x7t5v                                      2/2     Running     0          23m     192.168.50.36    minikube-m02   <none>           <none>
rook-ceph-crashcollector-minikube-m02-56d95c749f-jmt89   1/1     Running     0          2m      10.244.1.25      minikube-m02   <none>           <none>
rook-ceph-crashcollector-minikube-m03-58db9f774-2qhsl    1/1     Running     0          22m     10.244.2.7       minikube-m03   <none>           <none>
rook-ceph-crashcollector-minikube-m04-58fc88874f-h48jh   1/1     Running     0          21m     10.244.3.12      minikube-m04   <none>           <none>
rook-ceph-mgr-a-b8d58d8f9-g7wch                          3/3     Running     0          22m     10.244.3.7       minikube-m04   <none>           <none>
rook-ceph-mgr-b-b767d5f96-b5jw6                          3/3     Running     0          18m     10.244.2.12      minikube-m03   <none>           <none>
rook-ceph-mon-a-58f64dbb87-5s7gq                         2/2     Running     0          23m     10.244.3.6       minikube-m04   <none>           <none>
rook-ceph-mon-b-644b5ddf94-bcjhv                         2/2     Running     0          3m55s   10.244.1.24      minikube-m02   <none>           <none>
rook-ceph-mon-c-5cb6444c94-vfqv9                         2/2     Running     0          22m     10.244.2.6       minikube-m03   <none>           <none>
rook-ceph-operator-66d89f9c7c-lbvl4                      1/1     Running     0          3m55s   10.244.2.14      minikube-m03   <none>           <none>
rook-ceph-osd-0-7dc8d5dd97-txm84                         2/2     Running     0          3m29s   10.244.1.22      minikube-m02   <none>           <none>
rook-ceph-osd-1-7f84fdcdb6-4f94p                         2/2     Running     0          21m     10.244.3.10      minikube-m04   <none>           <none>
rook-ceph-osd-2-f6979b7b-n2jrt                           2/2     Running     0          21m     10.244.2.9       minikube-m03   <none>           <none>
rook-ceph-osd-3-6b76ff8696-2mk8g                         2/2     Running     0          3m55s   10.244.1.23      minikube-m02   <none>           <none>
rook-ceph-osd-4-7545d757d8-6m977                         2/2     Running     0          21m     10.244.3.11      minikube-m04   <none>           <none>
rook-ceph-osd-5-59d69d86f9-q8qg5                         2/2     Running     0          21m     10.244.2.10      minikube-m03   <none>           <none>
rook-ceph-osd-prepare-minikube-m03--1-j4m97              0/1     Completed   0          3m26s   10.244.2.16      minikube-m03   <none>           <none>
rook-ceph-osd-prepare-minikube-m04--1-d8bgv              0/1     Completed   0          3m22s   10.244.3.18      minikube-m04   <none>           <none>
rook-ceph-tools-598f4566db-989v8                         1/1     Running     0          18m     10.244.3.15      minikube-m04   <none>           <none>

----------------------------------
Observe node `minikube-m03`. It has osd-2 and osd-5 and the rook operator is also running on this node. 

oc get pods -o wide -n rook-ceph | grep minikube-m03
csi-cephfsplugin-b5b8g                                   2/2     Running     0          24m     192.168.50.115   minikube-m03   <none>           <none>
csi-cephfsplugin-provisioner-569f96898b-lq2rs            5/5     Running     0          24m     10.244.2.4       minikube-m03   <none>           <none>
csi-rbdplugin-d2jtn                                      2/2     Running     0          24m     192.168.50.115   minikube-m03   <none>           <none>
csi-rbdplugin-provisioner-5d4578b479-2wx4v               5/5     Running     0          24m     10.244.2.3       minikube-m03   <none>           <none>
rook-ceph-crashcollector-minikube-m03-58db9f774-2qhsl    1/1     Running     0          23m     10.244.2.7       minikube-m03   <none>           <none>
rook-ceph-mgr-b-b767d5f96-b5jw6                          3/3     Running     0          20m     10.244.2.12      minikube-m03   <none>           <none>
rook-ceph-mon-c-5cb6444c94-vfqv9                         2/2     Running     0          24m     10.244.2.6       minikube-m03   <none>           <none>
rook-ceph-operator-66d89f9c7c-lbvl4                      1/1     Running     0          5m12s   10.244.2.14      minikube-m03   <none>           <none>
rook-ceph-osd-2-f6979b7b-n2jrt                           2/2     Running     0          23m     10.244.2.9       minikube-m03   <none>           <none>
rook-ceph-osd-5-59d69d86f9-q8qg5                         2/2     Running     0          23m     10.244.2.10      minikube-m03   <none>           <none>
rook-ceph-osd-prepare-minikube-m03--1-j4m97              0/1     Completed   0          4m43s   10.244.2.16      minikube-m03   <none>           <none>

---------------------
PDBs before the tests:

Every 2.0s: oc get pdb -n rook-ceph                                                                                   localhost.localdomain: Wed Jan  4 10:38:21 2023

NAME                MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
rook-ceph-mgr-pdb   N/A             1                 1                     24m
rook-ceph-mon-pdb   N/A             1                 1                     25m
rook-ceph-osd       N/A             1                 1                     3m45s


----------------------------------------------------

Tests:
- Drained minikube-m03.


Every 2.0s: oc get pdb -n rook-ceph                                                                                   localhost.localdomain: Wed Jan  4 10:39:29 2023

NAME                MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
rook-ceph-mgr-pdb   N/A             1                 0                     25m
rook-ceph-mon-pdb   N/A             1                 0                     26m
rook-ceph-osd       N/A             1                 0                     4m53s

Initially no blocking PDBs were created because the rook operator was also removed. 

Once the operator got deployed on another node, it created the blocking PDBs on other nodes. 

Every 2.0s: oc get pdb -n rook-ceph                                                                                   localhost.localdomain: Wed Jan  4 10:40:35 2023

NAME                              MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
rook-ceph-mgr-pdb                 N/A             1                 1                     26m
rook-ceph-mon-pdb                 N/A             1                 0                     27m
rook-ceph-osd-host-minikube-m02   N/A             0                 0                     48s
rook-ceph-osd-host-minikube-m04   N/A             0                 0                     48s

And the node minikube-03 was drained successfully.

$ kubectl drain minikube-m03  --ignore-daemonsets --delete-local-data --force
Flag --delete-local-data has been deprecated, This option is deprecated and will be deleted. Use --delete-emptydir-data.
node/minikube-m03 cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/kindnet-l5vdb, kube-system/kube-proxy-hpjkn, rook-ceph/csi-cephfsplugin-b5b8g, rook-ceph/csi-rbdplugin-d2jtn
evicting pod rook-ceph/rook-ceph-osd-prepare-minikube-m03--1-j4m97
evicting pod rook-ceph/rook-ceph-mgr-b-b767d5f96-b5jw6
evicting pod rook-ceph/csi-cephfsplugin-provisioner-569f96898b-lq2rs
evicting pod rook-ceph/csi-rbdplugin-provisioner-5d4578b479-2wx4v
evicting pod rook-ceph/rook-ceph-crashcollector-minikube-m03-58db9f774-2qhsl
evicting pod rook-ceph/rook-ceph-operator-66d89f9c7c-lbvl4
evicting pod rook-ceph/rook-ceph-mon-c-5cb6444c94-vfqv9
evicting pod rook-ceph/rook-ceph-osd-2-f6979b7b-n2jrt
evicting pod rook-ceph/rook-ceph-osd-5-59d69d86f9-q8qg5
error when evicting pods/"rook-ceph-osd-5-59d69d86f9-q8qg5" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/rook-ceph-osd-prepare-minikube-m03--1-j4m97 evicted
evicting pod rook-ceph/rook-ceph-osd-5-59d69d86f9-q8qg5
error when evicting pods/"rook-ceph-osd-5-59d69d86f9-q8qg5" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/rook-ceph-mgr-b-b767d5f96-b5jw6 evicted
pod/csi-cephfsplugin-provisioner-569f96898b-lq2rs evicted
pod/csi-rbdplugin-provisioner-5d4578b479-2wx4v evicted
pod/rook-ceph-crashcollector-minikube-m03-58db9f774-2qhsl evicted
I0104 10:39:20.152678  150443 request.go:682] Waited for 1.0774003s due to client-side throttling, not priority and fairness, request: GET:https://192.168.50.186:8443/api/v1/namespaces/rook-ceph/pods/rook-ceph-operator-66d89f9c7c-lbvl4
pod/rook-ceph-operator-66d89f9c7c-lbvl4 evicted
pod/rook-ceph-mon-c-5cb6444c94-vfqv9 evicted
pod/rook-ceph-osd-2-f6979b7b-n2jrt evicted
evicting pod rook-ceph/rook-ceph-osd-5-59d69d86f9-q8qg5
error when evicting pods/"rook-ceph-osd-5-59d69d86f9-q8qg5" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod rook-ceph/rook-ceph-osd-5-59d69d86f9-q8qg5
error when evicting pods/"rook-ceph-osd-5-59d69d86f9-q8qg5" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod rook-ceph/rook-ceph-osd-5-59d69d86f9-q8qg5
error when evicting pods/"rook-ceph-osd-5-59d69d86f9-q8qg5" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod rook-ceph/rook-ceph-osd-5-59d69d86f9-q8qg5
error when evicting pods/"rook-ceph-osd-5-59d69d86f9-q8qg5" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod rook-ceph/rook-ceph-osd-5-59d69d86f9-q8qg5
error when evicting pods/"rook-ceph-osd-5-59d69d86f9-q8qg5" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod rook-ceph/rook-ceph-osd-5-59d69d86f9-q8qg5
pod/rook-ceph-osd-5-59d69d86f9-q8qg5 evicted
node/minikube-m03 drained

--------------

Note that it took some seconds for evicting the `rook-ceph-osd-5-59d69d86f9-q8qg5` pod because the rook operator was also down. Once the operator got deployed on another node, it created the blocking pdbs correctly. And now `rook-ceph-osd-5-59d69d86f9-q8qg5` got evicted as well. 

I tried draining multiple nodes one at a time. Same result was observed.


This was local testing with rook. Next I'll try to test this with ODF with same configuration like the customer. I'll update the results soon.

Note You need to log in before you can comment on or make changes to this bug.