Description of problem (please be detailed as possible and provide log snippets): when the OSD of a node on a 3 node OCS cluster is failed, no blocking PDBs are created on the other two nodes even when pgs are unhealthy. Ideally in such a situation, blocking PDBs should be created so node drain on the nonfailing nodes are blocked. see https://bugzilla.redhat.com/show_bug.cgi?id=1950419#c28 for more details Version of all relevant components (if applicable): oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.8.0-456.ci OpenShift Container Storage 4.8.0-456.ci Succeeded Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? NA Is there any workaround available to the best of your knowledge? Replace the failed OSD Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? yes Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: NA Steps to Reproduce: [test cluster; 3 ocs worker nodes with one OSD on each node] 1. create OCP + OCS cluster with 3 master and 3 worker nodes 2. create RBD volume and write data (this is to ensure that when OSD fails, Ps are degraded) 3. Fail one of the OSD on one node (I force detached the disk from AWS) 4. watch for any change in PDB Actual results: No blocking PDBs are created Expected results: Blocking PDBs should be created Additional info:
Santosh can you take a look?
I'll look into this week.
How's it looking?
Tested it with rook on 3 node minikube cluster by deleting the disk from virtual box. Observed the following: 1. osd pod (for which the disk was removed) when into CLBO state: ``` oc get pods -n rook-ceph -o wide | grep osd rook-ceph-osd-0-77b7459f77-l27r2 0/1 CrashLoopBackOff 5 19m 10.244.2.9 minikube-m03 <none> <none> rook-ceph-osd-1-749b5fbd74-5gg47 1/1 Running 0 19m 10.244.3.8 minikube-m04 <none> <none> rook-ceph-osd-2-85984c996d-6rxht 1/1 Running 0 19m 10.244.1.10 minikube-m02 <none> <none> rook-ceph-osd-prepare-minikube-m02-gtxrs 0/1 Completed 0 20m 10.244.1.9 minikube-m02 <none> <none> rook-ceph-osd-prepare-minikube-m03-5lr6b 0/1 Completed 0 20m 10.244.2.7 minikube-m03 <none> <none> rook-ceph-osd-prepare-minikube-m04-29hhd 0/1 Completed 0 20m 10.244.3.7 minikube-m04 <none> <none> ``` 2. Ceph Status was degraded: ``` Every 2.0s: ceph status rook-ceph-tools-78cdfd976c-dmj98: Tue Sep 7 08:39:04 2021 cluster: id: cff26850-1cfc-4542-8bed-bb19c42523e9 health: HEALTH_WARN 1 osds down 1 host (1 osds) down Degraded data redundancy: 224/672 objects degraded (33.333%), 34 pgs degraded, 81 pgs undersized 1 daemons have recently crashed services: mon: 3 daemons, quorum a,b,c (age 21m) mgr: a(active, since 20m) osd: 3 osds: 2 up (since 4m), 3 in (since 21m) rgw: 1 daemon active (1 hosts, 1 zones) data: pools: 8 pools, 81 pgs objects: 224 objects, 9.8 KiB usage: 34 MiB used, 30 GiB / 30 GiB avail pgs: 224/672 objects degraded (33.333%) 47 active+undersized 34 active+undersized+degraded ``` 3. Blocking PDBs got created successfully on other failure domains (nodes) ``` Every 2.0s: oc get pdb -n rook-ceph localhost.localdomain: Tue Sep 7 14:09:39 2021 NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE rook-ceph-mon-pdb N/A 1 1 22m rook-ceph-osd-host-minikube-m02 N/A 0 0 5m15s rook-ceph-osd-host-minikube-m04 N/A 0 0 5m15s ``` rook logs: ``` 2021-09-07 08:34:24.245486 I | clusterdisruption-controller: osd "rook-ceph-osd-0" is down but no node drain is detected 2021-09-07 08:34:24.845075 I | clusterdisruption-controller: osd is down in failure domain "minikube-m03" and pgs are not active+clean. pg health: "cluster is not fully clean. PGs: [{StateName:active+clean Count:52} {StateName:stale+active+clean Count:29}]" 2021-09-07 08:34:24.853990 I | clusterdisruption-controller: creating temporary blocking pdb "rook-ceph-osd-host-minikube-m02" with maxUnavailable=0 for "host" failure domain "minikube-m02" 2021-09-07 08:34:24.865325 I | clusterdisruption-controller: creating temporary blocking pdb "rook-ceph-osd-host-minikube-m04" with maxUnavailable=0 for "host" failure domain "minikube-m04" 2021-09-07 08:34:24.888968 I | clusterdisruption-controller: deleting the default pdb "rook-ceph-osd" with maxUnavailable=1 for all osd ``` So this looks expected. I'll test it a few more times to see if there is any inconsistency in the behavior.
I was able to reproduce this bug on Openshift AWS instance when deploying rook with OCS-operator. Possible root cause is this line of code - https://github.com/rook/rook/blob/ab728e0183c92e059af7d663b287b00e95d6e175/pkg/operator/ceph/disruption/clusterdisruption/osd.go#L525 Rook checks whether an OSD is down by checking the `ReadyReplicas` in the OSD deployement. When OSD pod is in CLBO due to disk failure, then there is a delay in updating OSD `deployement.Status.ReadyReplicas` to 0. Although the POD is CBLO but the `ReadyReplicas` count is still 1 when rook checks it. The delay causes rook to miss if any OSD is down at all and hence no blocking PDBs are created for other failure domains. And only default PDB is there with `AllowedDisruption` Count set to 0. (Note: This delay was observed in Openshift AWS instances and not on local minikube instances of rook) One possible solution is to reconcile in case `AllowedDisruption` count in the default PDB is 0
This should just be in post, right?
yeah. Sorry. Only upstream patch is ready. Should be post.
Santosh, could you open the downstream backport PR?
tested on OCP + OCS cluster detached the volumes manually from AWS console and blocking PDBs were created timidity as expected [asandler@fedora ~]$ oc get pods -A | grep osd openshift-storage rook-ceph-osd-0-55f5495846-bpmgx 1/2 CrashLoopBackOff 3 (41s ago) 87m [asandler@fedora ~]$ oc get pdb -A openshift-storage rook-ceph-osd-zone-us-east-2b N/A 0 0 73s openshift-storage rook-ceph-osd-zone-us-east-2c N/A 0 0 73s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:5086