Bug 1984396
| Summary: | Failing the only OSD of a node on a 3 node cluster doesn't create blocking PDBs | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | krishnaram Karthick <kramdoss> |
| Component: | rook | Assignee: | Santosh Pillai <sapillai> |
| Status: | CLOSED ERRATA | QA Contact: | Anna Sandler <asandler> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.8 | CC: | jelopez, madam, muagarwa, ocs-bugs, odf-bz-bot, sapillai, sostapov, tnielsen |
| Target Milestone: | --- | Keywords: | AutomationBackLog |
| Target Release: | ODF 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | v4.9.0-158.ci | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-13 17:44:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
krishnaram Karthick
2021-07-21 11:11:00 UTC
Santosh can you take a look? I'll look into this week. How's it looking? Tested it with rook on 3 node minikube cluster by deleting the disk from virtual box. Observed the following:
1. osd pod (for which the disk was removed) when into CLBO state:
```
oc get pods -n rook-ceph -o wide | grep osd
rook-ceph-osd-0-77b7459f77-l27r2 0/1 CrashLoopBackOff 5 19m 10.244.2.9 minikube-m03 <none> <none>
rook-ceph-osd-1-749b5fbd74-5gg47 1/1 Running 0 19m 10.244.3.8 minikube-m04 <none> <none>
rook-ceph-osd-2-85984c996d-6rxht 1/1 Running 0 19m 10.244.1.10 minikube-m02 <none> <none>
rook-ceph-osd-prepare-minikube-m02-gtxrs 0/1 Completed 0 20m 10.244.1.9 minikube-m02 <none> <none>
rook-ceph-osd-prepare-minikube-m03-5lr6b 0/1 Completed 0 20m 10.244.2.7 minikube-m03 <none> <none>
rook-ceph-osd-prepare-minikube-m04-29hhd 0/1 Completed 0 20m 10.244.3.7 minikube-m04 <none> <none>
```
2. Ceph Status was degraded:
```
Every 2.0s: ceph status rook-ceph-tools-78cdfd976c-dmj98: Tue Sep 7 08:39:04 2021
cluster:
id: cff26850-1cfc-4542-8bed-bb19c42523e9
health: HEALTH_WARN
1 osds down
1 host (1 osds) down
Degraded data redundancy: 224/672 objects degraded (33.333%), 34 pgs degraded, 81 pgs undersized
1 daemons have recently crashed
services:
mon: 3 daemons, quorum a,b,c (age 21m)
mgr: a(active, since 20m)
osd: 3 osds: 2 up (since 4m), 3 in (since 21m)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
pools: 8 pools, 81 pgs
objects: 224 objects, 9.8 KiB
usage: 34 MiB used, 30 GiB / 30 GiB avail
pgs: 224/672 objects degraded (33.333%)
47 active+undersized
34 active+undersized+degraded
```
3. Blocking PDBs got created successfully on other failure domains (nodes)
```
Every 2.0s: oc get pdb -n rook-ceph localhost.localdomain: Tue Sep 7 14:09:39 2021
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
rook-ceph-mon-pdb N/A 1 1 22m
rook-ceph-osd-host-minikube-m02 N/A 0 0 5m15s
rook-ceph-osd-host-minikube-m04 N/A 0 0 5m15s
```
rook logs:
```
2021-09-07 08:34:24.245486 I | clusterdisruption-controller: osd "rook-ceph-osd-0" is down but no node drain is detected
2021-09-07 08:34:24.845075 I | clusterdisruption-controller: osd is down in failure domain "minikube-m03" and pgs are not active+clean. pg health: "cluster is not fully clean. PGs: [{StateName:active+clean Count:52} {StateName:stale+active+clean Count:29}]"
2021-09-07 08:34:24.853990 I | clusterdisruption-controller: creating temporary blocking pdb "rook-ceph-osd-host-minikube-m02" with maxUnavailable=0 for "host" failure domain "minikube-m02"
2021-09-07 08:34:24.865325 I | clusterdisruption-controller: creating temporary blocking pdb "rook-ceph-osd-host-minikube-m04" with maxUnavailable=0 for "host" failure domain "minikube-m04"
2021-09-07 08:34:24.888968 I | clusterdisruption-controller: deleting the default pdb "rook-ceph-osd" with maxUnavailable=1 for all osd
```
So this looks expected.
I'll test it a few more times to see if there is any inconsistency in the behavior.
I was able to reproduce this bug on Openshift AWS instance when deploying rook with OCS-operator. Possible root cause is this line of code - https://github.com/rook/rook/blob/ab728e0183c92e059af7d663b287b00e95d6e175/pkg/operator/ceph/disruption/clusterdisruption/osd.go#L525 Rook checks whether an OSD is down by checking the `ReadyReplicas` in the OSD deployement. When OSD pod is in CLBO due to disk failure, then there is a delay in updating OSD `deployement.Status.ReadyReplicas` to 0. Although the POD is CBLO but the `ReadyReplicas` count is still 1 when rook checks it. The delay causes rook to miss if any OSD is down at all and hence no blocking PDBs are created for other failure domains. And only default PDB is there with `AllowedDisruption` Count set to 0. (Note: This delay was observed in Openshift AWS instances and not on local minikube instances of rook) One possible solution is to reconcile in case `AllowedDisruption` count in the default PDB is 0 This should just be in post, right? yeah. Sorry. Only upstream patch is ready. Should be post. Santosh, could you open the downstream backport PR? tested on OCP + OCS cluster detached the volumes manually from AWS console and blocking PDBs were created timidity as expected [asandler@fedora ~]$ oc get pods -A | grep osd openshift-storage rook-ceph-osd-0-55f5495846-bpmgx 1/2 CrashLoopBackOff 3 (41s ago) 87m [asandler@fedora ~]$ oc get pdb -A openshift-storage rook-ceph-osd-zone-us-east-2b N/A 0 0 73s openshift-storage rook-ceph-osd-zone-us-east-2c N/A 0 0 73s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:5086 |