Bug 2091623
| Summary: | [MS Tracker] ceph status is in Warning after provider add-on upgrade from v2.0.1 to v2.0.2 | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | suchita <sgatfane> | |
| Component: | odf-managed-service | Assignee: | Nobody <nobody> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Neha Berry <nberry> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.10 | CC: | aeyal, dbindra, fbalak, ocs-bugs, odf-bz-bot, rchikatw, sapillai, ykukreja | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2099212 (view as bug list) | Environment: | ||
| Last Closed: | 2023-03-13 11:58:44 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2099212 | |||
Possible root cause:
2022-05-30 11:38:45.089458 I | clusterdisruption-controller: all "zone" failure domains: [us-east-1a us-east-1b us-east-1c]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+clean Count:1305} {StateName:active+recovery_wait+undersized+degraded+remapped Count:6} {StateName:active+recovering+undersized+remapped Count:2}]"
2022-05-30 11:39:16.400384 I | clusterdisruption-controller: all PGs are active+clean. Restoring default OSD pdb settings
2022-05-30 11:39:16.400402 I | clusterdisruption-controller: creating the default pdb "rook-ceph-osd" with maxUnavailable=1 for all osd
2022-05-30 11:39:16.431569 I | clusterdisruption-controller: deleting temporary blocking pdb with "rook-ceph-osd-zone-us-east-1a" with maxUnavailable=0 for "zone" failure domain "us-east-1a"
2022-05-30 11:39:16.437856 I | clusterdisruption-controller: deleting temporary blocking pdb with "rook-ceph-osd-zone-us-east-1b" with maxUnavailable=0 for "zone" failure domain "us-east-1b"
2022-05-30 11:39:16.442985 I | clusterdisruption-controller: deleting temporary blocking pdb with "rook-ceph-osd-zone-us-east-1c" with maxUnavailable=0 for "zone" failure domain "us-east-1c"
2022-05-30 11:39:16.454078 I | clusterdisruption-controller: reconciling osd pdb reconciler as the allowed disruptions in default pdb is 0
2022-05-30 11:39:47.696835 I | clusterdisruption-controller: all "zone" failure domains: [us-east-1a us-east-1b us-east-1c]. osd is down in failure domain: "". active node drains: false. pg health: "cluster is not fully clean. PGs: [{StateName:active+clean Count:1312} {StateName:active+recovering+remapped Count:1}]"
`11:39:16.400384` suggests that all the pgs were active+clean. So default PDB with `maxUnavailable=1` (allowedDiruption=1) was created and temporary blocking PDBs were removed.
But right after that, `11:39:16.454078` suggests that `AllowedDisruptions` in the OSD PDB resource is 0.
So either
1. The PDB took some time to update the `AllowedDisruption` value back to 1.
2. Or an OSD went down temporarily for a very small duration of time
IMO, `1` is mostly likely to happen than `2`.
As a result, another the controller reconciled again. `11:39:47.696835` suggests that PGs were not acive+clean during this reconcile. So it added `no-out` flag on the failure domain `us-east-1a`
```
ceph osd dump -f json
"crush_node_flags":{"us-east-1a":["noout"]},"device_class_flags":{},"stretch_mode":{"stretch_mode_enabled":false,"stretch_bucket_count":0,"degraded_stretch_mode":0,"recovering_stretch_mode":0,"stretch_mode_bucket":0}}
```
As a result of this flag we are seeing this warning message in ceph status.
Workaround:
Manually unset the flag:
`ceph osd unset noout us-east-1a`
This looks like negative case and won't be reproducible every time. So not a blocker and lower priority, IMO.
But should be handled in rook.
(In reply to Santosh Pillai from comment #2) > > Workaround: > Manually unset the flag: > > `ceph osd unset noout us-east-1a` The correct command is `ceph osd unset-group noout us-east-1a` > > This looks like negative case and won't be reproducible every time. So not a > blocker and lower priority, IMO. > But should be handled in rook. Seems like a candidate for an SOP @ykukreja Hi Sahina,
I agree that this would be worth-tracking via an SOP.
Though right now, it seems very detailed and jargonised from my pov. Therefore, I'd suggest someone from the ODF team to be nominated to draft an SOP considering the intricacies and granular product-level details associated with this problem (just like all the other product-level ODF SOPs were written in the past).
Nothing too descriptive. Just the following items:
- Trigger of the problem/alert - (CephClusterWarningState it seems)
- How to recognize / confirm this problem - output of `oc rsh -n openshift-storage $(oc get pods -o wide -n openshift-storage|grep tool|awk '{print$1}') ceph status ` I guess
- Troubleshooting - I have a few doubts around this: Should the ceph osd unset-group command executed for each failure domain or just one failure domain? How the recognize those failure domains?
And considering the fact that MTSRE would be the reader/customer of this SOP, I'll review it and see if I understand it properly enough to ensure that if this problem occurs, I'd be able to follow the SOP properly or not.
How does that sound?
(In reply to Yashvardhan Kukreja from comment #5) > - Troubleshooting - I have a few doubts around this: Should the ceph osd > unset-group command executed for each failure domain or just one failure > domain? How the recognize those failure domains? I guess I get this, first fetch the details of the crush node flags ``` ceph osd dump -f json ``` under those flags, fetch the failure domain associated with `["noout"]` For example, for the following output ``` "crush_node_flags":{"us-east-1a":["noout"]},"device_class_flags":{},"stretch_mode":{"stretch_mode_enabled":false,"stretch_bucket_count":0,"degraded_stretch_mode":0,"recovering_stretch_mode":0,"stretch_mode_bucket":0}} ``` "us-east-1" would be that failure domain. Finally, execute the `ceph osd unset-group noout <failure-domain>` command for each of those failure-domains one-by-one OR Should it be executed like `ceph osd unset-group noout <failure-domain-1> <failure-domain-2> <failure-domain-3>`? Also, where would this ceph command be executed? Is it going to be in the provider cluster under the toolbox pod ? (In reply to Yashvardhan Kukreja from comment #6) > (In reply to Yashvardhan Kukreja from comment #5) > > - Troubleshooting - I have a few doubts around this: Should the ceph osd > > unset-group command executed for each failure domain or just one failure > > domain? How the recognize those failure domains? > > I guess I get this, > first fetch the details of the crush node flags > > ``` > ceph osd dump -f json > ``` > under those flags, fetch the failure domain associated with `["noout"]` > For example, for the following output > ``` > "crush_node_flags":{"us-east-1a":["noout"]},"device_class_flags":{}, > "stretch_mode":{"stretch_mode_enabled":false,"stretch_bucket_count":0, > "degraded_stretch_mode":0,"recovering_stretch_mode":0,"stretch_mode_bucket": > 0}} > ``` > "us-east-1" would be that failure domain. > > Finally, execute the `ceph osd unset-group noout <failure-domain>` command > for each of those failure-domains one-by-one > OR > Should it be executed like `ceph osd unset-group noout <failure-domain-1> > <failure-domain-2> <failure-domain-3>`? > > Also, where would this ceph command be executed? Is it going to be in the > provider cluster under the toolbox pod ? Yes we executed in the provider cluster toolbox pod to come out of health warn state Moving to ON_QA as the tracker bug is in closed state Last upgrade path was done from deployer 2.0.9 to 2.0.10 without any issues. --> VERIFIED (In reply to Filip Balák from comment #16) > Last upgrade path was done from deployer 2.0.9 to 2.0.10 without any issues. > --> VERIFIED As per the comment, this issue is fixed. Please feel free to reopen if repro. Thanks |
Description of problem: We have 2 Setups with appliance mode and appliance mode clusters with a private link. Both the provider cluster got upgraded. However one of the provider cluster cephs status is in a warning state. it seems NOUT flag is not removed Version-Release number of selected component (if applicable): ========CSV ====== NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.2 NooBaa Operator 4.10.2 mcg-operator.v4.10.1 Succeeded ocs-operator.v4.10.2 OpenShift Container Storage 4.10.2 ocs-operator.v4.10.0 Succeeded ocs-osd-deployer.v2.0.2 OCS OSD Deployer 2.0.2 ocs-osd-deployer.v2.0.1 Succeeded odf-csi-addons-operator.v4.10.2 CSI Addons 4.10.2 odf-csi-addons-operator.v4.10.0 Succeeded odf-operator.v4.10.2 OpenShift Data Foundation 4.10.2 odf-operator.v4.10.0 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.418-6459408 Route Monitor Operator 0.1.418-6459408 route-monitor-operator.v0.1.408-c2256a2 Succeeded -------------- $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.14 True False 8h Error while reconciling 4.10.14: the cluster operator insights is degraded How reproducible: 1/2 Steps to Reproduce: 1. Create an appliance provider cluster with 2 consumer 2. Upgrade ODF deployer version 3. Actual results: =====ceph status ==== Mon May 30 02:14:34 PM UTC 2022 cluster: id: 117ddfde-1253-49f8-8709-9a097124651e health: HEALTH_WARN 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set services: mon: 3 daemons, quorum a,b,c (age 8h) mgr: a(active, since 8h) mds: 1/1 daemons up, 1 hot standby osd: 15 osds: 15 up (since 3h), 15 in (since 8h) data: volumes: 1/1 healthy pools: 9 pools, 1313 pgs objects: 69.54k objects, 270 GiB usage: 787 GiB used, 59 TiB / 60 TiB avail pgs: 1313 active+clean io: client: 153 KiB/s rd, 167 KiB/s wr, 39 op/s rd, 38 op/s wr Expected results: Ceph health should be OK Additional info: --------------Few OC output --------------------- ======= storagecluster ========== NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 8h Ready 2022-05-30T06:03:29Z -------------- ======= cephcluster ========== NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL ocs-storagecluster-cephcluster /var/lib/rook 3 8h Ready Cluster created successfully HEALTH_WARN ======= cluster health status===== HEALTH_WARN 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set Mon May 30 02:14:15 PM UTC 2022 ======ceph osd tree === ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 60.00000 root default -5 60.00000 region us-east-1 -4 20.00000 zone us-east-1a -3 4.00000 host default-0-data-0tdh8b 0 ssd 4.00000 osd.0 up 1.00000 1.00000 -39 4.00000 host default-1-data-0ps629 5 ssd 4.00000 osd.5 up 1.00000 1.00000 -29 4.00000 host default-1-data-4t9tmm 4 ssd 4.00000 osd.4 up 1.00000 1.00000 -31 4.00000 host default-2-data-0fr967 3 ssd 4.00000 osd.3 up 1.00000 1.00000 -25 4.00000 host default-2-data-1qkmdw 2 ssd 4.00000 osd.2 up 1.00000 1.00000 -10 20.00000 zone us-east-1b -9 4.00000 host default-0-data-1z5w9r 1 ssd 4.00000 osd.1 up 1.00000 1.00000 -35 4.00000 host default-0-data-28rhlr 6 ssd 4.00000 osd.6 up 1.00000 1.00000 -37 4.00000 host default-0-data-3pc46c 8 ssd 4.00000 osd.8 up 1.00000 1.00000 -27 4.00000 host default-1-data-1p4k2t 9 ssd 4.00000 osd.9 up 1.00000 1.00000 -33 4.00000 host default-2-data-4d6pw2 7 ssd 4.00000 osd.7 up 1.00000 1.00000 -14 20.00000 zone us-east-1c -17 4.00000 host default-0-data-4hj5gs 12 ssd 4.00000 osd.12 up 1.00000 1.00000 -13 4.00000 host default-1-data-25brsh 14 ssd 4.00000 osd.14 up 1.00000 1.00000 -19 4.00000 host default-1-data-39gvhz 11 ssd 4.00000 osd.11 up 1.00000 1.00000 -21 4.00000 host default-2-data-2pdzz6 10 ssd 4.00000 osd.10 up 1.00000 1.00000 -23 4.00000 host default-2-data-3s5955 13 ssd 4.00000 osd.13 up 1.00000 1.00000 =========ceph versions======== { "mon": { "ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)": 1 }, "osd": { "ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)": 15 }, "mds": { "ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)": 2 }, "overall": { "ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)": 21 } } =========rados df===== POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR cephblockpool-storageconsumer-14d18631-37ea-4fbe-baca-f102ea34c6cf 47 KiB 5 0 15 0 0 0 158 133 KiB 693008 11 GiB 0 B 0 B cephblockpool-storageconsumer-53ca4e1e-e945-422e-8333-9dacf8c4029d 12 KiB 1 0 3 0 0 0 0 0 B 0 0 B 0 B 0 B cephblockpool-storageconsumer-6a8c7284-bb38-4f25-bd68-9e20f23773df 12 KiB 1 0 3 0 0 0 0 0 B 0 0 B 0 B 0 B cephblockpool-storageconsumer-a2233267-d7f3-449b-b33c-f9ed1e75f1d5 411 GiB 38410 0 115230 0 0 0 189887 731 MiB 239417 1.0 GiB 0 B 0 B cephblockpool-storageconsumer-ddc412aa-6102-4292-9e2f-05ce45c5ea68 12 KiB 1 0 3 0 0 0 0 0 B 0 0 B 0 B 0 B device_health_metrics 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B ocs-storagecluster-cephblockpool 12 KiB 1 0 3 0 0 0 0 0 B 0 0 B 0 B 0 B ocs-storagecluster-cephfilesystem-data0 364 GiB 31066 0 93198 0 0 0 135215 528 MiB 133747 522 MiB 0 B 0 B ocs-storagecluster-cephfilesystem-metadata 155 MiB 55 0 165 0 0 0 65373 87 MiB 16168 63 MiB 0 B 0 B total_objects 69540 total_used 787 GiB total_avail 59 TiB total_space 60 TiB =========ceph df===== --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 60 TiB 59 TiB 787 GiB 787 GiB 1.28 TOTAL 60 TiB 59 TiB 787 GiB 787 GiB 1.28 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 0 B 0 0 B 0 17 TiB ocs-storagecluster-cephblockpool 2 128 19 B 1 12 KiB 0 17 TiB ocs-storagecluster-cephfilesystem-metadata 3 32 51 MiB 55 155 MiB 0 17 TiB ocs-storagecluster-cephfilesystem-data0 4 512 121 GiB 31.07k 364 GiB 0.70 17 TiB cephblockpool-storageconsumer-6a8c7284-bb38-4f25-bd68-9e20f23773df 5 128 19 B 1 12 KiB 0 17 TiB cephblockpool-storageconsumer-ddc412aa-6102-4292-9e2f-05ce45c5ea68 6 128 19 B 1 12 KiB 0 17 TiB cephblockpool-storageconsumer-53ca4e1e-e945-422e-8333-9dacf8c4029d 7 128 19 B 1 12 KiB 0 17 TiB cephblockpool-storageconsumer-a2233267-d7f3-449b-b33c-f9ed1e75f1d5 8 128 137 GiB 38.41k 411 GiB 0.79 17 TiB cephblockpool-storageconsumer-14d18631-37ea-4fbe-baca-f102ea34c6cf 9 128 12 KiB 5 47 KiB 0 17 TiB =========ceph osd df===== ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 4.00000 1.00000 4 TiB 48 GiB 47 GiB 11 KiB 1.2 GiB 4.0 TiB 1.17 0.92 267 up 5 ssd 4.00000 1.00000 4 TiB 58 GiB 57 GiB 14 KiB 780 MiB 3.9 TiB 1.41 1.10 271 up 4 ssd 4.00000 1.00000 4 TiB 46 GiB 46 GiB 16 KiB 345 MiB 4.0 TiB 1.13 0.88 246 up 3 ssd 4.00000 1.00000 4 TiB 57 GiB 57 GiB 21 KiB 751 MiB 3.9 TiB 1.40 1.10 254 up 2 ssd 4.00000 1.00000 4 TiB 53 GiB 52 GiB 22 KiB 983 MiB 3.9 TiB 1.29 1.01 275 up 1 ssd 4.00000 1.00000 4 TiB 54 GiB 53 GiB 16 KiB 599 MiB 3.9 TiB 1.32 1.03 264 up 6 ssd 4.00000 1.00000 4 TiB 52 GiB 51 GiB 13 KiB 935 MiB 3.9 TiB 1.26 0.98 260 up 8 ssd 4.00000 1.00000 4 TiB 50 GiB 49 GiB 17 KiB 1.0 GiB 4.0 TiB 1.23 0.96 256 up 9 ssd 4.00000 1.00000 4 TiB 53 GiB 52 GiB 16 KiB 742 MiB 3.9 TiB 1.29 1.01 257 up 7 ssd 4.00000 1.00000 4 TiB 54 GiB 53 GiB 23 KiB 584 MiB 3.9 TiB 1.32 1.03 276 up 12 ssd 4.00000 1.00000 4 TiB 50 GiB 49 GiB 19 KiB 667 MiB 4.0 TiB 1.22 0.95 267 up 14 ssd 4.00000 1.00000 4 TiB 55 GiB 55 GiB 12 KiB 577 MiB 3.9 TiB 1.35 1.05 265 up 11 ssd 4.00000 1.00000 4 TiB 57 GiB 56 GiB 12 KiB 779 MiB 3.9 TiB 1.39 1.08 264 up 10 ssd 4.00000 1.00000 4 TiB 52 GiB 51 GiB 21 KiB 812 MiB 3.9 TiB 1.26 0.98 261 up 13 ssd 4.00000 1.00000 4 TiB 49 GiB 48 GiB 17 KiB 903 MiB 4.0 TiB 1.19 0.93 256 up TOTAL 60 TiB 787 GiB 776 GiB 256 KiB 11 GiB 59 TiB 1.28 MIN/MAX VAR: 0.88/1.10 STDDEV: 0.08 ====ceph fs status=== ocs-storagecluster-cephfilesystem - 20 clients ================================= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active ocs-storagecluster-cephfilesystem-a Reqs: 0 /s 156 159 58 100 0-s standby-replay ocs-storagecluster-cephfilesystem-b Evts: 0 /s 146 149 48 0 POOL TYPE USED AVAIL ocs-storagecluster-cephfilesystem-metadata metadata 154M 16.7T ocs-storagecluster-cephfilesystem-data0 data 363G 16.7T MDS version: ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)