Bug 2117398
| Summary: | The ceph is in warning state right after deployment though it has enough space. | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Alexander Chuzhoy <sasha> |
| Component: | ceph | Assignee: | Neha Ojha <nojha> |
| ceph sub component: | RADOS | QA Contact: | Elad <ebenahar> |
| Status: | CLOSED WORKSFORME | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | bniver, madam, muagarwa, ocs-bugs, odf-bz-bot, pdhange, pdhiran |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-09-26 13:55:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The issue is reproducing:
cluster:
id: 599932f3-90b2-49c4-a6c0-a2531dd2e694
health: HEALTH_WARN
Degraded data redundancy: 8899/299829 objects degraded (2.968%), 8 pgs degraded, 10 pgs undersized
services:
mon: 3 daemons, quorum a,b,c (age 102m)
mgr: a(active, since 101m)
mds: 1/1 daemons up, 1 hot standby
osd: 9 osds: 9 up (since 101m), 9 in (since 101m); 10 remapped pgs
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 11 pools, 369 pgs
objects: 99.94k objects, 213 GiB
usage: 655 GiB used, 425 GiB / 1.1 TiB avail
pgs: 8899/299829 objects degraded (2.968%)
5961/299829 objects misplaced (1.988%)
359 active+clean
8 active+recovery_wait+undersized+degraded+remapped
2 active+recovering+undersized+remapped
io:
client: 30 MiB/s rd, 1.1 MiB/s wr, 36 op/s rd, 66 op/s wr
recovery: 28 MiB/s, 12 objects/s
Came to check the cluster after a few hours and the status got fixed:
cluster:
id: 599932f3-90b2-49c4-a6c0-a2531dd2e694
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 13h)
mgr: a(active, since 13h)
mds: 1/1 daemons up, 1 hot standby
osd: 9 osds: 9 up (since 13h), 9 in (since 13h)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 11 pools, 369 pgs
objects: 99.98k objects, 214 GiB
usage: 649 GiB used, 431 GiB / 1.1 TiB avail
pgs: 369 active+clean
io:
client: 852 B/s rd, 19 KiB/s wr, 1 op/s rd, 1 op/s wr
@sasha The cluster was in HEALTH_WARN because one of the OSD reached 75% (nearfull ratio is set to 0.75 by default in ODF) of it's capacity. This is expected behavior. Check the "ceph osd df"/"ceph osd df tree" output to track individual OSD %USE (percentage usage). Check ceph documentation [1] on Nearfull OSDs : [1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/troubleshooting_guide/index#near-full-osds_diag Can you provide "ceph osd df tree" output if the cluster is still available ? Let me know if you have any further queries. Feel free to close this BZ as not a bug. Hi Prashant. Is the data redundancy message related to the same? " Degraded data redundancy: 811/295932 objects degraded (0.274%), 2 pgs degraded, 3 pgs undersized " Seems like it also raises warning. (In reply to Alexander Chuzhoy from comment #5) > Hi Prashant. > > Is the data redundancy message related to the same? > " > Degraded data redundancy: 811/295932 objects degraded (0.274%), 2 pgs > degraded, 3 pgs undersized > " > > Seems like it also raises warning. The data redundancy message is related to recovery and warning will get cleared once recovery completes. The HEALTH_WARN because of nearfull OSD notifies end user to add more OSDs to the cluster as cluster is getting full. Hi Alexander, Are we good to close this BZ ? Let me know if you have any further questions. Hi Prashant. Let's close it. Thank you :) |
Versions: mcg-operator.v4.10.5 ocs-operator.v4.10.5 odf-csi-addons-operator.v4.10.5 odf-operator.v4.10.5 OCP: 4.10.24 Deployed ODF in a cluster deployed on KVM virtual machines, where each vm has 3 disks of 120G for ODF. This in total we have 120 * 9 GB of space. With approximately 60% usage, we see 1 nearfull osd: cluster:", id: 5acf4f06-4381-446c-b64f-4acfb26006ea", health: HEALTH_WARN", 1 nearfull osd(s)", Degraded data redundancy: 811/295932 objects degraded (0.274%), 2 pgs degraded, 3 pgs undersized", 11 pool(s) nearfull", ", services:", mon: 3 daemons, quorum a,b,c (age 2h)", mgr: a(active, since 2h)", mds: 1/1 daemons up, 1 hot standby", osd: 9 osds: 9 up (since 2h), 9 in (since 2h); 3 remapped pgs", rgw: 1 daemon active (1 hosts, 1 zones)", ", data:", volumes: 1/1 healthy", pools: 11 pools, 369 pgs", objects: 98.64k objects, 211 GiB", usage: 646 GiB used, 434 GiB / 1.1 TiB avail", pgs: 811/295932 objects degraded (0.274%)", 1971/295932 objects misplaced (0.666%)", 366 active+clean", 2 active+recovery_wait+undersized+degraded+remapped", 1 active+recovering+undersized+remapped", ", io:", client: 19 MiB/s rd, 1.2 MiB/s wr, 30 op/s rd, 119 op/s wr", recovery: 19 MiB/s, 9 objects/s", ", progress:", Global Recovery Event (36m)", [===========================.] (remaining: 18s)", " ~