Versions: mcg-operator.v4.10.5 ocs-operator.v4.10.5 odf-csi-addons-operator.v4.10.5 odf-operator.v4.10.5 OCP: 4.10.24 Deployed ODF in a cluster deployed on KVM virtual machines, where each vm has 3 disks of 120G for ODF. This in total we have 120 * 9 GB of space. With approximately 60% usage, we see 1 nearfull osd: cluster:", id: 5acf4f06-4381-446c-b64f-4acfb26006ea", health: HEALTH_WARN", 1 nearfull osd(s)", Degraded data redundancy: 811/295932 objects degraded (0.274%), 2 pgs degraded, 3 pgs undersized", 11 pool(s) nearfull", ", services:", mon: 3 daemons, quorum a,b,c (age 2h)", mgr: a(active, since 2h)", mds: 1/1 daemons up, 1 hot standby", osd: 9 osds: 9 up (since 2h), 9 in (since 2h); 3 remapped pgs", rgw: 1 daemon active (1 hosts, 1 zones)", ", data:", volumes: 1/1 healthy", pools: 11 pools, 369 pgs", objects: 98.64k objects, 211 GiB", usage: 646 GiB used, 434 GiB / 1.1 TiB avail", pgs: 811/295932 objects degraded (0.274%)", 1971/295932 objects misplaced (0.666%)", 366 active+clean", 2 active+recovery_wait+undersized+degraded+remapped", 1 active+recovering+undersized+remapped", ", io:", client: 19 MiB/s rd, 1.2 MiB/s wr, 30 op/s rd, 119 op/s wr", recovery: 19 MiB/s, 9 objects/s", ", progress:", Global Recovery Event (36m)", [===========================.] (remaining: 18s)", " ~
The issue is reproducing: cluster: id: 599932f3-90b2-49c4-a6c0-a2531dd2e694 health: HEALTH_WARN Degraded data redundancy: 8899/299829 objects degraded (2.968%), 8 pgs degraded, 10 pgs undersized services: mon: 3 daemons, quorum a,b,c (age 102m) mgr: a(active, since 101m) mds: 1/1 daemons up, 1 hot standby osd: 9 osds: 9 up (since 101m), 9 in (since 101m); 10 remapped pgs rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 369 pgs objects: 99.94k objects, 213 GiB usage: 655 GiB used, 425 GiB / 1.1 TiB avail pgs: 8899/299829 objects degraded (2.968%) 5961/299829 objects misplaced (1.988%) 359 active+clean 8 active+recovery_wait+undersized+degraded+remapped 2 active+recovering+undersized+remapped io: client: 30 MiB/s rd, 1.1 MiB/s wr, 36 op/s rd, 66 op/s wr recovery: 28 MiB/s, 12 objects/s
Came to check the cluster after a few hours and the status got fixed: cluster: id: 599932f3-90b2-49c4-a6c0-a2531dd2e694 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 13h) mgr: a(active, since 13h) mds: 1/1 daemons up, 1 hot standby osd: 9 osds: 9 up (since 13h), 9 in (since 13h) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 369 pgs objects: 99.98k objects, 214 GiB usage: 649 GiB used, 431 GiB / 1.1 TiB avail pgs: 369 active+clean io: client: 852 B/s rd, 19 KiB/s wr, 1 op/s rd, 1 op/s wr
@sasha The cluster was in HEALTH_WARN because one of the OSD reached 75% (nearfull ratio is set to 0.75 by default in ODF) of it's capacity. This is expected behavior. Check the "ceph osd df"/"ceph osd df tree" output to track individual OSD %USE (percentage usage). Check ceph documentation [1] on Nearfull OSDs : [1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/troubleshooting_guide/index#near-full-osds_diag Can you provide "ceph osd df tree" output if the cluster is still available ? Let me know if you have any further queries. Feel free to close this BZ as not a bug.
Hi Prashant. Is the data redundancy message related to the same? " Degraded data redundancy: 811/295932 objects degraded (0.274%), 2 pgs degraded, 3 pgs undersized " Seems like it also raises warning.
(In reply to Alexander Chuzhoy from comment #5) > Hi Prashant. > > Is the data redundancy message related to the same? > " > Degraded data redundancy: 811/295932 objects degraded (0.274%), 2 pgs > degraded, 3 pgs undersized > " > > Seems like it also raises warning. The data redundancy message is related to recovery and warning will get cleared once recovery completes. The HEALTH_WARN because of nearfull OSD notifies end user to add more OSDs to the cluster as cluster is getting full.
Hi Alexander, Are we good to close this BZ ? Let me know if you have any further questions.
Hi Prashant. Let's close it. Thank you :)