Bug 1896959
Summary: | PG autoscaler did not respond to storage pool consuming space | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Ben England <bengland> | ||||
Component: | ceph | Assignee: | Neha Ojha <nojha> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Raz Tamir <ratamir> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.2 | CC: | assingh, bbenshab, bniver, ebenahar, jdurgin, jhopper, kramdoss, madam, muagarwa, nojha, ocs-bugs, owasserm | ||||
Target Milestone: | --- | Keywords: | AutomationBackLog, Performance | ||||
Target Release: | --- | Flags: | kramdoss:
needinfo+
|
||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-01-22 09:01:12 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Ben England
2020-11-11 22:44:44 UTC
Most likely a Ceph issue, that I don't see us fixing right now for 4.6.0 (also unlikely a new issue) - deferring to 4.7. This was already seen before and fixed as part of 1782756. Therefore, most likely a regression. In addition to that, the situation is exposed more easily for CNV - bug 1897351 (In reply to Elad from comment #3) > This was already seen before and fixed as part of 1782756. The correct one is https://bugzilla.redhat.com/show_bug.cgi?id=1797918 I was able to reproduce this issue by performing the following steps 1) On a 3 node OCS cluster with one 512 GB OSD per node, fill up the capacity 2) Add capacity; 1 more OSD per node 3) Follow https://access.redhat.com/solutions/3001761 so that recovery IOs can start 4) Allow the rebalance to new OSDs to complete [At this point, we could see that the PGs are not quite equally distributed] cat ceph_osd_df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 1 ssd 0.50000 1.00000 512 GiB 380 GiB 379 GiB 43 KiB 1.5 GiB 132 GiB 74.27 1.49 189 up 5 ssd 0.50000 1.00000 512 GiB 129 GiB 128 GiB 20 KiB 1024 MiB 383 GiB 25.15 0.50 78 up 2 ssd 0.50000 1.00000 512 GiB 388 GiB 386 GiB 39 KiB 1.5 GiB 124 GiB 75.70 1.52 198 up 4 ssd 0.50000 1.00000 512 GiB 129 GiB 128 GiB 27 KiB 1024 MiB 383 GiB 25.28 0.51 72 up 0 ssd 0.50000 1.00000 512 GiB 366 GiB 364 GiB 47 KiB 1.5 GiB 146 GiB 71.43 1.43 167 up 3 ssd 0.50000 1.00000 512 GiB 140 GiB 139 GiB 20 KiB 1024 MiB 372 GiB 27.35 0.55 97 up TOTAL 3 TiB 1.5 TiB 1.5 TiB 199 KiB 7.5 GiB 1.5 TiB 49.86 MIN/MAX VAR: 0.50/1.52 STDDEV: 23.98 5) write more data to fill up the cluster. [At this point, one of the OSDs hit full ratio way ahead of the other (new) OSDs] Created attachment 1731025 [details] must_gather for comment#6 (In reply to krishnaram Karthick from comment #7) > Created attachment 1731025 [details] > must_gather for comment#6 Thanks for the detailed data Karthick. In your case, the cluster was not rebalanced yet, there were still many backfilling pgs: data: pools: 3 pools, 288 pgs objects: 130.90k objects, 505 GiB usage: 1.5 TiB used, 1.5 TiB / 3 TiB avail pgs: 96496/392685 objects misplaced (24.573%) 232 active+clean 54 active+remapped+backfill_wait 2 active+remapped+backfilling io: client: 853 B/s rd, 303 MiB/s wr, 1 op/s rd, 94 op/s wr recovery: 15 MiB/s, 3 objects/s The balancer won't run until <5% of objects are misplaced. As you can see at this point in time, nearly 25% of the objects were still being rebalanced. Thus in this case, the balancer hasn't run at all. You can verify this by observing that there are no upmaps in ceph_osd_dump, which is how the balancer redistributes pgs. What happened after this must-gather was taken? I'd expect backfill to complete, and then the balancer to redistribute pgs as needed at that point. (In reply to Josh Durgin from comment #8) > (In reply to krishnaram Karthick from comment #7) > > Created attachment 1731025 [details] > > must_gather for comment#6 > > Thanks for the detailed data Karthick. In your case, the cluster was not > rebalanced yet, there were still many backfilling pgs: > > data: > pools: 3 pools, 288 pgs > objects: 130.90k objects, 505 GiB > usage: 1.5 TiB used, 1.5 TiB / 3 TiB avail > pgs: 96496/392685 objects misplaced (24.573%) > 232 active+clean > 54 active+remapped+backfill_wait > 2 active+remapped+backfilling > > io: > client: 853 B/s rd, 303 MiB/s wr, 1 op/s rd, 94 op/s wr > recovery: 15 MiB/s, 3 objects/s > > The balancer won't run until <5% of objects are misplaced. As you can see at > this point in time, nearly 25% of the objects were still being rebalanced. > Thus in this case, the balancer hasn't run at all. You can verify this by > observing that there are no upmaps in ceph_osd_dump, which is how the > balancer redistributes pgs. > > What happened after this must-gather was taken? I'd expect backfill to > complete, and then the balancer to redistribute pgs as needed at that point. Thanks Josh. I don't have the cluster anymore. QE's AWS cluster automatically get deleted after 12 hours. I'll rerun this test and update once I have the results. I reran the test and waited for a long time. I see that this time the OSDs are more evenly distributed. After expanding to 6 OSDs: ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 2 ssd 0.50000 1.00000 512 GiB 216 GiB 215 GiB 72 KiB 1.5 GiB 296 GiB 42.24 0.96 133 up 3 ssd 0.50000 1.00000 512 GiB 236 GiB 235 GiB 27 KiB 1024 MiB 276 GiB 46.01 1.04 155 up 1 ssd 0.50000 1.00000 512 GiB 226 GiB 224 GiB 72 KiB 1.5 GiB 286 GiB 44.04 1.00 146 up 4 ssd 0.50000 1.00000 512 GiB 226 GiB 225 GiB 27 KiB 1024 MiB 286 GiB 44.21 1.00 140 up 0 ssd 0.50000 1.00000 512 GiB 239 GiB 238 GiB 75 KiB 1.6 GiB 273 GiB 46.75 1.06 149 up 5 ssd 0.50000 1.00000 512 GiB 213 GiB 212 GiB 45 KiB 1024 MiB 299 GiB 41.51 0.94 139 up TOTAL 3 TiB 1.3 TiB 1.3 TiB 321 KiB 7.6 GiB 1.7 TiB 44.13 MIN/MAX VAR: 0.94/1.06 STDDEV: 1.86 After expanding to 9 OSDs: ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 2 ssd 0.50000 1.00000 512 GiB 293 GiB 292 GiB 42 KiB 1.4 GiB 219 GiB 57.25 0.96 93 up 3 ssd 0.50000 1.00000 512 GiB 307 GiB 305 GiB 40 KiB 1.7 GiB 205 GiB 59.92 1.01 103 up 6 ssd 0.50000 1.00000 512 GiB 313 GiB 312 GiB 32 KiB 1024 MiB 199 GiB 61.19 1.03 92 up 1 ssd 0.50000 1.00000 512 GiB 279 GiB 277 GiB 55 KiB 1.8 GiB 233 GiB 54.54 0.92 93 up 4 ssd 0.50000 1.00000 512 GiB 307 GiB 305 GiB 43 KiB 1.5 GiB 205 GiB 59.87 1.01 95 up 7 ssd 0.50000 1.00000 512 GiB 328 GiB 327 GiB 35 KiB 1024 MiB 184 GiB 63.98 1.08 99 up 0 ssd 0.50000 1.00000 512 GiB 328 GiB 327 GiB 51 KiB 1.4 GiB 184 GiB 64.05 1.08 101 up 5 ssd 0.50000 1.00000 512 GiB 272 GiB 271 GiB 39 KiB 1.4 GiB 240 GiB 53.20 0.89 89 up 8 ssd 0.50000 1.00000 512 GiB 313 GiB 312 GiB 24 KiB 1024 MiB 199 GiB 61.05 1.03 98 up TOTAL 4.5 TiB 2.7 TiB 2.7 TiB 366 KiB 12 GiB 1.8 TiB 59.45 MIN/MAX VAR: 0.89/1.08 STDDEV: 3.58 (In reply to krishnaram Karthick from comment #10) > I reran the test and waited for a long time. > I see that this time the OSDs are more evenly distributed. Good - so what's the next step? > > After expanding to 6 OSDs: > > ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL > %USE VAR PGS STATUS > 2 ssd 0.50000 1.00000 512 GiB 216 GiB 215 GiB 72 KiB 1.5 GiB 296 GiB > 42.24 0.96 133 up > 3 ssd 0.50000 1.00000 512 GiB 236 GiB 235 GiB 27 KiB 1024 MiB 276 GiB > 46.01 1.04 155 up > 1 ssd 0.50000 1.00000 512 GiB 226 GiB 224 GiB 72 KiB 1.5 GiB 286 GiB > 44.04 1.00 146 up > 4 ssd 0.50000 1.00000 512 GiB 226 GiB 225 GiB 27 KiB 1024 MiB 286 GiB > 44.21 1.00 140 up > 0 ssd 0.50000 1.00000 512 GiB 239 GiB 238 GiB 75 KiB 1.6 GiB 273 GiB > 46.75 1.06 149 up > 5 ssd 0.50000 1.00000 512 GiB 213 GiB 212 GiB 45 KiB 1024 MiB 299 GiB > 41.51 0.94 139 up > TOTAL 3 TiB 1.3 TiB 1.3 TiB 321 KiB 7.6 GiB 1.7 TiB > 44.13 > MIN/MAX VAR: 0.94/1.06 STDDEV: 1.86 > > After expanding to 9 OSDs: > ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL > %USE VAR PGS STATUS > 2 ssd 0.50000 1.00000 512 GiB 293 GiB 292 GiB 42 KiB 1.4 GiB 219 GiB > 57.25 0.96 93 up > 3 ssd 0.50000 1.00000 512 GiB 307 GiB 305 GiB 40 KiB 1.7 GiB 205 GiB > 59.92 1.01 103 up > 6 ssd 0.50000 1.00000 512 GiB 313 GiB 312 GiB 32 KiB 1024 MiB 199 GiB > 61.19 1.03 92 up > 1 ssd 0.50000 1.00000 512 GiB 279 GiB 277 GiB 55 KiB 1.8 GiB 233 GiB > 54.54 0.92 93 up > 4 ssd 0.50000 1.00000 512 GiB 307 GiB 305 GiB 43 KiB 1.5 GiB 205 GiB > 59.87 1.01 95 up > 7 ssd 0.50000 1.00000 512 GiB 328 GiB 327 GiB 35 KiB 1024 MiB 184 GiB > 63.98 1.08 99 up > 0 ssd 0.50000 1.00000 512 GiB 328 GiB 327 GiB 51 KiB 1.4 GiB 184 GiB > 64.05 1.08 101 up > 5 ssd 0.50000 1.00000 512 GiB 272 GiB 271 GiB 39 KiB 1.4 GiB 240 GiB > 53.20 0.89 89 up > 8 ssd 0.50000 1.00000 512 GiB 313 GiB 312 GiB 24 KiB 1024 MiB 199 GiB > 61.05 1.03 98 up > TOTAL 4.5 TiB 2.7 TiB 2.7 TiB 366 KiB 12 GiB 1.8 TiB > 59.45 > MIN/MAX VAR: 0.89/1.08 STDDEV: 3.58 As the balancer is working expected, this is not a regression or blocker. Removing the blocker flag as discussed in the OCS meeting yesterday, probably this should be closed as not a bug. (In reply to Yaniv Kaul from comment #11) > (In reply to krishnaram Karthick from comment #10) > > I reran the test and waited for a long time. > > I see that this time the OSDs are more evenly distributed. > > Good - so what's the next step? > Reaching out to the performance team running CNV workloads to see if this is seen on a scaled-up cluster with CNV workload as that is where the issue was originally seen. Moving out of 4.6, once we have the inputs from perf team we can move forward. (In reply to krishnaram Karthick from comment #13) > (In reply to Yaniv Kaul from comment #11) > > (In reply to krishnaram Karthick from comment #10) > > > I reran the test and waited for a long time. > > > I see that this time the OSDs are more evenly distributed. > > > > Good - so what's the next step? > > > > Reaching out to the performance team running CNV workloads to see if this is > seen on a scaled-up cluster with CNV workload as that is where the issue was > originally seen. Any update on this? (In reply to Josh Durgin from comment #15) > (In reply to krishnaram Karthick from comment #13) > > (In reply to Yaniv Kaul from comment #11) > > > (In reply to krishnaram Karthick from comment #10) > > > > I reran the test and waited for a long time. > > > > I see that this time the OSDs are more evenly distributed. > > > > > > Good - so what's the next step? > > > > > > > Reaching out to the performance team running CNV workloads to see if this is > > seen on a scaled-up cluster with CNV workload as that is where the issue was > > originally seen. > > Any update on this? The last time I reached out, I couldn't get a CNV system that runs with a storage capacity as described in the bug. But, I'm retaining the needinfo to check once again, Or maybe see if there is an automated test that we could run on our test environments. Please reopen if you see this again. Removing the needinfo flag. We weren't able to reproduce this scenario. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |