Bug 2153008
| Summary: | [GSS] False alert of "Cluster Object Store is in unhealthy state for more than 15s" | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Sonal <sarora> |
| Component: | ceph | Assignee: | Matt Benjamin (redhat) <mbenjamin> |
| ceph sub component: | RGW | QA Contact: | Elad <ebenahar> |
| Status: | NEW --- | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | bniver, brgardne, jthottan, mbenjamin, mkasturi, muagarwa, nojha, odf-bz-bot, rzarzyns, sostapov, tnielsen, vumrao |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Sonal
2022-12-13 18:45:23 UTC
RGW is returning error 500. I'm not sure why. I copied a section of RGW logs below, but we need to get someone experienced with RGW to take a look I think. I don't see anything misconfigured so far. 2022-11-22T09:36:30.191955329+01:00 debug 2022-11-22T08:36:30.190+0000 7fa5ac5f3700 1 ====== starting new request req=0x7fa4d5c45630 ===== 2022-11-22T09:36:30.197800379+01:00 debug 2022-11-22T08:36:30.196+0000 7fa53d515700 0 WARNING: set_req_state_err err_no=5 resorting to 500 2022-11-22T09:36:30.198185831+01:00 debug 2022-11-22T08:36:30.197+0000 7fa53d515700 1 ====== req done req=0x7fa4d5c45630 op status=-5 http_status=500 latency=0.007000081s ====== 2022-11-22T09:36:30.198280655+01:00 debug 2022-11-22T08:36:30.197+0000 7fa53d515700 1 beast: 0x7fa4d5c45630: 10.128.4.21 - noobaa-ceph-objectstore-user [22/Nov/2022:08:36:30.190 +0000] "PUT /nb.1637066608462.apps.ocp-test.openshift-dpc.local/noobaa_blocks/6193a7700e500e002327f8d6/blocks_tree/other.blocks/_test_store_perf HTTP/1.1" 500 1371 - "aws-sdk-nodejs/2.1127.0 linux/v14.18.2 promise" - latency=0.007000081s 2022-11-22T09:36:30.338275062+01:00 debug 2022-11-22T08:36:30.337+0000 7fa552d40700 1 ====== starting new request req=0x7fa4d5c45630 ===== 2022-11-22T09:36:30.345020431+01:00 debug 2022-11-22T08:36:30.344+0000 7fa52f4f9700 0 WARNING: set_req_state_err err_no=5 resorting to 500 2022-11-22T09:36:30.345648597+01:00 debug 2022-11-22T08:36:30.344+0000 7fa52f4f9700 1 ====== req done req=0x7fa4d5c45630 op status=-5 http_status=500 latency=0.007000081s ====== 2022-11-22T09:36:30.345983860+01:00 debug 2022-11-22T08:36:30.344+0000 7fa52f4f9700 1 beast: 0x7fa4d5c45630: 10.128.4.21 - noobaa-ceph-objectstore-user [22/Nov/2022:08:36:30.337 +0000] "PUT /nb.1637066608462.apps.ocp-test.openshift-dpc.local/noobaa_blocks/6193a7700e500e002327f8d6/blocks_tree/other.blocks/_test_store_perf HTTP/1.1" 500 1371 - "aws-sdk-nodejs/2.1127.0 linux/v14.18.2 promise" - latency=0.007000081s 2022-11-22T09:36:30.607642277+01:00 debug 2022-11-22T08:36:30.606+0000 7fa4f6c88700 1 ====== starting new request req=0x7fa4d5c45630 ===== 2022-11-22T09:36:30.614114065+01:00 debug 2022-11-22T08:36:30.613+0000 7fa544523700 0 WARNING: set_req_state_err err_no=5 resorting to 500 2022-11-22T09:36:30.614479947+01:00 debug 2022-11-22T08:36:30.613+0000 7fa544523700 1 ====== req done req=0x7fa4d5c45630 op status=-5 http_status=500 latency=0.007000081s ====== 2022-11-22T09:36:30.614586529+01:00 debug 2022-11-22T08:36:30.613+0000 7fa544523700 1 beast: 0x7fa4d5c45630: 10.128.4.21 - noobaa-ceph-objectstore-user [22/Nov/2022:08:36:30.606 +0000] "PUT /nb.1637066608462.apps.ocp-test.openshift-dpc.local/noobaa_blocks/6193a7700e500e002327f8d6/blocks_tree/other.blocks/_test_store_perf HTTP/1.1" 500 1371 - "aws-sdk-nodejs/2.1127.0 linux/v14.18.2 promise" - latency=0.007000081s 2022-11-22T09:36:30.628885237+01:00 debug 2022-11-22T08:36:30.628+0000 7fa5a95ed700 1 ====== starting new request req=0x7fa4d5c45630 ===== 2022-11-22T09:36:30.631082542+01:00 debug 2022-11-22T08:36:30.630+0000 7fa5a95ed700 1 ====== req done req=0x7fa4d5c45630 op status=-2 http_status=204 latency=0.002000023s ====== 2022-11-22T09:36:30.631371320+01:00 debug 2022-11-22T08:36:30.630+0000 7fa5a95ed700 1 beast: 0x7fa4d5c45630: 10.128.4.21 - noobaa-ceph-objectstore-user [22/Nov/2022:08:36:30.628 +0000] "DELETE /nb.1637066608462.apps.ocp-test.openshift-dpc.local/noobaa_blocks/6193a7700e500e002327f8d6/blocks_tree/other.blocks/test-delete-non-existing-key-1669106190622 HTTP/1.1" 204 0 - "aws-sdk-nodejs/2.1127.0 linux/v14.18.2 promise" - latency=0.002000023s 2022-11-22T09:36:37.089010119+01:00 debug 2022-11-22T08:36:37.088+0000 7fa58c5b3700 1 ====== starting new request req=0x7fa61c6ce630 ===== 2022-11-22T09:36:37.089444074+01:00 debug 2022-11-22T08:36:37.088+0000 7fa58c5b3700 1 ====== req done req=0x7fa61c6ce630 op status=0 http_status=200 latency=0.000000000s ====== 2022-11-22T09:36:37.089621302+01:00 debug 2022-11-22T08:36:37.088+0000 7fa58c5b3700 1 beast: 0x7fa61c6ce630: 10.128.4.1 - - [22/Nov/2022:08:36:37.088 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=0.000000000s 2022-11-22T09:36:47.088527876+01:00 debug 2022-11-22T08:36:47.086+0000 7fa50b4b1700 1 ====== starting new request req=0x7fa61c6ce630 ===== 2022-11-22T09:36:47.088995912+01:00 debug 2022-11-22T08:36:47.087+0000 7fa50b4b1700 1 ====== req done req=0x7fa61c6ce630 op status=0 http_status=200 latency=0.001000012s ====== 2022-11-22T09:36:47.089135565+01:00 debug 2022-11-22T08:36:47.087+0000 7fa50b4b1700 1 beast: 0x7fa61c6ce630: 10.128.4.1 - - [22/Nov/2022:08:36:47.086 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=0.001000012s 2022-11-22T09:36:47.317367259+01:00 debug 2022-11-22T08:36:47.315+0000 7fa607eaa700 0 rgw UsageLogger: WARNING: RGWRados::log_usage(): user name empty (bucket=), skipping 2022-11-22T09:36:57.087894797+01:00 debug 2022-11-22T08:36:57.085+0000 7fa55954d700 1 ====== starting new request req=0x7fa61c6ce630 ===== 2022-11-22T09:36:57.088352163+01:00 debug 2022-11-22T08:36:57.086+0000 7fa55954d700 1 ====== req done req=0x7fa61c6ce630 op status=0 http_status=200 latency=0.000000000s ====== 2022-11-22T09:36:57.088488432+01:00 debug 2022-11-22T08:36:57.086+0000 7fa55954d700 1 beast: 0x7fa61c6ce630: 10.128.4.1 - - [22/Nov/2022:08:36:57.085 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=0.000000000s 2022-11-22T09:37:07.088412567+01:00 debug 2022-11-22T09:37:07.088630767+01:00 2022-11-22T08:37:07.087+0000 7fa51ccd4700 1 ====== starting new request req=0x7fa4d5c45630 =====2022-11-22T09:37:07.088702317+01:00 2022-11-22T09:37:07.088918407+01:00 debug 2022-11-22T09:37:07.088993275+01:00 2022-11-22T08:37:07.087+0000 7fa51ccd4700 1 ====== req done req=0x7fa4d5c45630 op status=0 http_status=200 latency=0.000000000s ======2022-11-22T09:37:07.089046187+01:00 2022-11-22T09:37:07.089134150+01:00 debug 2022-11-22T09:37:07.089188270+01:00 2022-11-22T08:37:07.087+0000 7fa51ccd4700 1 beast: 0x7fa4d5c45630: 10.128.4.1 - - [22/Nov/2022:08:37:07.087 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=0.000000000s2022-11-22T09:37:07.089236159+01:00 In the meantime, could you increase the log level for the RGW, then restart the RGW pod, and then collect a new must-gather after 10 minutes or so? |