Bug 2000133
| Summary: | rgw pod stuck in CrashLoopBackOff while installing odf-operator via UI on VMware cluster | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Aman Agrawal <amagrawa> |
| Component: | rook | Assignee: | Jiffin <jthottan> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Elad <ebenahar> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.8 | CC: | hnallurv, jrivera, jthottan, madam, muagarwa, nberry, nigoyal, ocs-bugs, odf-bz-bot, sbaldwin, srai, tnielsen |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | --- | Flags: | muagarwa:
needinfo?
(sbaldwin) |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-01-24 16:21:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2000190 | ||
| Bug Blocks: | |||
|
Description
Aman Agrawal
2021-09-01 13:13:08 UTC
The rgw log shows an error with the cert or zonegroup configuration. @Jiffin Can you take a look? 2021-09-01T13:13:39.863734009Z debug 2021-09-01T13:13:39.861+0000 7f309e2ef480 0 deferred set uid:gid to 167:167 (ceph:ceph) 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 7f309e2ef480 0 ceph version 16.2.0-81.el8cp (8908ce967004ed706acb5055c01030e6ecd06036) pacific (stable), process radosgw, pid 478 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 7f309e2ef480 0 framework: beast 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 7f309e2ef480 0 framework conf key: port, val: 8080 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 7f309e2ef480 0 framework conf key: ssl_port, val: 443 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 7f309e2ef480 0 framework conf key: ssl_certificate, val: /etc/ceph/private/rgw-cert.pem 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 7f309e2ef480 0 framework conf key: ssl_private_key, val: /etc/ceph/private/rgw-key.pem 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.863860573Z 2021-09-01T13:13:39.861+0000 7f309e2ef480 1 radosgw_Main not setting numa affinity 2021-09-01T13:13:39.881728097Z debug 2021-09-01T13:13:39.879+0000 7f309e2ef480 0 failed reading zonegroup info: ret -2 (2) No such file or directory 2021-09-01T13:13:39.881728097Z debug 2021-09-01T13:13:39.879+0000 7f309e2ef480 0 ERROR: failed to start notify service ((2) No such file or directory 2021-09-01T13:13:39.881728097Z debug 2021-09-01T13:13:39.879+0000 7f309e2ef480 0 ERROR: failed to init services (ret=(2) No such file or directory) 2021-09-01T13:13:39.883725963Z debug 2021-09-01T13:13:39.881+0000 7f309e2ef480 -1 Couldn't init storage provider (RADOS) (In reply to Travis Nielsen from comment #3) > The rgw log shows an error with the cert or zonegroup configuration. > @Jiffin Can you take a look? > > 2021-09-01T13:13:39.863734009Z debug 2021-09-01T13:13:39.861+0000 > 7f309e2ef480 0 deferred set uid:gid to 167:167 (ceph:ceph) > 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 > 7f309e2ef480 0 ceph version 16.2.0-81.el8cp > (8908ce967004ed706acb5055c01030e6ecd06036) pacific (stable), process > radosgw, pid 478 > 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 > 7f309e2ef480 0 framework: beast > 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 > 7f309e2ef480 0 framework conf key: port, val: 8080 > 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 > 7f309e2ef480 0 framework conf key: ssl_port, val: 443 > 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 > 7f309e2ef480 0 framework conf key: ssl_certificate, val: > /etc/ceph/private/rgw-cert.pem > 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.861+0000 > 7f309e2ef480 0 framework conf key: ssl_private_key, val: > /etc/ceph/private/rgw-key.pem > 2021-09-01T13:13:39.863847681Z debug 2021-09-01T13:13:39.863860573Z > 2021-09-01T13:13:39.861+0000 7f309e2ef480 1 radosgw_Main not setting numa > affinity > 2021-09-01T13:13:39.881728097Z debug 2021-09-01T13:13:39.879+0000 > 7f309e2ef480 0 failed reading zonegroup info: ret -2 (2) No such file or > directory > 2021-09-01T13:13:39.881728097Z debug 2021-09-01T13:13:39.879+0000 > 7f309e2ef480 0 ERROR: failed to start notify service ((2) No such file or > directory > 2021-09-01T13:13:39.881728097Z debug 2021-09-01T13:13:39.879+0000 > 7f309e2ef480 0 ERROR: failed to init services (ret=(2) No such file or > directory) > 2021-09-01T13:13:39.883725963Z debug 2021-09-01T13:13:39.881+0000 > 7f309e2ef480 -1 Couldn't init storage provider (RADOS) I was not able to find suspicious part last few error messages Can you please recollect the logs with debug level 20, add the following to rook-config-override and restart the rgw-pod [client.rgw.ocs.storagecluster.cephobjectstore.a] debug rgw = 20/20 ? As of now, OCS must-gather command doesn't collect each and every log related to new changes in re-branding. Please follow bug 2000190 for more details. I am not sure how else could I be able to help you here. Hi Neha/Mudit/Jose/Nitin, Could you please help us here? Hi Jiffin, Travis, How should we change RGW log level? Is it done using a configMap? (In reply to Elad from comment #10) > Hi Jiffin, Travis, > > How should we change RGW log level? Is it done using a configMap? In Rook, we can do by adding the following to the "rook-config-override" cm and restart the rgw-pod(if the pod is already started) [client.rgw.ocs.storagecluster.cephobjectstore.a] debug rgw = 20/20 Removing needsinfo since Jiffin answered the logging question. Aman, can you please repro this with the logging instructions provided by Jiffin in #Comment11 Setting need info back on Aman, we still need help with the reproduction of this issue (with apt debug log level) Thanks Aman. I guess we can keep it open for some time, if there is no instance of this in future then we might close it. I don't see it as a Test blocker if this is not even reproducible, removing the blocker flag. Please re-flag if required. Please reopen if this is reproducible. Please open a new bug if this is able to repro with the increased logging. |