Bug 2214499
Summary: | [Tracker for https://bugzilla.redhat.com/show_bug.cgi?id=2266035] ceph-client.admin crashed in ceph-exporter thread with "throw_invalid_argument(char const*, boost::source_location const&)+0x37) [0x557c40cab267]" | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Prasad Desala <tdesala> | |
Component: | rook | Assignee: | Divyansh Kamboj <dkamboj> | |
Status: | CLOSED ERRATA | QA Contact: | Nagendra Reddy <nagreddy> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 4.13 | CC: | amagrawa, athakkar, bhubbard, bkunal, brgardne, dkamboj, dmoessne, kramdoss, mduasope, muagarwa, nagreddy, nthomas, odf-bz-bot, prsurve, sagrawal, sapillai, sarora, sheggodu, srai, tnielsen | |
Target Milestone: | --- | Keywords: | Automation, Reopened | |
Target Release: | ODF 4.16.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2266035 (view as bug list) | Environment: | ||
Last Closed: | 2024-07-17 13:10:53 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 2266035 | |||
Bug Blocks: |
Description
Prasad Desala
2023-06-13 07:19:22 UTC
Hi @Prasad This looks like a dup of https://bugzilla.redhat.com/show_bug.cgi?id=2232226, but that bug is being hit during upgrade from 4.13 to 4.14 4.14 is using the ceph build which has exporter changes, but I can see the same crash there. { "backtrace": [ "/lib64/libc.so.6(+0x54df0) [0x7f122f9a0df0]", "/lib64/libc.so.6(+0xa154c) [0x7f122f9ed54c]", "raise()", "abort()", "/lib64/libstdc++.so.6(+0xa1a01) [0x7f122fc11a01]", "/lib64/libstdc++.so.6(+0xad37c) [0x7f122fc1d37c]", "/lib64/libstdc++.so.6(+0xad3e7) [0x7f122fc1d3e7]", "/lib64/libstdc++.so.6(+0xad649) [0x7f122fc1d649]", "ceph-exporter(+0x29767) [0x565474e04767]", "(boost::json::detail::throw_invalid_argument(char const*, boost::source_location const&)+0x37) [0x565474e18267]", "ceph-exporter(+0x65947) [0x565474e40947]", "(DaemonMetricCollector::dump_asok_metrics()+0x1de7) [0x565474e209e7]", "ceph-exporter(+0x45e20) [0x565474e20e20]", "ceph-exporter(+0x5caed) [0x565474e37aed]", "ceph-exporter(+0xab6df) [0x565474e866df]", "(DaemonMetricCollector::main()+0x212) [0x565474e0abf2]", "main()", "/lib64/libc.so.6(+0x3feb0) [0x7f122f98beb0]", "__libc_start_main()", "_start()" ], "ceph_version": "17.2.6-105.el9cp", "crash_id": "2023-08-14T23:05:06.978999Z_273bd10e-1d27-4e80-ab3c-838c6c5a9519", "entity_name": "client.admin", "os_id": "rhel", "os_name": "Red Hat Enterprise Linux", "os_version": "9.2 (Plow)", "os_version_id": "9.2", "process_name": "ceph-exporter", "stack_sig": "03972c98be910d1ce25645fdd11917d43497d8e45963b63cf072b005e7daee44", "timestamp": "2023-08-14T23:05:06.978999Z", "utsname_hostname": "rook-ceph-exporter-compute-0-68fdf6c8b5-rbdqb", "utsname_machine": "x86_64", "utsname_release": "5.14.0-284.25.1.el9_2.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP PREEMPT_DYNAMIC Thu Jul 20 09:11:28 EDT 2023" } We can have the setup whenever you want as this is always reproducible. @athakkar it looks like this is still reproducible. Since this is a bug for 4.14, please take a look with appropriate priority. Can we try with 4.14.0-117, it should be fixed. Marking it a blocker for now 4.15 already has the fix for the crash backported from what I see, https://github.com/red-hat-storage/rook/blob/3e3dba07d1fd6f730b856f2175de238fac6e6b5a/pkg/operator/ceph/cluster/nodedaemon/exporter.go#L123C1-L124C1 The reason looks similar though. Nagendra, please try again with the latest ODF build The issue is that ceph-exporter doesn't handle exceptions while parsing jsons, so we need to add try-catch blocks to mitigate these crashes, and error out in the log. I'm working on the unit tests to verify if the fix is working. (In reply to Divyansh Kamboj from comment #37) > The issue is that ceph-exporter doesn't handle exceptions while parsing > jsons, so we need to add try-catch blocks to mitigate these crashes, and > error out in the log. I'm working on the unit tests to verify if the fix is > working. Why are we getting invalid json? Is there some other underlying issue? > Why are we getting invalid json? Is there some other underlying issue?
the json is "invalid" per-say. But the function that parses the json into `object` and `array` gives an exception when certain data points are not popluated. So try-catch block to catch those exceptions
Bug in NEW/ASSIGNED state. Moving the bug to 4.15.3 for a decision on RCA/FIX. *** Bug 2269122 has been marked as a duplicate of this bug. *** *** Bug 2255648 has been marked as a duplicate of this bug. *** Moving the bug to 4.15.4 as we have reached the limit on bugs intake for 4.15.3 Upstream PR https://github.com/ceph/ceph/pull/55773 has been merged, getting the patch backported to 7.1 Relevant BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2266035 it just got backported to ceph 7.1 downstream, afaik 4.16 uses 7.1. so we can mark it modified for 4.16? wdyt @sheggodu Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |