Bug 2400565
| Summary: | [Tentacle] [Upgrade][RHEL10] Upgrade to latest build of Tentacle failing with mgr getting crash in SnapRealmInfoNew::decode() | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Manisha Saini <msaini> |
| Component: | CephFS | Assignee: | Dhairya Parmar <dparmar> |
| Status: | CLOSED ERRATA | QA Contact: | sumr |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 9.0 | CC: | ceph-eng-bugs, cephqe-warriors, ngangadh, nojha, pdhiran, prallabh, vshankar |
| Target Milestone: | --- | Keywords: | Regression |
| Target Release: | 9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-20.1.0-40 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2026-01-29 07:00:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 9.0 Security and Enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2026:1536 |
Description of problem: ======================== While performing the upgrade to latest build of Tentacles - quay.io/rhceph-ci/rhceph@sha256:4c8be91d885a24d3960d12b0fcf1b2ae2ea983771313285290887418fb6a4daf , upgrade failed with ceph-mgr getting crashed. The issue was consistently observed in the following upgrade scenarios: 1. Squid → Tentacle 2. Previous Tentacle build → Latest Tentacle build core ----- [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/bin/ceph-mgr -n mgr.cali015.qfxkhz -f --setuser ceph --setgroup ceph --def'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f8e3818dedc in __pthread_kill_implementation () from /lib64/libc.so.6 [Current thread is 1 (Thread 0x7f8a57630640 (LWP 168))] (gdb) bt #0 0x00007f8e3818dedc in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x00007f8e38140b46 in raise () from /lib64/libc.so.6 #2 0x00007f8e3812a8c5 in abort () from /lib64/libc.so.6 #3 0x00007f8e383c7b21 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6 #4 0x00007f8e383d352c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6 #5 0x00007f8e383d3597 in std::terminate() () from /lib64/libstdc++.so.6 #6 0x00007f8e383d37f9 in __cxa_throw () from /lib64/libstdc++.so.6 #7 0x00007f8e38720205 in ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*) [clone .cold] () from /usr/lib64/ceph/libceph-common.so.2 #8 0x00007f8e38877953 in SnapRealmInfoNew::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&) () from /usr/lib64/ceph/libceph-common.so.2 #9 0x00007f8e2eb40dc0 in get_snap_realm_info(MetaSession*, ceph::buffer::v15_2_0::list::iterator_impl<true>&) [clone .lto_priv.0] () from /lib64/libcephfs.so.2 #10 0x00007f8e2eb48d1f in Client::update_snap_trace(MetaSession*, ceph::buffer::v15_2_0::list const&, SnapRealm**, bool) () from /lib64/libcephfs.so.2 #11 0x00007f8e2eb2d3f7 in Client::handle_client_reply(boost::intrusive_ptr<MClientReply const> const&) () from /lib64/libcephfs.so.2 #12 0x00007f8e2eb2ee62 in Client::ms_dispatch2(boost::intrusive_ptr<Message> const&) () from /lib64/libcephfs.so.2 #13 0x00007f8e3890916d in DispatchQueue::entry() () from /usr/lib64/ceph/libceph-common.so.2 #14 0x00007f8e389a28d1 in DispatchQueue::DispatchThread::entry() () from /usr/lib64/ceph/libceph-common.so.2 #15 0x00007f8e3818c19a in start_thread () from /lib64/libc.so.6 #16 0x00007f8e38211240 in clone3 () from /lib64/libc.so.6 (gdb) Version-Release number of selected component (if applicable): ============================================================= Upgrade from - ceph version 20.1.0-27 to 20.1.0-30 How reproducible: ================ 3/3 Steps to Reproduce: =================== 1.Deploy ceph cluster on N-1 builds 2.Upgrade the cluster to latest Tentacle build Actual results: =============== Upgrade fails with ceph-mgr crashing and generating a core dump. Expected results: ================ Upgrade should complete successfully without any ceph-mgr crashes. Additional info: =============== # ceph -s cluster: id: a9b05e46-9d2b-11f0-b44c-fa163e3ba847 health: HEALTH_WARN 1 failed cephadm daemon(s) 4 daemons have recently crashed Upgrade: Need standby mgr daemon # ceph orch ps | grep mgr mgr.ceph-msaini-srl78p-node1-installer.jjqhox ceph-msaini-srl78p-node1-installer *:9283,8765,8443 running (5h) 5m ago 35h 479M - 20.1.0-27.el9cp edd879ed237d 58746de34772 mgr.ceph-msaini-srl78p-node3.icntuk ceph-msaini-srl78p-node3 *:8443,9283,8765 error 5m ago 12m - - <unknown> <unknown> <unknown> # ls core.ceph-mgr.167.6e5341b06d594fa394978e5c40b142ab.733734.1759273221000000.zst core.ceph-mgr.167.6e5341b06d594fa394978e5c40b142ab.741252.1759273246000000.zst core.ceph-mgr.167.6e5341b06d594fa394978e5c40b142ab.741526.1759273271000000.zst core.ceph-mgr.167.6e5341b06d594fa394978e5c40b142ab.741790.1759273296000000.zst core.ceph-mgr.167.6e5341b06d594fa394978e5c40b142ab.742049.1759273321000000.zst