Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2400565

Summary: [Tentacle] [Upgrade][RHEL10] Upgrade to latest build of Tentacle failing with mgr getting crash in SnapRealmInfoNew::decode()
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Manisha Saini <msaini>
Component: CephFSAssignee: Dhairya Parmar <dparmar>
Status: CLOSED ERRATA QA Contact: sumr
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 9.0CC: ceph-eng-bugs, cephqe-warriors, ngangadh, nojha, pdhiran, prallabh, vshankar
Target Milestone: ---Keywords: Regression
Target Release: 9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-20.1.0-40 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2026-01-29 07:00:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Manisha Saini 2025-09-30 23:13:27 UTC
Description of problem:
========================

While performing the upgrade to latest build of Tentacles - quay.io/rhceph-ci/rhceph@sha256:4c8be91d885a24d3960d12b0fcf1b2ae2ea983771313285290887418fb6a4daf ,
upgrade failed with ceph-mgr getting crashed.

The issue was consistently observed in the following upgrade scenarios:
1. Squid → Tentacle
2. Previous Tentacle build → Latest Tentacle build

core 
-----
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/ceph-mgr -n mgr.cali015.qfxkhz -f --setuser ceph --setgroup ceph --def'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f8e3818dedc in __pthread_kill_implementation () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f8a57630640 (LWP 168))]
(gdb) bt
#0  0x00007f8e3818dedc in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007f8e38140b46 in raise () from /lib64/libc.so.6
#2  0x00007f8e3812a8c5 in abort () from /lib64/libc.so.6
#3  0x00007f8e383c7b21 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6
#4  0x00007f8e383d352c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
#5  0x00007f8e383d3597 in std::terminate() () from /lib64/libstdc++.so.6
#6  0x00007f8e383d37f9 in __cxa_throw () from /lib64/libstdc++.so.6
#7  0x00007f8e38720205 in ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*) [clone .cold] ()
   from /usr/lib64/ceph/libceph-common.so.2
#8  0x00007f8e38877953 in SnapRealmInfoNew::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&) ()
   from /usr/lib64/ceph/libceph-common.so.2
#9  0x00007f8e2eb40dc0 in get_snap_realm_info(MetaSession*, ceph::buffer::v15_2_0::list::iterator_impl<true>&) [clone .lto_priv.0] ()
   from /lib64/libcephfs.so.2
#10 0x00007f8e2eb48d1f in Client::update_snap_trace(MetaSession*, ceph::buffer::v15_2_0::list const&, SnapRealm**, bool) ()
   from /lib64/libcephfs.so.2
#11 0x00007f8e2eb2d3f7 in Client::handle_client_reply(boost::intrusive_ptr<MClientReply const> const&) () from /lib64/libcephfs.so.2
#12 0x00007f8e2eb2ee62 in Client::ms_dispatch2(boost::intrusive_ptr<Message> const&) () from /lib64/libcephfs.so.2
#13 0x00007f8e3890916d in DispatchQueue::entry() () from /usr/lib64/ceph/libceph-common.so.2
#14 0x00007f8e389a28d1 in DispatchQueue::DispatchThread::entry() () from /usr/lib64/ceph/libceph-common.so.2
#15 0x00007f8e3818c19a in start_thread () from /lib64/libc.so.6
#16 0x00007f8e38211240 in clone3 () from /lib64/libc.so.6
(gdb)



Version-Release number of selected component (if applicable):
=============================================================
Upgrade from -  ceph version 20.1.0-27 to 20.1.0-30



How reproducible:
================
3/3


Steps to Reproduce:
===================
1.Deploy ceph cluster on N-1 builds
2.Upgrade the cluster to latest Tentacle build


Actual results:
===============
Upgrade fails with ceph-mgr crashing and generating a core dump.


Expected results:
================
Upgrade should complete successfully without any ceph-mgr crashes.


Additional info:
===============

# ceph -s
  cluster:
    id:     a9b05e46-9d2b-11f0-b44c-fa163e3ba847
    health: HEALTH_WARN
            1 failed cephadm daemon(s)
            4 daemons have recently crashed
            Upgrade: Need standby mgr daemon


# ceph orch ps | grep mgr
mgr.ceph-msaini-srl78p-node1-installer.jjqhox     ceph-msaini-srl78p-node1-installer  *:9283,8765,8443  running (5h)      5m ago  35h     479M        -  20.1.0-27.el9cp  edd879ed237d  58746de34772
mgr.ceph-msaini-srl78p-node3.icntuk               ceph-msaini-srl78p-node3            *:8443,9283,8765  error             5m ago  12m        -        -  <unknown>        <unknown>     <unknown>



# ls
core.ceph-mgr.167.6e5341b06d594fa394978e5c40b142ab.733734.1759273221000000.zst
core.ceph-mgr.167.6e5341b06d594fa394978e5c40b142ab.741252.1759273246000000.zst
core.ceph-mgr.167.6e5341b06d594fa394978e5c40b142ab.741526.1759273271000000.zst
core.ceph-mgr.167.6e5341b06d594fa394978e5c40b142ab.741790.1759273296000000.zst
core.ceph-mgr.167.6e5341b06d594fa394978e5c40b142ab.742049.1759273321000000.zst

Comment 12 errata-xmlrpc 2026-01-29 07:00:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 9.0 Security and Enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2026:1536