Description of problem: ceph -s Version-Release number of selected component: ceph-common-2:18.2.0-1.fc39 Additional info: reporter: libreport-2.17.11 type: CCpp reason: python3.12 killed by SIGSEGV journald_cursor: s=9f1834c2842e46078968d850d7623551;i=87b8a6;b=24a9552ddedc44b0bfff8a41d6b1b100;m=35395e25;t=6067dddcc3fa6;x=480a204dbabd5686 executable: /usr/bin/python3.12 cmdline: /usr/bin/python3.12 /usr/bin/ceph -s cgroup: 0::/user.slice/user-1980.slice/user/app.slice/app-org.gnome.Terminal.slice/vte-spawn-fbbd8bfa-c6ca-4483-984b-c9734903c040.scope rootdir: / uid: 1980 kernel: 6.5.5-300.fc39.x86_64 package: ceph-common-2:18.2.0-1.fc39 runlevel: N 5 backtrace_rating: 4 crash_function: std::_Rb_tree_rebalance_for_erase comment: ceph -s Truncated backtrace: Thread no. 1 (22 frames) #0 std::_Rb_tree_rebalance_for_erase at ../../../../../libstdc++-v3/src/c++98/tree.cc:289 #1 std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned int>, std::_Select1st<std::pair<unsigned long const, unsigned int> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned int> > >::_M_erase_aux at /usr/include/c++/13/bits/stl_tree.h:2489 #2 std::_Rb_tree<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>, std::_Select1st<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >, std::less<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > >, std::allocator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >) at /usr/include/c++/13/bits/stl_tree.h:1210 #3 std::multimap<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, Context*, std::less<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > >, std::allocator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >) at /usr/include/c++/13/bits/stl_multimap.h:715 #4 CommonSafeTimer<std::mutex>::cancel_all_events at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/common/Timer.cc:206 #5 CommonSafeTimer<std::mutex>::shutdown at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/common/Timer.cc:66 #6 MonClient::shutdown at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/mon/MonClient.cc:565 #7 MonClient::get_monmap_and_config at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/mon/MonClient.cc:202 #8 librados::v14_2_0::RadosClient::connect at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/librados/RadosClient.cc:232 #9 _rados_connect at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/librados/librados_c.cc:221 #10 __pyx_pf_5rados_5Rados_28connect at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/redhat-linux-build/src/pybind/rados/rados.c:23202 #11 __pyx_pw_5rados_5Rados_29connect at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/redhat-linux-build/src/pybind/rados/rados.c:23123 #12 _PyObject_VectorcallTstate at /usr/src/debug/python3.12-3.12.0 #13 method_vectorcall at /usr/src/debug/python3.12-3.12.0 #14 PyCFunction_Call at /usr/src/debug/python3.12-3.12.0 #15 _PyEval_EvalFrameDefault at Python/bytecodes.c:3259 #16 _PyFunction_Vectorcall at /usr/src/debug/python3.12-3.12.0 #17 _PyObject_VectorcallTstate at /usr/src/debug/python3.12-3.12.0 #18 method_vectorcall at /usr/src/debug/python3.12-3.12.0 #19 thread_run at /usr/src/debug/python3.12-3.12.0 #20 pythread_wrapper at /usr/src/debug/python3.12-3.12.0 #22 clone3 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Created attachment 1991075 [details] File: proc_pid_status
Created attachment 1991076 [details] File: maps
Created attachment 1991077 [details] File: limits
Created attachment 1991078 [details] File: environ
Created attachment 1991079 [details] File: open_fds
Created attachment 1991080 [details] File: mountinfo
Created attachment 1991081 [details] File: os_info
Created attachment 1991082 [details] File: cpuinfo
Created attachment 1991083 [details] File: core_backtrace
Created attachment 1991084 [details] File: exploitable
Created attachment 1991085 [details] File: dso_list
Created attachment 1991086 [details] File: backtrace
upgraded to F39. This worked just fine on F38 and prior versions. this host actually runs the command every minute via check mk monitoring reporter: libreport-2.17.11 type: CCpp reason: python3.12 killed by SIGSEGV journald_cursor: s=03f255decddb458ea0ae62ab441cb404;i=c4c7081;b=f00e9a37cc7546118cee0422ccefbc3b;m=1d935020;t=609ac920275b3;x=53a05fee6090c7be executable: /usr/bin/python3.12 cmdline: /usr/bin/python3.12 /usr/bin/ceph -s cgroup: 0::/user.slice/user-1000.slice/user/app.slice/vte-spawn-72d1e136-3fcd-429c-9c1b-15d8e90c3695.scope rootdir: / uid: 1000 kernel: 6.5.10-300.fc39.x86_64 package: ceph-common-2:18.2.0-1.fc39 runlevel: N 5 backtrace_rating: 4 crash_function: std::_Rb_tree_rebalance_for_erase
Update from Fedora 38 to 39 (thus updating from ceph-common 17.6 to 18.2) reporter: libreport-2.17.11 type: CCpp reason: python3.12 killed by SIGSEGV journald_cursor: s=b8c420ec7eca4f9a9baa0a03c3b4bcd5;i=10fb2;b=107111c190664f248c76c52917556040;m=5e6ac0c8;t=60a0ee77b1627;x=a21fa0301d297dd9 executable: /usr/bin/python3.12 cmdline: /usr/bin/python3.12 /usr/bin/ceph auth get-key client.libvirt cgroup: 0::/user.slice/user-1000.slice/user/app.slice/app-org.gnome.Terminal.slice/vte-spawn-87987c76-8794-4ee6-9f70-9092aecee322.scope rootdir: / uid: 0 kernel: 6.5.11-300.fc39.x86_64 package: ceph-common-2:18.2.0-2.fc39 runlevel: N 5 backtrace_rating: 4 crash_function: std::_Rb_tree_rebalance_for_erase comment: Update from Fedora 38 to 39 (thus updating from ceph-common 17.6 to 18.2)
Same crash on F39 AArch64 (Fedora Asahi Remix), with the same backtrace. Doesn't look arch-specific. The backtrace with debuginfod: #0 std::_Rb_tree_rebalance_for_erase (__z=0xe0073e70, __header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:296 #1 0x0000ffffe8f17a94 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned int>, std::_Select1st<std::pair<unsigned long const, unsigned int> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned int> > >::_M_erase_aux (__position=..., this=0xffffe6a6e068) at /usr/include/c++/13/bits/stl_tree.h:2489 #2 std::_Rb_tree<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>, std::_Select1st<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >, std::less<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > >, std::allocator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >) ( __position=..., this=0xffffe6a6e068) at /usr/include/c++/13/bits/stl_tree.h:1210 #3 std::multimap<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, Context*, std::less<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > >, std::allocator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >) (__position=..., this=0xffffe6a6e068) at /usr/include/c++/13/bits/stl_multimap.h:715 #4 CommonSafeTimer<std::mutex>::cancel_all_events (this=this@entry=0xffffe6a6e000) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/common/Timer.cc:206 #5 0x0000ffffe8f1412c [PAC] in CommonSafeTimer<std::mutex>::shutdown (this=0xffffe6a6e000) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/common/Timer.cc:66 #6 0x0000ffffe9156678 [PAC] in MonClient::shutdown (this=0xffffe6a6dc00) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/mon/MonClient.cc:562 #7 0x0000ffffe91575cc [PAC] in MonClient::get_monmap_and_config (this=this@entry=0xffffe6a6dc00) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/mon/MonClient.cc:199 #8 0x0000ffffe96d0a48 [PAC] in librados::v14_2_0::RadosClient::connect (this=0xffffe0063bb0) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/librados/RadosClient.cc:232 #9 0x0000ffffe9667464 [PAC] in _rados_connect (cluster=0xffffe0063bb0) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/librados/librados_c.cc:221 #10 0x0000ffffe97e912c [PAC] in __pyx_pf_5rados_5Rados_28connect () from /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so #11 0x0000ffffe97e8d9c in __pyx_pw_5rados_5Rados_29connect () from /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so #12 0x0000ffffe9915e0c in __Pyx_CyFunction_Vectorcall_FASTCALL_KEYWORDS () from /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so #13 0x0000fffff7b18e40 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0xffffe6a6e5f8, callable=0xffffe9973c60, tstate=0xaaaaaaed6f70) at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Include/internal/pycore_call.h:92 #14 method_vectorcall (method=<optimized out>, args=0xfffff7f3e298 <_PyRuntime+76288>, nargsf=<optimized out>, kwnames=0x0) at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Objects/classobject.c:69 #15 0x0000fffff7ac9e90 [PAC] in PyCFunction_Call (kwargs=0xffffe6b165c0, args=0xfffff7f3e280 <_PyRuntime+76264>, callable=0xffffe6edac80) at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Objects/call.c:387 #16 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=0xfffff7fb0110, throwflag=<optimized out>) at Python/bytecodes.c:3254 #17 0x0000fffff7b18d58 [PAC] in _PyFunction_Vectorcall (kwnames=0x0, nargsf=1, stack=0xffffe6a6e858, func=0xffffe99c0fe0) at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Objects/call.c:419 #18 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0xffffe6a6e858, callable=0xffffe99c0fe0, tstate=0xaaaaaaed6f70) at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Include/internal/pycore_call.h:92 #19 method_vectorcall (method=<optimized out>, args=0xfffff7f3e298 <_PyRuntime+76288>, nargsf=<optimized out>, kwnames=0x0) at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Objects/classobject.c:69 --Type <RET> for more, q to quit, c to continue without paging-- #20 0x0000fffff7c20f24 [PAC] in thread_run (boot_raw=0xffffe0066050) at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Modules/_threadmodule.c:1114 #21 0x0000fffff7bd1ef0 [PAC] in pythread_wrapper (arg=<optimized out>) at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Python/thread_pthread.h:233 #22 0x0000fffff7800584 [PAC] in start_thread (arg=0xfffff7fa4760) at pthread_create.c:444 #23 0x0000fffff786fc4c [PAC] in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone3.S:76 ... suggests a 64-bit pointer got cast to 32 bits somewhere. 0xe0073e70 is not a valid pointer, but 0xffffe0073e70 is (and points to the right thing).
Valgrind immediately complains about uninitialized data: ==4272== Memcheck, a memory error detector ==4272== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==4272== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==4272== Command: /usr/bin/ceph status ==4272== ==4272== Thread 2: ==4272== Use of uninitialised value of size 8 ==4272== at 0x153763D4: std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (tree.cc:297) ==4272== by 0x14BB7A93: UnknownInlinedFun (stl_tree.h:2494) ==4272== by 0x14BB7A93: UnknownInlinedFun (stl_tree.h:1210) ==4272== by 0x14BB7A93: UnknownInlinedFun (stl_multimap.h:715) ==4272== by 0x14BB7A93: CommonSafeTimer<std::mutex>::cancel_all_events() (Timer.cc:206) ==4272== by 0x14BB412B: CommonSafeTimer<std::mutex>::shutdown() (Timer.cc:67) ==4272== by 0x14DF6677: MonClient::shutdown() (MonClient.cc:562) ==4272== by 0x14DF75CB: MonClient::get_monmap_and_config() (MonClient.cc:199) ==4272== by 0x148F0A47: librados::v14_2_0::RadosClient::connect() (RadosClient.cc:232) ==4272== by 0x14887463: rados_connect@@ (librados_c.cc:221) ==4272== by 0x146A912B: __pyx_pf_5rados_5Rados_28connect (in /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so) ==4272== by 0x146A8D9B: __pyx_pw_5rados_5Rados_29connect (in /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so) ==4272== by 0x147D5E0B: __Pyx_CyFunction_Vectorcall_FASTCALL_KEYWORDS (in /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so) ==4272== by 0x4AD8E3F: UnknownInlinedFun (pycore_call.h:92) ==4272== by 0x4AD8E3F: method_vectorcall (classobject.c:69) ==4272== by 0x4A89E8F: UnknownInlinedFun (call.c:387) ==4272== by 0x4A89E8F: _PyEval_EvalFrameDefault (bytecodes.c:3254) It actually works in Valgrind though, which strongly suggests this is really just a bug in Ceph where it's passing uninitialized pointers to the stl stuff. That I was getting the low 32 bits of a real pointer was probably a coincidence and that was just the garbage it read.
Reported upstream: https://tracker.ceph.com/issues/63867
I am now quite certain this has nothing to do with Ceph and it's a toolchain/GCC bug. It *is* actually miscompiling things and not copying the full pointer value. In the disassembly of this C++ template goop: std::_Rb_tree_iterator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > std::_Rb_tree<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>, std::_Select1st<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >, std::less<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > >, std::allocator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > >::_M_emplace_equal<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>&>(std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>&) Note that the argument is of type: std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> The source code is: template<typename _Key, typename _Val, typename _KeyOfValue, typename _Compare, typename _Alloc> template<typename... _Args> auto _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>:: _M_emplace_equal(_Args&&... __args) -> iterator { _Auto_node __z(*this, std::forward<_Args>(__args)...); auto __res = _M_get_insert_equal_pos(__z._M_key()); return __z._M_insert(__res); } My understanding here is __z is allocated and the argument copied into it. x0 should be this and x1 should be the arg (a pair): => 0x0000ffffe8f14e84 <+0>: paciasp 0x0000ffffe8f14e88 <+4>: stp x29, x30, [sp, #-48]! 0x0000ffffe8f14e8c <+8>: mov x29, sp 0x0000ffffe8f14e90 <+12>: stp x19, x20, [sp, #16] 0x0000ffffe8f14e94 <+16>: mov x19, x0 0x0000ffffe8f14e98 <+20>: mov x0, #0x30 // #48 0x0000ffffe8f14e9c <+24>: str x21, [sp, #32] 0x0000ffffe8f14ea0 <+28>: mov x21, x1 <-- save x1 in x21 0x0000ffffe8f14ea4 <+32>: bl 0xffffe8e903f0 <_Znwm@plt> <-- allocate new node 0x0000ffffe8f14ea8 <+36>: mov x20, x0 <-- new node is in x0 0x0000ffffe8f14eac <+40>: ldr x2, [x19, #16] 0x0000ffffe8f14eb0 <+44>: add x3, x19, #0x8 0x0000ffffe8f14eb4 <+48>: ldr x7, [x21] <-- load the first half of the value (the std::chrono::time_point) 0x0000ffffe8f14eb8 <+52>: str x7, [x0, #32] <-- store it in the new node 0x0000ffffe8f14ebc <+56>: ldr w1, [x21, #8] <-- BUG: loads 32 bits from value+8, should be 64 bits (a Context*) 0x0000ffffe8f14ec0 <+60>: str w1, [x0, #40] <-- store it in the new node At function entry we can see the time_point and Context* in x1: (gdb) x/2gx (uint64_t*)$x1 0xffffe6a6d6d8: 0x00000a098f5d8400 0x0000ffffe0067bb0 After stepping a bit until +44, the pair pointer is now in x21 (gdb) x/2gx (uint64_t*)$x21 0xffffe6a6d6d8: 0x00000a098f5d8400 0x0000ffffe0067bb0 The newly allocated block is garbage at this point: (gdb) x/2gx (uint64_t*)($x0 + 32) 0xffffe0067b30: 0x0000000000000011 0x676e697279656b2f Step a bit more and there's a nice smoking gun: 0x0000ffffe8f14eb8 in std::construct_at<std::pair<unsigned long const, unsigned int>, std::pair<unsigned long const, unsigned int> > (__location=0xffffe0067b30) at /usr/include/c++/13/bits/stl_construct.h:97 97 { return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); } (gdb) 0x0000ffffe8f14ebc 97 { return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); } Note the type name here: it thinks this is now a std::pair<unsigned long const, unsigned int>, which it obviously isn't nor is it equivalent. Step one more instruction and look at the target memory again: (gdb) x/2gx (uint64_t*)($x0 + 32) 0xffffe0067b30: 0x00000a098f5d8400 0x676e6972e0067bb0 Obviously only the low 32 bits of the pointer were copied. Let it run and we get the expected segfault: 2023-12-23T00:04:54.779+0900 ffffddfbf180 10 timer(0xffffe6a6e000).timer_thread executing 0x676e6972e0067bb0 Thread 11 "safe_timer" received signal SIGSEGV, Segmentation fault. GCC is confusing different instantiations of std::pair<> together.
This is an LTO bug. The object file has the correct code, but the linked library does not. Changing `-fto=auto` to `-fno-lto` fixes the problem. Trying to minimize the linker command line now.
I minimized it two two object files and a simple command line. Package here: https://marcan.fedorapeople.org/ltobug.tar.gz Running the included script will link both object files together, then diff the disassembly of the problem function. On F39 Aarch64 with everything up to date, the 64-bit copy of the second element becomes a 32-bit copy. Perhaps we should turn off LTO for ceph until this is fixed... should we reassign this to GCC/binutils?
Confirmed that disabling LTO in the spec file produces working RPMs without the issue.
For those that may come here looking for an interim solution, here's what worked for me based on Hectors work above. Download ceph srpm from https://dl.fedoraproject.org/pub/fedora/linux/updates/39/Everything/source/tree/Packages/c/ rpm -i ceph-XXXXXXXX.srpm cd rpmbuild/SPECS vi ceph.spec add following line to ceph.spec file, I added to the first non comment line. %global _lto_cflags %{nil} rpmbuild -ba ceph.spec Wait for the compilation and packaging to complete. Execute the next line to replace your existing rpms with the non-lto variants. Adjust as needed in case I missed any package identifiers. for i in $(rpmquery -qa | grep -E "ceph|rbd|rgw2|rados" | grep -v -E "libvirt|qemu" | grep x86_64); do echo ${i%%-[[:digit:]]*}; sudo rpm -Uvh --force ../RPMS/x86_64/${i%%-[[:digit:]]*}-18*; done
(In reply to Hector Martin from comment #20) > I minimized it two two object files and a simple command line. Package here: > https://marcan.fedorapeople.org/ltobug.tar.gz > > Running the included script will link both object files together, then diff > the disassembly of the problem function. On F39 Aarch64 with everything up > to date, the 64-bit copy of the second element becomes a 32-bit copy. > > Perhaps we should turn off LTO for ceph until this is fixed... should we > reassign this to GCC/binutils? I think I'd say yes! Disabling LTO across the board seems less than optimal.
I gave it my best shot at renaming and reassigning this to gcc.
(In reply to Hector Martin from comment #20) > I minimized it two two object files and a simple command line. Package here: > https://marcan.fedorapeople.org/ltobug.tar.gz > > Running the included script will link both object files together, then diff > the disassembly of the problem function. On F39 Aarch64 with everything up > to date, the 64-bit copy of the second element becomes a 32-bit copy. > > Perhaps we should turn off LTO for ceph until this is fixed... should we > reassign this to GCC/binutils? For really usable reproducer, we'd need preprocessed sources for the 2 object files + full command lines how to compile those, otherwise the reproducer isn't usable for anything but the exact GCC NVR. Compiling those files with -save-temps should leave around the *.ii files.
FEDORA-2024-f7360ebbb2 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2024-f7360ebbb2
Created attachment 2008347 [details] processed ceph .../rpmbuild/BUILD/ceph-18.2.1/src/common/Timer.cc
fwiw, the relevant Ceph source file, preproccessed, is in https://bugzilla.redhat.com/attachment.cgi?id=2008347 and the command line it was compiled with is g++ -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT -DCEPH_INSTALL_DATADIR=\"/usr/share/ceph\" -DCEPH_INSTALL_FULL_PKGLIBDIR=\"/usr/lib64/ceph\" -DCMAKE_INSTALL_LIBDIR=\"lib64\" -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/src/include -I/home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/boost/include -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/include -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/xxHash -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/fmt/include -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -O2 -g -DNDEBUG -std=c++20 -fPIC -U_FORTIFY_SOURCE -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wstrict-null-sentinel -Woverloaded-virtual -fstack-protector-strong -fdiagnostics-color=auto -MD -MT src/common/CMakeFiles/common-common-objs.dir/Timer.cc.o -MF src/common/CMakeFiles/common-common-objs.dir/Timer.cc.o.d -o src/common/CMakeFiles/common-common-objs.dir/Timer.cc.o -c /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/common/Timer.cc
Thanks; do you have the same for SloppyCRCMap.cc ?
FEDORA-2024-f7360ebbb2 has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-f7360ebbb2` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-f7360ebbb2 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
(In reply to Jakub Jelinek from comment #31) > Thanks; do you have the same for SloppyCRCMap.cc ? https://bugzilla.redhat.com/attachment.cgi?id=2008407 compiled with /usr/lib64/ccache/g++ -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT -DCEPH_INSTALL_DATADIR=\"/usr/share/ceph\" -DCEPH_INSTALL_FULL_PKGLIBDIR=\"/usr/lib64/ceph\" -DCMAKE_INSTALL_LIBDIR=\"lib64\" -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/src/include -I/home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/boost/include -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/include -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/xxHash -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/fmt/include -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -O2 -g -DNDEBUG -std=c++20 -fPIC -U_FORTIFY_SOURCE -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wstrict-null-sentinel -Woverloaded-virtual -fstack-protector-strong -fdiagnostics-color=auto -MD -MT src/common/CMakeFiles/common-common-objs.dir/SloppyCRCMap.cc.o -MF src/common/CMakeFiles/common-common-objs.dir/SloppyCRCMap.cc.o.d -o src/common/CMakeFiles/common-common-objs.dir/SloppyCRCMap.cc.o -c /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/common/SloppyCRCMap.cc
FEDORA-2024-f7360ebbb2 has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report.
(In reply to Patrick C. F. Ernzer from comment #13) > upgraded to F39. This worked just fine on F38 and prior versions. […] bug is fixed for me with ceph-common-18.2.1-4.fc39.x86_64 Thanks.
Fedora Linux 39 entered end-of-life (EOL) status on 2024-11-26. Fedora Linux 39 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days