Bug 2241339 - gcc incorrectly compiles ceph with LTO and truncates 64-bit values to 32-bit
Summary: gcc incorrectly compiles ceph with LTO and truncates 64-bit values to 32-bit
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: 39
Hardware: x86_64
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:ab00e03a403b9a475dab3ae8c57...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-09-29 11:50 UTC by bober
Modified: 2025-03-28 04:25 UTC (History)
27 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-11-27 21:32:27 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: proc_pid_status (1.44 KB, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: maps (3.86 KB, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: limits (1.29 KB, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: environ (5.72 KB, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: open_fds (1.09 KB, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: mountinfo (3.91 KB, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: os_info (756 bytes, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: cpuinfo (3.02 KB, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: core_backtrace (25.73 KB, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: exploitable (81 bytes, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: dso_list (531 bytes, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
File: backtrace (124.24 KB, text/plain)
2023-09-29 11:50 UTC, bober
no flags Details
processed ceph .../rpmbuild/BUILD/ceph-18.2.1/src/common/Timer.cc (10.90 MB, text/plain)
2024-01-11 20:18 UTC, Kaleb KEITHLEY
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 63867 0 None None None 2023-12-20 17:10:55 UTC
GNU Compiler Collection 113359 0 P3 UNCONFIRMED [13 Regression] LTO miscompilation of ceph on aarch64 2024-01-12 18:46:06 UTC

Description bober 2023-09-29 11:50:14 UTC
Description of problem:
ceph -s

Version-Release number of selected component:
ceph-common-2:18.2.0-1.fc39

Additional info:
reporter:       libreport-2.17.11
type:           CCpp
reason:         python3.12 killed by SIGSEGV
journald_cursor: s=9f1834c2842e46078968d850d7623551;i=87b8a6;b=24a9552ddedc44b0bfff8a41d6b1b100;m=35395e25;t=6067dddcc3fa6;x=480a204dbabd5686
executable:     /usr/bin/python3.12
cmdline:        /usr/bin/python3.12 /usr/bin/ceph -s
cgroup:         0::/user.slice/user-1980.slice/user/app.slice/app-org.gnome.Terminal.slice/vte-spawn-fbbd8bfa-c6ca-4483-984b-c9734903c040.scope
rootdir:        /
uid:            1980
kernel:         6.5.5-300.fc39.x86_64
package:        ceph-common-2:18.2.0-1.fc39
runlevel:       N 5
backtrace_rating: 4
crash_function: std::_Rb_tree_rebalance_for_erase
comment:        ceph -s

Truncated backtrace:
Thread no. 1 (22 frames)
 #0 std::_Rb_tree_rebalance_for_erase at ../../../../../libstdc++-v3/src/c++98/tree.cc:289
 #1 std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned int>, std::_Select1st<std::pair<unsigned long const, unsigned int> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned int> > >::_M_erase_aux at /usr/include/c++/13/bits/stl_tree.h:2489
 #2 std::_Rb_tree<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>, std::_Select1st<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >, std::less<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > >, std::allocator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >) at /usr/include/c++/13/bits/stl_tree.h:1210
 #3 std::multimap<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, Context*, std::less<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > >, std::allocator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >) at /usr/include/c++/13/bits/stl_multimap.h:715
 #4 CommonSafeTimer<std::mutex>::cancel_all_events at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/common/Timer.cc:206
 #5 CommonSafeTimer<std::mutex>::shutdown at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/common/Timer.cc:66
 #6 MonClient::shutdown at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/mon/MonClient.cc:565
 #7 MonClient::get_monmap_and_config at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/mon/MonClient.cc:202
 #8 librados::v14_2_0::RadosClient::connect at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/librados/RadosClient.cc:232
 #9 _rados_connect at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/src/librados/librados_c.cc:221
 #10 __pyx_pf_5rados_5Rados_28connect at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/redhat-linux-build/src/pybind/rados/rados.c:23202
 #11 __pyx_pw_5rados_5Rados_29connect at /usr/src/debug/ceph-18.2.0-1.fc39.x86_64/redhat-linux-build/src/pybind/rados/rados.c:23123
 #12 _PyObject_VectorcallTstate at /usr/src/debug/python3.12-3.12.0
 #13 method_vectorcall at /usr/src/debug/python3.12-3.12.0
 #14 PyCFunction_Call at /usr/src/debug/python3.12-3.12.0
 #15 _PyEval_EvalFrameDefault at Python/bytecodes.c:3259
 #16 _PyFunction_Vectorcall at /usr/src/debug/python3.12-3.12.0
 #17 _PyObject_VectorcallTstate at /usr/src/debug/python3.12-3.12.0
 #18 method_vectorcall at /usr/src/debug/python3.12-3.12.0
 #19 thread_run at /usr/src/debug/python3.12-3.12.0
 #20 pythread_wrapper at /usr/src/debug/python3.12-3.12.0
 #22 clone3 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Comment 1 bober 2023-09-29 11:50:18 UTC
Created attachment 1991075 [details]
File: proc_pid_status

Comment 2 bober 2023-09-29 11:50:20 UTC
Created attachment 1991076 [details]
File: maps

Comment 3 bober 2023-09-29 11:50:21 UTC
Created attachment 1991077 [details]
File: limits

Comment 4 bober 2023-09-29 11:50:22 UTC
Created attachment 1991078 [details]
File: environ

Comment 5 bober 2023-09-29 11:50:24 UTC
Created attachment 1991079 [details]
File: open_fds

Comment 6 bober 2023-09-29 11:50:25 UTC
Created attachment 1991080 [details]
File: mountinfo

Comment 7 bober 2023-09-29 11:50:26 UTC
Created attachment 1991081 [details]
File: os_info

Comment 8 bober 2023-09-29 11:50:28 UTC
Created attachment 1991082 [details]
File: cpuinfo

Comment 9 bober 2023-09-29 11:50:30 UTC
Created attachment 1991083 [details]
File: core_backtrace

Comment 10 bober 2023-09-29 11:50:31 UTC
Created attachment 1991084 [details]
File: exploitable

Comment 11 bober 2023-09-29 11:50:33 UTC
Created attachment 1991085 [details]
File: dso_list

Comment 12 bober 2023-09-29 11:50:34 UTC
Created attachment 1991086 [details]
File: backtrace

Comment 13 Patrick C. F. Ernzer 2023-11-09 00:08:45 UTC
upgraded to F39. This worked just fine on F38 and prior versions.
this host actually runs the command every minute via check mk monitoring


reporter:       libreport-2.17.11
type:           CCpp
reason:         python3.12 killed by SIGSEGV
journald_cursor: s=03f255decddb458ea0ae62ab441cb404;i=c4c7081;b=f00e9a37cc7546118cee0422ccefbc3b;m=1d935020;t=609ac920275b3;x=53a05fee6090c7be
executable:     /usr/bin/python3.12
cmdline:        /usr/bin/python3.12 /usr/bin/ceph -s
cgroup:         0::/user.slice/user-1000.slice/user/app.slice/vte-spawn-72d1e136-3fcd-429c-9c1b-15d8e90c3695.scope
rootdir:        /
uid:            1000
kernel:         6.5.10-300.fc39.x86_64
package:        ceph-common-2:18.2.0-1.fc39
runlevel:       N 5
backtrace_rating: 4
crash_function: std::_Rb_tree_rebalance_for_erase

Comment 14 Rolando Cedillo 2023-11-20 16:00:49 UTC
Update from Fedora 38 to 39 (thus updating from ceph-common 17.6 to 18.2)


reporter:       libreport-2.17.11
type:           CCpp
reason:         python3.12 killed by SIGSEGV
journald_cursor: s=b8c420ec7eca4f9a9baa0a03c3b4bcd5;i=10fb2;b=107111c190664f248c76c52917556040;m=5e6ac0c8;t=60a0ee77b1627;x=a21fa0301d297dd9
executable:     /usr/bin/python3.12
cmdline:        /usr/bin/python3.12 /usr/bin/ceph auth get-key client.libvirt
cgroup:         0::/user.slice/user-1000.slice/user/app.slice/app-org.gnome.Terminal.slice/vte-spawn-87987c76-8794-4ee6-9f70-9092aecee322.scope
rootdir:        /
uid:            0
kernel:         6.5.11-300.fc39.x86_64
package:        ceph-common-2:18.2.0-2.fc39
runlevel:       N 5
backtrace_rating: 4
crash_function: std::_Rb_tree_rebalance_for_erase
comment:        Update from Fedora 38 to 39 (thus updating from ceph-common 17.6 to 18.2)

Comment 15 Hector Martin 2023-12-20 09:09:24 UTC
Same crash on F39 AArch64 (Fedora Asahi Remix), with the same backtrace. Doesn't look arch-specific.

The backtrace with debuginfod:
#0  std::_Rb_tree_rebalance_for_erase (__z=0xe0073e70, __header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:296
#1  0x0000ffffe8f17a94 in std::_Rb_tree<unsigned long, std::pair<unsigned long const, unsigned int>, std::_Select1st<std::pair<unsigned long const, unsigned int> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned int> > >::_M_erase_aux (__position=..., this=0xffffe6a6e068) at /usr/include/c++/13/bits/stl_tree.h:2489
#2  std::_Rb_tree<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>, std::_Select1st<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >, std::less<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > >, std::allocator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >) (
    __position=..., this=0xffffe6a6e068) at /usr/include/c++/13/bits/stl_tree.h:1210
#3  std::multimap<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, Context*, std::less<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > >, std::allocator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >) (__position=..., this=0xffffe6a6e068) at /usr/include/c++/13/bits/stl_multimap.h:715
#4  CommonSafeTimer<std::mutex>::cancel_all_events (this=this@entry=0xffffe6a6e000) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/common/Timer.cc:206
#5  0x0000ffffe8f1412c [PAC] in CommonSafeTimer<std::mutex>::shutdown (this=0xffffe6a6e000) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/common/Timer.cc:66
#6  0x0000ffffe9156678 [PAC] in MonClient::shutdown (this=0xffffe6a6dc00) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/mon/MonClient.cc:562
#7  0x0000ffffe91575cc [PAC] in MonClient::get_monmap_and_config (this=this@entry=0xffffe6a6dc00) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/mon/MonClient.cc:199
#8  0x0000ffffe96d0a48 [PAC] in librados::v14_2_0::RadosClient::connect (this=0xffffe0063bb0) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/librados/RadosClient.cc:232
#9  0x0000ffffe9667464 [PAC] in _rados_connect (cluster=0xffffe0063bb0) at /usr/src/debug/ceph-18.2.1-1.fc39.aarch64/src/librados/librados_c.cc:221
#10 0x0000ffffe97e912c [PAC] in __pyx_pf_5rados_5Rados_28connect () from /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so
#11 0x0000ffffe97e8d9c in __pyx_pw_5rados_5Rados_29connect () from /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so
#12 0x0000ffffe9915e0c in __Pyx_CyFunction_Vectorcall_FASTCALL_KEYWORDS () from /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so
#13 0x0000fffff7b18e40 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0xffffe6a6e5f8, callable=0xffffe9973c60, tstate=0xaaaaaaed6f70)
    at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Include/internal/pycore_call.h:92
#14 method_vectorcall (method=<optimized out>, args=0xfffff7f3e298 <_PyRuntime+76288>, nargsf=<optimized out>, kwnames=0x0)
    at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Objects/classobject.c:69
#15 0x0000fffff7ac9e90 [PAC] in PyCFunction_Call (kwargs=0xffffe6b165c0, args=0xfffff7f3e280 <_PyRuntime+76264>, callable=0xffffe6edac80)
    at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Objects/call.c:387
#16 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=0xfffff7fb0110, throwflag=<optimized out>) at Python/bytecodes.c:3254
#17 0x0000fffff7b18d58 [PAC] in _PyFunction_Vectorcall (kwnames=0x0, nargsf=1, stack=0xffffe6a6e858, func=0xffffe99c0fe0)
    at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Objects/call.c:419
#18 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0xffffe6a6e858, callable=0xffffe99c0fe0, tstate=0xaaaaaaed6f70)
    at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Include/internal/pycore_call.h:92
#19 method_vectorcall (method=<optimized out>, args=0xfffff7f3e298 <_PyRuntime+76288>, nargsf=<optimized out>, kwnames=0x0)
    at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Objects/classobject.c:69
--Type <RET> for more, q to quit, c to continue without paging--
#20 0x0000fffff7c20f24 [PAC] in thread_run (boot_raw=0xffffe0066050) at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Modules/_threadmodule.c:1114
#21 0x0000fffff7bd1ef0 [PAC] in pythread_wrapper (arg=<optimized out>) at /usr/src/debug/python3.12-3.12.1-1.fc39.aarch64/Python/thread_pthread.h:233
#22 0x0000fffff7800584 [PAC] in start_thread (arg=0xfffff7fa4760) at pthread_create.c:444
#23 0x0000fffff786fc4c [PAC] in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone3.S:76

... suggests a 64-bit pointer got cast to 32 bits somewhere. 0xe0073e70 is not a valid pointer, but 0xffffe0073e70 is (and points to the right thing).

Comment 16 Hector Martin 2023-12-20 09:17:29 UTC
Valgrind immediately complains about uninitialized data:

==4272== Memcheck, a memory error detector
==4272== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==4272== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==4272== Command: /usr/bin/ceph status
==4272== 
==4272== Thread 2:
==4272== Use of uninitialised value of size 8
==4272==    at 0x153763D4: std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) (tree.cc:297)
==4272==    by 0x14BB7A93: UnknownInlinedFun (stl_tree.h:2494)
==4272==    by 0x14BB7A93: UnknownInlinedFun (stl_tree.h:1210)
==4272==    by 0x14BB7A93: UnknownInlinedFun (stl_multimap.h:715)
==4272==    by 0x14BB7A93: CommonSafeTimer<std::mutex>::cancel_all_events() (Timer.cc:206)
==4272==    by 0x14BB412B: CommonSafeTimer<std::mutex>::shutdown() (Timer.cc:67)
==4272==    by 0x14DF6677: MonClient::shutdown() (MonClient.cc:562)
==4272==    by 0x14DF75CB: MonClient::get_monmap_and_config() (MonClient.cc:199)
==4272==    by 0x148F0A47: librados::v14_2_0::RadosClient::connect() (RadosClient.cc:232)
==4272==    by 0x14887463: rados_connect@@ (librados_c.cc:221)
==4272==    by 0x146A912B: __pyx_pf_5rados_5Rados_28connect (in /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so)
==4272==    by 0x146A8D9B: __pyx_pw_5rados_5Rados_29connect (in /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so)
==4272==    by 0x147D5E0B: __Pyx_CyFunction_Vectorcall_FASTCALL_KEYWORDS (in /usr/lib64/python3.12/site-packages/rados.cpython-312-aarch64-linux-gnu.so)
==4272==    by 0x4AD8E3F: UnknownInlinedFun (pycore_call.h:92)
==4272==    by 0x4AD8E3F: method_vectorcall (classobject.c:69)
==4272==    by 0x4A89E8F: UnknownInlinedFun (call.c:387)
==4272==    by 0x4A89E8F: _PyEval_EvalFrameDefault (bytecodes.c:3254)

It actually works in Valgrind though, which strongly suggests this is really just a bug in Ceph where it's passing uninitialized pointers to the stl stuff. That I was getting the low 32 bits of a real pointer was probably a coincidence and that was just the garbage it read.

Comment 17 Hector Martin 2023-12-20 11:54:15 UTC
Reported upstream: https://tracker.ceph.com/issues/63867

Comment 18 Hector Martin 2023-12-22 15:05:58 UTC
I am now quite certain this has nothing to do with Ceph and it's a toolchain/GCC bug. It *is* actually miscompiling things and not copying the full pointer value.

In the disassembly of this C++ template goop:

std::_Rb_tree_iterator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > std::_Rb_tree<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>, std::_Select1st<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> >, std::less<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > >, std::allocator<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*> > >::_M_emplace_equal<std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>&>(std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>&)

Note that the argument is of type: std::pair<std::chrono::time_point<ceph::mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > const, Context*>

The source code is:

  template<typename _Key, typename _Val, typename _KeyOfValue,
           typename _Compare, typename _Alloc>
    template<typename... _Args>
      auto
      _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
      _M_emplace_equal(_Args&&... __args)
      -> iterator
      {    
        _Auto_node __z(*this, std::forward<_Args>(__args)...);
        auto __res = _M_get_insert_equal_pos(__z._M_key());
        return __z._M_insert(__res);
      }

My understanding here is __z is allocated and the argument copied into it.

x0 should be this and x1 should be the arg (a pair):

=> 0x0000ffffe8f14e84 <+0>:     paciasp
   0x0000ffffe8f14e88 <+4>:     stp     x29, x30, [sp, #-48]!
   0x0000ffffe8f14e8c <+8>:     mov     x29, sp
   0x0000ffffe8f14e90 <+12>:    stp     x19, x20, [sp, #16]
   0x0000ffffe8f14e94 <+16>:    mov     x19, x0
   0x0000ffffe8f14e98 <+20>:    mov     x0, #0x30                       // #48
   0x0000ffffe8f14e9c <+24>:    str     x21, [sp, #32]
   0x0000ffffe8f14ea0 <+28>:    mov     x21, x1                     <-- save x1 in x21
   0x0000ffffe8f14ea4 <+32>:    bl      0xffffe8e903f0 <_Znwm@plt>  <-- allocate new node
   0x0000ffffe8f14ea8 <+36>:    mov     x20, x0                     <-- new node is in x0
   0x0000ffffe8f14eac <+40>:    ldr     x2, [x19, #16]
   0x0000ffffe8f14eb0 <+44>:    add     x3, x19, #0x8
   0x0000ffffe8f14eb4 <+48>:    ldr     x7, [x21]                   <-- load the first half of the value (the std::chrono::time_point)
   0x0000ffffe8f14eb8 <+52>:    str     x7, [x0, #32]               <-- store it in the new node
   0x0000ffffe8f14ebc <+56>:    ldr     w1, [x21, #8]               <-- BUG: loads 32 bits from value+8, should be 64 bits (a Context*)
   0x0000ffffe8f14ec0 <+60>:    str     w1, [x0, #40]               <-- store it in the new node

At function entry we can see the time_point and Context* in x1:

(gdb) x/2gx (uint64_t*)$x1
0xffffe6a6d6d8: 0x00000a098f5d8400      0x0000ffffe0067bb0

After stepping a bit until +44, the pair pointer is now in x21

(gdb) x/2gx (uint64_t*)$x21
0xffffe6a6d6d8: 0x00000a098f5d8400      0x0000ffffe0067bb0

The newly allocated block is garbage at this point:

(gdb) x/2gx (uint64_t*)($x0 + 32)
0xffffe0067b30: 0x0000000000000011      0x676e697279656b2f

Step a bit more and there's a nice smoking gun:

0x0000ffffe8f14eb8 in std::construct_at<std::pair<unsigned long const, unsigned int>, std::pair<unsigned long const, unsigned int> > (__location=0xffffe0067b30)
    at /usr/include/c++/13/bits/stl_construct.h:97
97          { return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); }
(gdb) 
0x0000ffffe8f14ebc      97          { return ::new((void*)__location) _Tp(std::forward<_Args>(__args)...); }

Note the type name here: it thinks this is now a std::pair<unsigned long const, unsigned int>, which it obviously isn't nor is it equivalent.

Step one more instruction and look at the target memory again:

(gdb) x/2gx (uint64_t*)($x0 + 32)
0xffffe0067b30: 0x00000a098f5d8400      0x676e6972e0067bb0

Obviously only the low 32 bits of the pointer were copied.

Let it run and we get the expected segfault:

2023-12-23T00:04:54.779+0900 ffffddfbf180 10 timer(0xffffe6a6e000).timer_thread executing 0x676e6972e0067bb0

Thread 11 "safe_timer" received signal SIGSEGV, Segmentation fault.

GCC is confusing different instantiations of std::pair<> together.

Comment 19 Hector Martin 2023-12-22 15:35:48 UTC
This is an LTO bug. The object file has the correct code, but the linked library does not. Changing `-fto=auto` to `-fno-lto` fixes the problem. Trying to minimize the linker command line now.

Comment 20 Hector Martin 2023-12-22 15:57:19 UTC
I minimized it two two object files and a simple command line. Package here: https://marcan.fedorapeople.org/ltobug.tar.gz

Running the included script will link both object files together, then diff the disassembly of the problem function. On F39 Aarch64 with everything up to date, the 64-bit copy of the second element becomes a 32-bit copy.

Perhaps we should turn off LTO for ceph until this is fixed... should we reassign this to GCC/binutils?

Comment 21 Hector Martin 2023-12-23 02:32:59 UTC
Confirmed that disabling LTO in the spec file produces working RPMs without the issue.

Comment 22 Chris Roadfeldt 2024-01-05 00:16:25 UTC
For those that may come here looking for an interim solution, here's what worked for me based on Hectors work above.

Download ceph srpm from https://dl.fedoraproject.org/pub/fedora/linux/updates/39/Everything/source/tree/Packages/c/

rpm -i ceph-XXXXXXXX.srpm

cd rpmbuild/SPECS

vi ceph.spec

add following line to ceph.spec file, I added to the first non comment line.
%global _lto_cflags %{nil}

rpmbuild -ba ceph.spec

Wait for the compilation and packaging to complete.
Execute the next line to replace your existing rpms with the non-lto variants. Adjust as needed in case I missed any package identifiers.
for i in $(rpmquery -qa | grep -E "ceph|rbd|rgw2|rados" | grep -v -E "libvirt|qemu" | grep x86_64); do echo ${i%%-[[:digit:]]*}; sudo rpm -Uvh --force ../RPMS/x86_64/${i%%-[[:digit:]]*}-18*; done

Comment 23 Kaleb KEITHLEY 2024-01-10 20:09:36 UTC
(In reply to Hector Martin from comment #20)
> I minimized it two two object files and a simple command line. Package here:
> https://marcan.fedorapeople.org/ltobug.tar.gz
> 
> Running the included script will link both object files together, then diff
> the disassembly of the problem function. On F39 Aarch64 with everything up
> to date, the 64-bit copy of the second element becomes a 32-bit copy.
> 
> Perhaps we should turn off LTO for ceph until this is fixed... should we
> reassign this to GCC/binutils?

I think I'd say yes!  Disabling LTO across the board seems less than optimal.

Comment 24 Neal Gompa 2024-01-11 10:50:10 UTC
I gave it my best shot at renaming and reassigning this to gcc.

Comment 25 Jakub Jelinek 2024-01-11 12:22:56 UTC
(In reply to Hector Martin from comment #20)
> I minimized it two two object files and a simple command line. Package here:
> https://marcan.fedorapeople.org/ltobug.tar.gz
> 
> Running the included script will link both object files together, then diff
> the disassembly of the problem function. On F39 Aarch64 with everything up
> to date, the 64-bit copy of the second element becomes a 32-bit copy.
> 
> Perhaps we should turn off LTO for ceph until this is fixed... should we
> reassign this to GCC/binutils?

For really usable reproducer, we'd need preprocessed sources for the 2 object files + full command lines
how to compile those, otherwise the reproducer isn't usable for anything but the exact GCC NVR.
Compiling those files with -save-temps should leave around the *.ii files.

Comment 26 Fedora Update System 2024-01-11 12:47:35 UTC
FEDORA-2024-f7360ebbb2 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2024-f7360ebbb2

Comment 29 Kaleb KEITHLEY 2024-01-11 20:18:11 UTC
Created attachment 2008347 [details]
processed ceph .../rpmbuild/BUILD/ceph-18.2.1/src/common/Timer.cc

Comment 30 Kaleb KEITHLEY 2024-01-11 20:22:50 UTC
fwiw, the relevant Ceph source file, preproccessed, is in https://bugzilla.redhat.com/attachment.cgi?id=2008347 and the command line it was compiled with is

g++ -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT -DCEPH_INSTALL_DATADIR=\"/usr/share/ceph\" -DCEPH_INSTALL_FULL_PKGLIBDIR=\"/usr/lib64/ceph\" -DCMAKE_INSTALL_LIBDIR=\"lib64\" -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/src/include -I/home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/boost/include -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/include -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/xxHash -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/fmt/include -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong   -m64   -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -O2 -g -DNDEBUG -std=c++20 -fPIC   -U_FORTIFY_SOURCE -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wstrict-null-sentinel -Woverloaded-virtual -fstack-protector-strong -fdiagnostics-color=auto -MD -MT src/common/CMakeFiles/common-common-objs.dir/Timer.cc.o -MF src/common/CMakeFiles/common-common-objs.dir/Timer.cc.o.d -o src/common/CMakeFiles/common-common-objs.dir/Timer.cc.o -c /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/common/Timer.cc

Comment 31 Jakub Jelinek 2024-01-11 20:25:46 UTC
Thanks; do you have the same for SloppyCRCMap.cc ?

Comment 32 Fedora Update System 2024-01-12 02:04:23 UTC
FEDORA-2024-f7360ebbb2 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-f7360ebbb2`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-f7360ebbb2

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 34 Kaleb KEITHLEY 2024-01-12 12:49:54 UTC
(In reply to Jakub Jelinek from comment #31)
> Thanks; do you have the same for SloppyCRCMap.cc ?

https://bugzilla.redhat.com/attachment.cgi?id=2008407

compiled with 

/usr/lib64/ccache/g++ -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT -DCEPH_INSTALL_DATADIR=\"/usr/share/ceph\" -DCEPH_INSTALL_FULL_PKGLIBDIR=\"/usr/lib64/ceph\" -DCMAKE_INSTALL_LIBDIR=\"lib64\" -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/src/include -I/home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/boost/include -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/redhat-linux-build/include -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/xxHash -isystem /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/fmt/include -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong   -m64   -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -O2 -g -DNDEBUG -std=c++20 -fPIC   -U_FORTIFY_SOURCE -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wstrict-null-sentinel -Woverloaded-virtual -fstack-protector-strong -fdiagnostics-color=auto -MD -MT src/common/CMakeFiles/common-common-objs.dir/SloppyCRCMap.cc.o -MF src/common/CMakeFiles/common-common-objs.dir/SloppyCRCMap.cc.o.d -o src/common/CMakeFiles/common-common-objs.dir/SloppyCRCMap.cc.o -c /home/kkeithle/rpmbuild/BUILD/ceph-18.2.1/src/common/SloppyCRCMap.cc

Comment 35 Fedora Update System 2024-01-19 02:42:06 UTC
FEDORA-2024-f7360ebbb2 has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 36 Patrick C. F. Ernzer 2024-01-31 18:10:00 UTC
(In reply to Patrick C. F. Ernzer from comment #13)
> upgraded to F39. This worked just fine on F38 and prior versions.
[…]

bug is fixed for me with ceph-common-18.2.1-4.fc39.x86_64
Thanks.

Comment 37 Aoife Moloney 2024-11-27 21:32:27 UTC
Fedora Linux 39 entered end-of-life (EOL) status on 2024-11-26.

Fedora Linux 39 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 38 Red Hat Bugzilla 2025-03-28 04:25:32 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.