Bug 136455
Summary: | (glibc 70+) bad debuginfo after Oct 18th confuses gdb | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Warren Togami <wtogami> |
Component: | gdb | Assignee: | Elena Zannoni <ezannoni> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rawhide | CC: | cagney, eng-i18n-bugs, jakub, jjohnstn, wtogami |
Target Milestone: | --- | Keywords: | i18n |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-11-13 01:33:12 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 125997, 135876, 137149 |
Description
Warren Togami
2004-10-20 06:53:27 UTC
It looks like htt_server is getting stuck due to a gdb error. and iiimf-segv-logger, which was realized from forked process on htt_server, is just waiting until that parent process gets a exit status. though it actually does waitpid, it seems to not work somehow. Confirmed. Rebuilt 12.1-1 using yesterday's new gcc-3.4.2-6 and it fails in the same way. Something in gcc went bad. Hmm, built with gcc-3.4.2-5 and it seems to behave identically bad. im-sdk: drwxrwsr-x 11 100 buildsys 4096 Oct 18 10:57 12.1-1 drwxrwsr-x 11 100 buildsys 4096 Oct 19 15:32 12.1-2 glibc: drwxrwsr-x 12 100 buildsys 4096 Oct 14 11:25 2.3.3-68 drwxrwsr-x 12 100 buildsys 4096 Oct 19 01:26 2.3.3-70 gcc: drwxrwsr-x 12 100 buildsys 4096 Oct 7 08:40 3.4.2-5 drwxrwsr-x 12 100 buildsys 4096 Oct 18 19:36 3.4.2-6 Another (very unlikely candidate for trouble) is glibc -70? If not, then we have to look at all of the build dependencies of im-sdk for something that could have changed. Traceback being dumped into /var/log/iiim/ gets stuck at a certain point. ================================================== Process 11565 received signal 1156511 ================================================== Using host libthread_db library "/lib/tls/libthread_db.so.1". [Thread debugging using libthread_db enabled] [New Thread -154016064 (LWP 11565)] [New Thread -155034704 (LWP 11584)] [Thread debugging using libthread_db enabled] [New Thread -154016064 (LWP 11565)] [New Thread -155034704 (LWP 11584)] [Thread debugging using libthread_db enabled] [New Thread -154016064 (LWP 11565)] [New Thread -155034704 (LWP 11584)] 0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #0 0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0xf6dbe200 in fork () from /lib/tls/libc.so.6 #2 0xf6f61554 in fork () from /lib/tls/libpthread.so.0 #3 0x08055c6f in IMSignal::_segv (this=0x8fb7fc8, num=11) at IMSignal.cpp:94 #4 0x08055dff in IMSignal::segv (this=0x2d44) at IMSignal.cpp:36 #5 0x080552f0 in signal_handler (num=0) at IMSignal.cpp:136 #6 <signal handler called> #7 0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #8 0xf6df0564 in poll () from /lib/tls/libc.so.6 #9 0x0809206c in IMSocketListen::accept (this=0x8fc0808) at IMUtil.cpp:651 #10 0x0806b44c in IIIMProtocol::accept (this=0x8fc9690, flags=0) at IIIMProtocol.cpp:59 #11 0x080698df in IMScheduler_MTPC::start (this=0x8fb88c0) at IMScheduler_MTPC.cpp:53 #12 0x0804d9ca in IMSvr::start (this=0xfffffffc) at IMScheduler.hh:24 #13 0x0804d755 in main (argc=-4, argv=0xfee7b664) at main.cpp:44 #14 0xf6d49e33 in __libc_start_main () from /lib/tls/libc.so.6 #15 0x0804d621 in _start () Thread 2 (Thread -155034704 (LWP 11584)): #0 0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 No symbol table info available. #1 0xf6f5f7e8 in recv () from /lib/tls/libpthread.so.0 No symbol table info available. #2 0x08090346 in IMSocketTrans::recv (this=0xfffffe00, p=0xfffffe00, n=4294966784) at IMUtil.cpp:573 No locals. #3 0x0807cbe9 in IIIMPTrans_read (ptrans=0x8fc0ba0, buf=0xf6c253e0, nbyte=8) at IIIMPTrans.cpp:72 st = -512 n = 8 p = ( unsigned char *) 0xf6c253e0 "�\v�\b�[��\217.��\220\226�\b�\v�\b�[��\030T��n�\a\bx\217�\b���\b\024T�����\b\205������\b8T����\006\b�\v�\b���\b\001" #4 0xf6fe2ec4 in iiimf_stream_receive (stream=0x8fc8f78, data_s=0xfffffe00, message_ret=0xfffffe00) at misc/stream.c:106 status = 150735840 ptr = (const uchar_t *) 0xf6fe7728 "@F\002" p = (uchar_t *) 0x80a81dc "\030\200\n\b�����I���\016\234" nbyte = 150790048 message = (IIIMP_message *) 0x8fc0be0 header = "�\v�\b�[� header_len = 150735840 length = -152725504 buf = (uchar_t *) 0xf6c25bb0 "�[�����\b�[��\001" #5 0x0807ca6e in IIIMPTrans::receive (this=0x80a81dc) at IIIMPTrans.cpp:32 st = -512 pmes = (IIIMP_message *) 0x8fcdfa0 #6 0x0806b3b2 in IIIMProtocol::receive_and_dispatch (this=0x8fc9690, pims=0x8fc0be0, flags=0) at IIIMP_IMState.hh:35 pmes = (IIIMP_message *) 0x8fc9690 #7 0x08069833 in IMScheduler_MTPC_thread_entry (priv=0xfffffe00) at IMScheduler_MTPC.cpp:25 pimp = (class IMProtocol *) 0x8fc9690 pims = (class IMState *) 0x8fc0be0 #8 0xf6f5a1d5 in start_thread () from /lib/tls/libpthread.so.0 No symbol table info available. #9 0xf6dfa19a in clone () from /lib/tls/libc.so.6 No symbol table info available. Thread 1 (Thread -154016064 (LWP 11565)): #0 0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 No symbol table info available. #1 0xf6dbe200 in fork () from /lib/tls/libc.so.6 No symbol table info available. #2 0xf6f61554 in fork () from /lib/tls/libpthread.so.0 No symbol table info available. #3 0x08055c6f in IMSignal::_segv (this=0x8fb7fc8, num=11) at IMSignal.cpp:94 retval = 11588 #4 0x08055dff in IMSignal::segv (this=0x2d44) at IMSignal.cpp:36 No locals. #5 0x080552f0 in signal_handler (num=0) at IMSignal.cpp:136 pims = (IMSignal *) 0x8fb7fc8 ph = Very odd behavior with im-sdk-12.1-2 /usr/sbin/htt_server -nodaemon ============================== SEGV of this fails as described above. But if you copy /usr/sbin/htt_server to /tmp /tmp/htt_server -nodaemon This works properly when SEGV. Is this is because it cannot find the corresponding debuginfo, which has bad symbols? mkdir /usr/lib/debug/tmp cp -a /usr/lib/debug/{usr/sbin/htt_server,tmp/htt_server} Then /tmp/htt_server fails identically. This is bad debuginfo. More bad news. Same binaries built without stripping fail while debugging in exactly the same way. stripping was not at fault. http://people.redhat.com/wtogami/im-sdk/ 1: original working 12.1-1 binaries bad-1: above SRPM rebuilt with matching gcc-3.4.2-5 bad-1-nostrip: above SRPM rebuilt with matching gcc-3.4.2-5 without stripping If you use the "nostrip" binaries, you must uninstall the debuginfo package, and mkdir /usr/src/debug/im-sdk-src-r12_1 in order to trick htt_server into doing its automatic gdb dump. The only changes between 1: and bad-1: htt_server is: 1) 4 st_size changes in .dynsym for SHN_UND symbols caused by libdl.so changes - nothing uses these 2) 4 bytes in .gnu_debuglink section - CRC of the debuginfo file most probably The only changes between 1: and bad-1: htt_server.debug are: 1) different DW_AT_decl_line values for some glibc headers (<stdio.h>, <features.h>, <libio.h>, <string.h>, <sys/cdefs.h> had whitespace changes and preprocessor directives changes that don't show up in the debuginfo) the whole .debug_info section is 8 bytes larger, supposedly because some DW_AT_decl_line values grew from < 128 to >= 128 or other uleb128/sleb128 boundary 2) different indirect string offsets into .debug_str, but the strings they are referencing are identical (this is I think related to the growth of the .debug_info section) What I did to check this is for i in l a p r m s o; do readelf -w$i *.debug; done compare the result (was identical). readelf -wf and -wF shows differences in .eh_frame section, but that is just a bug in readelf (in the .debug file .eh_frame is STT_NOBITS). Last, I did readelf -wi *.debug | sed 's/offset: 0x[0-9a-f]\+)/offset: 0xXX)/g;s/DW_AT_decl_line.*$/DW_AT_decl_line/' and compared it between bad and good, there were no differences. So, if as Warren claims, gdb doesn't grok one debug info and groks the other one, it looks like gdb bug, not gcc nor binutils/elfutils bug. The commands used are in: /usr/lib/im/share/iiim/gdbcmd Any chance this can be fixed real soon, or should this be target for RHEL4RC and FC3 updates? This is actually a big gdb feature that is missing. The problem is that gdb doesn't handle the DW_OP_piece dwarf2 information. There is even a feature open for RHEL. There is no way this can be fixed quickly. We can maybe add a hackish workaround to make gdb not barf like that. Would a "hackish workaround to make gdb not barf" be suitable for RHEL4? Meanwhile could you suggest anything that we can use to workaround this problem? We don't need the "big gdb feature that is missing" implemented, but we do need some workaround because it would be VERY BAD to ship this in RHEL4. Adding RHEL4RCBlocker. export RPM_OPT_FLAGS=`echo $RPM_OPT_FLAGS | sed s/-O2/-O0/` The binaries built with this seem to avoid this gdb problem. -O1 is equally broken to -O2. I suspect there is a more fine-graind compiler flag that we can use with -O2 to make it work without losing too much performance. Investigating. I tried to isolate if a specific compiler flag causes this using: export RPM_OPT_FLAGS=`echo $RPM_OPT_FLAGS | sed s/-O2/-fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers/` These flags according to gcc's man page are equivalent to -O1, yet it doesn't seem to behave this way. -O1 produces symbols that causes gdb to die, while the equivalent does not? Strange... jakub explained that -O levels are much more complicated than a combination of flags, and attempting to disable a compiler optimization is a bad idea. We should instead just wait for gdb to be fixed. Meanwhile we are in very bad shape with an un-debuggable im-sdk, that causes the desktop to go haywire if it does crash. So the workaround is to use -gstabs+ which changes from the default DWARF-2 to old stabs+ debugging symbols. I am testing this now. Bad news. With -gstabs+ gdb gets stuck in another way. [root@ibmlaptop tmp]# htt_server -nodaemon /usr/lib/im/share/iiim/gdbcmd:2: Error in sourced command file: invalid pointer to member function <jakub> warren: didn't know iimf is C++ <jakub> warren: stabs are quite bad in expressing C++ <jakub> warren: on the other side, gdb is quite bad at handling dwarf2, so the situation is not really good elena built -43 in fc3-HEAD which no longer gets stuck when gdb fails to grok the DWARF-2 symbols. However we need to check if the dumped backtrace makes any sense, as elena said it may not. Probably unlikely that -43 is going into FC3 final, although I would encourage elena to try to ask Sopwith anyway. In any case, -43+ should be pushed in FC3 updates, and this definitely must ship in RHEL4 final. It will be included in RHEL4 as soon as there is a new build available. Was it already accepted into dist-4E by mikem? Yes, actually the 4E version is now .46. Would you recommend pushing .46 to FC3 updates too? .43 is heading to FC3 updates. |