Bug 136455

Summary:	(glibc 70+) bad debuginfo after Oct 18th confuses gdb
Product:	[Fedora] Fedora	Reporter:	Warren Togami <wtogami>
Component:	gdb	Assignee:	Elena Zannoni <ezannoni>
Status:	CLOSED ERRATA	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	rawhide	CC:	cagney, eng-i18n-bugs, jakub, jjohnstn, wtogami
Target Milestone:	---	Keywords:	i18n
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-11-13 01:33:12 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	125997, 135876, 137149

Description Warren Togami 2004-10-20 06:53:27 UTC

Description of problem:
12.1-1 -> 12.1-2 the segv logger broke.  It now gets stuck rather than
log the backgtrace and terminate htt_server.  The output below gives
some clue as to what happened.  Also the log file in /var/log/iiim/
apperas to get stuck mid-backtrace.

When htt_server is stuck in that condition, it causes the rest of the
desktop to go haywire.  firefox crashes, gnome-panel crashes,
bug-buddy crashes.  The only way to recover is to kill -9 htt_server.

[root@ibmlaptop imsdk]# htt_server -nodaemon
/usr/lib/im/share/iiim/gdbcmd:2: Error in sourced command file:
Unhandled dwarf expression opcode 0x93
Killed

[root@ibmlaptop tmp]# kill -s SEGV 14304

root     14306 14304  0 20:44 pts/4    00:00:00 [iiimf-segv-logg]
<defunct>
root     14304  8981  0 20:44 pts/4    00:00:00 htt_server -nodaemon

[root@ibmlaptop tmp]# kill -9 14304

Version-Release number of selected component (if applicable):
im-sdk-12.1-2

Comment 1 Akira TAGOH 2004-10-20 07:58:45 UTC

It looks like htt_server is getting stuck due to a gdb error. and
iiimf-segv-logger, which was realized from forked process on
htt_server, is just waiting until that parent process gets a exit
status. though it actually does waitpid, it seems to not work somehow.

Comment 2 Warren Togami 2004-10-20 08:17:24 UTC

Confirmed.  Rebuilt 12.1-1 using yesterday's new gcc-3.4.2-6 and it
fails in the same way.  Something in gcc went bad.

Comment 3 Warren Togami 2004-10-20 10:15:52 UTC

Hmm, built with gcc-3.4.2-5 and it seems to behave identically bad.

im-sdk:
drwxrwsr-x   11 100      buildsys     4096 Oct 18 10:57 12.1-1
drwxrwsr-x   11 100      buildsys     4096 Oct 19 15:32 12.1-2

glibc:
drwxrwsr-x   12 100      buildsys     4096 Oct 14 11:25 2.3.3-68
drwxrwsr-x   12 100      buildsys     4096 Oct 19 01:26 2.3.3-70

gcc:
drwxrwsr-x   12 100      buildsys     4096 Oct  7 08:40 3.4.2-5
drwxrwsr-x   12 100      buildsys     4096 Oct 18 19:36 3.4.2-6

Another (very unlikely candidate for trouble) is glibc -70?  If not,
then we have to look at all of the build dependencies of im-sdk for
something that could have changed.

Comment 4 Warren Togami 2004-10-20 10:44:29 UTC

Traceback being dumped into /var/log/iiim/ gets stuck at a certain point.

==================================================
Process 11565 received signal 1156511
==================================================
Using host libthread_db library "/lib/tls/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
[New Thread -154016064 (LWP 11565)]
[New Thread -155034704 (LWP 11584)]
[Thread debugging using libthread_db enabled]
[New Thread -154016064 (LWP 11565)]
[New Thread -155034704 (LWP 11584)]
[Thread debugging using libthread_db enabled]
[New Thread -154016064 (LWP 11565)]
[New Thread -155034704 (LWP 11584)]
0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#0  0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0xf6dbe200 in fork () from /lib/tls/libc.so.6
#2  0xf6f61554 in fork () from /lib/tls/libpthread.so.0
#3  0x08055c6f in IMSignal::_segv (this=0x8fb7fc8, num=11) at
IMSignal.cpp:94
#4  0x08055dff in IMSignal::segv (this=0x2d44) at IMSignal.cpp:36
#5  0x080552f0 in signal_handler (num=0) at IMSignal.cpp:136
#6  <signal handler called>
#7  0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#8  0xf6df0564 in poll () from /lib/tls/libc.so.6
#9  0x0809206c in IMSocketListen::accept (this=0x8fc0808) at
IMUtil.cpp:651
#10 0x0806b44c in IIIMProtocol::accept (this=0x8fc9690, flags=0) at
IIIMProtocol.cpp:59
#11 0x080698df in IMScheduler_MTPC::start (this=0x8fb88c0) at
IMScheduler_MTPC.cpp:53
#12 0x0804d9ca in IMSvr::start (this=0xfffffffc) at IMScheduler.hh:24
#13 0x0804d755 in main (argc=-4, argv=0xfee7b664) at main.cpp:44
#14 0xf6d49e33 in __libc_start_main () from /lib/tls/libc.so.6
#15 0x0804d621 in _start ()

Thread 2 (Thread -155034704 (LWP 11584)):
#0  0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
No symbol table info available.
#1  0xf6f5f7e8 in recv () from /lib/tls/libpthread.so.0
No symbol table info available.
#2  0x08090346 in IMSocketTrans::recv (this=0xfffffe00, p=0xfffffe00,
n=4294966784) at IMUtil.cpp:573
No locals.
#3  0x0807cbe9 in IIIMPTrans_read (ptrans=0x8fc0ba0, buf=0xf6c253e0,
nbyte=8) at IIIMPTrans.cpp:72
        st = -512
        n = 8
        p = (
    unsigned char *) 0xf6c253e0
"ï¿½\vï¿½\bï¿½[ï¿½ï¿½\217.ï¿½ï¿½\220\226ï¿½\bï¿½\vï¿½\bï¿½[ï¿½ï¿½\030Tï¿½ï¿½nï¿½\a\bx\217ï¿½\bï¿½ï¿½ï¿½\b\024Tï¿½ï¿½ï¿½ï¿½ï¿½\b\205ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½\b8Tï¿½ï¿½ï¿½ï¿½\006\bï¿½\vï¿½\bï¿½ï¿½ï¿½\b\001"
#4  0xf6fe2ec4 in iiimf_stream_receive (stream=0x8fc8f78,
data_s=0xfffffe00, message_ret=0xfffffe00) at misc/stream.c:106
        status = 150735840
        ptr = (const uchar_t *) 0xf6fe7728 "@F\002"
        p = (uchar_t *) 0x80a81dc "\030\200\n\bï¿½ï¿½ï¿½ï¿½ï¿½Iï¿½ï¿½ï¿½\016\234"
        nbyte = 150790048
        message = (IIIMP_message *) 0x8fc0be0
        header = "ï¿½\vï¿½\bï¿½[ï¿½
        header_len = 150735840
        length = -152725504
        buf = (uchar_t *) 0xf6c25bb0 "ï¿½[ï¿½ï¿½ï¿½ï¿½ï¿½\bï¿½[ï¿½ï¿½\001"
#5  0x0807ca6e in IIIMPTrans::receive (this=0x80a81dc) at
IIIMPTrans.cpp:32
        st = -512
        pmes = (IIIMP_message *) 0x8fcdfa0
#6  0x0806b3b2 in IIIMProtocol::receive_and_dispatch (this=0x8fc9690,
pims=0x8fc0be0, flags=0) at IIIMP_IMState.hh:35
        pmes = (IIIMP_message *) 0x8fc9690
#7  0x08069833 in IMScheduler_MTPC_thread_entry (priv=0xfffffe00) at
IMScheduler_MTPC.cpp:25
        pimp = (class IMProtocol *) 0x8fc9690
        pims = (class IMState *) 0x8fc0be0
#8  0xf6f5a1d5 in start_thread () from /lib/tls/libpthread.so.0
No symbol table info available.
#9  0xf6dfa19a in clone () from /lib/tls/libc.so.6
No symbol table info available.

Thread 1 (Thread -154016064 (LWP 11565)):
#0  0xf6fe9782 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
No symbol table info available.
#1  0xf6dbe200 in fork () from /lib/tls/libc.so.6
No symbol table info available.
#2  0xf6f61554 in fork () from /lib/tls/libpthread.so.0
No symbol table info available.
#3  0x08055c6f in IMSignal::_segv (this=0x8fb7fc8, num=11) at
IMSignal.cpp:94
        retval = 11588
#4  0x08055dff in IMSignal::segv (this=0x2d44) at IMSignal.cpp:36
No locals.
#5  0x080552f0 in signal_handler (num=0) at IMSignal.cpp:136
        pims = (IMSignal *) 0x8fb7fc8
        ph =

Comment 5 Warren Togami 2004-10-20 10:49:35 UTC

Very odd behavior with im-sdk-12.1-2

/usr/sbin/htt_server -nodaemon
==============================
SEGV of this fails as described above.  But if you copy
/usr/sbin/htt_server to /tmp

/tmp/htt_server -nodaemon
This works properly when SEGV.  Is this is because it cannot find the
corresponding debuginfo, which has bad symbols?

Comment 6 Warren Togami 2004-10-20 10:55:06 UTC

mkdir /usr/lib/debug/tmp
cp -a /usr/lib/debug/{usr/sbin/htt_server,tmp/htt_server}

Then /tmp/htt_server fails identically.  This is bad debuginfo.

Comment 7 Warren Togami 2004-10-20 13:30:12 UTC

More bad news.  Same binaries built without stripping fail while
debugging in exactly the same way.  stripping was not at fault.

http://people.redhat.com/wtogami/im-sdk/
1: original working 12.1-1 binaries
bad-1: above SRPM rebuilt with matching gcc-3.4.2-5
bad-1-nostrip: above SRPM rebuilt with matching gcc-3.4.2-5 without
stripping

If you use the "nostrip" binaries, you must uninstall the debuginfo
package, and mkdir /usr/src/debug/im-sdk-src-r12_1 in order to trick
htt_server into doing its automatic gdb dump.

Comment 8 Jakub Jelinek 2004-10-20 14:16:59 UTC

The only changes between 1: and bad-1: htt_server is:
1) 4 st_size changes in .dynsym for SHN_UND symbols caused by libdl.so
   changes - nothing uses these
2) 4 bytes in .gnu_debuglink section - CRC of the debuginfo file most probably
The only changes between 1: and bad-1: htt_server.debug are:
1) different DW_AT_decl_line values for some glibc headers
   (<stdio.h>, <features.h>, <libio.h>, <string.h>, <sys/cdefs.h>
    had whitespace changes and preprocessor directives changes that
    don't show up in the debuginfo)
   the whole .debug_info section is 8 bytes larger, supposedly
   because some DW_AT_decl_line values grew from < 128 to >= 128
   or other uleb128/sleb128 boundary
2) different indirect string offsets into .debug_str, but the
   strings they are referencing are identical (this is I think
   related to the growth of the .debug_info section)
What I did to check this is for i in l a p r m s o; do readelf -w$i *.debug; done
compare the result (was identical).
readelf -wf and -wF shows differences in .eh_frame section, but that
is just a bug in readelf (in the .debug file .eh_frame is STT_NOBITS).
Last, I did readelf -wi *.debug | sed 's/offset: 0x[0-9a-f]\+)/offset: 0xXX)/g;s/DW_AT_decl_line.*$/DW_AT_decl_line/'
and compared it between bad and good, there were no differences.

So, if as Warren claims, gdb doesn't grok one debug info and groks
the other one, it looks like gdb bug, not gcc nor binutils/elfutils
bug.

Comment 9 Elena Zannoni 2004-10-20 14:43:29 UTC

The commands used are in:
/usr/lib/im/share/iiim/gdbcmd

Comment 10 Warren Togami 2004-10-21 12:27:51 UTC

Any chance this can be fixed real soon, or should this be target for
RHEL4RC and FC3 updates?

Comment 11 Elena Zannoni 2004-10-21 15:06:55 UTC

This is actually a big gdb feature that is missing. The problem is
that gdb doesn't handle the DW_OP_piece dwarf2 information. There is
even a feature open for RHEL. 
There is no way this can be fixed quickly. We can maybe add a hackish
workaround to make gdb not barf like that.

Comment 12 Warren Togami 2004-10-25 14:55:20 UTC

Would a "hackish workaround to make gdb not barf" be suitable for
RHEL4?  Meanwhile could you suggest anything that we can use to
workaround this problem?

Comment 13 Warren Togami 2004-10-25 22:42:54 UTC

We don't need the "big gdb feature that is missing" implemented, but
we do need some workaround because it would be VERY BAD to ship this
in RHEL4.  Adding RHEL4RCBlocker.

Comment 14 Warren Togami 2004-10-26 02:51:15 UTC

export RPM_OPT_FLAGS=`echo $RPM_OPT_FLAGS | sed s/-O2/-O0/`

The binaries built with this seem to avoid this gdb problem.  -O1 is
equally broken to -O2.  I suspect there is a more fine-graind compiler
flag that we can use with -O2 to make it work without losing too much
performance.  Investigating.

Comment 15 Warren Togami 2004-10-26 08:04:20 UTC

I tried to isolate if a specific compiler flag causes this using:
export RPM_OPT_FLAGS=`echo $RPM_OPT_FLAGS | sed s/-O2/-fdefer-pop
-fmerge-constants -fthread-jumps -floop-optimize -fif-conversion
-fif-conversion2 -fdelayed-branch -fguess-branch-probability
-fcprop-registers/`

These flags according to gcc's man page are equivalent to -O1, yet it
doesn't seem to behave this way.  -O1 produces symbols that causes gdb
to die, while the equivalent does not?  Strange...

Comment 16 Warren Togami 2004-10-26 09:53:39 UTC

jakub explained that -O levels are much more complicated than a
combination of flags, and attempting to disable a compiler
optimization is a bad idea.  We should instead just wait for gdb to be
fixed.

Meanwhile we are in very bad shape with an un-debuggable im-sdk, that
causes the desktop to go haywire if it does crash.  So the workaround
is to use -gstabs+ which changes from the default DWARF-2 to old
stabs+ debugging symbols.  I am testing this now.

Comment 17 Warren Togami 2004-10-26 10:26:12 UTC

Bad news.  With -gstabs+ gdb gets stuck in another way.

[root@ibmlaptop tmp]# htt_server -nodaemon
/usr/lib/im/share/iiim/gdbcmd:2: Error in sourced command file:
invalid pointer to member function

Comment 18 Warren Togami 2004-10-26 10:46:51 UTC

<jakub> warren: didn't know iimf is C++
<jakub> warren: stabs are quite bad in expressing C++
<jakub> warren: on the other side, gdb is quite bad at handling
dwarf2, so the situation is not really good

Comment 19 Warren Togami 2004-10-27 13:22:55 UTC

elena built -43 in fc3-HEAD which no longer gets stuck when gdb fails
to grok the DWARF-2 symbols.  However we need to check if the dumped
backtrace makes any sense, as elena said it may not.

Probably unlikely that -43 is going into FC3 final, although I would
encourage elena to try to ask Sopwith anyway.  In any case, -43+
should be pushed in FC3 updates, and this definitely must ship in
RHEL4 final.

Comment 20 Elena Zannoni 2004-11-04 22:25:59 UTC

It will be included in RHEL4 as soon as there is a new build available.

Comment 21 Warren Togami 2004-11-12 17:52:55 UTC

Was it already accepted into dist-4E by mikem?

Comment 22 Elena Zannoni 2004-11-12 18:18:13 UTC

Yes, actually the 4E version is now .46.

Comment 23 Warren Togami 2004-11-12 22:48:11 UTC

Would you recommend pushing .46 to FC3 updates too?

Comment 24 Warren Togami 2004-11-13 01:33:12 UTC

.43 is heading to FC3 updates.