Bug 48841 - Crash in glibc
Crash in glibc
Status: CLOSED NOTABUG
Product: Red Hat Raw Hide
Classification: Retired
Component: glibc (Show other bugs)
1.0
ia64 Linux
medium Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Aaron Brown
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-07-12 01:53 EDT by Krishnakumar B
Modified: 2016-11-24 09:47 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2001-07-25 05:07:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to put all the sections in the right places (12.44 KB, patch)
2001-07-13 11:43 EDT, Bill Nottingham
no flags Details | Diff

  None (edit)
Description Krishnakumar B 2001-07-12 01:53:24 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010628

Description of problem:
When I try to run one of my programs, I get the following crash in glibc:

yoda> gdb Reactor_Exceptions_Test core

warning: core file may not match specified executable file.
Core was generated by `./Reactor_Exceptions_Test'.
Program terminated with signal 6, Aborted.
Reading symbols from /u/kitty/ACE_wrappers/ace/libACE.so...done.
Loaded symbols for /u/kitty/ACE_wrappers/ace/libACE.so
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...done.

warning: Unable to set global thread event mask: generic error
[New Thread 1024 (LWP 6386)]
Error while reading shared library symbols:
Cannot enable thread event reporting for Thread 1024 (LWP 6386): generic error
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libstdc++-libc6.2-2.so.3...done.
Loaded symbols for /usr/lib/libstdc++-libc6.2-2.so.3
Reading symbols from /lib/libm.so.6.1...done.
Loaded symbols for /lib/libm.so.6.1
Reading symbols from /lib/libc.so.6.1...done.
Loaded symbols for /lib/libc.so.6.1
Reading symbols from /lib/ld-linux-ia64.so.2...done.
Loaded symbols for /lib/ld-linux-ia64.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0  0x2000000000743f82 in rt_sigsuspend () at soinit.c:56
56
soinit.c: No such file or directory.
	in soinit.c
kitty> where
#0  0x2000000000743f82 in rt_sigsuspend () at soinit.c:56
#1  0x20000000005f43a0 in __sigsuspend (set=0x0)
    at ../sysdeps/unix/sysv/linux/ia64/sigsuspend.c:38
#2  0x200000000041b7f0 in __pthread_wait_for_restart_signal (
    self=0x2000000000441480) at pthread.c:957
#3  0x2000000000414bf0 in pthread_cond_wait (cond=0x0, 
    mutex=0x6000000000010c50) at restart.h:34
#4  0x20000000001a7c30 in ACE_Condition_Thread_Mutex::wait (
    this=0x2000000000240120, mutex=@0x6000000000010c50, abstime=0x0)
    at /u/kitty/ACE_wrappers/ace/OS.i:2743
#5  0x20000000001a7e30 in ACE_Condition_Thread_Mutex::wait (
    this=0x2000000000240120, abstime=0x0) at Synch.cpp:644
#6  0x20000000001b6aa0 in ACE_Thread_Manager::wait (this=0x6000000000010c00, 
    timeout=0x0, abandon_detached_threads=0) at Thread_Manager.cpp:1699
#7  0x4000000000005e90 in main (argc=-22796, argv=0x80000fffffffa6f8)
    at Reactor_Exceptions_Test.cpp:210

System info:

yoda> uname -a
Linux yoda 2.4.5-10 #1 Wed Jun 27 14:13:30 EDT 2001 ia64 unknown

yoda> gcc -v
Reading specs from /usr/lib/gcc-lib/ia64-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-93)

yoda> /lib/libc-2.2.3.so 
GNU C Library stable release version 2.2.3, by Roland McGrath et al.
Copyright (C) 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 2.96 20000731 (Red Hat Linux 7.1 2.96-93).
Compiled on a Linux 2.4.5-4 system on 2001-06-25.
Available extensions:
	GNU libio by Per Bothner
	crypt add-on version 2.1 by Michael Glad and others
	The C stubs add-on version 2.1.2.
	linuxthreads-0.9 by Xavier Leroy
	BIND-8.2.3-T5B
	libthread_db work sponsored by Alpha Processor Inc
	NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Report bugs using the `glibcbug' script to <bugs@gnu.org>.

yoda> ld -v
GNU ld version 2.11.90.0.8 (with BFD 2.11.90.0.8)

yoda> as --version
GNU assembler 2.11.90.0.8
Copyright 2001 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License.  This program has absolutely no warranty.
This assembler was configured for a target of `ia64-redhat-linux'.
File was compiled as follows:

g++ -W -Wall -Wpointer-arith -pipe -O -g -Wno-uninitialized
-fno-implicit-templates   -D_POSIX_THREADS -D_POSIX_THREAD_SAFE_FUNCTIONS
-D_REENTRANT -DACE_HAS_AIO_CALLS  -I/u/kitty/ACE_wrappers
-DACE_HAS_EXCEPTIONS  -c -o .obj/Reactor_Exceptions_Test.o
Reactor_Exceptions_Test.cpp

How reproducible:
Always

Steps to Reproduce:
1.Run the program again.
2.
3.
	

Additional info:
Comment 1 Bill Nottingham 2001-07-12 11:05:40 EDT
Was there ever a kernel relase where this worked for you?

If you take stock 2.4.5 and add the latest ia64 linux patch at:

ftp://ftp.us.kernel.org/pub/linux/ports/ia64

does the problem persist?
Comment 2 Krishnakumar B 2001-07-12 17:42:35 EDT
The Wolverine beta that I tried had a very *old* kernel as you had mentioned in
your mail. So a lot of tests were failing including this one. I have not played
around with kernels on this machine. The only kernels that I have tried this on
are kernel-2.4.3-2.10.1 and kernel-2.4.5-10. 

As per your suggestion I am building kernel 2.4.5 with the ia64 patch applied.
Couple of questions regarding the same. I copied over redhat's
kernel-ia64.config and did a make oldconfig. Is that alright ? 

The kernel is compiled with -g set. Is that OK ?

make bzImage fails saying that there is no target named bzImage. How do I
compile a compressed kernel image on IA-64 ? Also make install fails saying
install is not a valid target. But modules_install is present. How I do install
the kernel ?

I assume that I need to generate a initrd image for the RedHat configuration and
update elilo.conf.

Any help appreciated.
Comment 3 Bill Nottingham 2001-07-12 17:54:51 EDT
make vmlinux

will make the kernel image, and

make modules

will make the modules.

You can then gzip the resulting vmlinux, and put that in /boot/efi.
Comment 4 Krishnakumar B 2001-07-12 21:49:01 EDT
I compiled a kernel using this:

make vmlinux
make modules
make modules_install
cp vmlinux /boot/efi/vmlinux-2.4.5
gzip -9 vmlinux
cp vmlinux.gz /boot/efi/vmlinuz-2.4.5
cd /boot/efi
mkinitrd initrd-2.4.5.img 2.4.5

Then I modified elilo.conf and added the entries linux (for vmlinuz) and
linux-failsafe (for vmlinux)

I get the following error on booting:

fs0:\>elilo linux-failsafe
ELILO
Loading vmlinux-2.4.5...alloc.c(line 131): allocator AllocatePages (2,2,
-562949953420435, 0x4400000) failed (Not Found)
plain_loader.c (line 227): plain: AllocatePages (-562949953420435, 0x4400000)
for kernel failed.

Exit status code: Load Error.

and drops back to EFI shell.

fs0:\>elilo linux
ELILO
Loading vmlinuz-2.4.5...alloc.c(line 131): allocator AllocatePages (2,2,
3940649673950061,0x4400000) failed (Not Found)
gzip.c (line 366): gzip: AllocatePages ( 3940649673950061,0x4400000) for kernel
failed.
gzip.c (line 474):gzip:

invalid exec header

/

and it hangs.

Obviously, I am doing something brain-damaged. What's it ? I am using elilo from
RedHat rawhide. It boots the RedHat kernel (2.4.5-10) like a charm.

I have a B3 stepping processor with BIOS 99 from Intel and the latest QuickLogic
BIOS.
Any help is appreciated.
Comment 5 Bill Nottingham 2001-07-13 11:41:38 EDT
Add '-fno-merge-common' to the CFLAGS in the kernel makefile, alternatively, add
the attached patch to the kernel you're trying to build.
Comment 6 Bill Nottingham 2001-07-13 11:43:04 EDT
Created attachment 23528 [details]
patch to put all the sections in the right places
Comment 7 Krishnakumar B 2001-07-13 18:35:43 EDT
I upgraded to 2.4.5 with the latest ia64-patch. The problem still exists. If you
see the stack trace below, the call to pthread_cond_wait has a null contition
variable. So I this clobbering of the cond variable is causing the problem. Here
is the new stack trace:

yoda> gdb Reactor_Exceptions_Test core

warning: core file may not match specified executable file.
Core was generated by `./Reactor_Exceptions_Test'.
Program terminated with signal 6, Aborted.
Reading symbols from /u/kitty/ACE_wrappers/ace/libACE.so...done.
Loaded symbols for /u/kitty/ACE_wrappers/ace/libACE.so
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...done.

warning: Unable to set global thread event mask: generic error
[New Thread 1024 (LWP 7704)]
Error while reading shared library symbols:
Cannot enable thread event reporting for Thread 1024 (LWP 7704): generic error
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libstdc++-libc6.2-2.so.3...done.
Loaded symbols for /usr/lib/libstdc++-libc6.2-2.so.3
Reading symbols from /lib/libm.so.6.1...done.
Loaded symbols for /lib/libm.so.6.1
Reading symbols from /lib/libc.so.6.1...done.
Loaded symbols for /lib/libc.so.6.1
Reading symbols from /lib/ld-linux-ia64.so.2...done.
Loaded symbols for /lib/ld-linux-ia64.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0  0x200000000072ff82 in rt_sigsuspend () at soinit.c:56
56      soinit.c: No such file or directory.
    in soinit.c
    kitty> where
#0  0x200000000072ff82 in rt_sigsuspend () at soinit.c:56
#1  0x20000000005e03a0 in __sigsuspend (set=0x0)
        at ../sysdeps/unix/sysv/linux/ia64/sigsuspend.c:38
#2  0x20000000004077f0 in __pthread_wait_for_restart_signal (
            self=0x200000000042d480) at pthread.c:957
#3  0x2000000000400bf0 in pthread_cond_wait (cond=0x0, 
            mutex=0x6000000000010c50) at restart.h:34
#4  0x20000000001a7e90 in ACE_Condition_Thread_Mutex::wait (
            this=0x2000000000240120, mutex=@0x6000000000010c50, abstime=0x0)
    at /u/kitty/ACE_wrappers/ace/OS.i:2743
#5  0x20000000001a8090 in ACE_Condition_Thread_Mutex::wait (
            this=0x2000000000240120, abstime=0x0) at Synch.cpp:644
#6  0x20000000001b6c80 in ACE_Thread_Manager::wait (this=0x6000000000010c00, 
            timeout=0x0, abandon_detached_threads=0) at
Thread_Manager.cpp:1699
#7  0x4000000000005e90 in main (argc=-22940, argv=0x80000fffffffa668)
    at Reactor_Exceptions_Test.cpp:210
    kitty> quit
yoda>
Comment 8 Jakub Jelinek 2001-07-20 09:14:48 EDT
The backtraces you've shown show a different thread than was aborted, the
thread in in the backtrace just sleeps on a condition variable.
I'd suggest you run your program under gdb (since I believe multi-thread
core support in 7.1 is still imperfect - should be better in rawhide) and
use gdb thread commands to locate which exact thread aborted and see why.
Also note that without a reproducible testcase, there is nothing we can do
for this.
Comment 9 Krishnakumar B 2001-07-20 19:52:26 EDT
I upgraded my gcc to gcc-2.96-94 and gdb to GNU gdb Red Hat Linux 7.x
(5.0rh-12). I have the following while running the same program under the debugger.

yoda> gdb Reactor_Exceptions_Test
kitty> r
[New Thread 1024 (LWP 17857)]
[New Thread 2049 (LWP 17860)]
[New Thread 1027 (LWP 17861)]

Program received signal SIGABRT, Aborted.
[Switching to Thread 1027 (LWP 17861)]
0x20000000005e0302 in kill () at soinit.c:56
56      soinit.c: No such file or directory.
        in soinit.c
Current language:  auto; currently c
kitty> info threads
* 3 Thread 1027 (LWP 17861)  0x20000000005e0302 in kill () at soinit.c:56
  2 Thread 2049 (LWP 17860)  0x20000000007300a2 in __syscall_poll ()
    at soinit.c:56
  1 Thread 1024 (LWP 17857)  0x200000000072ff82 in rt_sigsuspend ()
    at soinit.c:56
kitty> thread 3
[Switching to thread 3 (Thread 1027 (LWP 17861))]#0  0x20000000005e0302 in kill
    () at soinit.c:56
56      in soinit.c
kitty> where
#0  0x20000000005e0302 in kill () at soinit.c:56
#1  0x2000000000407d40 in pthread_kill (thread=1027, signo=6) at signals.c:65
#2  0x2000000000408460 in raise (sig=6) at signals.c:232
#3  0x20000000005e2a30 in abort () at ../sysdeps/generic/abort.c:88
#4  0x2000000000495a30 in __terminate () from /usr/lib/libstdc++-libc6.2-2.so.3
#5  0x2000000000407d40 in pthread_kill (thread=9223389629030323312, 
    signo=-10497032) at signals.c:65
#6  0x2000000000496d00 in ia64_throw_helper ()
   from /usr/lib/libstdc++-libc6.2-2.so.3
#7  0x80000fffff5fd530 in ?? ()
#8  0x2000000000407d40 in pthread_kill (thread=13835058055282164491, 
    signo=57984) at signals.c:65
#9  0x200000000029c910 in
ACE_Select_Reactor_T<ACE_Select_Reactor_Token_T<ACE_Token> >::handle_events
(this=0xc00000000000028a, 
    max_wait_time=0x600000000000e280)
    at /u/kitty/ACE_wrappers/ace/Select_Reactor_T.cpp:1272
#10 0x80000fffff5ff940 in ?? ()
#11 0x2000000000407d40 in pthread_kill (thread=Cannot access memory at address
0x80000fffff3ffd80
) at signals.c:65
Cannot access memory at address 0x80000fffff3ffda8
kitty> thread 2
[Switching to thread 2 (Thread 2049 (LWP 17860))]#0  0x20000000007300a2 in
__syscall_poll () at soinit.c:56
56      in soinit.c
kitty> where
#0  0x20000000007300a2 in __syscall_poll () at soinit.c:56
#1  0x2000000000721b00 in __poll (fds=0x6000000000018e10, nfds=1, timeout=2000)
    at ../sysdeps/unix/sysv/linux/poll.c:63
#2  0x2000000000402580 in __pthread_manager (arg=0x9) at manager.c:139
#3  0x2000000000403c90 in __pthread_manager_sighandler (sig=9) at manager.c:221
#4  0x2000000000721b00 in __poll (fds=0x9, nfds=2305843009218073728, 
    timeout=7535904) at ../sysdeps/unix/sysv/linux/poll.c:63
#5  0x200000000002e840 in ?? ()
#6  0x2000000000721b00 in __poll (fds=0x6000000000010f60, nfds=32736, 
    timeout=3840) at ../sysdeps/unix/sysv/linux/poll.c:63
#7  0x00007ff1 in ?? ()
#8  0x2000000000721b00 in __poll (fds=0x1, nfds=0, timeout=1)
    at ../sysdeps/unix/sysv/linux/poll.c:63
#9  0x00000000 in ?? ()
kitty> thread 1
[Switching to thread 1 (Thread 1024 (LWP 17857))]#0  0x200000000072ff82 in
rt_sigsuspend () at soinit.c:56
56      in soinit.c
kitty> where
#0  0x200000000072ff82 in rt_sigsuspend () at soinit.c:56
#1  0x20000000005e03a0 in __sigsuspend (set=0x80000fffffffa570)
    at ../sysdeps/unix/sysv/linux/ia64/sigsuspend.c:38
#2  0x20000000004077f0 in __pthread_wait_for_restart_signal (
    self=0x200000000042d480) at pthread.c:957
#3  0x2000000000400bf0 in pthread_cond_wait (cond=0x0, 
    mutex=0x6000000000010c50) at restart.h:34#4  0x20000000001a7fa0 in
ACE_Condition_Thread_Mutex::wait (
    this=0x2000000000240120, mutex=@0x6000000000010c50, abstime=0x0)
    at /u/kitty/ACE_wrappers/ace/OS.i:2743
#5  0x20000000001a81a0 in ACE_Condition_Thread_Mutex::wait (
    this=0x2000000000240120, abstime=0x0) at Synch.cpp:644
#6  0x20000000001b6d90 in ACE_Thread_Manager::wait (this=0x6000000000010c00, 
    timeout=0x0, abandon_detached_threads=0) at Thread_Manager.cpp:1699
#7  0x4000000000005e90 in main (argc=-22876, argv=0x80000fffffffa6a8)
    at Reactor_Exceptions_Test.cpp:210
kitty>

Does that help ?


Comment 10 Jakub Jelinek 2001-07-23 10:51:49 EDT
It should help you, not me.
From the backtrace it looks like your C++ program throws an exception which
is not caught by anything (and thus __terminate is called).
It certainly does not look like libc or libpthread bug.
Comment 11 Krishnakumar B 2001-07-23 13:19:20 EDT
No, my code is right. FWIW, the same piece of code runs fine under a multitude
of compilers and OS combinations including 64-bit Oses like Tru64, HP-UX,
Solaris 8. It also runs fine under gcc on Linux ix86. I am catching the
exception. Here is a small trace of activity under Linux ix86:

samba> gdb Reactor_Exceptions_Test
kitty> b 93
Breakpoint 1 at 0x804c811: file Reactor_Exceptions_Test.cpp, line 93.
kitty> r
[New Thread 1024 (LWP 2234)]
[New Thread 2049 (LWP 2237)]
Delayed SIGSTOP caught for LWP 2237.
[New Thread 1026 (LWP 2238)]
Delayed SIGSTOP caught for LWP 2238.
Activity occurred on handle8
got buf = Hello
throw exception

Catch exception
[Switching to Thread 1026 (LWP 2238)]

Breakpoint 1, My_Reactor::handle_events (this=0xbfffe730, max_wait_time=0x0)
    at Reactor_Exceptions_Test.cpp:93
93              ret = -1;
Current language:  auto; currently c++
kitty> list
88              ret = ACE_Reactor::handle_events (max_wait_time);
89            }
90          catch (...)
91            {
92              cout << "Catch exception" << endl;
93              ret = -1;
94            }
95          return ret;
96        }
97
kitty> c
exception return
LWP 2238 exited.
LWP 2237 exited.

Program exited normally.
kitty>
Compare this with the following on IA-64:

yoda> gdb Reactor_Exceptions_Test 
kitty> b 93
Breakpoint 1 at 0x4000000000007912: file Reactor_Exceptions_Test.cpp, line 93.
kitty> r
[New Thread 1024 (LWP 10224)]
[New Thread 2049 (LWP 10227)]
[New Thread 1027 (LWP 10228)]
Activity occurred on handle8
got buf = Hello
throw exception


Program received signal SIGABRT, Aborted.
[Switching to Thread 1027 (LWP 10228)]
0x20000000005e0302 in kill () at soinit.c:56
56      soinit.c: No such file or directory.
        in soinit.c
Current language:  auto; currently c
kitty>

So my catch block is never getting executed. This might not be a bug in glibc
per se but is definitely a bug in the compiler generated code. 

Comment 12 Jakub Jelinek 2001-07-25 05:07:38 EDT
Sorry, but without a testcase there is nothing I can do about it.
Comment 13 Krishnakumar B 2001-07-27 19:39:00 EDT
No problem. The test is kind of complicated to repeat. So I didn't bother
writing a simple test case. Anyway gcc-3.0 with binutils 2.11.90.0.23 fixes the
crash. So I guess it is a problem with the RedHat's compiler. So I guess I
should stick with the official compiler for my compilations...

Thanks anyway.

Note You need to log in before you can comment on or make changes to this bug.