From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010628 Description of problem: When I try to run one of my programs, I get the following crash in glibc: yoda> gdb Reactor_Exceptions_Test core warning: core file may not match specified executable file. Core was generated by `./Reactor_Exceptions_Test'. Program terminated with signal 6, Aborted. Reading symbols from /u/kitty/ACE_wrappers/ace/libACE.so...done. Loaded symbols for /u/kitty/ACE_wrappers/ace/libACE.so Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libpthread.so.0...done. warning: Unable to set global thread event mask: generic error [New Thread 1024 (LWP 6386)] Error while reading shared library symbols: Cannot enable thread event reporting for Thread 1024 (LWP 6386): generic error Reading symbols from /lib/librt.so.1...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /usr/lib/libstdc++-libc6.2-2.so.3...done. Loaded symbols for /usr/lib/libstdc++-libc6.2-2.so.3 Reading symbols from /lib/libm.so.6.1...done. Loaded symbols for /lib/libm.so.6.1 Reading symbols from /lib/libc.so.6.1...done. Loaded symbols for /lib/libc.so.6.1 Reading symbols from /lib/ld-linux-ia64.so.2...done. Loaded symbols for /lib/ld-linux-ia64.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 #0 0x2000000000743f82 in rt_sigsuspend () at soinit.c:56 56 soinit.c: No such file or directory. in soinit.c kitty> where #0 0x2000000000743f82 in rt_sigsuspend () at soinit.c:56 #1 0x20000000005f43a0 in __sigsuspend (set=0x0) at ../sysdeps/unix/sysv/linux/ia64/sigsuspend.c:38 #2 0x200000000041b7f0 in __pthread_wait_for_restart_signal ( self=0x2000000000441480) at pthread.c:957 #3 0x2000000000414bf0 in pthread_cond_wait (cond=0x0, mutex=0x6000000000010c50) at restart.h:34 #4 0x20000000001a7c30 in ACE_Condition_Thread_Mutex::wait ( this=0x2000000000240120, mutex=@0x6000000000010c50, abstime=0x0) at /u/kitty/ACE_wrappers/ace/OS.i:2743 #5 0x20000000001a7e30 in ACE_Condition_Thread_Mutex::wait ( this=0x2000000000240120, abstime=0x0) at Synch.cpp:644 #6 0x20000000001b6aa0 in ACE_Thread_Manager::wait (this=0x6000000000010c00, timeout=0x0, abandon_detached_threads=0) at Thread_Manager.cpp:1699 #7 0x4000000000005e90 in main (argc=-22796, argv=0x80000fffffffa6f8) at Reactor_Exceptions_Test.cpp:210 System info: yoda> uname -a Linux yoda 2.4.5-10 #1 Wed Jun 27 14:13:30 EDT 2001 ia64 unknown yoda> gcc -v Reading specs from /usr/lib/gcc-lib/ia64-redhat-linux/2.96/specs gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-93) yoda> /lib/libc-2.2.3.so GNU C Library stable release version 2.2.3, by Roland McGrath et al. Copyright (C) 1992-1999, 2000, 2001 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 2.96 20000731 (Red Hat Linux 7.1 2.96-93). Compiled on a Linux 2.4.5-4 system on 2001-06-25. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others The C stubs add-on version 2.1.2. linuxthreads-0.9 by Xavier Leroy BIND-8.2.3-T5B libthread_db work sponsored by Alpha Processor Inc NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk Report bugs using the `glibcbug' script to <bugs>. yoda> ld -v GNU ld version 2.11.90.0.8 (with BFD 2.11.90.0.8) yoda> as --version GNU assembler 2.11.90.0.8 Copyright 2001 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License. This program has absolutely no warranty. This assembler was configured for a target of `ia64-redhat-linux'. File was compiled as follows: g++ -W -Wall -Wpointer-arith -pipe -O -g -Wno-uninitialized -fno-implicit-templates -D_POSIX_THREADS -D_POSIX_THREAD_SAFE_FUNCTIONS -D_REENTRANT -DACE_HAS_AIO_CALLS -I/u/kitty/ACE_wrappers -DACE_HAS_EXCEPTIONS -c -o .obj/Reactor_Exceptions_Test.o Reactor_Exceptions_Test.cpp How reproducible: Always Steps to Reproduce: 1.Run the program again. 2. 3. Additional info:
Was there ever a kernel relase where this worked for you? If you take stock 2.4.5 and add the latest ia64 linux patch at: ftp://ftp.us.kernel.org/pub/linux/ports/ia64 does the problem persist?
The Wolverine beta that I tried had a very *old* kernel as you had mentioned in your mail. So a lot of tests were failing including this one. I have not played around with kernels on this machine. The only kernels that I have tried this on are kernel-2.4.3-2.10.1 and kernel-2.4.5-10. As per your suggestion I am building kernel 2.4.5 with the ia64 patch applied. Couple of questions regarding the same. I copied over redhat's kernel-ia64.config and did a make oldconfig. Is that alright ? The kernel is compiled with -g set. Is that OK ? make bzImage fails saying that there is no target named bzImage. How do I compile a compressed kernel image on IA-64 ? Also make install fails saying install is not a valid target. But modules_install is present. How I do install the kernel ? I assume that I need to generate a initrd image for the RedHat configuration and update elilo.conf. Any help appreciated.
make vmlinux will make the kernel image, and make modules will make the modules. You can then gzip the resulting vmlinux, and put that in /boot/efi.
I compiled a kernel using this: make vmlinux make modules make modules_install cp vmlinux /boot/efi/vmlinux-2.4.5 gzip -9 vmlinux cp vmlinux.gz /boot/efi/vmlinuz-2.4.5 cd /boot/efi mkinitrd initrd-2.4.5.img 2.4.5 Then I modified elilo.conf and added the entries linux (for vmlinuz) and linux-failsafe (for vmlinux) I get the following error on booting: fs0:\>elilo linux-failsafe ELILO Loading vmlinux-2.4.5...alloc.c(line 131): allocator AllocatePages (2,2, -562949953420435, 0x4400000) failed (Not Found) plain_loader.c (line 227): plain: AllocatePages (-562949953420435, 0x4400000) for kernel failed. Exit status code: Load Error. and drops back to EFI shell. fs0:\>elilo linux ELILO Loading vmlinuz-2.4.5...alloc.c(line 131): allocator AllocatePages (2,2, 3940649673950061,0x4400000) failed (Not Found) gzip.c (line 366): gzip: AllocatePages ( 3940649673950061,0x4400000) for kernel failed. gzip.c (line 474):gzip: invalid exec header / and it hangs. Obviously, I am doing something brain-damaged. What's it ? I am using elilo from RedHat rawhide. It boots the RedHat kernel (2.4.5-10) like a charm. I have a B3 stepping processor with BIOS 99 from Intel and the latest QuickLogic BIOS. Any help is appreciated.
Add '-fno-merge-common' to the CFLAGS in the kernel makefile, alternatively, add the attached patch to the kernel you're trying to build.
Created attachment 23528 [details] patch to put all the sections in the right places
I upgraded to 2.4.5 with the latest ia64-patch. The problem still exists. If you see the stack trace below, the call to pthread_cond_wait has a null contition variable. So I this clobbering of the cond variable is causing the problem. Here is the new stack trace: yoda> gdb Reactor_Exceptions_Test core warning: core file may not match specified executable file. Core was generated by `./Reactor_Exceptions_Test'. Program terminated with signal 6, Aborted. Reading symbols from /u/kitty/ACE_wrappers/ace/libACE.so...done. Loaded symbols for /u/kitty/ACE_wrappers/ace/libACE.so Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libpthread.so.0...done. warning: Unable to set global thread event mask: generic error [New Thread 1024 (LWP 7704)] Error while reading shared library symbols: Cannot enable thread event reporting for Thread 1024 (LWP 7704): generic error Reading symbols from /lib/librt.so.1...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /usr/lib/libstdc++-libc6.2-2.so.3...done. Loaded symbols for /usr/lib/libstdc++-libc6.2-2.so.3 Reading symbols from /lib/libm.so.6.1...done. Loaded symbols for /lib/libm.so.6.1 Reading symbols from /lib/libc.so.6.1...done. Loaded symbols for /lib/libc.so.6.1 Reading symbols from /lib/ld-linux-ia64.so.2...done. Loaded symbols for /lib/ld-linux-ia64.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 #0 0x200000000072ff82 in rt_sigsuspend () at soinit.c:56 56 soinit.c: No such file or directory. in soinit.c kitty> where #0 0x200000000072ff82 in rt_sigsuspend () at soinit.c:56 #1 0x20000000005e03a0 in __sigsuspend (set=0x0) at ../sysdeps/unix/sysv/linux/ia64/sigsuspend.c:38 #2 0x20000000004077f0 in __pthread_wait_for_restart_signal ( self=0x200000000042d480) at pthread.c:957 #3 0x2000000000400bf0 in pthread_cond_wait (cond=0x0, mutex=0x6000000000010c50) at restart.h:34 #4 0x20000000001a7e90 in ACE_Condition_Thread_Mutex::wait ( this=0x2000000000240120, mutex=@0x6000000000010c50, abstime=0x0) at /u/kitty/ACE_wrappers/ace/OS.i:2743 #5 0x20000000001a8090 in ACE_Condition_Thread_Mutex::wait ( this=0x2000000000240120, abstime=0x0) at Synch.cpp:644 #6 0x20000000001b6c80 in ACE_Thread_Manager::wait (this=0x6000000000010c00, timeout=0x0, abandon_detached_threads=0) at Thread_Manager.cpp:1699 #7 0x4000000000005e90 in main (argc=-22940, argv=0x80000fffffffa668) at Reactor_Exceptions_Test.cpp:210 kitty> quit yoda>
The backtraces you've shown show a different thread than was aborted, the thread in in the backtrace just sleeps on a condition variable. I'd suggest you run your program under gdb (since I believe multi-thread core support in 7.1 is still imperfect - should be better in rawhide) and use gdb thread commands to locate which exact thread aborted and see why. Also note that without a reproducible testcase, there is nothing we can do for this.
I upgraded my gcc to gcc-2.96-94 and gdb to GNU gdb Red Hat Linux 7.x (5.0rh-12). I have the following while running the same program under the debugger. yoda> gdb Reactor_Exceptions_Test kitty> r [New Thread 1024 (LWP 17857)] [New Thread 2049 (LWP 17860)] [New Thread 1027 (LWP 17861)] Program received signal SIGABRT, Aborted. [Switching to Thread 1027 (LWP 17861)] 0x20000000005e0302 in kill () at soinit.c:56 56 soinit.c: No such file or directory. in soinit.c Current language: auto; currently c kitty> info threads * 3 Thread 1027 (LWP 17861) 0x20000000005e0302 in kill () at soinit.c:56 2 Thread 2049 (LWP 17860) 0x20000000007300a2 in __syscall_poll () at soinit.c:56 1 Thread 1024 (LWP 17857) 0x200000000072ff82 in rt_sigsuspend () at soinit.c:56 kitty> thread 3 [Switching to thread 3 (Thread 1027 (LWP 17861))]#0 0x20000000005e0302 in kill () at soinit.c:56 56 in soinit.c kitty> where #0 0x20000000005e0302 in kill () at soinit.c:56 #1 0x2000000000407d40 in pthread_kill (thread=1027, signo=6) at signals.c:65 #2 0x2000000000408460 in raise (sig=6) at signals.c:232 #3 0x20000000005e2a30 in abort () at ../sysdeps/generic/abort.c:88 #4 0x2000000000495a30 in __terminate () from /usr/lib/libstdc++-libc6.2-2.so.3 #5 0x2000000000407d40 in pthread_kill (thread=9223389629030323312, signo=-10497032) at signals.c:65 #6 0x2000000000496d00 in ia64_throw_helper () from /usr/lib/libstdc++-libc6.2-2.so.3 #7 0x80000fffff5fd530 in ?? () #8 0x2000000000407d40 in pthread_kill (thread=13835058055282164491, signo=57984) at signals.c:65 #9 0x200000000029c910 in ACE_Select_Reactor_T<ACE_Select_Reactor_Token_T<ACE_Token> >::handle_events (this=0xc00000000000028a, max_wait_time=0x600000000000e280) at /u/kitty/ACE_wrappers/ace/Select_Reactor_T.cpp:1272 #10 0x80000fffff5ff940 in ?? () #11 0x2000000000407d40 in pthread_kill (thread=Cannot access memory at address 0x80000fffff3ffd80 ) at signals.c:65 Cannot access memory at address 0x80000fffff3ffda8 kitty> thread 2 [Switching to thread 2 (Thread 2049 (LWP 17860))]#0 0x20000000007300a2 in __syscall_poll () at soinit.c:56 56 in soinit.c kitty> where #0 0x20000000007300a2 in __syscall_poll () at soinit.c:56 #1 0x2000000000721b00 in __poll (fds=0x6000000000018e10, nfds=1, timeout=2000) at ../sysdeps/unix/sysv/linux/poll.c:63 #2 0x2000000000402580 in __pthread_manager (arg=0x9) at manager.c:139 #3 0x2000000000403c90 in __pthread_manager_sighandler (sig=9) at manager.c:221 #4 0x2000000000721b00 in __poll (fds=0x9, nfds=2305843009218073728, timeout=7535904) at ../sysdeps/unix/sysv/linux/poll.c:63 #5 0x200000000002e840 in ?? () #6 0x2000000000721b00 in __poll (fds=0x6000000000010f60, nfds=32736, timeout=3840) at ../sysdeps/unix/sysv/linux/poll.c:63 #7 0x00007ff1 in ?? () #8 0x2000000000721b00 in __poll (fds=0x1, nfds=0, timeout=1) at ../sysdeps/unix/sysv/linux/poll.c:63 #9 0x00000000 in ?? () kitty> thread 1 [Switching to thread 1 (Thread 1024 (LWP 17857))]#0 0x200000000072ff82 in rt_sigsuspend () at soinit.c:56 56 in soinit.c kitty> where #0 0x200000000072ff82 in rt_sigsuspend () at soinit.c:56 #1 0x20000000005e03a0 in __sigsuspend (set=0x80000fffffffa570) at ../sysdeps/unix/sysv/linux/ia64/sigsuspend.c:38 #2 0x20000000004077f0 in __pthread_wait_for_restart_signal ( self=0x200000000042d480) at pthread.c:957 #3 0x2000000000400bf0 in pthread_cond_wait (cond=0x0, mutex=0x6000000000010c50) at restart.h:34#4 0x20000000001a7fa0 in ACE_Condition_Thread_Mutex::wait ( this=0x2000000000240120, mutex=@0x6000000000010c50, abstime=0x0) at /u/kitty/ACE_wrappers/ace/OS.i:2743 #5 0x20000000001a81a0 in ACE_Condition_Thread_Mutex::wait ( this=0x2000000000240120, abstime=0x0) at Synch.cpp:644 #6 0x20000000001b6d90 in ACE_Thread_Manager::wait (this=0x6000000000010c00, timeout=0x0, abandon_detached_threads=0) at Thread_Manager.cpp:1699 #7 0x4000000000005e90 in main (argc=-22876, argv=0x80000fffffffa6a8) at Reactor_Exceptions_Test.cpp:210 kitty> Does that help ?
It should help you, not me. From the backtrace it looks like your C++ program throws an exception which is not caught by anything (and thus __terminate is called). It certainly does not look like libc or libpthread bug.
No, my code is right. FWIW, the same piece of code runs fine under a multitude of compilers and OS combinations including 64-bit Oses like Tru64, HP-UX, Solaris 8. It also runs fine under gcc on Linux ix86. I am catching the exception. Here is a small trace of activity under Linux ix86: samba> gdb Reactor_Exceptions_Test kitty> b 93 Breakpoint 1 at 0x804c811: file Reactor_Exceptions_Test.cpp, line 93. kitty> r [New Thread 1024 (LWP 2234)] [New Thread 2049 (LWP 2237)] Delayed SIGSTOP caught for LWP 2237. [New Thread 1026 (LWP 2238)] Delayed SIGSTOP caught for LWP 2238. Activity occurred on handle8 got buf = Hello throw exception Catch exception [Switching to Thread 1026 (LWP 2238)] Breakpoint 1, My_Reactor::handle_events (this=0xbfffe730, max_wait_time=0x0) at Reactor_Exceptions_Test.cpp:93 93 ret = -1; Current language: auto; currently c++ kitty> list 88 ret = ACE_Reactor::handle_events (max_wait_time); 89 } 90 catch (...) 91 { 92 cout << "Catch exception" << endl; 93 ret = -1; 94 } 95 return ret; 96 } 97 kitty> c exception return LWP 2238 exited. LWP 2237 exited. Program exited normally. kitty> Compare this with the following on IA-64: yoda> gdb Reactor_Exceptions_Test kitty> b 93 Breakpoint 1 at 0x4000000000007912: file Reactor_Exceptions_Test.cpp, line 93. kitty> r [New Thread 1024 (LWP 10224)] [New Thread 2049 (LWP 10227)] [New Thread 1027 (LWP 10228)] Activity occurred on handle8 got buf = Hello throw exception Program received signal SIGABRT, Aborted. [Switching to Thread 1027 (LWP 10228)] 0x20000000005e0302 in kill () at soinit.c:56 56 soinit.c: No such file or directory. in soinit.c Current language: auto; currently c kitty> So my catch block is never getting executed. This might not be a bug in glibc per se but is definitely a bug in the compiler generated code.
Sorry, but without a testcase there is nothing I can do about it.
No problem. The test is kind of complicated to repeat. So I didn't bother writing a simple test case. Anyway gcc-3.0 with binutils 2.11.90.0.23 fixes the crash. So I guess it is a problem with the RedHat's compiler. So I guess I should stick with the official compiler for my compilations... Thanks anyway.