Bug 103254
Summary: | Kernel crashes on Itanium after few minutes with message indicating compilation errors | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Albert Fluegel <tdsc.af> | ||||||
Component: | kernel | Assignee: | Jason Baron <jbaron> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.0 | CC: | knoel | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2004-05-13 22:29:31 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 101028 | ||||||||
Attachments: |
|
Description
Albert Fluegel
2003-08-28 07:15:19 UTC
Same kernel version works perfectly on Opteron Did you have an app that segfaulted that caused the core dumping code to execute? Created attachment 94039 [details] the binary that core dumps followed by a kernel crash This is the melim binary from Platform computing Inc. coming with the LSF software, version 5.1, see: http://www.platform.com/products/LSF/ Aditional findings: the kernel crash occurs exactly, when this program gets a SIGTERM. I attached an strace to the process and the last thing i see is: strace -f -p 3135^M Process 3135 attached - interrupt to quit^M select(0x1, 0xbffffa58, 0, 0xbffff9d8, 0xbffff9cc) = -514^M --- SIGTERM (Tersizeof(elf_gregset_t) (1024) != sizeof(struct pt_regs) (400) ^Mminated) @ 40016kernel BUG at /usr/src/build/297471-ia64/BUILD/kernel-2.4.21/linux-2.4.21/include/linux/elfcore.h:94! ^M5ce (5009) ---^M Unable to handle kernel NULL pointer dereferencemelim[3135]: Oops 8804682956800 ^M ^MPid: 3135, comm: melim and the rest is like already reported. Here's what happens on 2.4.21-1.1931.2.393, the main difference is, that the machine does not stop working. output on console, if that melim program gets SIGTERM: ^MIA32 syscall #252 issued, maybe we should implement it ^MAug 29 16:49:47 ltuii002 kernel: IA32 syscall #252 issued, maybe we should implement it^M sizeof(elf_gregset_t) (1024) != sizeof(struct pt_regs) (400) ^Mkernel BUG at /usr/src/build/293850-ia64/BUILD/kernel-2.4.21/linux-2.4.21/include/linux/elfcore.h:94! ^MUnable to handle kernel NULL pointer dereferencemelim[3468]: Oops 8804682956800 ^M ^MPid: 3468, comm: melim ^MEIP is at elf_core_dump [kernel] 0x640 (2.4.21-1.1931.2.393.ent) ^Mpsr : 0000101008026018 ifs : 8000000000000e24 ip : [<e00000000446f260>] Not tainted ^Munat: 0000000000000000 pfs : 0000000000000e24 rsc : 0000000000000003 ^Mrnat: 00000000000000bf bsps: 0000000000000fff pr : 8002924155aa9967 ^Mldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f ^Mb0 : e00000000446f250 b6 : e0000000047f80e0 b7 : e0000000047f4fa0 ^Mf6 : 0fffbccccccccc8c00000 f7 : 0ffdcb640000000000000 ^Mf8 : 100029000000000000000 f9 : 10002a000000000000000 ^Mr1 : e000000004c9bd00 r2 : e0000000018a7e60 r3 : 000000000000416a ^Mr8 : 0000000000000066 r9 : 0000000000000000 r10 : 0000000000000000 ^Mr11 : e0000000018a0000 r12 : e00000001b05ef50 r13 : e00000001b058000 ^Mr14 : 0000000000000001 r15 : 0000000000000000 r16 : e0000000018a7e48 ^Mr17 : 0000000000004000 r18 : 0000000000004000 r19 : e000000004b68580 ^Mr20 : e000000004abb8e8 r21 : e0000000047f4d60 r22 : 0000000000020000 ^Mr23 : e000000004b66d70 r24 : 0000000000000060 r25 : 0000000000000000 ^Mr26 : 0000000000000000 r27 : 00000000100000c0 r28 : 0000000000800000 ^Mr29 : 0000000000000001 r30 : e000000000025a00 r31 : e000000004b66d70 ^M ^MCall Trace: [<e0000000044155c0>] sp=0xe00000001b05eb60 bsp=0xe00000001b059460 show_stack [kernel] 0x80 ^M[<e000000004430150>] sp=0xe00000001b05ed20 bsp=0xe00000001b059438 die [kernel] 0x1b0 ^M[<e000000004451a70>] sp=0xe00000001b05ed20 bsp=0xe00000001b0593d8 ia64_do_page_fault [kernel] 0x310 ^M[<e00000000440e680>] sp=0xe00000001b05edb0 bsp=0xe00000001b0593d8 ia64_leave_kernel [kernel] 0x0 ^M[<e00000000446f260>] sp=0xe00000001b05ef50 bsp=0xe00000001b0592b8 elf_core_dump [kernel] 0x640 ^M[<e00000000452cae0>] sp=0xe00000001b05fd80 bsp=0xe00000001b059260 do_coredump [kernel] 0x500 ^M[<e0000000044a7810>] sp=0xe00000001b05fdd0 bsp=0xe00000001b0591e8 get_signal_to_deliver [kernel] 0x630 ^M[<e00000000442e7f0>] sp=0xe00000001b05fdd0 bsp=0xe00000001b059180 ia64_do_signal [kernel] 0xd0 ^M[<e00000000440eac0>] sp=0xe00000001b05fe50 bsp=0xe00000001b059130 handle_signal_delivery [kernel] 0x40 ^M[<e00000000440e6f0>] sp=0xe00000001b05fe60 bsp=0xe00000001b059130 ia64_leave_kernel [kernel] 0x70 ^M Aug 29 16:49:57 ltuii002 kernel: sizeof(elf_gregset_t) (1024) != sizeof(struct pt_regs) (400)^M Aug 29 16:49:57 ltuii002 kernel: kernel BUG at /usr/src/build/293850-ia64/BUILD/kernel-2.4.21/linux-2.4.21/include/linux/elfcore.h:94!^M Aug 29 16:49:57 ltuii002 kernel: Unable to handle kernel NULL pointer dereferencemelim[3468]: Oops 8804682956800^M Aug 29 16:49:57 ltuii002 kernel: ^M Aug 29 16:49:57 ltuii002 kernel: Pid: 3468, comm: melim^M Aug 29 16:49:57 ltuii002 kernel: EIP is at elf_core_dump [kernel] 0x640 (2.4.21-1.1931.2.393.ent)^M Aug 29 16:49:57 ltuii002 kernel: psr : 0000101008026018 ifs : 8000000000000e24 ip : [<e00000000446f260>] Not tainted^M Aug 29 16:49:57 ltuii002 kernel: unat: 0000000000000000 pfs : 0000000000000e24 rsc : 0000000000000003^M Aug 29 16:49:57 ltuii002 kernel: rnat: 00000000000000bf bsps: 0000000000000fff pr : 8002924155aa9967^M Aug 29 16:49:57 ltuii002 kernel: ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f^M Aug 29 16:49:57 ltuii002 kernel: b0 : e00000000446f250 b6 : e0000000047f80e0 b7 : e0000000047f4fa0^M Aug 29 16:49:57 ltuii002 kernel: f6 : 0fffbccccccccc8c00000 f7 : 0ffdcb640000000000000^M Aug 29 16:49:57 ltuii002 kernel: f8 : 100029000000000000000 f9 : 10002a000000000000000^M Aug 29 16:49:57 ltuii002 kernel: r1 : e000000004c9bd00 r2 : e0000000018a7e60 r3 : 000000000000416a^M Aug 29 16:49:57 ltuii002 kernel: r8 : 0000000000000066 r9 : 0000000000000000 r10 : 0000000000000000^M Aug 29 16:49:57 ltuii002 kernel: r11 : e0000000018a0000 r12 : e00000001b05ef50 r13 : e00000001b058000^M Aug 29 16:49:57 ltuii002 kernel: r14 : 0000000000000001 r15 : 0000000000000000 r16 : e0000000018a7e48^M Aug 29 16:49:58 ltuii002 kernel: r17 : 0000000000004000 r18 : 0000000000004000 r19 : e000000004b68580^M Aug 29 16:49:58 ltuii002 kernel: r20 : e000000004abb8e8 r21 : e0000000047f4d60 r22 : 0000000000020000^M Aug 29 16:49:58 ltuii002 kernel: r23 : e000000004b66d70 r24 : 0000000000000060 r25 : 0000000000000000^M Aug 29 16:49:58 ltuii002 kernel: r26 : 0000000000000000 r27 : 00000000100000c0 r28 : 0000000000800000^M Aug 29 16:49:58 ltuii002 kernel: r29 : 0000000000000001 r30 : e000000000025a00 r31 : e000000004b66d70^M Aug 29 16:49:58 ltuii002 kernel: ^M Aug 29 16:49:58 ltuii002 kernel: Call Trace: [<e0000000044155c0>] sp=0xe00000001b05eb60 bsp=0xe00000001b059460 show_stack [kernel] 0x80^M Aug 29 16:49:58 ltuii002 kernel: [<e000000004430150>] sp=0xe00000001b05ed20 bsp=0xe00000001b059438 die [kernel] 0x1b0^M Aug 29 16:49:58 ltuii002 kernel: [<e000000004451a70>] sp=0xe00000001b05ed20 bsp=0xe00000001b0593d8 ia64_do_page_fault [kernel] 0x310^M Aug 29 16:49:58 ltuii002 kernel: [<e00000000440e680>] sp=0xe00000001b05edb0 bsp=0xe00000001b0593d8 ia64_leave_kernel [kernel] 0x0^M Aug 29 16:49:58 ltuii002 kernel: [<e00000000446f260>] sp=0xe00000001b05ef50 bsp=0xe00000001b0592b8 elf_core_dump [kernel] 0x640^M Aug 29 16:49:58 ltuii002 kernel: [<e00000000452cae0>] sp=0xe00000001b05fd80 bsp=0xe00000001b059260 do_coredump [kernel] 0x500^M Aug 29 16:49:58 ltuii002 kernel: [<e0000000044a7810>] sp=0xe00000001b05fdd0 bsp=0xe00000001b0591e8 get_signal_to_deliver [kernel] 0x630^M Aug 29 16:49:58 ltuii002 kernel: [<e00000000442e7f0>] sp=0xe00000001b05fdd0 bsp=0xe00000001b059180 ia64_do_signal [kernel] 0xd0^M Aug 29 16:49:58 ltuii002 kernel: [<e00000000440eac0>] sp=0xe00000001b05fe50 bsp=0xe00000001b059130 handle_signal_delivery [kernel] 0x40^M Aug 29 16:49:58 ltuii002 kernel: [<e00000000440e6f0>] sp=0xe00000001b05fe60 bsp=0xe00000001b059130 ia64_leave_kernel [kernel] 0x70^M <4>IA32 syscall #252 issued, maybe we should implement it ^MAug 29 16:50:10 ltuii002 kernel: <4>IA32 syscall #252 issued, maybe we should implement it^M Could it be it has something to do with the nanosleep 32 Bit implementation ? I've seen that call one time in gdb just before the machine went down with .411 kernel: Program received signal SIGTERM, Terminated. 0x400165ce in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) s Single stepping until exit from function _dl_sysinfo_int80, which has no line number information. (now kill <pid>) Program received signal SIGSEGV, Segmentation fault. 0x400eda8e in nanosleep () from /lib/tls/libc.so.6 (gdb) s Single stepping until exit from function nanosleep, which has no line number information. Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. (gdb) s BTW it is not possible to strace the process termination on the .393 kernel. The only things i get: select(0x1, 0xbffff828, 0, 0xbffff7a8, 0xbffff79c) = -514 --- SIGTERM (Terminated) @ 400165ce (bfa) --- Process 3450 detached Here's how to reproduce the problem (kernel messages like reported, but without kernel crash, but in my opinion this should be sufficient to locate the issue): Write a trivial program, that immediately dumps core, e.g.: main() { *((char *) 2) = 5; } compile it on a x86 machine (e.g. Xeon) to become an i386 executable, then start it on an Itanium. It is important, that the coredumpsize resource is not set to 0, so first set it to unlimited (e.g. for csh: limit coredumpsize unlimited or for sh: ulimit -c unlimited). Immediately the following messages appear in the syslog: Sep 1 12:58:55 ltuii002 kernel: sizeof(elf_gregset_t) (1024) != sizeof(struct pt_regs) (400) Sep 1 12:58:55 ltuii002 kernel: kernel BUG at /usr/src/build/293850-ia64/BUILD/kernel-2.4.21/linux-2.4.21/include/linux/elfcore.h:94! Sep 1 12:58:55 ltuii002 kernel: Unable to handle kernel NULL pointer dereferences[22128]: Oops 8804682956800 Sep 1 12:58:55 ltuii002 kernel: Sep 1 12:58:55 ltuii002 kernel: Pid: 22128, comm: s Sep 1 12:58:55 ltuii002 kernel: EIP is at elf_core_dump [kernel] 0x640 (2.4.21-1.1931.2.393.ent) Sep 1 12:58:55 ltuii002 kernel: psr : 0000101008026038 ifs : 8000000000000e24 ip : [<e00000000446f260>] Not tainted Sep 1 12:58:55 ltuii002 kernel: unat: 0000000000000000 pfs : 0000000000000e24 rsc : 0000000000000003 Sep 1 12:58:55 ltuii002 kernel: rnat: 00000000000000bf bsps: 0000000000000fff pr : 8002924155aa9967 Sep 1 12:58:55 ltuii002 kernel: ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f Sep 1 12:58:55 ltuii002 kernel: b0 : e00000000446f250 b6 : e0000000044bc760 b7 : e0000000047f4fa0 Sep 1 12:58:55 ltuii002 kernel: f6 : 0fffbccccccccc8c00000 f7 : 0ffdcb640000000000000 Sep 1 12:58:55 ltuii002 kernel: f8 : 100029000000000000000 f9 : 10002a000000000000000 Sep 1 12:58:55 ltuii002 kernel: r1 : e000000004c9bd00 r2 : e00000003ed57e60 r3 : 000000000001f584 Sep 1 12:58:55 ltuii002 kernel: r8 : 0000000000000066 r9 : 0000000000000000 r10 : 0000000000000000 Sep 1 12:58:55 ltuii002 kernel: r11 : e00000003ed50000 r12 : e00000000d5fef50 r13 : e00000000d5f8000 Sep 1 12:58:55 ltuii002 kernel: r14 : 0000000000000001 r15 : 0000000000000000 r16 : e00000003ed57e48 Sep 1 12:58:55 ltuii002 kernel: r17 : 0000000000004000 r18 : 0000000000004000 r19 : e000000004b68580 Sep 1 12:58:55 ltuii002 kernel: r20 : e000000004abb8e8 r21 : e0000000047f4d60 r22 : 0000000000020000 Sep 1 12:58:55 ltuii002 kernel: r23 : e000000004b66d70 r24 : 0000000000000060 r25 : 0000000000000000 Sep 1 12:58:55 ltuii002 kernel: r26 : 0000000000000000 r27 : 00000000100000c0 r28 : 0000000000800000 Sep 1 12:58:55 ltuii002 kernel: r29 : 0000000000000001 r30 : e000000000025a00 r31 : e000000004b66d70 Sep 1 12:58:55 ltuii002 kernel: Sep 1 12:58:55 ltuii002 kernel: Call Trace: [<e0000000044155c0>] sp=0xe00000000d5feb60 bsp=0xe00000000d5f9460 show_stack [kernel] 0x80 Sep 1 12:58:55 ltuii002 kernel: [<e000000004430150>] sp=0xe00000000d5fed20 bsp=0xe00000000d5f9438 die [kernel] 0x1b0 Sep 1 12:58:55 ltuii002 kernel: [<e000000004451a70>] sp=0xe00000000d5fed20 bsp=0xe00000000d5f93d8 ia64_do_page_fault [kernel] 0x310 Sep 1 12:58:55 ltuii002 kernel: [<e00000000440e680>] sp=0xe00000000d5fedb0 bsp=0xe00000000d5f93d8 ia64_leave_kernel [kernel] 0x0 Sep 1 12:58:55 ltuii002 kernel: [<e00000000446f260>] sp=0xe00000000d5fef50 bsp=0xe00000000d5f92b8 elf_core_dump [kernel] 0x640 Sep 1 12:58:55 ltuii002 kernel: [<e00000000452cae0>] sp=0xe00000000d5ffd80 bsp=0xe00000000d5f9260 do_coredump [kernel] 0x500 Sep 1 12:58:55 ltuii002 kernel: [<e0000000044a7810>] sp=0xe00000000d5ffdd0 bsp=0xe00000000d5f91e8 get_signal_to_deliver [kernel] 0x630 Sep 1 12:58:55 ltuii002 kernel: [<e00000000442e7f0>] sp=0xe00000000d5ffdd0 bsp=0xe00000000d5f9180 ia64_do_signal [kernel] 0xd0 Sep 1 12:58:55 ltuii002 kernel: [<e00000000440eac0>] sp=0xe00000000d5ffe50 bsp=0xe00000000d5f9130 handle_signal_delivery [kernel] 0x40 Sep 1 12:58:55 ltuii002 kernel: [<e00000000440e6f0>] sp=0xe00000000d5ffe60 bsp=0xe00000000d5f9130 ia64_leave_kernel [kernel] 0x70 Maybe i'm wrong, AFAIS from the code is, that ia32 core dump is not really supported under Itanium Linux. So probably it should be better hardcoded coredumpsize = 0 for now ? Created attachment 97280 [details]
/var/log/messages snippet when doing I/O testing on external disks
System seems to work OK but I am perplexed with these messages clogging up the
system logfile.
this has long since been fixed. pls update the kernel. closing. |