From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3) Gecko/20050104 Red Hat/1.4.3-3.0.7 Description of problem: Got this kernel panic on one of our Opteron computers: Kernel BUG at signal:1659 invalid operand: 0000 CPU 0 Pid: 31484, comm: ccrc_g Not tainted RIP: 0010:[<ffffffff80132d04>]{get_signal_to_deliver+1124} RSP: 0000:0000010074007e88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000001 RDX: 0000000000000086 RSI: ffffffff8060eb80 RDI: 00000100bd98ce38 RBP: 0000010074007eb8 R08: 0000000000000000 R09: 00000100bd98ccc0 R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000006 R13: 0000010074006a48 R14: 0000010074007f58 R15: 0000000000000000 FS: 0000002a95d6b0a0(0000) GS:ffffffff805e1440(005b) knlGS:0000000040355080 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 000000004018b060 CR3: 0000000000101000 CR4: 00000000000006e0 Call Trace: [<ffffffff80132ce1>]{get_signal_to_deliver+1089} [<ffffffff80110061>]{do_signal+97} [<ffffffff80132e45>]{sys_rt_sigprocmask+213} [<ffffffff801aa49c>]{sys32_rt_sigprocmask+156} [<ffffffff801104ef>]{intret_signal+45} Process ccrc_g (pid: 31484, stackpage=10074007000) Stack: 0000010074007e88 0000000000000000 ffffffff80132ce1 0000010074007f58 0000010074006a48 0000010074007eb8 0000000000000000 0000000000000000 ffffffff80110061 0000000000000006 fffffffffffffffa 0000018500007afc 00000100bcee1d78 0000000000000001 ffffffff80132e45 0000000000000000 0000000000000000 00000000ffff9930 0000000000000000 0000000000000008 ffffffff801aa49c 0000000000000020 0000000000000020 0000000000000001 00000000ffff991c 0000000000007afc 00000000ffff991c 0000000000000000 ffffffff801104ef 0000000000000000 0000000000000000 0000000000000000 0000000000000000 00000000ffff991c 0000000000007afc 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff80132ce1>]{get_signal_to_deliver+1089} [<ffffffff80110061>]{do_signal+97} [<ffffffff80132e45>]{sys_rt_sigprocmask+213} [<ffffffff801aa49c>]{sys32_rt_sigprocmask+156} [<ffffffff801104ef>]{intret_signal+45} Code: 0f 0b bd 7b 2d 80 ff ff ff ff 7b 06 65 48 8b 04 25 18 00 00 Kernel panic: Fatal exception does anyone have any clue? Version-Release number of selected component (if applicable): kernel-smp-2.4.21-27.0.1.EL How reproducible: Didn't try Steps to Reproduce: 1. ???? Additional info:
FYI, the kernel is BUG()'ing right after do_coredump: kernel/signal.c: if (sig_kernel_coredump(signr) && do_coredump((long)signr, signr, regs)) { ... const int code = signr | 0x80; DIES HERE>>>>>>> BUG_ON(!current->signal->group_exit); BUG_ON(current->signal->group_exit_code != code); do_exit(code); /* NOTREACHED */ } Since do_coredump() sets current->signal->group_exit, something else must be up.
Created attachment 109958 [details] RHEL-3 backport of 2.6 patch I would expect to hit the second BUG_ON there, not the first. There is a known race condition that can lead to this, and this was fixed in 2.6.
A fix for this problem has just been committed to the RHEL3 U5 patch pool this evening (in kernel version 2.4.21-27.10.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html