Bug 145331 - kernel panic in get_signal_to_deliver
kernel panic in get_signal_to_deliver
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Roland McGrath
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-01-17 10:17 EST by David Juran
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-01-19 18:33:06 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
RHEL-3 backport of 2.6 patch (1.23 KB, patch)
2005-01-19 02:35 EST, Roland McGrath
no flags Details | Diff

  None (edit)
Description David Juran 2005-01-17 10:17:37 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Gecko/20050104 Red Hat/1.4.3-3.0.7

Description of problem:
Got this kernel panic on one of our Opteron computers:

Kernel BUG at signal:1659
invalid operand: 0000
CPU 0
Pid: 31484, comm: ccrc_g Not tainted
RIP: 0010:[<ffffffff80132d04>]{get_signal_to_deliver+1124}
RSP: 0000:0000010074007e88  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000001
RDX: 0000000000000086 RSI: ffffffff8060eb80 RDI: 00000100bd98ce38
RBP: 0000010074007eb8 R08: 0000000000000000 R09: 00000100bd98ccc0
R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000006
R13: 0000010074006a48 R14: 0000010074007f58 R15: 0000000000000000
FS:  0000002a95d6b0a0(0000) GS:ffffffff805e1440(005b)
knlGS:0000000040355080
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000004018b060 CR3: 0000000000101000 CR4: 00000000000006e0
                                                                     
                               
Call Trace: [<ffffffff80132ce1>]{get_signal_to_deliver+1089}
       [<ffffffff80110061>]{do_signal+97}
[<ffffffff80132e45>]{sys_rt_sigprocmask+213}
       [<ffffffff801aa49c>]{sys32_rt_sigprocmask+156}
[<ffffffff801104ef>]{intret_signal+45}

Process ccrc_g (pid: 31484, stackpage=10074007000)
Stack: 0000010074007e88 0000000000000000 ffffffff80132ce1 0000010074007f58
       0000010074006a48 0000010074007eb8 0000000000000000 0000000000000000
       ffffffff80110061 0000000000000006 fffffffffffffffa 0000018500007afc
       00000100bcee1d78 0000000000000001 ffffffff80132e45 0000000000000000
       0000000000000000 00000000ffff9930 0000000000000000 0000000000000008
       ffffffff801aa49c 0000000000000020 0000000000000020 0000000000000001
       00000000ffff991c 0000000000007afc 00000000ffff991c 0000000000000000
       ffffffff801104ef 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 00000000ffff991c 0000000000007afc 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace: [<ffffffff80132ce1>]{get_signal_to_deliver+1089}
       [<ffffffff80110061>]{do_signal+97}
[<ffffffff80132e45>]{sys_rt_sigprocmask+213}
       [<ffffffff801aa49c>]{sys32_rt_sigprocmask+156}
[<ffffffff801104ef>]{intret_signal+45}

                                                                     
                               
Code: 0f 0b bd 7b 2d 80 ff ff ff ff 7b 06 65 48 8b 04 25 18 00 00
                                                                     
                               
Kernel panic: Fatal exception

does anyone have any clue?

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-27.0.1.EL

How reproducible:
Didn't try

Steps to Reproduce:
1. ????


Additional info:
Comment 1 Jim Paradis 2005-01-19 00:57:56 EST
FYI, the kernel is BUG()'ing right after do_coredump:

kernel/signal.c:

                if (sig_kernel_coredump(signr) &&
                    do_coredump((long)signr, signr, regs)) {
                        ...
                        const int code = signr | 0x80;
       DIES HERE>>>>>>> BUG_ON(!current->signal->group_exit);
                        BUG_ON(current->signal->group_exit_code != code);
                        do_exit(code);
                        /* NOTREACHED */
                }

Since do_coredump() sets current->signal->group_exit, something else must be up.

Comment 2 Roland McGrath 2005-01-19 02:35:30 EST
Created attachment 109958 [details]
RHEL-3 backport of 2.6 patch

I would expect to hit the second BUG_ON there, not the first.
There is a known race condition that can lead to this, and this was fixed in
2.6.
Comment 4 Ernie Petrides 2005-01-29 01:07:36 EST
A fix for this problem has just been committed to the RHEL3 U5
patch pool this evening (in kernel version 2.4.21-27.10.EL).
Comment 5 Tim Powers 2005-05-18 09:29:08 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-294.html

Note You need to log in before you can comment on or make changes to this bug.