Bug 145331

Summary: kernel panic in get_signal_to_deliver
Product: Red Hat Enterprise Linux 3 Reporter: David Juran <djuran>
Component: kernelAssignee: Roland McGrath <roland>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: jparadis, mingo, peterm, petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-01-19 23:33:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
RHEL-3 backport of 2.6 patch none

Description David Juran 2005-01-17 15:17:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Gecko/20050104 Red Hat/1.4.3-3.0.7

Description of problem:
Got this kernel panic on one of our Opteron computers:

Kernel BUG at signal:1659
invalid operand: 0000
CPU 0
Pid: 31484, comm: ccrc_g Not tainted
RIP: 0010:[<ffffffff80132d04>]{get_signal_to_deliver+1124}
RSP: 0000:0000010074007e88  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000001
RDX: 0000000000000086 RSI: ffffffff8060eb80 RDI: 00000100bd98ce38
RBP: 0000010074007eb8 R08: 0000000000000000 R09: 00000100bd98ccc0
R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000006
R13: 0000010074006a48 R14: 0000010074007f58 R15: 0000000000000000
FS:  0000002a95d6b0a0(0000) GS:ffffffff805e1440(005b)
knlGS:0000000040355080
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000004018b060 CR3: 0000000000101000 CR4: 00000000000006e0
                                                                     
                               
Call Trace: [<ffffffff80132ce1>]{get_signal_to_deliver+1089}
       [<ffffffff80110061>]{do_signal+97}
[<ffffffff80132e45>]{sys_rt_sigprocmask+213}
       [<ffffffff801aa49c>]{sys32_rt_sigprocmask+156}
[<ffffffff801104ef>]{intret_signal+45}

Process ccrc_g (pid: 31484, stackpage=10074007000)
Stack: 0000010074007e88 0000000000000000 ffffffff80132ce1 0000010074007f58
       0000010074006a48 0000010074007eb8 0000000000000000 0000000000000000
       ffffffff80110061 0000000000000006 fffffffffffffffa 0000018500007afc
       00000100bcee1d78 0000000000000001 ffffffff80132e45 0000000000000000
       0000000000000000 00000000ffff9930 0000000000000000 0000000000000008
       ffffffff801aa49c 0000000000000020 0000000000000020 0000000000000001
       00000000ffff991c 0000000000007afc 00000000ffff991c 0000000000000000
       ffffffff801104ef 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 00000000ffff991c 0000000000007afc 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace: [<ffffffff80132ce1>]{get_signal_to_deliver+1089}
       [<ffffffff80110061>]{do_signal+97}
[<ffffffff80132e45>]{sys_rt_sigprocmask+213}
       [<ffffffff801aa49c>]{sys32_rt_sigprocmask+156}
[<ffffffff801104ef>]{intret_signal+45}

                                                                     
                               
Code: 0f 0b bd 7b 2d 80 ff ff ff ff 7b 06 65 48 8b 04 25 18 00 00
                                                                     
                               
Kernel panic: Fatal exception

does anyone have any clue?

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-27.0.1.EL

How reproducible:
Didn't try

Steps to Reproduce:
1. ????


Additional info:

Comment 1 Jim Paradis 2005-01-19 05:57:56 UTC
FYI, the kernel is BUG()'ing right after do_coredump:

kernel/signal.c:

                if (sig_kernel_coredump(signr) &&
                    do_coredump((long)signr, signr, regs)) {
                        ...
                        const int code = signr | 0x80;
       DIES HERE>>>>>>> BUG_ON(!current->signal->group_exit);
                        BUG_ON(current->signal->group_exit_code != code);
                        do_exit(code);
                        /* NOTREACHED */
                }

Since do_coredump() sets current->signal->group_exit, something else must be up.



Comment 2 Roland McGrath 2005-01-19 07:35:30 UTC
Created attachment 109958 [details]
RHEL-3 backport of 2.6 patch

I would expect to hit the second BUG_ON there, not the first.
There is a known race condition that can lead to this, and this was fixed in
2.6.

Comment 4 Ernie Petrides 2005-01-29 06:07:36 UTC
A fix for this problem has just been committed to the RHEL3 U5
patch pool this evening (in kernel version 2.4.21-27.10.EL).


Comment 5 Tim Powers 2005-05-18 13:29:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-294.html