Bug 428693

Summary: [RHEL5 U2] kernel BUG at kernel/utrace.c:345!
Product: Red Hat Enterprise Linux 5 Reporter: Jeff Burke <jburke>
Component: kernelAssignee: Roland McGrath <roland>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: dzickus, ebachalo, jan.kratochvil, mgahagan
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1604354
Whiteboard:
Fixed In Version: RHBA-2008-0314 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 15:06:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeff Burke 2008-01-14 16:38:41 UTC
Description of problem:
While running the scrashme test the system panics

Version-Release number of selected component (if applicable):
2.6.18-68.el5debug

How reproducible:
Intermittent

Steps to Reproduce:
1. Install RHEL5.1, Then install 2.6.18-68.el5debug kernel.
2. Run the scrashme test from rhts several times.
  
Actual results:
------------[ cut here ]------------
kernel BUG at kernel/utrace.c:345!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo
crypto_api dm_multipath video sbs backlight i2c_ec i2c_core button battery
asus_acpi ac parport_pc lp parport joydev e100 tg3 mii ide_cd floppy cdrom
serio_raw sg pcspkr dm_snapshot dm_zero dm_mirror dm_mod qla2xxx
scsi_transport_fc ata_piix libata megaraid_mbox sd_mod scsi_mod megaraid_mm ext3
jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    4
EIP:    0060:[<c0456ca1>]    Tainted: P      VLI
EFLAGS: 00010202   (2.6.18-68.el5debug #1) 
EIP is at check_dead_utrace+0x123/0x14d
eax: 00000020   ebx: f3b724d4   ecx: c042fe91   edx: debe7000
esi: ed74ece0   edi: 00000000   ebp: f3b724cc   esp: debe7f44
ds: 007b   es: 007b   ss: 0068
Process scrashme (pid: 3259, ti=debe7000 task=ed74ece0 task.ti=debe7000)
Stack: 00000020 f3b724d4 00000000 f3b724d4 f3b724cc c0456d25 00000000 ed74ece0 
       ed74ece0 f3b724cc ed74ece0 debe7fa4 c0456d59 00000000 00000010 c04293c6 
       f7c0eca0 ed74ed94 00000000 00000004 00000000 f1f4f180 00000000 bffc8e78 
Call Trace:
 [<c0456d25>] remove_detached+0x5a/0x6d
 [<c0456d59>] finish_report_death+0x21/0x24
 [<c04293c6>] do_exit+0x71b/0x79c
 [<c04294bd>] sys_exit_group+0x0/0xd
 [<c0404f7b>] syscall_call+0x7/0xb
 =======================
Code: 98 00 00 00 89 f0 e8 bb 90 fd ff 83 be 98 00 00 00 ff 75 1f b8 20 00 00 00
87 86 90 00 00 00 83 f8 10 c7 04 24 20 00 00 00 74 08 <0f> 0b 59 01 50 c8 63 c0
b8 00 1a 73 c0 e8 5e e8 1b 00 83 3c 24 
EIP: [<c0456ca1>] check_dead_utrace+0x123/0x14d SS:ESP 0068:debe7f44
<0>Kernel panic - not syncing: Fatal exception

Expected results:
This should pass

Additional info:
This bug seems similar to BZ:
 Bugzilla Bug 351031: utrace: crash - utrace_get_signal
But, since the traceback and line numbers are a little different I decided to
open a new bug.

Spoke with Roland about this here was his reply.

"This is no different from what I said before.  To reiterate: Bugs
RHBZ#312961, RHBZ#245735, RHBZ#245429 relate to intermittent/racy
issues in utrace, all brought out by the same sort of torture test.
Those remain unfixed as of now, though all other 5.2 utrace issues 
are fixed by what went into cvs just before xmas.  Since these are
crashing problems, I expect their eventual fixes to go into 5.2 and 
5.1.z as soon they are available, regardless of planned deadlines."

Comment 1 Jeff Burke 2008-01-22 13:46:39 UTC
This was seen again while running the 2.6.18-71.el5 kernel through testing.

Link to log:
http://rhts.lab.boston.redhat.com/testlogs/13959/48993/399225/1712222-test_log--kernel-syscalls-scrashme-multiple-EXTERNALWATCHDOG

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at kernel/utrace.c:345
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 0 
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo
crypto_api dm_multipath video sbs backlight i2c_ec button battery asus_acpi
acpi_memhotplug ac parport_pc lp parport joydev sg i2c_i801 i2c_core shpchp
i5000_edac edac_mc serio_raw ide_cd e1000 cdrom pcspkr dm_snapshot dm_zero
dm_mirror dm_mod ata_piix libata mptsas mptscsih mptbase scsi_transport_sas
sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 28858, comm: scrashme Not tainted 2.6.18-71.el5 #1
RIP: 0010:[<ffffffff800ba6fc>]  [<ffffffff800ba6fc>] check_dead_utrace+0x143/0x172
RSP: 0018:ffff810030e7feb8  EFLAGS: 00010202
RAX: 0000000000000020 RBX: ffff81015a81d100 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff8101aed814c8
RBP: 0000000000000000 R08: ffff8101ac521690 R09: ffff8101ad4e0cc0
R10: ffff81010626bc80 R11: 0000000000000065 R12: ffff8101128fc2c0
R13: 0000000000000020 R14: ffff81015a81d100 R15: 0000000000000028
FS:  0000000000000000(0000) GS:ffffffff8039c000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaad28960 CR3: 0000000000201000 CR4: 00000000000006e0
Process scrashme (pid: 28858, threadinfo ffff810030e7e000, task ffff81015a81d100)
Stack:  ffff8101bc211c40 ffff8101128fc2d0 ffff8101128fc2c0 0000000000000000
 0000000000000000 ffffffff800ba79e ffff8101bd324c40 0000000000000010
 ffff81015a81d100 ffff8101128fc2c0 ffff81015a81d238 0000000000000000
Call Trace:
 [<ffffffff800ba79e>] remove_detached+0x73/0x85
 [<ffffffff80015549>] do_exit+0x81c/0x8d0
 [<ffffffff80047100>] cpuset_exit+0x0/0x6c
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0


Code: 0f 0b 68 52 09 29 80 c2 59 01 f0 ff 05 73 97 2f 00 49 83 fd 
RIP  [<ffffffff800ba6fc>] check_dead_utrace+0x143/0x172
 RSP <ffff810030e7feb8>


Comment 2 Jeff Burke 2008-02-22 14:08:32 UTC
This was seen again while running the 2.6.18-71.el5 kernel through testing.

Link to log:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2016911

------------[ cut here ]------------
kernel BUG at kernel/utrace.c:345!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo
crypto_api dm_multipath video sbs backlight i2c_ec button battery asus_acpi ac
parport_pc lp parport joydev i2c_piix4 ide_cd tg3 i2c_core cdrom serio_raw
pcspkr sg dm_snapshot dm_zero dm_mirror dm_mod aic94xx libsas libata
scsi_transport_sas aacraid sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    9
EIP:    0060:[<c04514fc>]    Not tainted VLI
EFLAGS: 00010202   (2.6.18-83.el5 #1) 
EIP is at check_dead_utrace+0x10f/0x133
eax: 00000020   ebx: de640aa0   ecx: 00000246   edx: 00000200
esi: 00000000   edi: d9d4d2c0   ebp: 00000020   esp: e03e1f48
ds: 007b   es: 007b   ss: 0068
Process scrashme (pid: 3050, ti=e03e1000 task=de640aa0 task.ti=e03e1000)
Stack: d9d4d2c8 00000000 d9d4d2c8 d9d4d2c0 c045157a 00000000 de640aa0 de640aa0 
       d9d4d2c0 de640aa0 e03e1fa4 c04515ae 00000000 00000010 c0428b85 f7c0aaa0 
       de640b54 00000000 00000009 00000000 00000000 f70c1ac0 00000000 e03e1000 
Call Trace:
 [<c045157a>] remove_detached+0x5a/0x6d
 [<c04515ae>] finish_report_death+0x21/0x24
 [<c0428b85>] do_exit+0x6e4/0x75e
 [<c0428c75>] sys_exit_group+0x0/0xd
 [<c0404eff>] syscall_call+0x7/0xb
 =======================
Code: 39 8b 93 98 00 00 00 89 d8 e8 e2 dc fd ff 83 bb 98 00 00 00 ff 75 1c b8 20
00 00 00 87 83 90 00 00 00 83 f8 10 66 bd 20 00 74 08 <0f> 0b 59 01 d5 d6 62 c0
f0 ff 05 00 da 6d c0 83 fd 20 75 0b 89 
EIP: [<c04514fc>] check_dead_utrace+0x10f/0x133 SS:ESP 0068:e03e1f48
 <0>Kernel panic - not syncing: Fatal exception
 

Comment 3 Jeff Burke 2008-02-22 14:10:18 UTC
Cut and paste error in Comment #2. It should have said:
"This was seen again while running the 2.6.18-83.el5 kernel through testing"

Jeff

Comment 7 Don Zickus 2008-04-02 16:09:09 UTC
in kernel-2.6.18-88.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 9 Mike Gahagan 2008-04-28 21:22:47 UTC
confirmed no systems are currently crashing when running the scrashme test and
that the fix is in the -90 kernel.


Comment 11 errata-xmlrpc 2008-05-21 15:06:23 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html