Description of problem: While running the scrashme test the system panics Version-Release number of selected component (if applicable): 2.6.18-68.el5debug How reproducible: Intermittent Steps to Reproduce: 1. Install RHEL5.1, Then install 2.6.18-68.el5debug kernel. 2. Run the scrashme test from rhts several times. Actual results: ------------[ cut here ]------------ kernel BUG at kernel/utrace.c:345! invalid opcode: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath video sbs backlight i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport joydev e100 tg3 mii ide_cd floppy cdrom serio_raw sg pcspkr dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc ata_piix libata megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 4 EIP: 0060:[<c0456ca1>] Tainted: P VLI EFLAGS: 00010202 (2.6.18-68.el5debug #1) EIP is at check_dead_utrace+0x123/0x14d eax: 00000020 ebx: f3b724d4 ecx: c042fe91 edx: debe7000 esi: ed74ece0 edi: 00000000 ebp: f3b724cc esp: debe7f44 ds: 007b es: 007b ss: 0068 Process scrashme (pid: 3259, ti=debe7000 task=ed74ece0 task.ti=debe7000) Stack: 00000020 f3b724d4 00000000 f3b724d4 f3b724cc c0456d25 00000000 ed74ece0 ed74ece0 f3b724cc ed74ece0 debe7fa4 c0456d59 00000000 00000010 c04293c6 f7c0eca0 ed74ed94 00000000 00000004 00000000 f1f4f180 00000000 bffc8e78 Call Trace: [<c0456d25>] remove_detached+0x5a/0x6d [<c0456d59>] finish_report_death+0x21/0x24 [<c04293c6>] do_exit+0x71b/0x79c [<c04294bd>] sys_exit_group+0x0/0xd [<c0404f7b>] syscall_call+0x7/0xb ======================= Code: 98 00 00 00 89 f0 e8 bb 90 fd ff 83 be 98 00 00 00 ff 75 1f b8 20 00 00 00 87 86 90 00 00 00 83 f8 10 c7 04 24 20 00 00 00 74 08 <0f> 0b 59 01 50 c8 63 c0 b8 00 1a 73 c0 e8 5e e8 1b 00 83 3c 24 EIP: [<c0456ca1>] check_dead_utrace+0x123/0x14d SS:ESP 0068:debe7f44 <0>Kernel panic - not syncing: Fatal exception Expected results: This should pass Additional info: This bug seems similar to BZ: Bugzilla Bug 351031: utrace: crash - utrace_get_signal But, since the traceback and line numbers are a little different I decided to open a new bug. Spoke with Roland about this here was his reply. "This is no different from what I said before. To reiterate: Bugs RHBZ#312961, RHBZ#245735, RHBZ#245429 relate to intermittent/racy issues in utrace, all brought out by the same sort of torture test. Those remain unfixed as of now, though all other 5.2 utrace issues are fixed by what went into cvs just before xmas. Since these are crashing problems, I expect their eventual fixes to go into 5.2 and 5.1.z as soon they are available, regardless of planned deadlines."
This was seen again while running the 2.6.18-71.el5 kernel through testing. Link to log: http://rhts.lab.boston.redhat.com/testlogs/13959/48993/399225/1712222-test_log--kernel-syscalls-scrashme-multiple-EXTERNALWATCHDOG ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at kernel/utrace.c:345 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 0 Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sg i2c_i801 i2c_core shpchp i5000_edac edac_mc serio_raw ide_cd e1000 cdrom pcspkr dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 28858, comm: scrashme Not tainted 2.6.18-71.el5 #1 RIP: 0010:[<ffffffff800ba6fc>] [<ffffffff800ba6fc>] check_dead_utrace+0x143/0x172 RSP: 0018:ffff810030e7feb8 EFLAGS: 00010202 RAX: 0000000000000020 RBX: ffff81015a81d100 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff8101aed814c8 RBP: 0000000000000000 R08: ffff8101ac521690 R09: ffff8101ad4e0cc0 R10: ffff81010626bc80 R11: 0000000000000065 R12: ffff8101128fc2c0 R13: 0000000000000020 R14: ffff81015a81d100 R15: 0000000000000028 FS: 0000000000000000(0000) GS:ffffffff8039c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaad28960 CR3: 0000000000201000 CR4: 00000000000006e0 Process scrashme (pid: 28858, threadinfo ffff810030e7e000, task ffff81015a81d100) Stack: ffff8101bc211c40 ffff8101128fc2d0 ffff8101128fc2c0 0000000000000000 0000000000000000 ffffffff800ba79e ffff8101bd324c40 0000000000000010 ffff81015a81d100 ffff8101128fc2c0 ffff81015a81d238 0000000000000000 Call Trace: [<ffffffff800ba79e>] remove_detached+0x73/0x85 [<ffffffff80015549>] do_exit+0x81c/0x8d0 [<ffffffff80047100>] cpuset_exit+0x0/0x6c [<ffffffff8005b28d>] tracesys+0xd5/0xe0 Code: 0f 0b 68 52 09 29 80 c2 59 01 f0 ff 05 73 97 2f 00 49 83 fd RIP [<ffffffff800ba6fc>] check_dead_utrace+0x143/0x172 RSP <ffff810030e7feb8>
This was seen again while running the 2.6.18-71.el5 kernel through testing. Link to log: http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2016911 ------------[ cut here ]------------ kernel BUG at kernel/utrace.c:345! invalid opcode: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath video sbs backlight i2c_ec button battery asus_acpi ac parport_pc lp parport joydev i2c_piix4 ide_cd tg3 i2c_core cdrom serio_raw pcspkr sg dm_snapshot dm_zero dm_mirror dm_mod aic94xx libsas libata scsi_transport_sas aacraid sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 9 EIP: 0060:[<c04514fc>] Not tainted VLI EFLAGS: 00010202 (2.6.18-83.el5 #1) EIP is at check_dead_utrace+0x10f/0x133 eax: 00000020 ebx: de640aa0 ecx: 00000246 edx: 00000200 esi: 00000000 edi: d9d4d2c0 ebp: 00000020 esp: e03e1f48 ds: 007b es: 007b ss: 0068 Process scrashme (pid: 3050, ti=e03e1000 task=de640aa0 task.ti=e03e1000) Stack: d9d4d2c8 00000000 d9d4d2c8 d9d4d2c0 c045157a 00000000 de640aa0 de640aa0 d9d4d2c0 de640aa0 e03e1fa4 c04515ae 00000000 00000010 c0428b85 f7c0aaa0 de640b54 00000000 00000009 00000000 00000000 f70c1ac0 00000000 e03e1000 Call Trace: [<c045157a>] remove_detached+0x5a/0x6d [<c04515ae>] finish_report_death+0x21/0x24 [<c0428b85>] do_exit+0x6e4/0x75e [<c0428c75>] sys_exit_group+0x0/0xd [<c0404eff>] syscall_call+0x7/0xb ======================= Code: 39 8b 93 98 00 00 00 89 d8 e8 e2 dc fd ff 83 bb 98 00 00 00 ff 75 1c b8 20 00 00 00 87 83 90 00 00 00 83 f8 10 66 bd 20 00 74 08 <0f> 0b 59 01 d5 d6 62 c0 f0 ff 05 00 da 6d c0 83 fd 20 75 0b 89 EIP: [<c04514fc>] check_dead_utrace+0x10f/0x133 SS:ESP 0068:e03e1f48 <0>Kernel panic - not syncing: Fatal exception
Cut and paste error in Comment #2. It should have said: "This was seen again while running the 2.6.18-83.el5 kernel through testing" Jeff
in kernel-2.6.18-88.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
confirmed no systems are currently crashing when running the scrashme test and that the fix is in the -90 kernel.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html