Description of problem: Sometimes our Machines are crashing hardly with the following Message. Only cold-boot helps out. Mar 28 09:12:03 amun014 kernel: Unable to handle kernel NULL pointer dereference at 000000000000007c RIP: Mar 28 09:12:03 amun014 kernel: <ffffffff8030b199>{__lock_text_start+1} Mar 28 09:12:03 amun014 kernel: PML4 1e57c3067 PGD 1e57c4067 PMD 0 Mar 28 09:12:03 amun014 kernel: Oops: 0000 [1] SMP Mar 28 09:12:03 amun014 kernel: CPU 0 Mar 28 09:12:03 amun014 kernel: Modules linked in: nfsd exportfs nfs lockd nfs_acl md5 ipv6 autofs4 i2c_dev i2c_core sunrpc dm_mirror dm_mod butt on battery ac ohci_hcd hw_random tg3 floppy ext3 jbd mptscsih mptsas mptspi mptfc mptscsi mptbase sd_mod scsi_mod Mar 28 09:12:03 amun014 kernel: Pid: 8037, comm: sbatchd Not tainted 2.6.9-42.0.8.ELsmp Mar 28 09:12:03 amun014 kernel: RIP: 0010:[<ffffffff8030b199>] <ffffffff8030b199>{__lock_text_start+1} Mar 28 09:12:03 amun014 kernel: RSP: 0018:00000101e5197e38 EFLAGS: 00010246 Mar 28 09:12:03 amun014 kernel: RAX: 0000000000020000 RBX: 00000101cbeef478 RCX: 0000002800000000 Mar 28 09:12:03 amun014 kernel: RDX: ffffffff803dc340 RSI: 00000000ffffe000 RDI: 0000000000000078 Mar 28 09:12:03 amun014 kernel: RBP: 00000000ffffe000 R08: 00000000ffffffff R09: 0000000000000000 Mar 28 09:12:03 amun014 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Mar 28 09:12:03 amun014 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Mar 28 09:12:03 amun014 kernel: FS: 0000002a9557f980(0000) GS:ffffffff804e5880(0000) knlGS:00000000f7fd46c0 Mar 28 09:12:03 amun014 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 28 09:12:03 amun014 kernel: CR2: 000000000000007c CR3: 0000000000101000 CR4: 00000000000006a0 Mar 28 09:12:03 amun014 kernel: Process sbatchd (pid: 8037, threadinfo 00000101e5196000, task 00000103e0dc77f0) Mar 28 12:24:07 amun014 syslogd 1.4.1: restart. Version-Release number of selected component (if applicable): Linux amun014 2.6.9-42.0.8.ELsmp #1 SMP Tue Jan 23 12:49:51 EST 2007 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Not really know how to reproduce at the Moment. The Application seems to be the same, while sbatchd crashes the Machine. Happens sometimes, not in all Jobs. We run about thousands of simulations per day and on one or two simulations in 2 days, this panic happens. Actual results: Panic Expected results: No Panic, work as in all other Jobs Additional info: If you need more Information, please let me know
is there any more traceback in the logs? any further information or reproducer to work off of? thanks.
ping, is there any kernel panic stack track back that we can use to analysis the issue?
Hi there, this bug seems to be fixed with one of the last kernel Patches. Unfortunately I changed my workplace so I don't know exactly which Release solved it and I saw your Mail at least today. You may close this ticket. Florian
Closing ticket based on comment #3.