Description of problem: While postprocessing with the CAE application Medina the machine stop to work with a kernel Oops on Xeon EM64T. The only way to get the machine working again is to hard reset the machine. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-5.EL How reproducible: Every time on Xeon EM64T systems, never on AMD Opteron systems Steps to Reproduce: 1. start Medina postprocessor 2. load protocol file Actual results: Expected results: Additional info:
Created attachment 116024 [details] sysreport after the reboot
The binary Nvidia driver for the graphich adapter is loaded. Here is a output from the kernel Oops Jun 27 17:33:25 ibm1 kernel: medpost74: Corrupted page table at address 2aad17e000 Jun 27 17:33:25 ibm1 kernel: PML4 203c71067 PGD 203c80067 PMD 1f2f78067 PTE 7ffffe000000002f Jun 27 17:33:25 ibm1 kernel: Bad pagetable: 000f [1] SMP Jun 27 17:33:25 ibm1 kernel: CPU 2 Jun 27 17:33:25 ibm1 kernel: Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd sunrpc ds yenta_so cket pcmcia_core button battery ac nvidia(U) md5 ipv6 uhci_hcd ehci_hcd snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_o ss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore tg3 dm_snapshot dm_zero dm_mir ror ext3 jbd dm_mod aic79xx sd_mod scsi_mod Jun 27 17:33:25 ibm1 kernel: Pid: 4131, comm: medpost74 Tainted: P 2.6.9-5.ELsmp Jun 27 17:33:25 ibm1 kernel: RIP: 0033:[<0000002a96c17f10>] [<0000002a96c17f10>] Jun 27 17:33:25 ibm1 kernel: RSP: 002b:0000007fbfffbfa8 EFLAGS: 00010202 Jun 27 17:33:25 ibm1 kernel: RAX: 0000000000002480 RBX: 0000000004669a60 RCX: 0000002aad17e000 Jun 27 17:33:25 ibm1 kernel: RDX: 0000000000400000 RSI: 0000000000000000 RDI: 0000002aace10000 Jun 27 17:33:25 ibm1 kernel: RBP: 0000007fbfffc060 R08: 0000000000000000 R09: 0000000000000200 Jun 27 17:33:25 ibm1 kernel: R10: 0000000000000041 R11: 0000000000000003 R12: 0000000000000000 Jun 27 17:33:25 ibm1 kernel: R13: 0000000000400000 R14: 0000000000002010 R15: 0000000004669a70 Jun 27 17:33:25 ibm1 kernel: FS: 0000002a98009140(0000) GS:ffffffff804bf400(0000) knlGS:0000000000000000 Jun 27 17:33:25 ibm1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 27 17:33:25 ibm1 kernel: CR2: 0000002aad17e000 CR3: 0000000037e2e000 CR4: 00000000000006e0 Jun 27 17:33:25 ibm1 kernel: Process medpost74 (pid: 4131, threadinfo 0000010203438000, task 00000102038a2030) Jun 27 17:33:25 ibm1 kernel: Jun 27 17:33:25 ibm1 kernel: RIP [<0000002a96c17f10>] RSP <0000007fbfffbfa8> Jun 27 17:33:25 ibm1 kernel: <1>Unable to handle kernel paging request at 000000fe0e63e3b0 RIP: Jun 27 17:33:25 ibm1 kernel: <ffffffff80120224>{unmap_single+50} Jun 27 17:33:25 ibm1 kernel: PML4 0 Jun 27 17:33:25 ibm1 kernel: Oops: 0000 [2] SMP Jun 27 17:33:25 ibm1 kernel: CPU 2 Jun 27 17:33:25 ibm1 kernel: Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd sunrpc ds yenta_so cket pcmcia_core button battery ac nvidia(U) md5 ipv6 uhci_hcd ehci_hcd snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_o ss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore tg3 dm_snapshot dm_zero dm_mir ror ext3 jbd dm_mod aic79xx sd_mod scsi_mod Jun 27 17:33:25 ibm1 kernel: Pid: 4131, comm: medpost74 Tainted: P 2.6.9-5.ELsmp Jun 27 17:33:25 ibm1 kernel: RIP: 0010:[<ffffffff80120224>] <ffffffff80120224>{unmap_single+50} Jun 27 17:33:25 ibm1 kernel: RSP: 0000:0000010203439ac8 EFLAGS: 00010293 Jun 27 17:33:25 ibm1 kernel: RAX: 000001000e6e5000 RBX: 001fffffbffeb276 RCX: 0000000000000000 Jun 27 17:33:25 ibm1 kernel: RDX: ffffffffbffeb276 RSI: ffffff0000000000 RDI: 000001023fe90f30 Jun 27 17:33:25 ibm1 kernel: RBP: 0000000000000002 R08: 0000000000001000 R09: 00000101f2ee4000 Jun 27 17:33:25 ibm1 kernel: R10: 0000010000000000 R11: 0000000000000246 R12: 0000000000000000 Jun 27 17:33:25 ibm1 kernel: R13: 000001023fe90f30 R14: 000001023fe90ec0 R15: 0000010000000000 Jun 27 17:33:25 ibm1 kernel: FS: 0000000000000000(0000) GS:ffffffff804bf400(0000) knlGS:0000000000000000 Jun 27 17:33:25 ibm1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 27 17:33:25 ibm1 kernel: CR2: 000000fe0e63e3b0 CR3: 0000000037e2e000 CR4: 00000000000006e0 Jun 27 17:33:25 ibm1 kernel: Process medpost74 (pid: 4131, threadinfo 0000010203438000, task 00000102038a2030) Jun 27 17:33:26 ibm1 kernel: Stack: 0000000000000001 00000101f3371968 0000000000000001 ffffffff80120943 Jun 27 17:33:26 ibm1 kernel: 0000000000000000 000001023e710fc0 00000101f3371950 00000101f3371950 Jun 27 17:33:26 ibm1 kernel: 000000000000036e ffffffffa03d5432 Jun 27 17:33:26 ibm1 kernel: Call Trace:<ffffffff80120943>{swiotlb_unmap_sg+191} <ffffffffa03d5432>{:nvidia:nv_vm_free_page s+283} Jun 27 17:33:26 ibm1 kernel: <ffffffffa03d3410>{:nvidia:nv_free_pages+734} <ffffffffa01ca083>{:nvidia:_nv001716rm+89 } Jun 27 17:33:26 ibm1 kernel: <ffffffffa01b2acc>{:nvidia:_nv001228rm+150} <ffffffffa01b1aea>{:nvidia:_nv001241rm+154} Jun 27 17:33:26 ibm1 kernel: <ffffffffa01b1846>{:nvidia:_nv001246rm+60} <ffffffffa02e5713>{:nvidia:_nv004331rm+33} Jun 27 17:33:26 ibm1 kernel: <ffffffffa02e4eaf>{:nvidia:_nv004179rm+121} <ffffffffa01cf9d2>{:nvidia:_nv001226rm+96} Jun 27 17:33:26 ibm1 kernel: <ffffffffa01d0d50>{:nvidia:rm_free_unused_clients+128} Jun 27 17:33:26 ibm1 kernel: <ffffffffa03d1144>{:nvidia:nv_kern_ctl_close+175} <ffffffffa03d1282>{:nvidia:nv_kern_cl ose+252} Jun 27 17:33:26 ibm1 kernel: <ffffffff80172bcf>{__fput+99} <ffffffff80171810>{filp_close+103} Jun 27 17:33:26 ibm1 kernel: <ffffffff80136dec>{put_files_struct+101} <ffffffff801375b8>{do_exit+665} Jun 27 17:33:26 ibm1 kernel: <ffffffff80226ba4>{do_unblank_screen+97} <ffffffff80121e04>{do_page_fault+0} Jun 27 17:33:26 ibm1 kernel: <ffffffff80122393>{do_page_fault+1423} <ffffffff80165be8>{do_mmap_pgoff+1593} Jun 27 17:33:26 ibm1 kernel: <ffffffff801dccf8>{__up_write+19} <ffffffff80110a6d>{error_exit+0} Jun 27 17:33:26 ibm1 kernel: Jun 27 17:33:26 ibm1 kernel: Jun 27 17:33:26 ibm1 kernel: Code: 4c 8b 0c d0 0f 94 c2 85 c9 0f 94 c0 09 d0 a8 01 74 09 fc 4c Jun 27 17:33:26 ibm1 kernel: RIP <ffffffff80120224>{unmap_single+50} RSP <0000010203439ac8> Jun 27 17:33:26 ibm1 kernel: CR2: 000000fe0e63e3b0 Jun 27 17:33:26 ibm1 kernel: <1>Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP: Jun 27 17:33:26 ibm1 kernel: <ffffffff8013331d>{mm_release+70} Jun 27 17:33:26 ibm1 kernel: PML4 203c71067 PGD 0 Jun 27 17:33:26 ibm1 kernel: Oops: 0000 [3] SMP Jun 27 17:33:26 ibm1 kernel: CPU 2 Jun 27 17:33:26 ibm1 kernel: Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd sunrpc ds yenta_so cket pcmcia_core button battery ac nvidia(U) md5 ipv6 uhci_hcd ehci_hcd snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_o ss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore tg3 dm_snapshot dm_zero dm_mir ror ext3 jbd dm_mod aic79xx sd_mod scsi_mod Jun 27 17:33:26 ibm1 kernel: Pid: 4131, comm: medpost74 Tainted: P 2.6.9-5.ELsmp Jun 27 17:33:26 ibm1 kernel: RIP: 0010:[<ffffffff8013331d>] <ffffffff8013331d>{mm_release+70} Jun 27 17:33:26 ibm1 kernel: RSP: 0000:00000102034398c8 EFLAGS: 00010202 Jun 27 17:33:26 ibm1 kernel: RAX: 00000102038a2030 RBX: 00000102038a2030 RCX: 0000000000000004 Jun 27 17:33:26 ibm1 kernel: RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000002a980091d0 Jun 27 17:33:26 ibm1 kernel: RBP: 0000000000000000 R08: 000000000000000f R09: 0000000000000001 Jun 27 17:33:26 ibm1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Jun 27 17:33:26 ibm1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Jun 27 17:33:26 ibm1 kernel: FS: 0000000000000000(0000) GS:ffffffff804bf400(0000) knlGS:0000000000000000 Jun 27 17:33:26 ibm1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 27 17:33:26 ibm1 kernel: CR2: 0000000000000048 CR3: 0000000037e2e000 CR4: 00000000000006e0 Jun 27 17:33:26 ibm1 kernel: Process medpost74 (pid: 4131, threadinfo 0000010203438000, task 00000102038a2030) Jun 27 17:33:26 ibm1 kernel: Stack: 0000000000000000 0000000000000000 000000fe0e63e3b0 00000102038a2030 Jun 27 17:33:26 ibm1 kernel: 0000000000000009 ffffffff80137467 ffffffff803c7508 0000000000000046 Jun 27 17:33:26 ibm1 kernel: ffffffff803bc66c ffffffffffffffef Jun 27 17:33:26 ibm1 kernel: Call Trace:<ffffffff80137467>{do_exit+328} <ffffffff80111796>{oops_end+38} Jun 27 17:33:26 ibm1 kernel: <ffffffff80122286>{do_page_fault+1154} <ffffffff80157130>{free_pages_bulk+682} Jun 27 17:33:26 ibm1 kernel: <ffffffff80157130>{free_pages_bulk+682} <ffffffff80110a6d>{error_exit+0} Jun 27 17:33:26 ibm1 kernel: <ffffffff80120224>{unmap_single+50} <ffffffff80120943>{swiotlb_unmap_sg+191} Jun 27 17:33:26 ibm1 kernel: <ffffffffa03d5432>{:nvidia:nv_vm_free_pages+283} <ffffffffa03d3410>{:nvidia:nv_free_pag es+734} Jun 27 17:33:26 ibm1 kernel: <ffffffffa01ca083>{:nvidia:_nv001716rm+89} <ffffffffa01b2acc>{:nvidia:_nv001228rm+150} Jun 27 17:33:26 ibm1 kernel: <ffffffffa01b1aea>{:nvidia:_nv001241rm+154} <ffffffffa01b1846>{:nvidia:_nv001246rm+60} Jun 27 17:33:26 ibm1 kernel: <ffffffffa02e5713>{:nvidia:_nv004331rm+33} <ffffffffa02e4eaf>{:nvidia:_nv004179rm+121} Jun 27 17:33:26 ibm1 kernel: <ffffffffa01cf9d2>{:nvidia:_nv001226rm+96} <ffffffffa01d0d50>{:nvidia:rm_free_unused_cl ients+128} Jun 27 17:33:26 ibm1 kernel: <ffffffffa03d1144>{:nvidia:nv_kern_ctl_close+175} <ffffffffa03d1282>{:nvidia:nv_kern_cl ose+252} Jun 27 17:33:26 ibm1 kernel: <ffffffff80172bcf>{__fput+99} <ffffffff80171810>{filp_close+103} Jun 27 17:33:26 ibm1 kernel: <ffffffff80136dec>{put_files_struct+101} <ffffffff801375b8>{do_exit+665} Jun 27 17:33:26 ibm1 kernel: <ffffffff80226ba4>{do_unblank_screen+97} <ffffffff80121e04>{do_page_fault+0} Jun 27 17:33:26 ibm1 kernel: <ffffffff80122393>{do_page_fault+1423} <ffffffff80165be8>{do_mmap_pgoff+1593} Jun 27 17:33:26 ibm1 kernel: <ffffffff801dccf8>{__up_write+19} <ffffffff80110a6d>{error_exit+0} Jun 27 17:33:26 ibm1 kernel: Jun 27 17:33:26 ibm1 kernel: Jun 27 17:33:26 ibm1 kernel: Code: 41 8b 45 48 ff c8 7e 53 48 c7 83 08 02 00 00 00 00 00 00 65 Jun 27 17:33:26 ibm1 kernel: RIP <ffffffff8013331d>{mm_release+70} RSP <00000102034398c8>
This looks to me like a bug in the NVidia driver judging from the call trace. We've heard no other reports of page table corruption which lends more credibility to this. I also recommend updating to the U1 kernel which fixes a large number of bugs. *** This bug has been marked as a duplicate of 73733 ***